Matei Zaharia
6607f546cc
Added an option to spread out jobs in the standalone mode.
2012-11-08 23:13:12 -08:00
Matei Zaharia
66cbdee941
Fix for connections not being reused (from Josh Rosen)
2012-11-08 09:53:40 -08:00
tdas
52d21cb682
Removed unnecessary files.
2012-11-08 11:35:40 +00:00
tdas
cc2a65f547
Fixed bug in InputStreamsSuite
2012-11-08 11:17:57 +00:00
Imran Rashid
809b2bb1fe
fix bug in getting slave id out of mesos
2012-11-08 00:34:28 -08:00
Matei Zaharia
bb1bce7924
Various fixes to standalone mode and web UI:
...
- Don't report a job as finishing multiple times
- Don't show state of workers as LOADING when they're running
- Show start and finish times in web UI
- Sort web UI tables by ID and time by default
2012-11-07 16:49:53 -08:00
Tathagata Das
fc3d0b602a
Added FailureTestsuite for testing multiple, repeated master failures.
2012-11-06 17:23:31 -08:00
Matei Zaharia
e2b8477487
Made Akka timeout and message frame size configurable, and upped the defaults
2012-11-06 15:58:05 -08:00
Denny
485803d740
Merge branch 'dev' of github.com:radlab/spark into kafka
2012-11-06 09:41:45 -08:00
Denny
0c1de43fc7
Working on kafka.
2012-11-06 09:41:42 -08:00
Tathagata Das
f8bb719cd2
Added a few more comments to the checkpoint-related functions.
2012-11-05 17:53:56 -08:00
Tathagata Das
395167f2b2
Made more bug fixes for checkpointing.
2012-11-05 16:11:50 -08:00
Tathagata Das
72b2303f99
Fixed major bugs in checkpointing.
2012-11-05 11:41:36 -08:00
Tathagata Das
d154238789
Made checkpointing of dstream graph to work with checkpointing of RDDs. For streams requiring checkpointing of its RDD, the default checkpoint interval is set to 10 seconds.
2012-11-04 12:12:06 -08:00
Matei Zaharia
dfce7e74a7
Merge pull request #298 from JoshRosen/fix/ec2-existing-cluster-check
...
Fix check for existing instances during spark-ec2 launch
2012-11-03 18:35:26 -07:00
Josh Rosen
594eed31c4
Fix check for existing instances during EC2 launch.
2012-11-03 17:02:47 -07:00
Tathagata Das
596154eabe
Merge branch 'dev-checkpoint' into dev
2012-11-02 17:05:22 -07:00
Tathagata Das
3fb5c9ee24
Fixed serialization bug in countByWindow, added countByKey and countByKeyAndWindow, and added testcases for them.
2012-11-02 12:12:25 -07:00
Matei Zaharia
590e4aa9cb
Merge pull request #296 from shivaram/block-manager-fix
...
Remove unnecessary hash-map put in MemoryStore
2012-11-01 11:54:23 -07:00
Matei Zaharia
4a47d1a476
Merge pull request #297 from JoshRosen/fix/ec2-spot-instances
...
Cancel spot instance requests when exiting spark-ec2
2012-11-01 11:31:18 -07:00
Shivaram Venkataraman
a7d967a1ca
Remove unnecessary hash-map put in MemoryStore
2012-11-01 10:46:38 -07:00
Tathagata Das
34e569f40e
Added 'synchronized' to RDD serialization to ensure checkpoint-related changes are reflected atomically in the task closure. Added to tests to ensure that jobs running on an RDD on which checkpointing is in progress does hurt the result of the job.
2012-10-31 00:56:40 -07:00
Josh Rosen
96c9bcfd8d
Cancel spot instance requests when exiting spark-ec2.
2012-10-30 23:32:38 -07:00
Tathagata Das
0dcd770fdc
Added checkpointing support to all RDDs, along with CheckpointSuite to test checkpointing in them.
2012-10-30 16:09:37 -07:00
Denny
ceec1a1a6a
Nicer storage level format on RDD page
2012-10-29 15:03:01 -07:00
Denny
eb95212f4d
code Formatting
2012-10-29 14:57:32 -07:00
Denny
531ac136bf
BlockManager UI.
2012-10-29 14:53:47 -07:00
Tathagata Das
ac12abc17f
Modified RDD API to make dependencies a var (therefore can be changed to checkpointed hadoop rdd) and othere references to parent RDDs either through dependencies or through a weak reference (to allow finalizing when dependencies do not refer to it any more).
2012-10-29 11:55:27 -07:00
Josh Rosen
2ccf3b6652
Fix PySpark hash partitioning bug.
...
A Java array's hashCode is based on its object
identify, not its elements, so this was causing
serialized keys to be hashed incorrectly.
This commit adds a PySpark-specific workaround
and adds more tests.
2012-10-28 22:30:28 -07:00
Josh Rosen
7859879aaa
Bump required Py4J version and add test for large broadcast variables.
2012-10-28 16:48:25 -07:00
Tathagata Das
1b900183c8
Added save operations to DStreams.
2012-10-27 18:55:50 -07:00
Matei Zaharia
51477e8874
Merge pull request #294 from JoshRosen/docs/quickstart
...
Fix minor typos in quickstart and Scala programming guides
2012-10-27 16:56:39 -07:00
Josh Rosen
33bea24f8e
Fix Spark groupId in Scala Programming Guide.
2012-10-26 15:01:28 -07:00
root
e782187b4a
Don't throw an error in the block manager when a block is cached on the master due to
...
a locally computed operation
Conflicts:
core/src/main/scala/spark/storage/BlockManagerMaster.scala
2012-10-26 00:33:45 -07:00
Tathagata Das
650d717544
Merge branch 'dev' of github.com:radlab/spark into dev
2012-10-25 13:03:18 -07:00
Matei Zaharia
863a55ae42
Merge remote-tracking branch 'public/master' into dev
...
Conflicts:
core/src/main/scala/spark/BlockStoreShuffleFetcher.scala
core/src/main/scala/spark/KryoSerializer.scala
core/src/main/scala/spark/MapOutputTracker.scala
core/src/main/scala/spark/RDD.scala
core/src/main/scala/spark/SparkContext.scala
core/src/main/scala/spark/executor/Executor.scala
core/src/main/scala/spark/network/Connection.scala
core/src/main/scala/spark/network/ConnectionManagerTest.scala
core/src/main/scala/spark/rdd/BlockRDD.scala
core/src/main/scala/spark/rdd/NewHadoopRDD.scala
core/src/main/scala/spark/scheduler/ShuffleMapTask.scala
core/src/main/scala/spark/scheduler/cluster/StandaloneSchedulerBackend.scala
core/src/main/scala/spark/storage/BlockManager.scala
core/src/main/scala/spark/storage/BlockMessage.scala
core/src/main/scala/spark/storage/BlockStore.scala
core/src/main/scala/spark/storage/StorageLevel.scala
core/src/main/scala/spark/util/AkkaUtils.scala
project/SparkBuild.scala
run
2012-10-24 23:21:00 -07:00
Tathagata Das
926e05b030
Added tests for the file input stream.
2012-10-24 23:14:37 -07:00
Matei Zaharia
f63a40fd99
Strip leading mesos:// in URLs passed to Mesos
2012-10-24 21:52:13 -07:00
Tathagata Das
ed71df46cd
Minor fixes.
2012-10-24 16:49:40 -07:00
Tathagata Das
1ef6ea2513
Added tests for testing network input stream.
2012-10-24 14:44:20 -07:00
Matei Zaharia
d290e964ea
Merge pull request #281 from rxin/memreport
...
Added a method to report slave memory status; force serialize accumulator update in local mode.
2012-10-23 22:04:35 -07:00
Matei Zaharia
0bd20c63e2
Merge remote-tracking branch 'JoshRosen/shuffle_refactoring' into dev
...
Conflicts:
core/src/main/scala/spark/Dependency.scala
core/src/main/scala/spark/rdd/CoGroupedRDD.scala
core/src/main/scala/spark/rdd/ShuffledRDD.scala
2012-10-23 22:01:45 -07:00
Matei Zaharia
7849216bba
Merge pull request #286 from JoshRosen/ec2-error-handling
...
Allow EC2 script to stop/destroy cluster after master/slave failures
2012-10-23 21:15:43 -07:00
Matei Zaharia
46b87dfc3a
Merge pull request #292 from tomdz/tweaked-run-file
...
Tweaked run file to live more happily with typesafe's debian package
2012-10-23 21:14:06 -07:00
Tathagata Das
020d643484
Renamed the streaming testsuites.
2012-10-23 16:24:05 -07:00
Tathagata Das
0e5d9be4df
Renamed APIs to create queueStream and fileStream.
2012-10-23 15:17:05 -07:00
Tathagata Das
c2731dd3ef
Updated StateDStream api to use Options instead of nulls.
2012-10-23 15:10:27 -07:00
Tathagata Das
19191d178d
Renamed the network input streams.
2012-10-23 14:40:24 -07:00
Josh Rosen
c4aa10154e
Fix minor typos in quick start guide.
2012-10-23 13:49:52 -07:00
Tathagata Das
a6de5758f1
Modified API of NetworkInputDStreams and got ObjectInputDStream and RawInputDStream working.
2012-10-23 01:41:13 -07:00