Commit graph

2626 commits

Author SHA1 Message Date
Matei Zaharia 173e0354c0 Detect correctly when one has disconnected from a standalone cluster.
SPARK-617 #resolve
2012-11-11 21:06:57 -08:00
tdas 052d0b800f Merge branch 'dev' of github.com:radlab/spark into dev 2012-11-11 22:56:14 +00:00
Denny 68e0a88282 Merge branch 'master' into blockmanagerUI 2012-11-11 14:00:02 -08:00
Denny b829fba749 Merge branch 'master' into blockmanagerUI
Conflicts:
	core/src/main/twirl/spark/deploy/worker/index.scala.html
2012-11-11 13:59:40 -08:00
Tathagata Das 46222dc56d Fixed bug in FileInputDStream that allowed it to miss new files. Added tests in the InputStreamsSuite to test checkpointing of file and network streams. 2012-11-11 13:20:09 -08:00
Denny 0fd4c93f1c Updated comment. 2012-11-11 11:15:31 -08:00
Denny deb2c4df72 Add comment. 2012-11-11 11:11:49 -08:00
Denny d006109e95 Kafka Stream comments. 2012-11-11 11:06:49 -08:00
Tathagata Das 04e9e9d93c Refactored BlockManagerMaster (not BlockManagerMasterActor) to simplify the code and fix live lock problem in unlimited attempts to contact the master. Also added testcases in the BlockManagerSuite to test BlockManagerMaster methods getPeers and getLocations. 2012-11-11 08:54:21 -08:00
root acf8272324 Fix K-means example a little 2012-11-10 23:07:21 -08:00
Matei Zaharia d0f0fc8c1e Merge pull request #302 from tdas/blockmanager-fix
Blockmanager fix
2012-11-09 20:27:20 -08:00
Tathagata Das 62af376863 Merge branch 'dev' of github.com:radlab/spark into dev 2012-11-09 16:29:11 -08:00
Tathagata Das 355c8e4b17 Fixed deadlock in BlockManager. 2012-11-09 16:28:45 -08:00
Tathagata Das 9915989bfa Incorporated Matei's suggestions. Tested with 5 producer(consumer) threads each doing 50k puts (gets), took 15 minutes to run, no errors or deadlocks. 2012-11-09 15:46:15 -08:00
Tathagata Das de00bc63db Fixed deadlock in BlockManager.
1. Changed the lock structure of BlockManager by replacing the 337 coarse-grained locks to use BlockInfo objects as per-block fine-grained locks.
2. Changed the MemoryStore lock structure by making the block putting threads lock on a different object (not the memory store) thus making sure putting threads minimally blocks to the getting treads.
3. Added spark.storage.ThreadingTest to stress test the BlockManager using 5 block producer and 5 block consumer threads.
2012-11-09 14:09:37 -08:00
Denny 2e8f2ee4ad Merge branch 'dev' of github.com:radlab/spark into kafka
Conflicts:
	streaming/src/main/scala/spark/streaming/DStream.scala
2012-11-09 12:26:17 -08:00
Denny e5a0936787 Kafka Stream. 2012-11-09 12:23:46 -08:00
Matei Zaharia 6607f546cc Added an option to spread out jobs in the standalone mode. 2012-11-08 23:13:12 -08:00
Matei Zaharia 66cbdee941 Fix for connections not being reused (from Josh Rosen) 2012-11-08 09:53:40 -08:00
tdas 52d21cb682 Removed unnecessary files. 2012-11-08 11:35:40 +00:00
tdas cc2a65f547 Fixed bug in InputStreamsSuite 2012-11-08 11:17:57 +00:00
Imran Rashid 809b2bb1fe fix bug in getting slave id out of mesos 2012-11-08 00:34:28 -08:00
Matei Zaharia bb1bce7924 Various fixes to standalone mode and web UI:
- Don't report a job as finishing multiple times
- Don't show state of workers as LOADING when they're running
- Show start and finish times in web UI
- Sort web UI tables by ID and time by default
2012-11-07 16:49:53 -08:00
Tathagata Das fc3d0b602a Added FailureTestsuite for testing multiple, repeated master failures. 2012-11-06 17:23:31 -08:00
Matei Zaharia e2b8477487 Made Akka timeout and message frame size configurable, and upped the defaults 2012-11-06 15:58:05 -08:00
Denny 485803d740 Merge branch 'dev' of github.com:radlab/spark into kafka 2012-11-06 09:41:45 -08:00
Denny 0c1de43fc7 Working on kafka. 2012-11-06 09:41:42 -08:00
Tathagata Das f8bb719cd2 Added a few more comments to the checkpoint-related functions. 2012-11-05 17:53:56 -08:00
Tathagata Das 395167f2b2 Made more bug fixes for checkpointing. 2012-11-05 16:11:50 -08:00
Tathagata Das 72b2303f99 Fixed major bugs in checkpointing. 2012-11-05 11:41:36 -08:00
Tathagata Das d154238789 Made checkpointing of dstream graph to work with checkpointing of RDDs. For streams requiring checkpointing of its RDD, the default checkpoint interval is set to 10 seconds. 2012-11-04 12:12:06 -08:00
Matei Zaharia dfce7e74a7 Merge pull request #298 from JoshRosen/fix/ec2-existing-cluster-check
Fix check for existing instances during spark-ec2 launch
2012-11-03 18:35:26 -07:00
Josh Rosen 594eed31c4 Fix check for existing instances during EC2 launch. 2012-11-03 17:02:47 -07:00
Tathagata Das 596154eabe Merge branch 'dev-checkpoint' into dev 2012-11-02 17:05:22 -07:00
Tathagata Das 3fb5c9ee24 Fixed serialization bug in countByWindow, added countByKey and countByKeyAndWindow, and added testcases for them. 2012-11-02 12:12:25 -07:00
Matei Zaharia 590e4aa9cb Merge pull request #296 from shivaram/block-manager-fix
Remove unnecessary hash-map put in MemoryStore
2012-11-01 11:54:23 -07:00
Matei Zaharia 4a47d1a476 Merge pull request #297 from JoshRosen/fix/ec2-spot-instances
Cancel spot instance requests when exiting spark-ec2
2012-11-01 11:31:18 -07:00
Shivaram Venkataraman a7d967a1ca Remove unnecessary hash-map put in MemoryStore 2012-11-01 10:46:38 -07:00
Tathagata Das 34e569f40e Added 'synchronized' to RDD serialization to ensure checkpoint-related changes are reflected atomically in the task closure. Added to tests to ensure that jobs running on an RDD on which checkpointing is in progress does hurt the result of the job. 2012-10-31 00:56:40 -07:00
Josh Rosen 96c9bcfd8d Cancel spot instance requests when exiting spark-ec2. 2012-10-30 23:32:38 -07:00
Tathagata Das 0dcd770fdc Added checkpointing support to all RDDs, along with CheckpointSuite to test checkpointing in them. 2012-10-30 16:09:37 -07:00
Denny ceec1a1a6a Nicer storage level format on RDD page 2012-10-29 15:03:01 -07:00
Denny eb95212f4d code Formatting 2012-10-29 14:57:32 -07:00
Denny 531ac136bf BlockManager UI. 2012-10-29 14:53:47 -07:00
Tathagata Das ac12abc17f Modified RDD API to make dependencies a var (therefore can be changed to checkpointed hadoop rdd) and othere references to parent RDDs either through dependencies or through a weak reference (to allow finalizing when dependencies do not refer to it any more). 2012-10-29 11:55:27 -07:00
Josh Rosen 2ccf3b6652 Fix PySpark hash partitioning bug.
A Java array's hashCode is based on its object
identify, not its elements, so this was causing
serialized keys to be hashed incorrectly.

This commit adds a PySpark-specific workaround
and adds more tests.
2012-10-28 22:30:28 -07:00
Josh Rosen 7859879aaa Bump required Py4J version and add test for large broadcast variables. 2012-10-28 16:48:25 -07:00
Tathagata Das 1b900183c8 Added save operations to DStreams. 2012-10-27 18:55:50 -07:00
Matei Zaharia 51477e8874 Merge pull request #294 from JoshRosen/docs/quickstart
Fix minor typos in quickstart and Scala programming guides
2012-10-27 16:56:39 -07:00
Josh Rosen 33bea24f8e Fix Spark groupId in Scala Programming Guide. 2012-10-26 15:01:28 -07:00