Matei Zaharia
173e0354c0
Detect correctly when one has disconnected from a standalone cluster.
...
SPARK-617 #resolve
2012-11-11 21:06:57 -08:00
tdas
052d0b800f
Merge branch 'dev' of github.com:radlab/spark into dev
2012-11-11 22:56:14 +00:00
Denny
68e0a88282
Merge branch 'master' into blockmanagerUI
2012-11-11 14:00:02 -08:00
Denny
b829fba749
Merge branch 'master' into blockmanagerUI
...
Conflicts:
core/src/main/twirl/spark/deploy/worker/index.scala.html
2012-11-11 13:59:40 -08:00
Tathagata Das
46222dc56d
Fixed bug in FileInputDStream that allowed it to miss new files. Added tests in the InputStreamsSuite to test checkpointing of file and network streams.
2012-11-11 13:20:09 -08:00
Denny
0fd4c93f1c
Updated comment.
2012-11-11 11:15:31 -08:00
Denny
deb2c4df72
Add comment.
2012-11-11 11:11:49 -08:00
Denny
d006109e95
Kafka Stream comments.
2012-11-11 11:06:49 -08:00
Tathagata Das
04e9e9d93c
Refactored BlockManagerMaster (not BlockManagerMasterActor) to simplify the code and fix live lock problem in unlimited attempts to contact the master. Also added testcases in the BlockManagerSuite to test BlockManagerMaster methods getPeers and getLocations.
2012-11-11 08:54:21 -08:00
root
acf8272324
Fix K-means example a little
2012-11-10 23:07:21 -08:00
Matei Zaharia
d0f0fc8c1e
Merge pull request #302 from tdas/blockmanager-fix
...
Blockmanager fix
2012-11-09 20:27:20 -08:00
Tathagata Das
62af376863
Merge branch 'dev' of github.com:radlab/spark into dev
2012-11-09 16:29:11 -08:00
Tathagata Das
355c8e4b17
Fixed deadlock in BlockManager.
2012-11-09 16:28:45 -08:00
Tathagata Das
9915989bfa
Incorporated Matei's suggestions. Tested with 5 producer(consumer) threads each doing 50k puts (gets), took 15 minutes to run, no errors or deadlocks.
2012-11-09 15:46:15 -08:00
Tathagata Das
de00bc63db
Fixed deadlock in BlockManager.
...
1. Changed the lock structure of BlockManager by replacing the 337 coarse-grained locks to use BlockInfo objects as per-block fine-grained locks.
2. Changed the MemoryStore lock structure by making the block putting threads lock on a different object (not the memory store) thus making sure putting threads minimally blocks to the getting treads.
3. Added spark.storage.ThreadingTest to stress test the BlockManager using 5 block producer and 5 block consumer threads.
2012-11-09 14:09:37 -08:00
Denny
2e8f2ee4ad
Merge branch 'dev' of github.com:radlab/spark into kafka
...
Conflicts:
streaming/src/main/scala/spark/streaming/DStream.scala
2012-11-09 12:26:17 -08:00
Denny
e5a0936787
Kafka Stream.
2012-11-09 12:23:46 -08:00
Matei Zaharia
6607f546cc
Added an option to spread out jobs in the standalone mode.
2012-11-08 23:13:12 -08:00
Matei Zaharia
66cbdee941
Fix for connections not being reused (from Josh Rosen)
2012-11-08 09:53:40 -08:00
tdas
52d21cb682
Removed unnecessary files.
2012-11-08 11:35:40 +00:00
tdas
cc2a65f547
Fixed bug in InputStreamsSuite
2012-11-08 11:17:57 +00:00
Imran Rashid
809b2bb1fe
fix bug in getting slave id out of mesos
2012-11-08 00:34:28 -08:00
Matei Zaharia
bb1bce7924
Various fixes to standalone mode and web UI:
...
- Don't report a job as finishing multiple times
- Don't show state of workers as LOADING when they're running
- Show start and finish times in web UI
- Sort web UI tables by ID and time by default
2012-11-07 16:49:53 -08:00
Tathagata Das
fc3d0b602a
Added FailureTestsuite for testing multiple, repeated master failures.
2012-11-06 17:23:31 -08:00
Matei Zaharia
e2b8477487
Made Akka timeout and message frame size configurable, and upped the defaults
2012-11-06 15:58:05 -08:00
Denny
485803d740
Merge branch 'dev' of github.com:radlab/spark into kafka
2012-11-06 09:41:45 -08:00
Denny
0c1de43fc7
Working on kafka.
2012-11-06 09:41:42 -08:00
Tathagata Das
f8bb719cd2
Added a few more comments to the checkpoint-related functions.
2012-11-05 17:53:56 -08:00
Tathagata Das
395167f2b2
Made more bug fixes for checkpointing.
2012-11-05 16:11:50 -08:00
Tathagata Das
72b2303f99
Fixed major bugs in checkpointing.
2012-11-05 11:41:36 -08:00
Tathagata Das
d154238789
Made checkpointing of dstream graph to work with checkpointing of RDDs. For streams requiring checkpointing of its RDD, the default checkpoint interval is set to 10 seconds.
2012-11-04 12:12:06 -08:00
Matei Zaharia
dfce7e74a7
Merge pull request #298 from JoshRosen/fix/ec2-existing-cluster-check
...
Fix check for existing instances during spark-ec2 launch
2012-11-03 18:35:26 -07:00
Josh Rosen
594eed31c4
Fix check for existing instances during EC2 launch.
2012-11-03 17:02:47 -07:00
Tathagata Das
596154eabe
Merge branch 'dev-checkpoint' into dev
2012-11-02 17:05:22 -07:00
Tathagata Das
3fb5c9ee24
Fixed serialization bug in countByWindow, added countByKey and countByKeyAndWindow, and added testcases for them.
2012-11-02 12:12:25 -07:00
Matei Zaharia
590e4aa9cb
Merge pull request #296 from shivaram/block-manager-fix
...
Remove unnecessary hash-map put in MemoryStore
2012-11-01 11:54:23 -07:00
Matei Zaharia
4a47d1a476
Merge pull request #297 from JoshRosen/fix/ec2-spot-instances
...
Cancel spot instance requests when exiting spark-ec2
2012-11-01 11:31:18 -07:00
Shivaram Venkataraman
a7d967a1ca
Remove unnecessary hash-map put in MemoryStore
2012-11-01 10:46:38 -07:00
Tathagata Das
34e569f40e
Added 'synchronized' to RDD serialization to ensure checkpoint-related changes are reflected atomically in the task closure. Added to tests to ensure that jobs running on an RDD on which checkpointing is in progress does hurt the result of the job.
2012-10-31 00:56:40 -07:00
Josh Rosen
96c9bcfd8d
Cancel spot instance requests when exiting spark-ec2.
2012-10-30 23:32:38 -07:00
Tathagata Das
0dcd770fdc
Added checkpointing support to all RDDs, along with CheckpointSuite to test checkpointing in them.
2012-10-30 16:09:37 -07:00
Denny
ceec1a1a6a
Nicer storage level format on RDD page
2012-10-29 15:03:01 -07:00
Denny
eb95212f4d
code Formatting
2012-10-29 14:57:32 -07:00
Denny
531ac136bf
BlockManager UI.
2012-10-29 14:53:47 -07:00
Tathagata Das
ac12abc17f
Modified RDD API to make dependencies a var (therefore can be changed to checkpointed hadoop rdd) and othere references to parent RDDs either through dependencies or through a weak reference (to allow finalizing when dependencies do not refer to it any more).
2012-10-29 11:55:27 -07:00
Josh Rosen
2ccf3b6652
Fix PySpark hash partitioning bug.
...
A Java array's hashCode is based on its object
identify, not its elements, so this was causing
serialized keys to be hashed incorrectly.
This commit adds a PySpark-specific workaround
and adds more tests.
2012-10-28 22:30:28 -07:00
Josh Rosen
7859879aaa
Bump required Py4J version and add test for large broadcast variables.
2012-10-28 16:48:25 -07:00
Tathagata Das
1b900183c8
Added save operations to DStreams.
2012-10-27 18:55:50 -07:00
Matei Zaharia
51477e8874
Merge pull request #294 from JoshRosen/docs/quickstart
...
Fix minor typos in quickstart and Scala programming guides
2012-10-27 16:56:39 -07:00
Josh Rosen
33bea24f8e
Fix Spark groupId in Scala Programming Guide.
2012-10-26 15:01:28 -07:00