Commit graph

1530 commits

Author SHA1 Message Date
Tathagata Das 477de94894 Minor modifications. 2012-12-01 13:15:06 -08:00
Tathagata Das 62965c5d8e Added ssc.union 2012-12-01 08:26:10 -08:00
Tathagata Das 6fcd09f499 Added TimeStampedHashSet and used that to cleanup the list of registered RDD IDs in CacheTracker. 2012-11-29 02:06:33 -08:00
Tathagata Das c9789751bf Added metadata cleaner to BlockManager to remove old blocks completely. 2012-11-28 23:18:24 -08:00
Tathagata Das 9e9e9e1d89 Renamed CleanupTask to MetadataCleaner. 2012-11-28 18:48:14 -08:00
Tathagata Das e463ae4920 Modified StorageLevel and BlockManagerId to cache common objects and use cached object while deserializing. 2012-11-28 14:05:01 -08:00
Tathagata Das d5e7aad039 Bug fixes 2012-11-28 08:36:55 +00:00
Tathagata Das b18d70870a Modified bunch HashMaps in Spark to use TimeStampedHashMap and made various modules use CleanupTask to periodically clean up metadata. 2012-11-27 15:08:49 -08:00
Tathagata Das 0fe2fc4d5e Merged branch mesos/master to branch dev. 2012-11-26 13:16:59 -08:00
Tathagata Das fd11d23bb3 Modified StreamingContext API to make constructor accept the batch size (since it is always needed, Patrick's suggestion). Added description to DStream and StreamingContext. 2012-11-19 19:04:39 -08:00
Matei Zaharia cd16eab0db Merge pull request #309 from admobius/use-boto-config-file
Improved use of Boto
2012-11-19 15:51:41 -08:00
Tathagata Das c97ebf6437 Fixed bug in the number of splits in RDD after checkpointing. Modified reduceByKeyAndWindow (naive) computation from window+reduceByKey to reduceByKey+window+reduceByKey. 2012-11-19 23:22:07 +00:00
Peter Sankauskas dc2fb3c4b6 Allow Boto to use the other config options it supports, and gracefully
handling Boto connection exceptions (like AuthFailure)
2012-11-19 14:21:16 -08:00
Matei Zaharia 85ce5f27c1 Merge pull request #308 from admobius/multi-zone
Let EC2 script launch slaves in multiple availability zones
2012-11-19 13:24:09 -08:00
Matei Zaharia 3ff6f4bdee Merge pull request #304 from mbautin/configurable_local_ip
SPARK-624: make the default local IP customizable
2012-11-19 13:23:39 -08:00
mbautin 00f4e3ff9c Addressing Matei's comment: SPARK_LOCAL_IP environment variable 2012-11-19 11:52:10 -08:00
Peter Sankauskas 606d252d26 Adding comment about additional bandwidth charges 2012-11-17 23:09:11 -08:00
Tathagata Das 3fd7b8319b Merge branch 'dev' of github.com:radlab/spark into dev 2012-11-17 17:27:07 -08:00
Tathagata Das 10c1abcb6a Fixed checkpointing bug in CoGroupedRDD. CoGroupSplits kept around the RDD splits of its parent RDDs, thus checkpointing its parents did not release the references to the parent splits. 2012-11-17 17:27:00 -08:00
Matei Zaharia 20a1058dd5 Merge pull request #305 from woggling/exit-on-uncaught
Set default uncaught exception handler to exit
2012-11-16 20:56:09 -08:00
Matei Zaharia fcc0ba7da1 Merge pull request #306 from admobius/master
Delete security groups when destroying cluster
2012-11-16 20:54:28 -08:00
Matei Zaharia 6adc7c965f Doc fix 2012-11-16 20:49:02 -08:00
Patrick Wendell efa93fd0e6 Merge pull request #4 from radlab/streaming-example
A "streaming page view" example.
2012-11-16 20:40:27 -08:00
Charles Reiss 12c24e786c Set default uncaught exception handler to exit.
Among other things, should prevent OutOfMemoryErrors in some daemon threads
(such as the network manager) from causing a spark executor to enter a state
where it cannot make progress but does not report an error.
2012-11-16 20:12:31 -08:00
Peter Sankauskas 32442ee1e1 Giving the Spark EC2 script the ability to launch instances spread
across multiple availability zones in order to make the cluster more
resilient to failure
2012-11-16 17:25:28 -08:00
Peter Sankauskas 6d22f7ccb8 Delete security groups when deleting the cluster. As many operations
are done on instances in specific security groups, this seems like a
reasonable thing to clean up.
2012-11-16 14:02:43 -08:00
Patrick Wendell 720cb0f467 A "streaming page view" example. 2012-11-16 12:11:22 -08:00
mbautin 1f5a7e0e64 SPARK-624: make the default local IP customizable 2012-11-15 13:57:47 -08:00
Matei Zaharia c23a74df0a Use DNS names instead of IP addresses in standalone mode, to allow
matching with data locality hints from storage systems.
2012-11-15 00:10:52 -08:00
Matei Zaharia 59e648c081 Fix Java/Scala home having spaces on Windows 2012-11-14 22:37:05 -08:00
Patrick Wendell 9563f7aba9 Merge pull request #3 from radlab/streaming-docs
Streaming programming guide. STREAMING-2 #resolve
2012-11-14 22:00:48 -08:00
Patrick Wendell d39ac5fbc1 Streaming programming guide. STREAMING-2 #resolve 2012-11-13 21:19:58 -08:00
Tathagata Das 26fec8f0b8 Fixed bug in MappedValuesRDD, and set default graph checkpoint interval to be batch duration. 2012-11-13 11:05:57 -08:00
Tathagata Das c3ccd14cf8 Replaced StateRDD in StateDStream with MapPartitionsRDD. 2012-11-13 02:43:03 -08:00
Tathagata Das 8a25d530ed Optimized checkpoint writing by reusing FileSystem object. Fixed bug in updating of checkpoint data in DStream where the checkpointed RDDs, upon recovery, were not recognized as checkpointed RDDs and therefore deleted from HDFS. Made InputStreamsSuite more robust to timing delays. 2012-11-13 02:16:28 -08:00
Tathagata Das 564dd8c3f4 Speeded up CheckpointSuite 2012-11-12 14:22:05 -08:00
Tathagata Das b9bfd1456f Changed default level on calling DStream.persist() to be MEMORY_ONLY_SER. Also changed the persist level of StateDStream to be MEMORY_ONLY_SER. 2012-11-12 21:51:42 +00:00
Tathagata Das ae61ebaee6 Fixed bugs in RawNetworkInputDStream and in its examples. Made the ReducedWindowedDStream persist RDDs to MEMOERY_SER_ONLY by default. Removed unncessary examples. Added streaming-env.sh.template to add recommended setting for streaming. 2012-11-12 21:45:16 +00:00
Matei Zaharia 173e0354c0 Detect correctly when one has disconnected from a standalone cluster.
SPARK-617 #resolve
2012-11-11 21:06:57 -08:00
tdas 052d0b800f Merge branch 'dev' of github.com:radlab/spark into dev 2012-11-11 22:56:14 +00:00
Tathagata Das 46222dc56d Fixed bug in FileInputDStream that allowed it to miss new files. Added tests in the InputStreamsSuite to test checkpointing of file and network streams. 2012-11-11 13:20:09 -08:00
Tathagata Das 04e9e9d93c Refactored BlockManagerMaster (not BlockManagerMasterActor) to simplify the code and fix live lock problem in unlimited attempts to contact the master. Also added testcases in the BlockManagerSuite to test BlockManagerMaster methods getPeers and getLocations. 2012-11-11 08:54:21 -08:00
root acf8272324 Fix K-means example a little 2012-11-10 23:07:21 -08:00
Matei Zaharia d0f0fc8c1e Merge pull request #302 from tdas/blockmanager-fix
Blockmanager fix
2012-11-09 20:27:20 -08:00
Tathagata Das 62af376863 Merge branch 'dev' of github.com:radlab/spark into dev 2012-11-09 16:29:11 -08:00
Tathagata Das 355c8e4b17 Fixed deadlock in BlockManager. 2012-11-09 16:28:45 -08:00
Tathagata Das 9915989bfa Incorporated Matei's suggestions. Tested with 5 producer(consumer) threads each doing 50k puts (gets), took 15 minutes to run, no errors or deadlocks. 2012-11-09 15:46:15 -08:00
Tathagata Das de00bc63db Fixed deadlock in BlockManager.
1. Changed the lock structure of BlockManager by replacing the 337 coarse-grained locks to use BlockInfo objects as per-block fine-grained locks.
2. Changed the MemoryStore lock structure by making the block putting threads lock on a different object (not the memory store) thus making sure putting threads minimally blocks to the getting treads.
3. Added spark.storage.ThreadingTest to stress test the BlockManager using 5 block producer and 5 block consumer threads.
2012-11-09 14:09:37 -08:00
Matei Zaharia 6607f546cc Added an option to spread out jobs in the standalone mode. 2012-11-08 23:13:12 -08:00
Matei Zaharia 66cbdee941 Fix for connections not being reused (from Josh Rosen) 2012-11-08 09:53:40 -08:00