Commit graph

135 commits

Author SHA1 Message Date
Tathagata Das af8738dfb5 Moved Spark Streaming examples to examples sub-project. 2013-01-06 19:31:54 -08:00
Tathagata Das 02497f0cd4 Updated Streaming Programming Guide. 2013-01-01 12:21:32 -08:00
Tathagata Das 18b9b3b99f More classes made private[streaming] to hide from scala docs. 2012-12-30 20:00:42 -08:00
Tathagata Das 7e0271b438 Refactored a whole lot to push all DStreams into the spark.streaming.dstream package. 2012-12-30 15:19:55 -08:00
Tathagata Das 9e644402c1 Improved jekyll and scala docs. Made many classes and method private to remove them from scala docs. 2012-12-29 18:31:51 -08:00
Tathagata Das 0bc0a60d30 Modifications to make sure LocalScheduler terminate cleanly without errors when SparkContext is shutdown, to minimize spurious exception during master failure tests. 2012-12-27 15:37:33 -08:00
Tathagata Das 8512dd3225 Merge branch 'dev' of github.com:radlab/spark into dev-checkpoint
Conflicts:
	core/src/main/scala/spark/ParallelCollection.scala
	core/src/test/scala/spark/CheckpointSuite.scala
	streaming/src/main/scala/spark/streaming/DStream.scala
2012-12-20 14:24:19 -08:00
Tathagata Das 8e74fac215 Made checkpoint data in RDDs optional to further reduce serialized size. 2012-12-11 15:36:12 -08:00
Patrick Wendell 3e796bdd57 Changes in response to TD's review. 2012-12-07 19:34:05 -08:00
Patrick Wendell 3ff9710265 Adding Flume InputDStream 2012-12-07 16:42:39 -08:00
Denny 556c38ed91 Added kafka JAR 2012-12-05 11:54:42 -08:00
Denny a23462191f Adjust Kafka code to work with new streaming changes. 2012-12-05 10:30:40 -08:00
Denny 15df4b0e52 Merge branch 'dev' into kafka
Conflicts:
	streaming/src/main/scala/spark/streaming/DStream.scala
2012-12-05 10:16:56 -08:00
Tathagata Das 21a0852976 Refactored RDD checkpointing to minimize extra fields in RDD class. 2012-12-04 22:10:25 -08:00
Tathagata Das 609e00d599 Minor mods 2012-12-02 02:39:08 +00:00
Tathagata Das b4dba55f78 Made RDD checkpoint not create a new thread. Fixed bug in detecting when spark.cleaner.delay is insufficient. 2012-12-02 02:03:05 +00:00
Tathagata Das 477de94894 Minor modifications. 2012-12-01 13:15:06 -08:00
Tathagata Das 62965c5d8e Added ssc.union 2012-12-01 08:26:10 -08:00
Tathagata Das d5e7aad039 Bug fixes 2012-11-28 08:36:55 +00:00
Tathagata Das b18d70870a Modified bunch HashMaps in Spark to use TimeStampedHashMap and made various modules use CleanupTask to periodically clean up metadata. 2012-11-27 15:08:49 -08:00
Tathagata Das 0fe2fc4d5e Merged branch mesos/master to branch dev. 2012-11-26 13:16:59 -08:00
Tathagata Das fd11d23bb3 Modified StreamingContext API to make constructor accept the batch size (since it is always needed, Patrick's suggestion). Added description to DStream and StreamingContext. 2012-11-19 19:04:39 -08:00
Tathagata Das c97ebf6437 Fixed bug in the number of splits in RDD after checkpointing. Modified reduceByKeyAndWindow (naive) computation from window+reduceByKey to reduceByKey+window+reduceByKey. 2012-11-19 23:22:07 +00:00
Denny 5e2b0a3bf6 Added Kafka Wordcount producer 2012-11-19 10:17:58 -08:00
Denny 6757ed6a40 Comment out code for fault-tolerance. 2012-11-19 09:42:35 -08:00
Denny f56befa914 Merge branch 'dev' into kafka 2012-11-19 09:29:54 -08:00
Tathagata Das 3fd7b8319b Merge branch 'dev' of github.com:radlab/spark into dev 2012-11-17 17:27:07 -08:00
Tathagata Das 10c1abcb6a Fixed checkpointing bug in CoGroupedRDD. CoGroupSplits kept around the RDD splits of its parent RDDs, thus checkpointing its parents did not release the references to the parent splits. 2012-11-17 17:27:00 -08:00
Patrick Wendell efa93fd0e6 Merge pull request #4 from radlab/streaming-example
A "streaming page view" example.
2012-11-16 20:40:27 -08:00
Patrick Wendell 720cb0f467 A "streaming page view" example. 2012-11-16 12:11:22 -08:00
Denny 2aceae25be Merge branch 'dev' into kafka
Conflicts:
	streaming/src/main/scala/spark/streaming/DStream.scala
2012-11-13 13:16:18 -08:00
Denny b6f7ba813e change import for example function 2012-11-13 13:15:32 -08:00
Tathagata Das 26fec8f0b8 Fixed bug in MappedValuesRDD, and set default graph checkpoint interval to be batch duration. 2012-11-13 11:05:57 -08:00
Tathagata Das c3ccd14cf8 Replaced StateRDD in StateDStream with MapPartitionsRDD. 2012-11-13 02:43:03 -08:00
Tathagata Das 8a25d530ed Optimized checkpoint writing by reusing FileSystem object. Fixed bug in updating of checkpoint data in DStream where the checkpointed RDDs, upon recovery, were not recognized as checkpointed RDDs and therefore deleted from HDFS. Made InputStreamsSuite more robust to timing delays. 2012-11-13 02:16:28 -08:00
Denny 255b3e44c1 Merge branch 'dev' into kafka 2012-11-12 19:39:29 -08:00
Tathagata Das 564dd8c3f4 Speeded up CheckpointSuite 2012-11-12 14:22:05 -08:00
Tathagata Das b9bfd1456f Changed default level on calling DStream.persist() to be MEMORY_ONLY_SER. Also changed the persist level of StateDStream to be MEMORY_ONLY_SER. 2012-11-12 21:51:42 +00:00
Tathagata Das ae61ebaee6 Fixed bugs in RawNetworkInputDStream and in its examples. Made the ReducedWindowedDStream persist RDDs to MEMOERY_SER_ONLY by default. Removed unncessary examples. Added streaming-env.sh.template to add recommended setting for streaming. 2012-11-12 21:45:16 +00:00
tdas 052d0b800f Merge branch 'dev' of github.com:radlab/spark into dev 2012-11-11 22:56:14 +00:00
Tathagata Das 46222dc56d Fixed bug in FileInputDStream that allowed it to miss new files. Added tests in the InputStreamsSuite to test checkpointing of file and network streams. 2012-11-11 13:20:09 -08:00
Denny 0fd4c93f1c Updated comment. 2012-11-11 11:15:31 -08:00
Denny deb2c4df72 Add comment. 2012-11-11 11:11:49 -08:00
Denny d006109e95 Kafka Stream comments. 2012-11-11 11:06:49 -08:00
Denny 2e8f2ee4ad Merge branch 'dev' of github.com:radlab/spark into kafka
Conflicts:
	streaming/src/main/scala/spark/streaming/DStream.scala
2012-11-09 12:26:17 -08:00
Denny e5a0936787 Kafka Stream. 2012-11-09 12:23:46 -08:00
tdas 52d21cb682 Removed unnecessary files. 2012-11-08 11:35:40 +00:00
tdas cc2a65f547 Fixed bug in InputStreamsSuite 2012-11-08 11:17:57 +00:00
Tathagata Das fc3d0b602a Added FailureTestsuite for testing multiple, repeated master failures. 2012-11-06 17:23:31 -08:00
Denny 485803d740 Merge branch 'dev' of github.com:radlab/spark into kafka 2012-11-06 09:41:45 -08:00