Commit graph

364 commits

Author SHA1 Message Date
Patrick Wendell 7e1049d8f1 Squashing a few TODOs 2013-01-14 09:42:36 -08:00
Patrick Wendell 74182010a4 Style cleanup and moving functions 2013-01-14 09:42:36 -08:00
Patrick Wendell 056f5efc55 More pair functions 2013-01-14 09:42:36 -08:00
Patrick Wendell 6e514a8d35 PairDStream and DStreamLike 2013-01-14 09:42:36 -08:00
Patrick Wendell f144e0413a Adding transform and union 2013-01-14 09:42:36 -08:00
Patrick Wendell 0d0bab25bd Reduce tests 2013-01-14 09:42:36 -08:00
Patrick Wendell 22a8c7be9a Adding more tests 2013-01-14 09:42:36 -08:00
Patrick Wendell 91b3d41448 Better equality test (thanks Josh) 2013-01-14 09:42:36 -08:00
Patrick Wendell b0974e6c1d Remving main method from tests 2013-01-14 09:42:36 -08:00
Patrick Wendell 867a7455e2 Adding some initial tests to streaming API. 2013-01-14 09:42:36 -08:00
Patrick Wendell b607c9e916 A very rough, early cut at some Java functionality for Streaming. 2013-01-14 09:42:36 -08:00
Tathagata Das f90f794cde Minor name fix 2013-01-13 21:25:57 -08:00
Tathagata Das 6cc8592f26 Fixed bug 2013-01-13 21:20:49 -08:00
Tathagata Das 0dbd411a56 Added documentation for PairDStreamFunctions. 2013-01-13 21:08:35 -08:00
Tathagata Das 0a2e333341 Removed stream id from the constructor of NetworkReceiver to make it easier for PluggableNetworkInputDStream. 2013-01-13 16:18:39 -08:00
Tathagata Das 365506fb03 Changed variable name form ***Time to ***Duration to keep things consistent. 2013-01-09 14:29:25 -08:00
Tathagata Das 156e8b47ef Split Time to Time (absolute instant of time) and Duration (duration of time). 2013-01-09 12:42:10 -08:00
Tathagata Das 8c1b872512 Moved Twitter example to the where the other examples are. 2013-01-07 17:48:10 -08:00
Tathagata Das 64dceec293 Merge branch 'streaming-merge' into dev-merge 2013-01-07 16:54:35 -08:00
Tathagata Das d808e1026a Merge branch 'dev' into dev-merge 2013-01-07 16:41:11 -08:00
Tathagata Das 4719e6d8fe Changed locations for unit test logs. 2013-01-07 16:06:07 -08:00
Tathagata Das e60514d79e Fixed bug 2013-01-07 15:16:16 -08:00
Tathagata Das 237bac36e9 Renamed examples and added documentation. 2013-01-07 14:37:21 -08:00
Tathagata Das af8738dfb5 Moved Spark Streaming examples to examples sub-project. 2013-01-06 19:31:54 -08:00
Patrick Wendell 2ef993d159 BufferingBlockCreator -> NetworkReceiver.BlockGenerator 2013-01-02 14:19:51 -08:00
Patrick Wendell 96a6ff0b09 Merge branch 'dev-merge' into datahandler-fix
Conflicts:
	streaming/src/main/scala/spark/streaming/dstream/DataHandler.scala
2013-01-02 14:08:15 -08:00
Patrick Wendell 493d65ce65 Several code-quality improvements to DataHandler.
- Changed to more accurate name: BufferingBlockCreator
- Docstring now correctly reflects the abstraction
  offered by the class
- Made internal methods private
- Fixed indentation problems
2013-01-02 13:39:18 -08:00
Tathagata Das 02497f0cd4 Updated Streaming Programming Guide. 2013-01-01 12:21:32 -08:00
Tathagata Das 18b9b3b99f More classes made private[streaming] to hide from scala docs. 2012-12-30 20:00:42 -08:00
Tathagata Das 7e0271b438 Refactored a whole lot to push all DStreams into the spark.streaming.dstream package. 2012-12-30 15:19:55 -08:00
Tathagata Das 9e644402c1 Improved jekyll and scala docs. Made many classes and method private to remove them from scala docs. 2012-12-29 18:31:51 -08:00
Patrick Wendell 518111573f Merge pull request #8 from radlab/twitter-example
Adding a Twitter InputDStream with an example
2012-12-29 14:23:01 -08:00
Tathagata Das 0bc0a60d30 Modifications to make sure LocalScheduler terminate cleanly without errors when SparkContext is shutdown, to minimize spurious exception during master failure tests. 2012-12-27 15:37:33 -08:00
Patrick Wendell bce84ceabb Minor changes after review and general cleanup.
- Added filters to Twitter example
- Removed un-used import
- Some code clean-up
2012-12-21 20:57:46 -08:00
Patrick Wendell 9ac4cb1c5f Adding a Twitter InputDStream with an example 2012-12-21 17:18:19 -08:00
Tathagata Das 8512dd3225 Merge branch 'dev' of github.com:radlab/spark into dev-checkpoint
Conflicts:
	core/src/main/scala/spark/ParallelCollection.scala
	core/src/test/scala/spark/CheckpointSuite.scala
	streaming/src/main/scala/spark/streaming/DStream.scala
2012-12-20 14:24:19 -08:00
Tathagata Das 8e74fac215 Made checkpoint data in RDDs optional to further reduce serialized size. 2012-12-11 15:36:12 -08:00
Patrick Wendell 3e796bdd57 Changes in response to TD's review. 2012-12-07 19:34:05 -08:00
Patrick Wendell 3ff9710265 Adding Flume InputDStream 2012-12-07 16:42:39 -08:00
Denny 556c38ed91 Added kafka JAR 2012-12-05 11:54:42 -08:00
Denny a23462191f Adjust Kafka code to work with new streaming changes. 2012-12-05 10:30:40 -08:00
Denny 15df4b0e52 Merge branch 'dev' into kafka
Conflicts:
	streaming/src/main/scala/spark/streaming/DStream.scala
2012-12-05 10:16:56 -08:00
Tathagata Das 21a0852976 Refactored RDD checkpointing to minimize extra fields in RDD class. 2012-12-04 22:10:25 -08:00
Tathagata Das 609e00d599 Minor mods 2012-12-02 02:39:08 +00:00
Tathagata Das b4dba55f78 Made RDD checkpoint not create a new thread. Fixed bug in detecting when spark.cleaner.delay is insufficient. 2012-12-02 02:03:05 +00:00
Tathagata Das 477de94894 Minor modifications. 2012-12-01 13:15:06 -08:00
Tathagata Das 62965c5d8e Added ssc.union 2012-12-01 08:26:10 -08:00
Tathagata Das d5e7aad039 Bug fixes 2012-11-28 08:36:55 +00:00
Tathagata Das b18d70870a Modified bunch HashMaps in Spark to use TimeStampedHashMap and made various modules use CleanupTask to periodically clean up metadata. 2012-11-27 15:08:49 -08:00
Tathagata Das 0fe2fc4d5e Merged branch mesos/master to branch dev. 2012-11-26 13:16:59 -08:00
Tathagata Das fd11d23bb3 Modified StreamingContext API to make constructor accept the batch size (since it is always needed, Patrick's suggestion). Added description to DStream and StreamingContext. 2012-11-19 19:04:39 -08:00
Tathagata Das c97ebf6437 Fixed bug in the number of splits in RDD after checkpointing. Modified reduceByKeyAndWindow (naive) computation from window+reduceByKey to reduceByKey+window+reduceByKey. 2012-11-19 23:22:07 +00:00
Denny 5e2b0a3bf6 Added Kafka Wordcount producer 2012-11-19 10:17:58 -08:00
Denny 6757ed6a40 Comment out code for fault-tolerance. 2012-11-19 09:42:35 -08:00
Denny f56befa914 Merge branch 'dev' into kafka 2012-11-19 09:29:54 -08:00
Tathagata Das 3fd7b8319b Merge branch 'dev' of github.com:radlab/spark into dev 2012-11-17 17:27:07 -08:00
Tathagata Das 10c1abcb6a Fixed checkpointing bug in CoGroupedRDD. CoGroupSplits kept around the RDD splits of its parent RDDs, thus checkpointing its parents did not release the references to the parent splits. 2012-11-17 17:27:00 -08:00
Patrick Wendell efa93fd0e6 Merge pull request #4 from radlab/streaming-example
A "streaming page view" example.
2012-11-16 20:40:27 -08:00
Patrick Wendell 720cb0f467 A "streaming page view" example. 2012-11-16 12:11:22 -08:00
Denny 2aceae25be Merge branch 'dev' into kafka
Conflicts:
	streaming/src/main/scala/spark/streaming/DStream.scala
2012-11-13 13:16:18 -08:00
Denny b6f7ba813e change import for example function 2012-11-13 13:15:32 -08:00
Tathagata Das 26fec8f0b8 Fixed bug in MappedValuesRDD, and set default graph checkpoint interval to be batch duration. 2012-11-13 11:05:57 -08:00
Tathagata Das c3ccd14cf8 Replaced StateRDD in StateDStream with MapPartitionsRDD. 2012-11-13 02:43:03 -08:00
Tathagata Das 8a25d530ed Optimized checkpoint writing by reusing FileSystem object. Fixed bug in updating of checkpoint data in DStream where the checkpointed RDDs, upon recovery, were not recognized as checkpointed RDDs and therefore deleted from HDFS. Made InputStreamsSuite more robust to timing delays. 2012-11-13 02:16:28 -08:00
Denny 255b3e44c1 Merge branch 'dev' into kafka 2012-11-12 19:39:29 -08:00
Tathagata Das 564dd8c3f4 Speeded up CheckpointSuite 2012-11-12 14:22:05 -08:00
Tathagata Das b9bfd1456f Changed default level on calling DStream.persist() to be MEMORY_ONLY_SER. Also changed the persist level of StateDStream to be MEMORY_ONLY_SER. 2012-11-12 21:51:42 +00:00
Tathagata Das ae61ebaee6 Fixed bugs in RawNetworkInputDStream and in its examples. Made the ReducedWindowedDStream persist RDDs to MEMOERY_SER_ONLY by default. Removed unncessary examples. Added streaming-env.sh.template to add recommended setting for streaming. 2012-11-12 21:45:16 +00:00
tdas 052d0b800f Merge branch 'dev' of github.com:radlab/spark into dev 2012-11-11 22:56:14 +00:00
Tathagata Das 46222dc56d Fixed bug in FileInputDStream that allowed it to miss new files. Added tests in the InputStreamsSuite to test checkpointing of file and network streams. 2012-11-11 13:20:09 -08:00
Denny 0fd4c93f1c Updated comment. 2012-11-11 11:15:31 -08:00
Denny deb2c4df72 Add comment. 2012-11-11 11:11:49 -08:00
Denny d006109e95 Kafka Stream comments. 2012-11-11 11:06:49 -08:00
Denny 2e8f2ee4ad Merge branch 'dev' of github.com:radlab/spark into kafka
Conflicts:
	streaming/src/main/scala/spark/streaming/DStream.scala
2012-11-09 12:26:17 -08:00
Denny e5a0936787 Kafka Stream. 2012-11-09 12:23:46 -08:00
tdas 52d21cb682 Removed unnecessary files. 2012-11-08 11:35:40 +00:00
tdas cc2a65f547 Fixed bug in InputStreamsSuite 2012-11-08 11:17:57 +00:00
Tathagata Das fc3d0b602a Added FailureTestsuite for testing multiple, repeated master failures. 2012-11-06 17:23:31 -08:00
Denny 485803d740 Merge branch 'dev' of github.com:radlab/spark into kafka 2012-11-06 09:41:45 -08:00
Denny 0c1de43fc7 Working on kafka. 2012-11-06 09:41:42 -08:00
Tathagata Das f8bb719cd2 Added a few more comments to the checkpoint-related functions. 2012-11-05 17:53:56 -08:00
Tathagata Das 395167f2b2 Made more bug fixes for checkpointing. 2012-11-05 16:11:50 -08:00
Tathagata Das 72b2303f99 Fixed major bugs in checkpointing. 2012-11-05 11:41:36 -08:00
Tathagata Das d154238789 Made checkpointing of dstream graph to work with checkpointing of RDDs. For streams requiring checkpointing of its RDD, the default checkpoint interval is set to 10 seconds. 2012-11-04 12:12:06 -08:00
Tathagata Das 3fb5c9ee24 Fixed serialization bug in countByWindow, added countByKey and countByKeyAndWindow, and added testcases for them. 2012-11-02 12:12:25 -07:00
Tathagata Das 1b900183c8 Added save operations to DStreams. 2012-10-27 18:55:50 -07:00
Tathagata Das 650d717544 Merge branch 'dev' of github.com:radlab/spark into dev 2012-10-25 13:03:18 -07:00
Matei Zaharia 863a55ae42 Merge remote-tracking branch 'public/master' into dev
Conflicts:
	core/src/main/scala/spark/BlockStoreShuffleFetcher.scala
	core/src/main/scala/spark/KryoSerializer.scala
	core/src/main/scala/spark/MapOutputTracker.scala
	core/src/main/scala/spark/RDD.scala
	core/src/main/scala/spark/SparkContext.scala
	core/src/main/scala/spark/executor/Executor.scala
	core/src/main/scala/spark/network/Connection.scala
	core/src/main/scala/spark/network/ConnectionManagerTest.scala
	core/src/main/scala/spark/rdd/BlockRDD.scala
	core/src/main/scala/spark/rdd/NewHadoopRDD.scala
	core/src/main/scala/spark/scheduler/ShuffleMapTask.scala
	core/src/main/scala/spark/scheduler/cluster/StandaloneSchedulerBackend.scala
	core/src/main/scala/spark/storage/BlockManager.scala
	core/src/main/scala/spark/storage/BlockMessage.scala
	core/src/main/scala/spark/storage/BlockStore.scala
	core/src/main/scala/spark/storage/StorageLevel.scala
	core/src/main/scala/spark/util/AkkaUtils.scala
	project/SparkBuild.scala
	run
2012-10-24 23:21:00 -07:00
Tathagata Das 926e05b030 Added tests for the file input stream. 2012-10-24 23:14:37 -07:00
Tathagata Das ed71df46cd Minor fixes. 2012-10-24 16:49:40 -07:00
Tathagata Das 1ef6ea2513 Added tests for testing network input stream. 2012-10-24 14:44:20 -07:00
Tathagata Das 020d643484 Renamed the streaming testsuites. 2012-10-23 16:24:05 -07:00
Tathagata Das 0e5d9be4df Renamed APIs to create queueStream and fileStream. 2012-10-23 15:17:05 -07:00
Tathagata Das c2731dd3ef Updated StateDStream api to use Options instead of nulls. 2012-10-23 15:10:27 -07:00
Tathagata Das 19191d178d Renamed the network input streams. 2012-10-23 14:40:24 -07:00
Tathagata Das a6de5758f1 Modified API of NetworkInputDStreams and got ObjectInputDStream and RawInputDStream working. 2012-10-23 01:41:13 -07:00
Tathagata Das 2c87c853ba Renamed examples 2012-10-22 15:31:19 -07:00
Tathagata Das d85c66636b Added MapValueDStream, FlatMappedValuesDStream and CoGroupedDStream, and therefore DStream operations mapValue, flatMapValues, cogroup, and join. Also, added tests for DStream operations filter, glom, mapPartitions, groupByKey, mapValues, flatMapValues, cogroup, and join. 2012-10-21 17:40:08 -07:00
Tathagata Das c4a2b6f636 Fixed some bugs in tests for forgetting RDDs, and made sure that use of manual clock leads to a zeroTime of 0 in the DStreams (more intuitive). 2012-10-21 10:41:25 -07:00
Tathagata Das 6d5eb4b40c Added functionality to forget RDDs from DStreams. 2012-10-19 12:11:44 -07:00
Tathagata Das b760d6426a Minor modifications. 2012-10-15 12:26:44 -07:00
Tathagata Das 3f1aae5c71 Refactored DStreamSuiteBase to create CheckpointSuite- testsuite for testing checkpointing under different operations. 2012-10-14 21:39:30 -07:00
Tathagata Das b08708e6fc Fixed bugs in the streaming testsuites. 2012-10-13 21:02:24 -07:00
Tathagata Das e95ff45b53 Implemented checkpointing of StreamingContext and DStream graph. 2012-10-13 20:10:49 -07:00
Tathagata Das 6d1fe02685 Merge branch 'dev' of github.com:radlab/spark into dev 2012-09-17 14:26:06 -07:00
Tathagata Das 86d420478f Allowed StreamingContext to be created from existing SparkContext 2012-09-17 14:25:48 -07:00
Tathagata Das 3cbc72ff1d Minor tweaks 2012-09-14 07:00:30 +00:00
Tathagata Das 0269792c17 Merge branch 'dev' of github.com:radlab/spark into dev
Conflicts:
	streaming/src/main/scala/spark/streaming/Scheduler.scala
2012-09-07 20:18:30 +00:00
Tathagata Das b5750726ff Fixed bugs in streaming Scheduler and optimized QueueInputDStream. 2012-09-07 20:16:21 +00:00
haoyuan 381e2c7ac4 add warmup code for TopKWordCountRaw.scala 2012-09-06 20:54:52 -07:00
haoyuan 0681bbc5d9 Merge branch 'dev' of github.com:radlab/spark into dev 2012-09-07 02:18:33 +00:00
haoyuan db08a362aa commit opt for grep scalibility test. 2012-09-07 02:17:52 +00:00
Tathagata Das 4a7bde6865 Fixed bugs and added testcases for naive reduceByKeyAndWindow. 2012-09-06 19:06:59 -07:00
Tathagata Das 203ac8fa8b Merge branch 'dev' of github.com:radlab/spark into dev 2012-09-06 05:29:06 -07:00
Tathagata Das babb7e3ce2 Re-implemented ReducedWindowedDSteam to simplify and fix bugs. Added slice operator to DStream. Also, refactored DStream testsuites and added tests for reduceByKeyAndWindow. 2012-09-06 05:28:29 -07:00
root 019de4562c Less warmup in word count 2012-09-06 02:50:41 +00:00
root 4a5d0d249e Merge branch 'dev' of github.com:radlab/spark into dev 2012-09-05 08:23:09 +00:00
root b7ad291ac5 Tuning Akka for more connections 2012-09-05 07:08:07 +00:00
root fc186dc18a Merge branch 'dev' of github.com:radlab/spark into dev 2012-09-05 05:53:18 +00:00
root 4ea032a142 Some changes to make important log output visible even if we set the logging to WARNING 2012-09-05 05:53:07 +00:00
Tathagata Das 25fd684b89 Merge branch 'dev' of github.com:radlab/spark into dev 2012-09-04 20:44:14 -07:00
Tathagata Das 7c09ad0e04 Changed DStream member access permissions from private to protected. Updated StateDStream to checkpoint RDDs and forget lineage. 2012-09-04 19:11:49 -07:00
haoyuan 96a1f2277d fix the compile error in TopKWordCountRaw.scala 2012-09-04 18:03:34 -07:00
haoyuan 2ff72f60ac add TopKWordCountRaw.scala 2012-09-04 17:55:55 -07:00
Tathagata Das 389a78722c Updated the return types of PairDStreamFunctions to return DStreams instead of ShuffleDStreams for cleaner abstraction. 2012-09-04 15:37:46 -07:00
root 7b892ee66e Merge branch 'dev' of github.com:radlab/spark into dev 2012-09-04 04:27:10 +00:00
root 1878731671 Various test programs 2012-09-04 04:26:53 +00:00
Tathagata Das b8e9e8ea78 Merge branch 'dev' of github.com:radlab/spark into dev 2012-09-02 02:35:32 -07:00
Tathagata Das 7419d2c7ea Added transformRDD DStream operation and TransformedDStream. Added sbt assembly option for streaming project. 2012-09-02 02:35:17 -07:00
root ceabf71257 tweaks 2012-09-01 21:52:42 +00:00
root 6025889be0 More raw network receiver programs 2012-09-01 20:51:07 +00:00
root bf993cda63 Make batch size configurable in RawCount 2012-09-01 19:59:23 +00:00
root 83dad56334 Further fixes to raw text sender, plus an app that uses it 2012-09-01 19:45:25 +00:00
Matei Zaharia f84d2bbe55 Bug fixes to RateLimitedOutputStream 2012-09-01 00:31:15 -07:00
Matei Zaharia 44758aa8e2 First work towards a RawInputDStream and a sender program for it. 2012-09-01 00:17:59 -07:00
Matei Zaharia 51fb13dd16 Bug fix 2012-08-31 15:36:11 -07:00
Matei Zaharia ce42a46375 Bug fix 2012-08-31 15:35:35 -07:00
Matei Zaharia f92d4a6ac1 Better output messages for streaming job duration 2012-08-31 15:33:48 -07:00
Tathagata Das 2d01d38a41 Added StateDStream, corresponding stateful stream operations, and testcases. Also refactored few PairDStreamFunctions methods. 2012-08-31 03:47:34 -07:00
root e1da274a48 WordCount tweaks 2012-08-31 07:16:19 +00:00
root d4d2cb670f Make checkpoint interval configurable in WordCount2 2012-08-31 00:34:57 +00:00
root 1f8085b8d0 Compile fixes 2012-08-29 03:20:56 +00:00
Tathagata Das 43e66146f7 Merge branch 'dev' of github.com/radlab/spark into dev 2012-08-28 13:51:05 -07:00
Tathagata Das b5b93a621c Added capabllity to take streaming input from network. Renamed SparkStreamContext to StreamingContext. 2012-08-28 12:35:19 -07:00
root e2cf197a0a Made WordCount2 even more configurable 2012-08-27 03:34:15 +00:00
root b78c5ae803 Merge branch 'dev' of github.com:radlab/spark into dev 2012-08-27 01:16:39 +00:00
root 9de1c3abf9 Tweaks to WordCount2 2012-08-27 00:57:00 +00:00
Matei Zaharia 57796b183e Code style 2012-08-26 17:25:22 -07:00
Matei Zaharia 22b1a20e61 Made Time and Interval immutable 2012-08-26 17:04:34 -07:00
Matei Zaharia 23a29b6d19 Merge branch 'dev' of github.com:radlab/spark into dev 2012-08-26 16:45:37 -07:00
Matei Zaharia b120e24fe0 Add equals and hashCode to Time 2012-08-26 16:45:14 -07:00
root b08ff710af Added sliding word count, and some fixes to reduce window DStream 2012-08-26 23:40:50 +00:00
Matei Zaharia ad6537321e Make Time serializable 2012-08-26 16:27:23 -07:00
Matei Zaharia e7a5cbb543 Reduce log4j verbosity for streaming 2012-08-24 16:45:01 -07:00
Matei Zaharia 091b1438f5 Fix WordCount job name 2012-08-24 16:43:59 -07:00
Tathagata Das cae894ee7a Added new Clock interface that is used by RecurringTimer to scheduler events on system time or manually-configured time. 2012-08-06 14:52:46 -07:00
Matei Zaharia 43b81eb271 Renamed RDS to DStream, plus minor style fixes 2012-08-02 14:05:51 -04:00
Matei Zaharia 29bf44473c Added an RDS that repeatedly returns the same input 2012-08-02 11:43:04 -04:00
Matei Zaharia 650d11817e Added a WordCount for external data and fixed bugs in file streams 2012-08-02 11:09:43 -04:00
Tathagata Das ed897ac5e1 Moved streaming files not immediately necessary to spark.streaming.util. 2012-08-01 22:28:54 -07:00
Tathagata Das 3be54c2a8a 1. Refactored SparkStreamContext, Scheduler, InputRDS, FileInputRDS and a few other files.
2. Modified Time class to represent milliseconds (long) directly, instead of LongTime.
3. Added new files QueueInputRDS, RecurringTimer, etc.
4. Added RDDSuite as the skeleton for testcases.
5. Added two examples in spark.streaming.examples.
6. Removed all past examples and a few unnecessary files. Moved a number of files to spark.streaming.util.
2012-08-01 22:09:27 -07:00
Tathagata Das 5a26ca4a80 Restructured file locations to separate examples and other programs from core programs. 2012-07-30 13:29:13 -07:00
Matei Zaharia fcee4153b9 Renamed stream package to streaming 2012-07-29 13:35:22 -07:00
Matei Zaharia 47b7ebad12 Added the Spark Streaing code, ported to Akka 2 2012-07-28 20:03:26 -07:00