Commit graph

154 commits

Author SHA1 Message Date
Tathagata Das 41285eaae3 Fixed differences in APIs of StreamingContext and JavaStreamingContext. Change rawNetworkStream to rawSocketStream, and added twitter, actor, zeroMQ streams to JavaStreamingContext. Also added them to JavaAPISuite. 2013-02-23 16:25:07 -08:00
Tathagata Das 972fe7714f Merge branch 'mesos-streaming' into streaming
Conflicts:
	streaming/src/test/java/spark/streaming/JavaAPISuite.java
2013-02-20 11:06:01 -08:00
Tathagata Das fb9956256d Merge branch 'mesos-master' into streaming
Conflicts:
	core/src/main/scala/spark/rdd/CheckpointRDD.scala
	streaming/src/main/scala/spark/streaming/dstream/ReducedWindowedDStream.scala
2013-02-20 09:01:29 -08:00
Patrick Wendell 041c19e5f0 Small changes that were missing in merge 2013-02-19 08:44:20 -08:00
Patrick Wendell 35880de42e Use RDD type for transform operator in Java.
This is an improved implementation of the `transform` operator in Java.
The main difference is that this allows all four possible types of
transform functions

1. JavaRDD -> JavaRDD
2. JavaRDD -> JavaPairRDD
3. JavaPairRDD -> JavaPairRDD
4. JavaPairRDD -> JavaRDD

whereas previously only (1) and (3) were possible.

Conflicts:

	streaming/src/test/java/spark/streaming/JavaAPISuite.java
2013-02-19 08:31:58 -08:00
Patrick Wendell 9d49a6b03f Use RDD type for foreach operator in Java. 2013-02-19 08:30:32 -08:00
Tathagata Das 9e82be1503 Merge branch 'streaming' into ScrapCodes-streaming-actor
Conflicts:
	docs/plugin-custom-receiver.md
	streaming/src/main/scala/spark/streaming/StreamingContext.scala
	streaming/src/main/scala/spark/streaming/dstream/KafkaInputDStream.scala
	streaming/src/main/scala/spark/streaming/dstream/PluggableInputDStream.scala
	streaming/src/main/scala/spark/streaming/receivers/ActorReceiver.scala
	streaming/src/test/scala/spark/streaming/InputStreamsSuite.scala
2013-02-19 02:48:50 -08:00
Tathagata Das 12ea14c211 Changed networkStream to socketStream and pluggableNetworkStream to become networkStream as a way to create streams from arbitrary network receiver. 2013-02-18 15:18:34 -08:00
Tathagata Das 8ad561dc7d Added checkpointing and fault-tolerance semantics to the programming guide. Fixed default checkpoint interval to being a multiple of slide duration. Fixed visibility of some classes and objects to clean up docs. 2013-02-18 02:12:41 -08:00
Tathagata Das f98c7da23e Many changes to ensure better 2nd recovery if 2nd failure happens while
recovering from 1st failure
- Made the scheduler to checkpoint after clearing old metadata which
  ensures that a new checkpoint is written as soon as at least one batch
  gets computed  while recovering from a failure. This ensures that if
  there is a 2nd failure while recovering from 1st failure, the system
  start 2nd recovery from a newer checkpoint.
- Modified Checkpoint writer to write checkpoint in a different thread.
- Added a check to make sure that compute for InputDStreams gets called
  only for strictly increasing times.
- Changed implementation of slice to call getOrCompute on parent DStream
  in time-increasing order.
- Added testcase to test slice.
- Fixed testGroupByKeyAndWindow testcase in JavaAPISuite to verify
  results with expected output in an order-independent manner.
2013-02-17 15:06:41 -08:00
Tathagata Das 2eacf22401 Removed countByKeyAndWindow on paired DStreams, and added countByValueAndWindow for all DStreams. Updated both scala and java API and testsuites. 2013-02-14 12:21:47 -08:00
Tathagata Das 12b020b668 Added filter functionality to reduceByKeyAndWindow with inverse. Consolidated reduceByKeyAndWindow's many functions into smaller number of functions with optional parameters. 2013-02-13 20:53:50 -08:00
Tathagata Das 39addd3803 Changed scheduler and file input stream to fix bugs in the driver fault tolerance. Added MasterFailureTest to rigorously test master fault tolerance with file input stream. 2013-02-13 12:17:45 -08:00
Patrick Wendell 3f3e77f28b STREAMING-50: Support transform workaround in JavaPairDStream
This ports a useful workaround (the `transform` function) to
JavaPairDStream. It is necessary to do things like sorting which
are not supported yet in the core streaming API.
2013-02-12 14:02:32 -08:00
Patrick Wendell d09c36065c Using tuple swap() 2013-02-11 10:45:45 -08:00
Patrick Wendell 04786d0739 small fix 2013-02-11 10:05:49 -08:00
Patrick Wendell c65988bdc1 Fix for MapPartitions 2013-02-11 10:03:37 -08:00
Patrick Wendell 20cf770545 Fix for flatmap 2013-02-11 10:03:37 -08:00
Patrick Wendell 314d87a038 Indentation fix 2013-02-11 10:03:37 -08:00
Patrick Wendell f0b68c623c Initial cut at replacing K, V in Java files 2013-02-11 10:03:37 -08:00
Tathagata Das fd90daf850 Fixed bugs in FileInputDStream and Scheduler that occasionally failed to reprocess old files after recovering from master failure. Completely modified spark.streaming.FailureTest to test multiple master failures using file input stream. 2013-02-10 19:48:42 -08:00
Tathagata Das 99a5fc498a Added an initial spark job to ensure worker nodes are initialized. 2013-02-09 15:18:05 -08:00
Tathagata Das 4cc223b478 Merge branch 'mesos-master' into streaming 2013-02-07 13:59:31 -08:00
Tathagata Das c6b2f765d3 Merge branch 'mesos-streaming' into streaming 2013-02-07 13:13:53 -08:00
Tathagata Das 915d9931fe Merge pull request #373 from Reinvigorate/sm-updateStateByKey
StateDStream changes to give updateStateByKey consistent behavior
2013-02-07 11:59:19 -08:00
Stephen Haberman 7dfb82a992 Replace old 'master' term with 'driver'. 2013-01-25 11:03:00 -06:00
Tathagata Das 9c8ff1e55f Fixed checkpoint testcases 2013-01-23 07:31:49 -08:00
Tathagata Das 666ce431aa Added support for rescheduling unprocessed batches on master failure. 2013-01-23 03:15:36 -08:00
Tathagata Das fad2b82fc8 Added support for saving input files of FileInputDStream to graph checkpoints. Modified 'file input stream with checkpoint' testcase to test recovery of pre-master-failure input files. 2013-01-22 18:10:00 -08:00
Tathagata Das 364cdb679c Refactored DStreamCheckpointData. 2013-01-22 00:43:31 -08:00
Prashant Sharma d17065c4b5 actor as receiver 2013-01-22 13:28:29 +05:30
Stephen Haberman e5ca241335 Move JavaAPISuite into spark.streaming. 2013-01-21 16:06:58 -06:00
seanm ea739251eb adding updateStateByKey object lifecycle test 2013-01-20 11:29:21 -07:00
Tathagata Das 33bad85bb9 Fixed streaming testsuite bugs 2013-01-20 03:51:11 -08:00
Patrick Wendell c46dd2de78 Moving tests to appropriate directory 2013-01-17 21:43:17 -08:00
Patrick Wendell e0165bf714 Adding queueStream and some slight refactoring 2013-01-17 21:25:49 -08:00
Patrick Wendell ee0314c3b3 Merge branch 'streaming' into streaming-java-api 2013-01-17 18:43:00 -08:00
Patrick Wendell 70ba994d6d Import fixup 2013-01-17 18:41:59 -08:00
Patrick Wendell 82b8707c6b Checkpointing in Streaming java API 2013-01-17 18:41:58 -08:00
Patrick Wendell 61b877c688 Adding flatMap 2013-01-17 18:41:58 -08:00
Patrick Wendell 2a872335c5 Bug fix and test cleanup 2013-01-17 18:41:58 -08:00
Tathagata Das 1638fcb0dc Fixed updateStateByKey to work with primitive types. 2013-01-14 17:18:39 -08:00
Patrick Wendell a0013beb03 Stash 2013-01-14 15:15:02 -08:00
Patrick Wendell 5bcb048167 More work on InputStreams 2013-01-14 09:42:36 -08:00
Patrick Wendell 280b6d0186 Porting to new Duration class 2013-01-14 09:42:36 -08:00
Patrick Wendell c2537057f9 Fixing issue with <Long> types 2013-01-14 09:42:36 -08:00
Patrick Wendell 5004eec37c Import Cleanup 2013-01-14 09:42:36 -08:00
Patrick Wendell 2fe39a4468 Some docs for the JavaTestUtils 2013-01-14 09:42:36 -08:00
Patrick Wendell 560c312c60 Docs, some tests, and work on
StreamingContext
2013-01-14 09:42:36 -08:00
Patrick Wendell 6e514a8d35 PairDStream and DStreamLike 2013-01-14 09:42:36 -08:00
Patrick Wendell f144e0413a Adding transform and union 2013-01-14 09:42:36 -08:00
Patrick Wendell 0d0bab25bd Reduce tests 2013-01-14 09:42:36 -08:00
Patrick Wendell 22a8c7be9a Adding more tests 2013-01-14 09:42:36 -08:00
Patrick Wendell 91b3d41448 Better equality test (thanks Josh) 2013-01-14 09:42:36 -08:00
Patrick Wendell b0974e6c1d Remving main method from tests 2013-01-14 09:42:36 -08:00
Patrick Wendell 867a7455e2 Adding some initial tests to streaming API. 2013-01-14 09:42:36 -08:00
Tathagata Das 6cc8592f26 Fixed bug 2013-01-13 21:20:49 -08:00
Tathagata Das 365506fb03 Changed variable name form ***Time to ***Duration to keep things consistent. 2013-01-09 14:29:25 -08:00
Tathagata Das 4719e6d8fe Changed locations for unit test logs. 2013-01-07 16:06:07 -08:00
Tathagata Das e60514d79e Fixed bug 2013-01-07 15:16:16 -08:00
Tathagata Das 237bac36e9 Renamed examples and added documentation. 2013-01-07 14:37:21 -08:00
Tathagata Das 7e0271b438 Refactored a whole lot to push all DStreams into the spark.streaming.dstream package. 2012-12-30 15:19:55 -08:00
Tathagata Das 9e644402c1 Improved jekyll and scala docs. Made many classes and method private to remove them from scala docs. 2012-12-29 18:31:51 -08:00
Tathagata Das 0bc0a60d30 Modifications to make sure LocalScheduler terminate cleanly without errors when SparkContext is shutdown, to minimize spurious exception during master failure tests. 2012-12-27 15:37:33 -08:00
Patrick Wendell 3ff9710265 Adding Flume InputDStream 2012-12-07 16:42:39 -08:00
Denny 15df4b0e52 Merge branch 'dev' into kafka
Conflicts:
	streaming/src/main/scala/spark/streaming/DStream.scala
2012-12-05 10:16:56 -08:00
Tathagata Das 0fe2fc4d5e Merged branch mesos/master to branch dev. 2012-11-26 13:16:59 -08:00
Tathagata Das fd11d23bb3 Modified StreamingContext API to make constructor accept the batch size (since it is always needed, Patrick's suggestion). Added description to DStream and StreamingContext. 2012-11-19 19:04:39 -08:00
Denny 2aceae25be Merge branch 'dev' into kafka
Conflicts:
	streaming/src/main/scala/spark/streaming/DStream.scala
2012-11-13 13:16:18 -08:00
Tathagata Das 8a25d530ed Optimized checkpoint writing by reusing FileSystem object. Fixed bug in updating of checkpoint data in DStream where the checkpointed RDDs, upon recovery, were not recognized as checkpointed RDDs and therefore deleted from HDFS. Made InputStreamsSuite more robust to timing delays. 2012-11-13 02:16:28 -08:00
Denny 255b3e44c1 Merge branch 'dev' into kafka 2012-11-12 19:39:29 -08:00
Tathagata Das 564dd8c3f4 Speeded up CheckpointSuite 2012-11-12 14:22:05 -08:00
Tathagata Das 46222dc56d Fixed bug in FileInputDStream that allowed it to miss new files. Added tests in the InputStreamsSuite to test checkpointing of file and network streams. 2012-11-11 13:20:09 -08:00
Denny d006109e95 Kafka Stream comments. 2012-11-11 11:06:49 -08:00
tdas cc2a65f547 Fixed bug in InputStreamsSuite 2012-11-08 11:17:57 +00:00
Tathagata Das fc3d0b602a Added FailureTestsuite for testing multiple, repeated master failures. 2012-11-06 17:23:31 -08:00
Tathagata Das 395167f2b2 Made more bug fixes for checkpointing. 2012-11-05 16:11:50 -08:00
Tathagata Das 72b2303f99 Fixed major bugs in checkpointing. 2012-11-05 11:41:36 -08:00
Tathagata Das d154238789 Made checkpointing of dstream graph to work with checkpointing of RDDs. For streams requiring checkpointing of its RDD, the default checkpoint interval is set to 10 seconds. 2012-11-04 12:12:06 -08:00
Tathagata Das 3fb5c9ee24 Fixed serialization bug in countByWindow, added countByKey and countByKeyAndWindow, and added testcases for them. 2012-11-02 12:12:25 -07:00
Tathagata Das 650d717544 Merge branch 'dev' of github.com:radlab/spark into dev 2012-10-25 13:03:18 -07:00
Matei Zaharia 863a55ae42 Merge remote-tracking branch 'public/master' into dev
Conflicts:
	core/src/main/scala/spark/BlockStoreShuffleFetcher.scala
	core/src/main/scala/spark/KryoSerializer.scala
	core/src/main/scala/spark/MapOutputTracker.scala
	core/src/main/scala/spark/RDD.scala
	core/src/main/scala/spark/SparkContext.scala
	core/src/main/scala/spark/executor/Executor.scala
	core/src/main/scala/spark/network/Connection.scala
	core/src/main/scala/spark/network/ConnectionManagerTest.scala
	core/src/main/scala/spark/rdd/BlockRDD.scala
	core/src/main/scala/spark/rdd/NewHadoopRDD.scala
	core/src/main/scala/spark/scheduler/ShuffleMapTask.scala
	core/src/main/scala/spark/scheduler/cluster/StandaloneSchedulerBackend.scala
	core/src/main/scala/spark/storage/BlockManager.scala
	core/src/main/scala/spark/storage/BlockMessage.scala
	core/src/main/scala/spark/storage/BlockStore.scala
	core/src/main/scala/spark/storage/StorageLevel.scala
	core/src/main/scala/spark/util/AkkaUtils.scala
	project/SparkBuild.scala
	run
2012-10-24 23:21:00 -07:00
Tathagata Das 926e05b030 Added tests for the file input stream. 2012-10-24 23:14:37 -07:00
Tathagata Das ed71df46cd Minor fixes. 2012-10-24 16:49:40 -07:00
Tathagata Das 1ef6ea2513 Added tests for testing network input stream. 2012-10-24 14:44:20 -07:00
Tathagata Das 020d643484 Renamed the streaming testsuites. 2012-10-23 16:24:05 -07:00
Tathagata Das c2731dd3ef Updated StateDStream api to use Options instead of nulls. 2012-10-23 15:10:27 -07:00
Tathagata Das d85c66636b Added MapValueDStream, FlatMappedValuesDStream and CoGroupedDStream, and therefore DStream operations mapValue, flatMapValues, cogroup, and join. Also, added tests for DStream operations filter, glom, mapPartitions, groupByKey, mapValues, flatMapValues, cogroup, and join. 2012-10-21 17:40:08 -07:00
Tathagata Das c4a2b6f636 Fixed some bugs in tests for forgetting RDDs, and made sure that use of manual clock leads to a zeroTime of 0 in the DStreams (more intuitive). 2012-10-21 10:41:25 -07:00
Tathagata Das 6d5eb4b40c Added functionality to forget RDDs from DStreams. 2012-10-19 12:11:44 -07:00
Tathagata Das b760d6426a Minor modifications. 2012-10-15 12:26:44 -07:00
Tathagata Das 3f1aae5c71 Refactored DStreamSuiteBase to create CheckpointSuite- testsuite for testing checkpointing under different operations. 2012-10-14 21:39:30 -07:00
Tathagata Das b08708e6fc Fixed bugs in the streaming testsuites. 2012-10-13 21:02:24 -07:00
Tathagata Das 4a7bde6865 Fixed bugs and added testcases for naive reduceByKeyAndWindow. 2012-09-06 19:06:59 -07:00
Tathagata Das babb7e3ce2 Re-implemented ReducedWindowedDSteam to simplify and fix bugs. Added slice operator to DStream. Also, refactored DStream testsuites and added tests for reduceByKeyAndWindow. 2012-09-06 05:28:29 -07:00
Tathagata Das 7c09ad0e04 Changed DStream member access permissions from private to protected. Updated StateDStream to checkpoint RDDs and forget lineage. 2012-09-04 19:11:49 -07:00
Tathagata Das 7419d2c7ea Added transformRDD DStream operation and TransformedDStream. Added sbt assembly option for streaming project. 2012-09-02 02:35:17 -07:00
Tathagata Das 2d01d38a41 Added StateDStream, corresponding stateful stream operations, and testcases. Also refactored few PairDStreamFunctions methods. 2012-08-31 03:47:34 -07:00
Tathagata Das 43e66146f7 Merge branch 'dev' of github.com/radlab/spark into dev 2012-08-28 13:51:05 -07:00
Tathagata Das b5b93a621c Added capabllity to take streaming input from network. Renamed SparkStreamContext to StreamingContext. 2012-08-28 12:35:19 -07:00