Commit graph

277 commits

Author SHA1 Message Date
Tathagata Das 1249e9153b Merge pull request #572 from Reinvigorate/sm-block-interval
Adding spark.streaming.blockInterval property
2013-06-24 21:46:33 -07:00
Tathagata Das cfcda95f86 Merge pull request #571 from Reinvigorate/sm-kafka-serializers
Surfacing decoders on KafkaInputDStream
2013-06-24 21:44:50 -07:00
seanm f25282def5 fixing kafkaStream Java API and adding test 2013-05-10 17:34:28 -06:00
seanm 3632980b1b fixing indentation 2013-05-10 15:54:26 -06:00
seanm b95c1bdbba count() now uses a transform instead of ConstantInputDStream 2013-05-10 12:47:24 -06:00
seanm d761e7359d adding kafkaStream API tests 2013-05-10 12:05:10 -06:00
seanm 7e56e99573 Surfacing decoders on KafkaInputDStream 2013-04-16 17:17:16 -06:00
seanm ab0f834dbb adding spark.streaming.blockInterval property 2013-04-16 11:57:05 -06:00
seanm b42d68c8ce fixing Spark Streaming count() so that 0 will be emitted when there is nothing to count 2013-04-15 12:54:55 -06:00
seanm 329ef34c2e fixing autooffset.reset behavior when set to 'largest' 2013-03-26 23:56:15 -06:00
seanm d61978d0ab keeping JavaStreamingContext in sync with StreamingContext + adding comments for better clarity 2013-03-15 23:36:52 -06:00
seanm 33fa1e7e4a removing dependency on ZookeeperConsumerConnector + purging last relic of kafka reliability that never solidified (ie- setOffsets) 2013-03-15 00:10:13 -06:00
seanm d069283211 fixing memory leak in kafka MessageHandler 2013-03-14 23:45:33 -06:00
seanm cfa8e769a8 KafkaInputDStream improvements. Allows more Kafka configurability 2013-03-14 23:45:19 -06:00
Tathagata Das 5ab37be983 Fixed class paths and dependencies based on Matei's comments. 2013-02-24 16:24:52 -08:00
Tathagata Das 28f8b721f6 Added back the initial spark job before starting streaming receivers 2013-02-24 13:01:54 -08:00
Tathagata Das 68c7934b1a Fixed missing dependencies in streaming/pom.xml 2013-02-24 11:51:45 -08:00
Tathagata Das b4eb24de96 Updated streaming programming guide with Java API info, and comments from Patrick. 2013-02-23 23:59:45 -08:00
Tathagata Das 41285eaae3 Fixed differences in APIs of StreamingContext and JavaStreamingContext. Change rawNetworkStream to rawSocketStream, and added twitter, actor, zeroMQ streams to JavaStreamingContext. Also added them to JavaAPISuite. 2013-02-23 16:25:07 -08:00
Tathagata Das d8cee52d52 Merge branch 'mesos-streaming' into streaming 2013-02-22 18:25:34 -08:00
Tathagata Das cfa65ebff1 Merge pull request #480 from MLnick/streaming-eg-algebird
[Streaming] Examples using Twitter's Algebird library
2013-02-22 12:29:04 -08:00
Tathagata Das 688e62718f Merge pull request #479 from ScrapCodes/zeromq-streaming
Zeromq streaming
2013-02-22 12:17:17 -08:00
Tathagata Das 208edaac1b Fixed condition in InputDStream isTimeValid. 2013-02-21 15:22:26 -08:00
Nick Pentreath 16d456742e Merge remote-tracking branch 'upstream/streaming' into streaming-eg-algebird 2013-02-21 09:33:08 +02:00
Tathagata Das 972fe7714f Merge branch 'mesos-streaming' into streaming
Conflicts:
	streaming/src/test/java/spark/streaming/JavaAPISuite.java
2013-02-20 11:06:01 -08:00
Tathagata Das fb9956256d Merge branch 'mesos-master' into streaming
Conflicts:
	core/src/main/scala/spark/rdd/CheckpointRDD.scala
	streaming/src/main/scala/spark/streaming/dstream/ReducedWindowedDStream.scala
2013-02-20 09:01:29 -08:00
Prashant Sharma 4e5b09664c fixes corresponding to review feedback at pull request #479 2013-02-20 19:14:52 +05:30
Patrick Wendell 041c19e5f0 Small changes that were missing in merge 2013-02-19 08:44:20 -08:00
Patrick Wendell fed1122d74 Use RDD type for slice operator in Java.
This commit uses the RDD type in `slice`, making it available to both normal
and pair RDD's in java. It also updates the signature for `slice` to match
changes in the Scala API.
2013-02-19 08:32:38 -08:00
Patrick Wendell 35880de42e Use RDD type for transform operator in Java.
This is an improved implementation of the `transform` operator in Java.
The main difference is that this allows all four possible types of
transform functions

1. JavaRDD -> JavaRDD
2. JavaRDD -> JavaPairRDD
3. JavaPairRDD -> JavaPairRDD
4. JavaPairRDD -> JavaRDD

whereas previously only (1) and (3) were possible.

Conflicts:

	streaming/src/test/java/spark/streaming/JavaAPISuite.java
2013-02-19 08:31:58 -08:00
Patrick Wendell 9d49a6b03f Use RDD type for foreach operator in Java. 2013-02-19 08:30:32 -08:00
Nick Pentreath d8ee184d95 Dependencies and refactoring for streaming HLL example, and using context.twitterStream method 2013-02-19 17:42:57 +02:00
Prashant Sharma 8d44480d84 example for demonstrating ZeroMQ stream 2013-02-19 19:42:14 +05:30
Prashant Sharma f7d3e309cb ZeroMQ stream as receiver 2013-02-19 19:32:52 +05:30
Tathagata Das 9e82be1503 Merge branch 'streaming' into ScrapCodes-streaming-actor
Conflicts:
	docs/plugin-custom-receiver.md
	streaming/src/main/scala/spark/streaming/StreamingContext.scala
	streaming/src/main/scala/spark/streaming/dstream/KafkaInputDStream.scala
	streaming/src/main/scala/spark/streaming/dstream/PluggableInputDStream.scala
	streaming/src/main/scala/spark/streaming/receivers/ActorReceiver.scala
	streaming/src/test/scala/spark/streaming/InputStreamsSuite.scala
2013-02-19 02:48:50 -08:00
Tathagata Das 12ea14c211 Changed networkStream to socketStream and pluggableNetworkStream to become networkStream as a way to create streams from arbitrary network receiver. 2013-02-18 15:18:34 -08:00
Tathagata Das 6a6e6bda57 Merge branch 'streaming' into ScrapCode-streaming
Conflicts:
	streaming/src/main/scala/spark/streaming/dstream/KafkaInputDStream.scala
	streaming/src/main/scala/spark/streaming/dstream/NetworkInputDStream.scala
2013-02-18 13:26:12 -08:00
Tathagata Das 8ad561dc7d Added checkpointing and fault-tolerance semantics to the programming guide. Fixed default checkpoint interval to being a multiple of slide duration. Fixed visibility of some classes and objects to clean up docs. 2013-02-18 02:12:41 -08:00
Matei Zaharia 7151e1e4c8 Rename "jobs" to "applications" in the standalone cluster 2013-02-17 23:23:08 -08:00
Tathagata Das f98c7da23e Many changes to ensure better 2nd recovery if 2nd failure happens while
recovering from 1st failure
- Made the scheduler to checkpoint after clearing old metadata which
  ensures that a new checkpoint is written as soon as at least one batch
  gets computed  while recovering from a failure. This ensures that if
  there is a 2nd failure while recovering from 1st failure, the system
  start 2nd recovery from a newer checkpoint.
- Modified Checkpoint writer to write checkpoint in a different thread.
- Added a check to make sure that compute for InputDStreams gets called
  only for strictly increasing times.
- Changed implementation of slice to call getOrCompute on parent DStream
  in time-increasing order.
- Added testcase to test slice.
- Fixed testGroupByKeyAndWindow testcase in JavaAPISuite to verify
  results with expected output in an order-independent manner.
2013-02-17 15:06:41 -08:00
Stephen Haberman ae2234687d Make CoGroupedRDDs explicitly have the same key type. 2013-02-16 13:10:31 -06:00
Tathagata Das ddcb976b0d Made MasterFailureTest more robust. 2013-02-15 06:54:47 +00:00
Tathagata Das 4b8402e900 Moved Java streaming examples to examples/src/main/java/spark/streaming/... and fixed logging in NetworkInputTracker to highlight errors when receiver deregisters/shuts down. 2013-02-14 18:10:37 -08:00
Tathagata Das def8126d77 Added TwitterInputDStream from example to StreamingContext. Renamed example TwitterBasic to TwitterPopularTags. 2013-02-14 17:49:43 -08:00
Tathagata Das 2eacf22401 Removed countByKeyAndWindow on paired DStreams, and added countByValueAndWindow for all DStreams. Updated both scala and java API and testsuites. 2013-02-14 12:21:47 -08:00
Tathagata Das 03e8dc6861 Changes functions comments to make them more consistent. 2013-02-13 20:59:29 -08:00
Tathagata Das 12b020b668 Added filter functionality to reduceByKeyAndWindow with inverse. Consolidated reduceByKeyAndWindow's many functions into smaller number of functions with optional parameters. 2013-02-13 20:53:50 -08:00
Tathagata Das 39addd3803 Changed scheduler and file input stream to fix bugs in the driver fault tolerance. Added MasterFailureTest to rigorously test master fault tolerance with file input stream. 2013-02-13 12:17:45 -08:00
Patrick Wendell 3f3e77f28b STREAMING-50: Support transform workaround in JavaPairDStream
This ports a useful workaround (the `transform` function) to
JavaPairDStream. It is necessary to do things like sorting which
are not supported yet in the core streaming API.
2013-02-12 14:02:32 -08:00
Patrick Wendell d09c36065c Using tuple swap() 2013-02-11 10:45:45 -08:00