Commit graph

2180 commits

Author SHA1 Message Date
Prashant Sharma 4e5b09664c fixes corresponding to review feedback at pull request #479 2013-02-20 19:14:52 +05:30
Prashant Sharma 8d44480d84 example for demonstrating ZeroMQ stream 2013-02-19 19:42:14 +05:30
Prashant Sharma f7d3e309cb ZeroMQ stream as receiver 2013-02-19 19:32:52 +05:30
Tathagata Das 8b9c673fce Merge pull request #476 from tdas/streaming
Major modifications to fix driver fault-tolerance with file input stream
2013-02-19 03:07:10 -08:00
Tathagata Das 7e30c46aaf Added comment to the KafkaWordCount, given by Sean McNamara. 2013-02-19 03:05:44 -08:00
Tathagata Das 7851b34e97 Merge branch 'mesos-streaming' into streaming 2013-02-19 03:01:15 -08:00
Tathagata Das 9e82be1503 Merge branch 'streaming' into ScrapCodes-streaming-actor
Conflicts:
	docs/plugin-custom-receiver.md
	streaming/src/main/scala/spark/streaming/StreamingContext.scala
	streaming/src/main/scala/spark/streaming/dstream/KafkaInputDStream.scala
	streaming/src/main/scala/spark/streaming/dstream/PluggableInputDStream.scala
	streaming/src/main/scala/spark/streaming/receivers/ActorReceiver.scala
	streaming/src/test/scala/spark/streaming/InputStreamsSuite.scala
2013-02-19 02:48:50 -08:00
Tathagata Das 12ea14c211 Changed networkStream to socketStream and pluggableNetworkStream to become networkStream as a way to create streams from arbitrary network receiver. 2013-02-18 15:18:34 -08:00
Tathagata Das 6a6e6bda57 Merge branch 'streaming' into ScrapCode-streaming
Conflicts:
	streaming/src/main/scala/spark/streaming/dstream/KafkaInputDStream.scala
	streaming/src/main/scala/spark/streaming/dstream/NetworkInputDStream.scala
2013-02-18 13:26:12 -08:00
Tathagata Das 8ad561dc7d Added checkpointing and fault-tolerance semantics to the programming guide. Fixed default checkpoint interval to being a multiple of slide duration. Fixed visibility of some classes and objects to clean up docs. 2013-02-18 02:12:41 -08:00
Tathagata Das f98c7da23e Many changes to ensure better 2nd recovery if 2nd failure happens while
recovering from 1st failure
- Made the scheduler to checkpoint after clearing old metadata which
  ensures that a new checkpoint is written as soon as at least one batch
  gets computed  while recovering from a failure. This ensures that if
  there is a 2nd failure while recovering from 1st failure, the system
  start 2nd recovery from a newer checkpoint.
- Modified Checkpoint writer to write checkpoint in a different thread.
- Added a check to make sure that compute for InputDStreams gets called
  only for strictly increasing times.
- Changed implementation of slice to call getOrCompute on parent DStream
  in time-increasing order.
- Added testcase to test slice.
- Fixed testGroupByKeyAndWindow testcase in JavaAPISuite to verify
  results with expected output in an order-independent manner.
2013-02-17 15:06:41 -08:00
Tathagata Das ddcb976b0d Made MasterFailureTest more robust. 2013-02-15 06:54:47 +00:00
Tathagata Das 4b8402e900 Moved Java streaming examples to examples/src/main/java/spark/streaming/... and fixed logging in NetworkInputTracker to highlight errors when receiver deregisters/shuts down. 2013-02-14 18:10:37 -08:00
Tathagata Das def8126d77 Added TwitterInputDStream from example to StreamingContext. Renamed example TwitterBasic to TwitterPopularTags. 2013-02-14 17:49:43 -08:00
Tathagata Das 2eacf22401 Removed countByKeyAndWindow on paired DStreams, and added countByValueAndWindow for all DStreams. Updated both scala and java API and testsuites. 2013-02-14 12:21:47 -08:00
Tathagata Das 03e8dc6861 Changes functions comments to make them more consistent. 2013-02-13 20:59:29 -08:00
Tathagata Das 12b020b668 Added filter functionality to reduceByKeyAndWindow with inverse. Consolidated reduceByKeyAndWindow's many functions into smaller number of functions with optional parameters. 2013-02-13 20:53:50 -08:00
Tathagata Das 39addd3803 Changed scheduler and file input stream to fix bugs in the driver fault tolerance. Added MasterFailureTest to rigorously test master fault tolerance with file input stream. 2013-02-13 12:17:45 -08:00
Tathagata Das fd90daf850 Fixed bugs in FileInputDStream and Scheduler that occasionally failed to reprocess old files after recovering from master failure. Completely modified spark.streaming.FailureTest to test multiple master failures using file input stream. 2013-02-10 19:48:42 -08:00
Tathagata Das 16baea62bc Fixed bug in CheckpointRDD to prevent exception when the original RDD had zero splits. 2013-02-10 19:14:49 -08:00
Tathagata Das 99a5fc498a Added an initial spark job to ensure worker nodes are initialized. 2013-02-09 15:18:05 -08:00
Prashant Sharma 291dd47c7f Taking FeederActor out as seperate program 2013-02-08 14:34:07 +05:30
Tathagata Das bcee3cb2db Merge pull request #455 from tdas/streaming
Merging latest master branch changes to the streaming branch
2013-02-07 15:05:20 -08:00
Tathagata Das 4cc223b478 Merge branch 'mesos-master' into streaming 2013-02-07 13:59:31 -08:00
Tathagata Das d55e3aa467 Updated JavaStreamingContext with updated kafkaStream API. 2013-02-07 13:59:18 -08:00
Tathagata Das c6b2f765d3 Merge branch 'mesos-streaming' into streaming 2013-02-07 13:13:53 -08:00
Tathagata Das 12300758cc Merge pull request #372 from Reinvigorate/sm-kafka
Removing offset management code that is non-existent in kafka 0.7.0+
2013-02-07 12:41:07 -08:00
Tathagata Das 915d9931fe Merge pull request #373 from Reinvigorate/sm-updateStateByKey
StateDStream changes to give updateStateByKey consistent behavior
2013-02-07 11:59:19 -08:00
Matei Zaharia 9cfa068379 Merge pull request #450 from stephenh/inlinemergepair
Inline mergePair to look more like the narrow dep branch.
2013-02-05 18:28:44 -08:00
Matei Zaharia 03eefbb200 Merge pull request #451 from stephenh/fixdeathpactexception
Handle Terminated to avoid endless DeathPactExceptions.
2013-02-05 18:27:54 -08:00
Stephen Haberman 870b2aaf5d Merge branch 'master' into fixdeathpactexception
Conflicts:
	core/src/main/scala/spark/deploy/worker/Worker.scala
2013-02-05 20:27:09 -06:00
Matei Zaharia a4611d66f0 Merge pull request #449 from stephenh/longerdriversuite
Increase DriverSuite timeout.
2013-02-05 17:58:22 -08:00
Stephen Haberman 0e19093fd8 Handle Terminated to avoid endless DeathPactExceptions.
Credit to Roland Kuhn, Akka's tech lead, for pointing out this
various obvious fix, but StandaloneExecutorBackend.preStart's
catch block would never (ever) get hit, because all of the
operation's in preStart are async.

So, the System.exit in the catch block was skipped, and instead
Akka was sending Terminated messages which, since we didn't
handle, it turned into DeathPactException, which started
a postRestart/preStart infinite loop.
2013-02-05 18:58:00 -06:00
Stephen Haberman 1ba3393ceb Increase DriverSuite timeout. 2013-02-05 17:56:50 -06:00
Stephen Haberman 8bd0e888f3 Inline mergePair to look more like the narrow dep branch.
No functionality changes, I think this is just more consistent
given mergePair isn't called multiple times/recursive.

Also added a comment to explain the usual case of having two parent RDDs.
2013-02-05 17:50:25 -06:00
Matei Zaharia 2d9eca9fbb Merge pull request #447 from pwendell/streaming-constructor
Streaming constructor which takes JavaSparkContext
2013-02-05 11:45:44 -08:00
Patrick Wendell 7eea64aa4c Streaming constructor which takes JavaSparkContext
It's sometimes helpful to directly pass a JavaSparkContext,
and take advantage of the various constructors available for that.
2013-02-05 11:43:16 -08:00
Matei Zaharia f6ec547ea7 Small fix to test for distinct 2013-02-04 13:14:54 -08:00
Matei Zaharia aa4ee1e9e5 Fix failing test 2013-02-04 11:06:31 -08:00
Matei Zaharia f7b4e428be Merge pull request #445 from JoshRosen/pyspark_fixes
Fix exit status in PySpark unit tests; fix/optimize PySpark's RDD.take()
2013-02-03 21:36:36 -08:00
Josh Rosen e61729113d Remove unnecessary doctest __main__ methods. 2013-02-03 21:29:40 -08:00
Matei Zaharia 3bfaf3ab1d Merge pull request #379 from stephenh/sparkmem
Add spark.executor.memory to differentiate executor memory from spark-shell
2013-02-02 23:58:23 -08:00
Matei Zaharia 88ee6163a1 Merge pull request #422 from squito/blockmanager_info
RDDInfo available from SparkContext
2013-02-02 23:44:13 -08:00
Matei Zaharia cd4ca93679 Merge pull request #436 from stephenh/removeextraloop
Once we find a split with no block, we don't have to look for more.
2013-02-02 23:39:28 -08:00
Matei Zaharia d5daaab381 Merge pull request #442 from stephenh/fixsystemnames
Fix createActorSystem not actually using the systemName parameter.
2013-02-02 23:38:46 -08:00
Matei Zaharia 9163c3705d Formatting 2013-02-02 23:34:47 -08:00
Josh Rosen 8fbd5380b7 Fetch fewer objects in PySpark's take() method. 2013-02-03 06:44:49 +00:00
Josh Rosen 2415c18f48 Fix reporting of PySpark doctest failures. 2013-02-03 06:44:11 +00:00
Matei Zaharia 34a7bcdb3a Formatting 2013-02-02 19:40:30 -08:00
Matei Zaharia 85019d76a4 Merge pull request #427 from woggling/dag-sched-tests
Tests for DAGScheduler
2013-02-02 19:09:59 -08:00