Commit graph

271 commits

Author SHA1 Message Date
Tathagata Das 99a5fc498a Added an initial spark job to ensure worker nodes are initialized. 2013-02-09 15:18:05 -08:00
Tathagata Das 4cc223b478 Merge branch 'mesos-master' into streaming 2013-02-07 13:59:31 -08:00
Tathagata Das d55e3aa467 Updated JavaStreamingContext with updated kafkaStream API. 2013-02-07 13:59:18 -08:00
Tathagata Das c6b2f765d3 Merge branch 'mesos-streaming' into streaming 2013-02-07 13:13:53 -08:00
Tathagata Das 12300758cc Merge pull request #372 from Reinvigorate/sm-kafka
Removing offset management code that is non-existent in kafka 0.7.0+
2013-02-07 12:41:07 -08:00
Tathagata Das 915d9931fe Merge pull request #373 from Reinvigorate/sm-updateStateByKey
StateDStream changes to give updateStateByKey consistent behavior
2013-02-07 11:59:19 -08:00
Patrick Wendell 7eea64aa4c Streaming constructor which takes JavaSparkContext
It's sometimes helpful to directly pass a JavaSparkContext,
and take advantage of the various constructors available for that.
2013-02-05 11:43:16 -08:00
Mikhail Bautin fe3eceab57 Remove activation of profiles by default
See the discussion at https://github.com/mesos/spark/pull/355 for why
default profile activation is a problem.
2013-01-31 13:30:41 -08:00
Matei Zaharia 9ae11603b4 Merge pull request #415 from stephenh/driver
Replace old 'master' term with 'driver'.
2013-01-29 10:41:42 -08:00
Matei Zaharia b29599e5cf Fix code that depended on metadata cleaner interval being in minutes 2013-01-28 22:24:47 -08:00
Stephen Haberman 7dfb82a992 Replace old 'master' term with 'driver'. 2013-01-25 11:03:00 -06:00
Tathagata Das 9c8ff1e55f Fixed checkpoint testcases 2013-01-23 07:31:49 -08:00
Tathagata Das 666ce431aa Added support for rescheduling unprocessed batches on master failure. 2013-01-23 03:15:36 -08:00
Tathagata Das fad2b82fc8 Added support for saving input files of FileInputDStream to graph checkpoints. Modified 'file input stream with checkpoint' testcase to test recovery of pre-master-failure input files. 2013-01-22 18:10:00 -08:00
Tathagata Das 364cdb679c Refactored DStreamCheckpointData. 2013-01-22 00:43:31 -08:00
Prashant Sharma d17065c4b5 actor as receiver 2013-01-22 13:28:29 +05:30
Josh Rosen 551a47a620 Refactor daemon thread pool creation. 2013-01-21 23:31:00 -08:00
Stephen Haberman e5ca241335 Move JavaAPISuite into spark.streaming. 2013-01-21 16:06:58 -06:00
Prashant Sharma 43bfd7bb21 Changed method name of createReceiver to getReceiver as it is not intended to be a factory. 2013-01-21 11:39:30 +05:30
Matei Zaharia 6e3754bf47 Add Maven build file for streaming, and fix some issues in SBT file
As part of this, changed our Scala 2.9.2 Kafka library to be available
as a local Maven repository, following the example in
(http://blog.dub.podval.org/2010/01/maven-in-project-repository.html)
2013-01-20 19:22:24 -08:00
seanm c0694291c8 Splitting StreamingContext.queueStream into two methods 2013-01-20 12:09:45 -07:00
seanm ea739251eb adding updateStateByKey object lifecycle test 2013-01-20 11:29:21 -07:00
Tathagata Das 33bad85bb9 Fixed streaming testsuite bugs 2013-01-20 03:51:11 -08:00
Tathagata Das 4f8fe58b25 Merge branch 'mesos-streaming' into streaming
Conflicts:
	core/src/main/scala/spark/api/java/JavaRDDLike.scala
	core/src/main/scala/spark/api/java/JavaSparkContext.scala
	core/src/test/scala/spark/JavaAPISuite.java
2013-01-20 01:13:56 -08:00
Tathagata Das 214345ceac Fixed issue https://spark-project.atlassian.net/browse/STREAMING-29, along with updates to doc comments in SparkContext.checkpoint(). 2013-01-19 23:50:17 -08:00
Prashant Sharma 56b9bd197c Plug in actor as stream receiver API 2013-01-19 22:04:07 +05:30
Prashant Sharma bb6ab92e31 Changed method name of createReceiver to getReceiver as it is not intended to be a factory. 2013-01-19 22:04:07 +05:30
seanm d3064fe707 kafkaStream API cleanup. A quorum of zookeepers can now be specified 2013-01-18 21:34:29 -07:00
seanm 56b7fbafa2 further KafkaInputDStream cleanup (removing unused and commented out code relating to offset management) 2013-01-18 21:15:54 -07:00
Patrick Wendell c46dd2de78 Moving tests to appropriate directory 2013-01-17 21:43:17 -08:00
Patrick Wendell e0165bf714 Adding queueStream and some slight refactoring 2013-01-17 21:25:49 -08:00
Patrick Wendell ee0314c3b3 Merge branch 'streaming' into streaming-java-api 2013-01-17 18:43:00 -08:00
Patrick Wendell 70ba994d6d Import fixup 2013-01-17 18:41:59 -08:00
Patrick Wendell 2261e62ee5 Style cleanup 2013-01-17 18:41:59 -08:00
Patrick Wendell 82b8707c6b Checkpointing in Streaming java API 2013-01-17 18:41:58 -08:00
Patrick Wendell 61b877c688 Adding flatMap 2013-01-17 18:41:58 -08:00
Patrick Wendell 8e6cbbc6c7 Adding other updateState functions 2013-01-17 18:41:58 -08:00
Patrick Wendell 2a872335c5 Bug fix and test cleanup 2013-01-17 18:41:58 -08:00
seanm a4639400ea Merge branch 'streaming' into sm-updateStateByKey 2013-01-15 20:29:01 -07:00
Tathagata Das 1638fcb0dc Fixed updateStateByKey to work with primitive types. 2013-01-14 17:18:39 -08:00
seanm c203a29296 StateDStream changes to give updateStateByKey consistent behavior 2013-01-14 17:22:03 -07:00
seanm b61a4ec773 Removing offset management code that is non-existent in kafka 0.7.0+ 2013-01-14 17:13:10 -07:00
Patrick Wendell a0013beb03 Stash 2013-01-14 15:15:02 -08:00
Patrick Wendell 8ad6220bd3 Bugfix 2013-01-14 14:56:36 -08:00
Patrick Wendell 38d9a3a863 Remove AnyRef constraint in updateState 2013-01-14 13:31:49 -08:00
Patrick Wendell ae5290f4a2 Bug fix 2013-01-14 13:31:32 -08:00
Patrick Wendell 6069446356 Making comments consistent w/ Spark style 2013-01-14 10:42:05 -08:00
Patrick Wendell d182a57cae Two changes:
- Updating countByX() types based on bug fix
- Porting new documentation to Java
2013-01-14 10:03:55 -08:00
Patrick Wendell a292ed8d8a Some style cleanup 2013-01-14 09:42:36 -08:00
Patrick Wendell 3461cd99b7 Flume example and bug fix 2013-01-14 09:42:36 -08:00
Patrick Wendell 5bcb048167 More work on InputStreams 2013-01-14 09:42:36 -08:00
Patrick Wendell 280b6d0186 Porting to new Duration class 2013-01-14 09:42:36 -08:00
Patrick Wendell c2537057f9 Fixing issue with <Long> types 2013-01-14 09:42:36 -08:00
Patrick Wendell b36c4f7cce More work on StreamingContext 2013-01-14 09:42:36 -08:00
Patrick Wendell 5004eec37c Import Cleanup 2013-01-14 09:42:36 -08:00
Patrick Wendell 2fe39a4468 Some docs for the JavaTestUtils 2013-01-14 09:42:36 -08:00
Patrick Wendell 560c312c60 Docs, some tests, and work on
StreamingContext
2013-01-14 09:42:36 -08:00
Patrick Wendell 7e1049d8f1 Squashing a few TODOs 2013-01-14 09:42:36 -08:00
Patrick Wendell 74182010a4 Style cleanup and moving functions 2013-01-14 09:42:36 -08:00
Patrick Wendell 056f5efc55 More pair functions 2013-01-14 09:42:36 -08:00
Patrick Wendell 6e514a8d35 PairDStream and DStreamLike 2013-01-14 09:42:36 -08:00
Patrick Wendell f144e0413a Adding transform and union 2013-01-14 09:42:36 -08:00
Patrick Wendell 0d0bab25bd Reduce tests 2013-01-14 09:42:36 -08:00
Patrick Wendell 22a8c7be9a Adding more tests 2013-01-14 09:42:36 -08:00
Patrick Wendell 91b3d41448 Better equality test (thanks Josh) 2013-01-14 09:42:36 -08:00
Patrick Wendell b0974e6c1d Remving main method from tests 2013-01-14 09:42:36 -08:00
Patrick Wendell 867a7455e2 Adding some initial tests to streaming API. 2013-01-14 09:42:36 -08:00
Patrick Wendell b607c9e916 A very rough, early cut at some Java functionality for Streaming. 2013-01-14 09:42:36 -08:00
Tathagata Das f90f794cde Minor name fix 2013-01-13 21:25:57 -08:00
Tathagata Das 6cc8592f26 Fixed bug 2013-01-13 21:20:49 -08:00
Tathagata Das 0dbd411a56 Added documentation for PairDStreamFunctions. 2013-01-13 21:08:35 -08:00
Tathagata Das 0a2e333341 Removed stream id from the constructor of NetworkReceiver to make it easier for PluggableNetworkInputDStream. 2013-01-13 16:18:39 -08:00
Tathagata Das 365506fb03 Changed variable name form ***Time to ***Duration to keep things consistent. 2013-01-09 14:29:25 -08:00
Tathagata Das 156e8b47ef Split Time to Time (absolute instant of time) and Duration (duration of time). 2013-01-09 12:42:10 -08:00
Tathagata Das 8c1b872512 Moved Twitter example to the where the other examples are. 2013-01-07 17:48:10 -08:00
Tathagata Das 64dceec293 Merge branch 'streaming-merge' into dev-merge 2013-01-07 16:54:35 -08:00
Tathagata Das d808e1026a Merge branch 'dev' into dev-merge 2013-01-07 16:41:11 -08:00
Tathagata Das 4719e6d8fe Changed locations for unit test logs. 2013-01-07 16:06:07 -08:00
Tathagata Das e60514d79e Fixed bug 2013-01-07 15:16:16 -08:00
Tathagata Das 237bac36e9 Renamed examples and added documentation. 2013-01-07 14:37:21 -08:00
Tathagata Das af8738dfb5 Moved Spark Streaming examples to examples sub-project. 2013-01-06 19:31:54 -08:00
Patrick Wendell 2ef993d159 BufferingBlockCreator -> NetworkReceiver.BlockGenerator 2013-01-02 14:19:51 -08:00
Patrick Wendell 96a6ff0b09 Merge branch 'dev-merge' into datahandler-fix
Conflicts:
	streaming/src/main/scala/spark/streaming/dstream/DataHandler.scala
2013-01-02 14:08:15 -08:00
Patrick Wendell 493d65ce65 Several code-quality improvements to DataHandler.
- Changed to more accurate name: BufferingBlockCreator
- Docstring now correctly reflects the abstraction
  offered by the class
- Made internal methods private
- Fixed indentation problems
2013-01-02 13:39:18 -08:00
Tathagata Das 02497f0cd4 Updated Streaming Programming Guide. 2013-01-01 12:21:32 -08:00
Tathagata Das 18b9b3b99f More classes made private[streaming] to hide from scala docs. 2012-12-30 20:00:42 -08:00
Tathagata Das 7e0271b438 Refactored a whole lot to push all DStreams into the spark.streaming.dstream package. 2012-12-30 15:19:55 -08:00
Tathagata Das 9e644402c1 Improved jekyll and scala docs. Made many classes and method private to remove them from scala docs. 2012-12-29 18:31:51 -08:00
Patrick Wendell 518111573f Merge pull request #8 from radlab/twitter-example
Adding a Twitter InputDStream with an example
2012-12-29 14:23:01 -08:00
Tathagata Das 0bc0a60d30 Modifications to make sure LocalScheduler terminate cleanly without errors when SparkContext is shutdown, to minimize spurious exception during master failure tests. 2012-12-27 15:37:33 -08:00
Patrick Wendell bce84ceabb Minor changes after review and general cleanup.
- Added filters to Twitter example
- Removed un-used import
- Some code clean-up
2012-12-21 20:57:46 -08:00
Patrick Wendell 9ac4cb1c5f Adding a Twitter InputDStream with an example 2012-12-21 17:18:19 -08:00
Tathagata Das 8512dd3225 Merge branch 'dev' of github.com:radlab/spark into dev-checkpoint
Conflicts:
	core/src/main/scala/spark/ParallelCollection.scala
	core/src/test/scala/spark/CheckpointSuite.scala
	streaming/src/main/scala/spark/streaming/DStream.scala
2012-12-20 14:24:19 -08:00
Tathagata Das 8e74fac215 Made checkpoint data in RDDs optional to further reduce serialized size. 2012-12-11 15:36:12 -08:00
Patrick Wendell 3e796bdd57 Changes in response to TD's review. 2012-12-07 19:34:05 -08:00
Patrick Wendell 3ff9710265 Adding Flume InputDStream 2012-12-07 16:42:39 -08:00
Denny 556c38ed91 Added kafka JAR 2012-12-05 11:54:42 -08:00
Denny a23462191f Adjust Kafka code to work with new streaming changes. 2012-12-05 10:30:40 -08:00
Denny 15df4b0e52 Merge branch 'dev' into kafka
Conflicts:
	streaming/src/main/scala/spark/streaming/DStream.scala
2012-12-05 10:16:56 -08:00
Tathagata Das 21a0852976 Refactored RDD checkpointing to minimize extra fields in RDD class. 2012-12-04 22:10:25 -08:00