Commit graph

353 commits

Author SHA1 Message Date
Tathagata Das 972fe7714f Merge branch 'mesos-streaming' into streaming
Conflicts:
	streaming/src/test/java/spark/streaming/JavaAPISuite.java
2013-02-20 11:06:01 -08:00
Tathagata Das fb9956256d Merge branch 'mesos-master' into streaming
Conflicts:
	core/src/main/scala/spark/rdd/CheckpointRDD.scala
	streaming/src/main/scala/spark/streaming/dstream/ReducedWindowedDStream.scala
2013-02-20 09:01:29 -08:00
Prashant Sharma 4e5b09664c fixes corresponding to review feedback at pull request #479 2013-02-20 19:14:52 +05:30
Patrick Wendell 041c19e5f0 Small changes that were missing in merge 2013-02-19 08:44:20 -08:00
Patrick Wendell fed1122d74 Use RDD type for slice operator in Java.
This commit uses the RDD type in `slice`, making it available to both normal
and pair RDD's in java. It also updates the signature for `slice` to match
changes in the Scala API.
2013-02-19 08:32:38 -08:00
Patrick Wendell 35880de42e Use RDD type for transform operator in Java.
This is an improved implementation of the `transform` operator in Java.
The main difference is that this allows all four possible types of
transform functions

1. JavaRDD -> JavaRDD
2. JavaRDD -> JavaPairRDD
3. JavaPairRDD -> JavaPairRDD
4. JavaPairRDD -> JavaRDD

whereas previously only (1) and (3) were possible.

Conflicts:

	streaming/src/test/java/spark/streaming/JavaAPISuite.java
2013-02-19 08:31:58 -08:00
Patrick Wendell 9d49a6b03f Use RDD type for foreach operator in Java. 2013-02-19 08:30:32 -08:00
Nick Pentreath d8ee184d95 Dependencies and refactoring for streaming HLL example, and using context.twitterStream method 2013-02-19 17:42:57 +02:00
Prashant Sharma 8d44480d84 example for demonstrating ZeroMQ stream 2013-02-19 19:42:14 +05:30
Prashant Sharma f7d3e309cb ZeroMQ stream as receiver 2013-02-19 19:32:52 +05:30
Tathagata Das 9e82be1503 Merge branch 'streaming' into ScrapCodes-streaming-actor
Conflicts:
	docs/plugin-custom-receiver.md
	streaming/src/main/scala/spark/streaming/StreamingContext.scala
	streaming/src/main/scala/spark/streaming/dstream/KafkaInputDStream.scala
	streaming/src/main/scala/spark/streaming/dstream/PluggableInputDStream.scala
	streaming/src/main/scala/spark/streaming/receivers/ActorReceiver.scala
	streaming/src/test/scala/spark/streaming/InputStreamsSuite.scala
2013-02-19 02:48:50 -08:00
Tathagata Das 12ea14c211 Changed networkStream to socketStream and pluggableNetworkStream to become networkStream as a way to create streams from arbitrary network receiver. 2013-02-18 15:18:34 -08:00
Tathagata Das 6a6e6bda57 Merge branch 'streaming' into ScrapCode-streaming
Conflicts:
	streaming/src/main/scala/spark/streaming/dstream/KafkaInputDStream.scala
	streaming/src/main/scala/spark/streaming/dstream/NetworkInputDStream.scala
2013-02-18 13:26:12 -08:00
Tathagata Das 8ad561dc7d Added checkpointing and fault-tolerance semantics to the programming guide. Fixed default checkpoint interval to being a multiple of slide duration. Fixed visibility of some classes and objects to clean up docs. 2013-02-18 02:12:41 -08:00
Matei Zaharia 7151e1e4c8 Rename "jobs" to "applications" in the standalone cluster 2013-02-17 23:23:08 -08:00
Tathagata Das f98c7da23e Many changes to ensure better 2nd recovery if 2nd failure happens while
recovering from 1st failure
- Made the scheduler to checkpoint after clearing old metadata which
  ensures that a new checkpoint is written as soon as at least one batch
  gets computed  while recovering from a failure. This ensures that if
  there is a 2nd failure while recovering from 1st failure, the system
  start 2nd recovery from a newer checkpoint.
- Modified Checkpoint writer to write checkpoint in a different thread.
- Added a check to make sure that compute for InputDStreams gets called
  only for strictly increasing times.
- Changed implementation of slice to call getOrCompute on parent DStream
  in time-increasing order.
- Added testcase to test slice.
- Fixed testGroupByKeyAndWindow testcase in JavaAPISuite to verify
  results with expected output in an order-independent manner.
2013-02-17 15:06:41 -08:00
Stephen Haberman ae2234687d Make CoGroupedRDDs explicitly have the same key type. 2013-02-16 13:10:31 -06:00
Tathagata Das ddcb976b0d Made MasterFailureTest more robust. 2013-02-15 06:54:47 +00:00
Tathagata Das 4b8402e900 Moved Java streaming examples to examples/src/main/java/spark/streaming/... and fixed logging in NetworkInputTracker to highlight errors when receiver deregisters/shuts down. 2013-02-14 18:10:37 -08:00
Tathagata Das def8126d77 Added TwitterInputDStream from example to StreamingContext. Renamed example TwitterBasic to TwitterPopularTags. 2013-02-14 17:49:43 -08:00
Tathagata Das 2eacf22401 Removed countByKeyAndWindow on paired DStreams, and added countByValueAndWindow for all DStreams. Updated both scala and java API and testsuites. 2013-02-14 12:21:47 -08:00
Tathagata Das 03e8dc6861 Changes functions comments to make them more consistent. 2013-02-13 20:59:29 -08:00
Tathagata Das 12b020b668 Added filter functionality to reduceByKeyAndWindow with inverse. Consolidated reduceByKeyAndWindow's many functions into smaller number of functions with optional parameters. 2013-02-13 20:53:50 -08:00
Tathagata Das 39addd3803 Changed scheduler and file input stream to fix bugs in the driver fault tolerance. Added MasterFailureTest to rigorously test master fault tolerance with file input stream. 2013-02-13 12:17:45 -08:00
Patrick Wendell 3f3e77f28b STREAMING-50: Support transform workaround in JavaPairDStream
This ports a useful workaround (the `transform` function) to
JavaPairDStream. It is necessary to do things like sorting which
are not supported yet in the core streaming API.
2013-02-12 14:02:32 -08:00
Patrick Wendell d09c36065c Using tuple swap() 2013-02-11 10:45:45 -08:00
Patrick Wendell 04786d0739 small fix 2013-02-11 10:05:49 -08:00
Patrick Wendell c65988bdc1 Fix for MapPartitions 2013-02-11 10:03:37 -08:00
Patrick Wendell 20cf770545 Fix for flatmap 2013-02-11 10:03:37 -08:00
Patrick Wendell 314d87a038 Indentation fix 2013-02-11 10:03:37 -08:00
Patrick Wendell f0b68c623c Initial cut at replacing K, V in Java files 2013-02-11 10:03:37 -08:00
Tathagata Das fd90daf850 Fixed bugs in FileInputDStream and Scheduler that occasionally failed to reprocess old files after recovering from master failure. Completely modified spark.streaming.FailureTest to test multiple master failures using file input stream. 2013-02-10 19:48:42 -08:00
Tathagata Das 99a5fc498a Added an initial spark job to ensure worker nodes are initialized. 2013-02-09 15:18:05 -08:00
Tathagata Das 4cc223b478 Merge branch 'mesos-master' into streaming 2013-02-07 13:59:31 -08:00
Tathagata Das d55e3aa467 Updated JavaStreamingContext with updated kafkaStream API. 2013-02-07 13:59:18 -08:00
Tathagata Das c6b2f765d3 Merge branch 'mesos-streaming' into streaming 2013-02-07 13:13:53 -08:00
Tathagata Das 12300758cc Merge pull request #372 from Reinvigorate/sm-kafka
Removing offset management code that is non-existent in kafka 0.7.0+
2013-02-07 12:41:07 -08:00
Tathagata Das 915d9931fe Merge pull request #373 from Reinvigorate/sm-updateStateByKey
StateDStream changes to give updateStateByKey consistent behavior
2013-02-07 11:59:19 -08:00
Patrick Wendell 7eea64aa4c Streaming constructor which takes JavaSparkContext
It's sometimes helpful to directly pass a JavaSparkContext,
and take advantage of the various constructors available for that.
2013-02-05 11:43:16 -08:00
Mikhail Bautin fe3eceab57 Remove activation of profiles by default
See the discussion at https://github.com/mesos/spark/pull/355 for why
default profile activation is a problem.
2013-01-31 13:30:41 -08:00
Matei Zaharia 9ae11603b4 Merge pull request #415 from stephenh/driver
Replace old 'master' term with 'driver'.
2013-01-29 10:41:42 -08:00
Matei Zaharia b29599e5cf Fix code that depended on metadata cleaner interval being in minutes 2013-01-28 22:24:47 -08:00
Stephen Haberman 7dfb82a992 Replace old 'master' term with 'driver'. 2013-01-25 11:03:00 -06:00
Tathagata Das 9c8ff1e55f Fixed checkpoint testcases 2013-01-23 07:31:49 -08:00
Tathagata Das 666ce431aa Added support for rescheduling unprocessed batches on master failure. 2013-01-23 03:15:36 -08:00
Tathagata Das fad2b82fc8 Added support for saving input files of FileInputDStream to graph checkpoints. Modified 'file input stream with checkpoint' testcase to test recovery of pre-master-failure input files. 2013-01-22 18:10:00 -08:00
Tathagata Das 364cdb679c Refactored DStreamCheckpointData. 2013-01-22 00:43:31 -08:00
Prashant Sharma d17065c4b5 actor as receiver 2013-01-22 13:28:29 +05:30
Josh Rosen 551a47a620 Refactor daemon thread pool creation. 2013-01-21 23:31:00 -08:00
Stephen Haberman e5ca241335 Move JavaAPISuite into spark.streaming. 2013-01-21 16:06:58 -06:00
Prashant Sharma 43bfd7bb21 Changed method name of createReceiver to getReceiver as it is not intended to be a factory. 2013-01-21 11:39:30 +05:30
Matei Zaharia 6e3754bf47 Add Maven build file for streaming, and fix some issues in SBT file
As part of this, changed our Scala 2.9.2 Kafka library to be available
as a local Maven repository, following the example in
(http://blog.dub.podval.org/2010/01/maven-in-project-repository.html)
2013-01-20 19:22:24 -08:00
seanm c0694291c8 Splitting StreamingContext.queueStream into two methods 2013-01-20 12:09:45 -07:00
seanm ea739251eb adding updateStateByKey object lifecycle test 2013-01-20 11:29:21 -07:00
Tathagata Das 33bad85bb9 Fixed streaming testsuite bugs 2013-01-20 03:51:11 -08:00
Tathagata Das 4f8fe58b25 Merge branch 'mesos-streaming' into streaming
Conflicts:
	core/src/main/scala/spark/api/java/JavaRDDLike.scala
	core/src/main/scala/spark/api/java/JavaSparkContext.scala
	core/src/test/scala/spark/JavaAPISuite.java
2013-01-20 01:13:56 -08:00
Tathagata Das 214345ceac Fixed issue https://spark-project.atlassian.net/browse/STREAMING-29, along with updates to doc comments in SparkContext.checkpoint(). 2013-01-19 23:50:17 -08:00
Prashant Sharma 56b9bd197c Plug in actor as stream receiver API 2013-01-19 22:04:07 +05:30
Prashant Sharma bb6ab92e31 Changed method name of createReceiver to getReceiver as it is not intended to be a factory. 2013-01-19 22:04:07 +05:30
seanm d3064fe707 kafkaStream API cleanup. A quorum of zookeepers can now be specified 2013-01-18 21:34:29 -07:00
seanm 56b7fbafa2 further KafkaInputDStream cleanup (removing unused and commented out code relating to offset management) 2013-01-18 21:15:54 -07:00
Patrick Wendell c46dd2de78 Moving tests to appropriate directory 2013-01-17 21:43:17 -08:00
Patrick Wendell e0165bf714 Adding queueStream and some slight refactoring 2013-01-17 21:25:49 -08:00
Patrick Wendell ee0314c3b3 Merge branch 'streaming' into streaming-java-api 2013-01-17 18:43:00 -08:00
Patrick Wendell 70ba994d6d Import fixup 2013-01-17 18:41:59 -08:00
Patrick Wendell 2261e62ee5 Style cleanup 2013-01-17 18:41:59 -08:00
Patrick Wendell 82b8707c6b Checkpointing in Streaming java API 2013-01-17 18:41:58 -08:00
Patrick Wendell 61b877c688 Adding flatMap 2013-01-17 18:41:58 -08:00
Patrick Wendell 8e6cbbc6c7 Adding other updateState functions 2013-01-17 18:41:58 -08:00
Patrick Wendell 2a872335c5 Bug fix and test cleanup 2013-01-17 18:41:58 -08:00
seanm a4639400ea Merge branch 'streaming' into sm-updateStateByKey 2013-01-15 20:29:01 -07:00
Tathagata Das 1638fcb0dc Fixed updateStateByKey to work with primitive types. 2013-01-14 17:18:39 -08:00
seanm c203a29296 StateDStream changes to give updateStateByKey consistent behavior 2013-01-14 17:22:03 -07:00
seanm b61a4ec773 Removing offset management code that is non-existent in kafka 0.7.0+ 2013-01-14 17:13:10 -07:00
Patrick Wendell a0013beb03 Stash 2013-01-14 15:15:02 -08:00
Patrick Wendell 8ad6220bd3 Bugfix 2013-01-14 14:56:36 -08:00
Patrick Wendell 38d9a3a863 Remove AnyRef constraint in updateState 2013-01-14 13:31:49 -08:00
Patrick Wendell ae5290f4a2 Bug fix 2013-01-14 13:31:32 -08:00
Patrick Wendell 6069446356 Making comments consistent w/ Spark style 2013-01-14 10:42:05 -08:00
Patrick Wendell d182a57cae Two changes:
- Updating countByX() types based on bug fix
- Porting new documentation to Java
2013-01-14 10:03:55 -08:00
Patrick Wendell a292ed8d8a Some style cleanup 2013-01-14 09:42:36 -08:00
Patrick Wendell 3461cd99b7 Flume example and bug fix 2013-01-14 09:42:36 -08:00
Patrick Wendell 5bcb048167 More work on InputStreams 2013-01-14 09:42:36 -08:00
Patrick Wendell 280b6d0186 Porting to new Duration class 2013-01-14 09:42:36 -08:00
Patrick Wendell c2537057f9 Fixing issue with <Long> types 2013-01-14 09:42:36 -08:00
Patrick Wendell b36c4f7cce More work on StreamingContext 2013-01-14 09:42:36 -08:00
Patrick Wendell 5004eec37c Import Cleanup 2013-01-14 09:42:36 -08:00
Patrick Wendell 2fe39a4468 Some docs for the JavaTestUtils 2013-01-14 09:42:36 -08:00
Patrick Wendell 560c312c60 Docs, some tests, and work on
StreamingContext
2013-01-14 09:42:36 -08:00
Patrick Wendell 7e1049d8f1 Squashing a few TODOs 2013-01-14 09:42:36 -08:00
Patrick Wendell 74182010a4 Style cleanup and moving functions 2013-01-14 09:42:36 -08:00
Patrick Wendell 056f5efc55 More pair functions 2013-01-14 09:42:36 -08:00
Patrick Wendell 6e514a8d35 PairDStream and DStreamLike 2013-01-14 09:42:36 -08:00
Patrick Wendell f144e0413a Adding transform and union 2013-01-14 09:42:36 -08:00
Patrick Wendell 0d0bab25bd Reduce tests 2013-01-14 09:42:36 -08:00
Patrick Wendell 22a8c7be9a Adding more tests 2013-01-14 09:42:36 -08:00
Patrick Wendell 91b3d41448 Better equality test (thanks Josh) 2013-01-14 09:42:36 -08:00
Patrick Wendell b0974e6c1d Remving main method from tests 2013-01-14 09:42:36 -08:00
Patrick Wendell 867a7455e2 Adding some initial tests to streaming API. 2013-01-14 09:42:36 -08:00
Patrick Wendell b607c9e916 A very rough, early cut at some Java functionality for Streaming. 2013-01-14 09:42:36 -08:00
Tathagata Das f90f794cde Minor name fix 2013-01-13 21:25:57 -08:00
Tathagata Das 6cc8592f26 Fixed bug 2013-01-13 21:20:49 -08:00
Tathagata Das 0dbd411a56 Added documentation for PairDStreamFunctions. 2013-01-13 21:08:35 -08:00
Tathagata Das 0a2e333341 Removed stream id from the constructor of NetworkReceiver to make it easier for PluggableNetworkInputDStream. 2013-01-13 16:18:39 -08:00
Tathagata Das 365506fb03 Changed variable name form ***Time to ***Duration to keep things consistent. 2013-01-09 14:29:25 -08:00
Tathagata Das 156e8b47ef Split Time to Time (absolute instant of time) and Duration (duration of time). 2013-01-09 12:42:10 -08:00
Tathagata Das 8c1b872512 Moved Twitter example to the where the other examples are. 2013-01-07 17:48:10 -08:00
Tathagata Das 64dceec293 Merge branch 'streaming-merge' into dev-merge 2013-01-07 16:54:35 -08:00
Tathagata Das d808e1026a Merge branch 'dev' into dev-merge 2013-01-07 16:41:11 -08:00
Tathagata Das 4719e6d8fe Changed locations for unit test logs. 2013-01-07 16:06:07 -08:00
Tathagata Das e60514d79e Fixed bug 2013-01-07 15:16:16 -08:00
Tathagata Das 237bac36e9 Renamed examples and added documentation. 2013-01-07 14:37:21 -08:00
Tathagata Das af8738dfb5 Moved Spark Streaming examples to examples sub-project. 2013-01-06 19:31:54 -08:00
Patrick Wendell 2ef993d159 BufferingBlockCreator -> NetworkReceiver.BlockGenerator 2013-01-02 14:19:51 -08:00
Patrick Wendell 96a6ff0b09 Merge branch 'dev-merge' into datahandler-fix
Conflicts:
	streaming/src/main/scala/spark/streaming/dstream/DataHandler.scala
2013-01-02 14:08:15 -08:00
Patrick Wendell 493d65ce65 Several code-quality improvements to DataHandler.
- Changed to more accurate name: BufferingBlockCreator
- Docstring now correctly reflects the abstraction
  offered by the class
- Made internal methods private
- Fixed indentation problems
2013-01-02 13:39:18 -08:00
Tathagata Das 02497f0cd4 Updated Streaming Programming Guide. 2013-01-01 12:21:32 -08:00
Tathagata Das 18b9b3b99f More classes made private[streaming] to hide from scala docs. 2012-12-30 20:00:42 -08:00
Tathagata Das 7e0271b438 Refactored a whole lot to push all DStreams into the spark.streaming.dstream package. 2012-12-30 15:19:55 -08:00
Tathagata Das 9e644402c1 Improved jekyll and scala docs. Made many classes and method private to remove them from scala docs. 2012-12-29 18:31:51 -08:00
Patrick Wendell 518111573f Merge pull request #8 from radlab/twitter-example
Adding a Twitter InputDStream with an example
2012-12-29 14:23:01 -08:00
Tathagata Das 0bc0a60d30 Modifications to make sure LocalScheduler terminate cleanly without errors when SparkContext is shutdown, to minimize spurious exception during master failure tests. 2012-12-27 15:37:33 -08:00
Patrick Wendell bce84ceabb Minor changes after review and general cleanup.
- Added filters to Twitter example
- Removed un-used import
- Some code clean-up
2012-12-21 20:57:46 -08:00
Patrick Wendell 9ac4cb1c5f Adding a Twitter InputDStream with an example 2012-12-21 17:18:19 -08:00
Tathagata Das 8512dd3225 Merge branch 'dev' of github.com:radlab/spark into dev-checkpoint
Conflicts:
	core/src/main/scala/spark/ParallelCollection.scala
	core/src/test/scala/spark/CheckpointSuite.scala
	streaming/src/main/scala/spark/streaming/DStream.scala
2012-12-20 14:24:19 -08:00
Tathagata Das 8e74fac215 Made checkpoint data in RDDs optional to further reduce serialized size. 2012-12-11 15:36:12 -08:00
Patrick Wendell 3e796bdd57 Changes in response to TD's review. 2012-12-07 19:34:05 -08:00
Patrick Wendell 3ff9710265 Adding Flume InputDStream 2012-12-07 16:42:39 -08:00
Denny 556c38ed91 Added kafka JAR 2012-12-05 11:54:42 -08:00
Denny a23462191f Adjust Kafka code to work with new streaming changes. 2012-12-05 10:30:40 -08:00
Denny 15df4b0e52 Merge branch 'dev' into kafka
Conflicts:
	streaming/src/main/scala/spark/streaming/DStream.scala
2012-12-05 10:16:56 -08:00
Tathagata Das 21a0852976 Refactored RDD checkpointing to minimize extra fields in RDD class. 2012-12-04 22:10:25 -08:00
Tathagata Das 609e00d599 Minor mods 2012-12-02 02:39:08 +00:00
Tathagata Das b4dba55f78 Made RDD checkpoint not create a new thread. Fixed bug in detecting when spark.cleaner.delay is insufficient. 2012-12-02 02:03:05 +00:00
Tathagata Das 477de94894 Minor modifications. 2012-12-01 13:15:06 -08:00
Tathagata Das 62965c5d8e Added ssc.union 2012-12-01 08:26:10 -08:00
Tathagata Das d5e7aad039 Bug fixes 2012-11-28 08:36:55 +00:00
Tathagata Das b18d70870a Modified bunch HashMaps in Spark to use TimeStampedHashMap and made various modules use CleanupTask to periodically clean up metadata. 2012-11-27 15:08:49 -08:00
Tathagata Das 0fe2fc4d5e Merged branch mesos/master to branch dev. 2012-11-26 13:16:59 -08:00
Tathagata Das fd11d23bb3 Modified StreamingContext API to make constructor accept the batch size (since it is always needed, Patrick's suggestion). Added description to DStream and StreamingContext. 2012-11-19 19:04:39 -08:00
Tathagata Das c97ebf6437 Fixed bug in the number of splits in RDD after checkpointing. Modified reduceByKeyAndWindow (naive) computation from window+reduceByKey to reduceByKey+window+reduceByKey. 2012-11-19 23:22:07 +00:00
Denny 5e2b0a3bf6 Added Kafka Wordcount producer 2012-11-19 10:17:58 -08:00
Denny 6757ed6a40 Comment out code for fault-tolerance. 2012-11-19 09:42:35 -08:00
Denny f56befa914 Merge branch 'dev' into kafka 2012-11-19 09:29:54 -08:00
Tathagata Das 3fd7b8319b Merge branch 'dev' of github.com:radlab/spark into dev 2012-11-17 17:27:07 -08:00
Tathagata Das 10c1abcb6a Fixed checkpointing bug in CoGroupedRDD. CoGroupSplits kept around the RDD splits of its parent RDDs, thus checkpointing its parents did not release the references to the parent splits. 2012-11-17 17:27:00 -08:00
Patrick Wendell efa93fd0e6 Merge pull request #4 from radlab/streaming-example
A "streaming page view" example.
2012-11-16 20:40:27 -08:00
Patrick Wendell 720cb0f467 A "streaming page view" example. 2012-11-16 12:11:22 -08:00
Denny 2aceae25be Merge branch 'dev' into kafka
Conflicts:
	streaming/src/main/scala/spark/streaming/DStream.scala
2012-11-13 13:16:18 -08:00
Denny b6f7ba813e change import for example function 2012-11-13 13:15:32 -08:00