Tathagata Das
def8126d77
Added TwitterInputDStream from example to StreamingContext. Renamed example TwitterBasic to TwitterPopularTags.
2013-02-14 17:49:43 -08:00
Tathagata Das
2eacf22401
Removed countByKeyAndWindow on paired DStreams, and added countByValueAndWindow for all DStreams. Updated both scala and java API and testsuites.
2013-02-14 12:21:47 -08:00
Tathagata Das
03e8dc6861
Changes functions comments to make them more consistent.
2013-02-13 20:59:29 -08:00
Tathagata Das
12b020b668
Added filter functionality to reduceByKeyAndWindow with inverse. Consolidated reduceByKeyAndWindow's many functions into smaller number of functions with optional parameters.
2013-02-13 20:53:50 -08:00
Tathagata Das
39addd3803
Changed scheduler and file input stream to fix bugs in the driver fault tolerance. Added MasterFailureTest to rigorously test master fault tolerance with file input stream.
2013-02-13 12:17:45 -08:00
Patrick Wendell
3f3e77f28b
STREAMING-50: Support transform workaround in JavaPairDStream
...
This ports a useful workaround (the `transform` function) to
JavaPairDStream. It is necessary to do things like sorting which
are not supported yet in the core streaming API.
2013-02-12 14:02:32 -08:00
Patrick Wendell
d09c36065c
Using tuple swap()
2013-02-11 10:45:45 -08:00
Patrick Wendell
04786d0739
small fix
2013-02-11 10:05:49 -08:00
Patrick Wendell
c65988bdc1
Fix for MapPartitions
2013-02-11 10:03:37 -08:00
Patrick Wendell
20cf770545
Fix for flatmap
2013-02-11 10:03:37 -08:00
Patrick Wendell
314d87a038
Indentation fix
2013-02-11 10:03:37 -08:00
Patrick Wendell
f0b68c623c
Initial cut at replacing K, V in Java files
2013-02-11 10:03:37 -08:00
Tathagata Das
fd90daf850
Fixed bugs in FileInputDStream and Scheduler that occasionally failed to reprocess old files after recovering from master failure. Completely modified spark.streaming.FailureTest to test multiple master failures using file input stream.
2013-02-10 19:48:42 -08:00
Tathagata Das
99a5fc498a
Added an initial spark job to ensure worker nodes are initialized.
2013-02-09 15:18:05 -08:00
Tathagata Das
4cc223b478
Merge branch 'mesos-master' into streaming
2013-02-07 13:59:31 -08:00
Tathagata Das
d55e3aa467
Updated JavaStreamingContext with updated kafkaStream API.
2013-02-07 13:59:18 -08:00
Tathagata Das
c6b2f765d3
Merge branch 'mesos-streaming' into streaming
2013-02-07 13:13:53 -08:00
Tathagata Das
12300758cc
Merge pull request #372 from Reinvigorate/sm-kafka
...
Removing offset management code that is non-existent in kafka 0.7.0+
2013-02-07 12:41:07 -08:00
Tathagata Das
915d9931fe
Merge pull request #373 from Reinvigorate/sm-updateStateByKey
...
StateDStream changes to give updateStateByKey consistent behavior
2013-02-07 11:59:19 -08:00
Patrick Wendell
7eea64aa4c
Streaming constructor which takes JavaSparkContext
...
It's sometimes helpful to directly pass a JavaSparkContext,
and take advantage of the various constructors available for that.
2013-02-05 11:43:16 -08:00
Mikhail Bautin
fe3eceab57
Remove activation of profiles by default
...
See the discussion at https://github.com/mesos/spark/pull/355 for why
default profile activation is a problem.
2013-01-31 13:30:41 -08:00
Matei Zaharia
9ae11603b4
Merge pull request #415 from stephenh/driver
...
Replace old 'master' term with 'driver'.
2013-01-29 10:41:42 -08:00
Matei Zaharia
b29599e5cf
Fix code that depended on metadata cleaner interval being in minutes
2013-01-28 22:24:47 -08:00
Stephen Haberman
7dfb82a992
Replace old 'master' term with 'driver'.
2013-01-25 11:03:00 -06:00
Tathagata Das
9c8ff1e55f
Fixed checkpoint testcases
2013-01-23 07:31:49 -08:00
Tathagata Das
666ce431aa
Added support for rescheduling unprocessed batches on master failure.
2013-01-23 03:15:36 -08:00
Tathagata Das
fad2b82fc8
Added support for saving input files of FileInputDStream to graph checkpoints. Modified 'file input stream with checkpoint' testcase to test recovery of pre-master-failure input files.
2013-01-22 18:10:00 -08:00
Tathagata Das
364cdb679c
Refactored DStreamCheckpointData.
2013-01-22 00:43:31 -08:00
Prashant Sharma
d17065c4b5
actor as receiver
2013-01-22 13:28:29 +05:30
Josh Rosen
551a47a620
Refactor daemon thread pool creation.
2013-01-21 23:31:00 -08:00
Stephen Haberman
e5ca241335
Move JavaAPISuite into spark.streaming.
2013-01-21 16:06:58 -06:00
Prashant Sharma
43bfd7bb21
Changed method name of createReceiver to getReceiver as it is not intended to be a factory.
2013-01-21 11:39:30 +05:30
Matei Zaharia
6e3754bf47
Add Maven build file for streaming, and fix some issues in SBT file
...
As part of this, changed our Scala 2.9.2 Kafka library to be available
as a local Maven repository, following the example in
(http://blog.dub.podval.org/2010/01/maven-in-project-repository.html )
2013-01-20 19:22:24 -08:00
seanm
c0694291c8
Splitting StreamingContext.queueStream into two methods
2013-01-20 12:09:45 -07:00
seanm
ea739251eb
adding updateStateByKey object lifecycle test
2013-01-20 11:29:21 -07:00
Tathagata Das
33bad85bb9
Fixed streaming testsuite bugs
2013-01-20 03:51:11 -08:00
Tathagata Das
4f8fe58b25
Merge branch 'mesos-streaming' into streaming
...
Conflicts:
core/src/main/scala/spark/api/java/JavaRDDLike.scala
core/src/main/scala/spark/api/java/JavaSparkContext.scala
core/src/test/scala/spark/JavaAPISuite.java
2013-01-20 01:13:56 -08:00
Tathagata Das
214345ceac
Fixed issue https://spark-project.atlassian.net/browse/STREAMING-29 , along with updates to doc comments in SparkContext.checkpoint().
2013-01-19 23:50:17 -08:00
Prashant Sharma
56b9bd197c
Plug in actor as stream receiver API
2013-01-19 22:04:07 +05:30
Prashant Sharma
bb6ab92e31
Changed method name of createReceiver to getReceiver as it is not intended to be a factory.
2013-01-19 22:04:07 +05:30
seanm
d3064fe707
kafkaStream API cleanup. A quorum of zookeepers can now be specified
2013-01-18 21:34:29 -07:00
seanm
56b7fbafa2
further KafkaInputDStream cleanup (removing unused and commented out code relating to offset management)
2013-01-18 21:15:54 -07:00
Patrick Wendell
c46dd2de78
Moving tests to appropriate directory
2013-01-17 21:43:17 -08:00
Patrick Wendell
e0165bf714
Adding queueStream and some slight refactoring
2013-01-17 21:25:49 -08:00
Patrick Wendell
ee0314c3b3
Merge branch 'streaming' into streaming-java-api
2013-01-17 18:43:00 -08:00
Patrick Wendell
70ba994d6d
Import fixup
2013-01-17 18:41:59 -08:00
Patrick Wendell
2261e62ee5
Style cleanup
2013-01-17 18:41:59 -08:00
Patrick Wendell
82b8707c6b
Checkpointing in Streaming java API
2013-01-17 18:41:58 -08:00
Patrick Wendell
61b877c688
Adding flatMap
2013-01-17 18:41:58 -08:00
Patrick Wendell
8e6cbbc6c7
Adding other updateState functions
2013-01-17 18:41:58 -08:00
Patrick Wendell
2a872335c5
Bug fix and test cleanup
2013-01-17 18:41:58 -08:00
seanm
a4639400ea
Merge branch 'streaming' into sm-updateStateByKey
2013-01-15 20:29:01 -07:00
Tathagata Das
1638fcb0dc
Fixed updateStateByKey to work with primitive types.
2013-01-14 17:18:39 -08:00
seanm
c203a29296
StateDStream changes to give updateStateByKey consistent behavior
2013-01-14 17:22:03 -07:00
seanm
b61a4ec773
Removing offset management code that is non-existent in kafka 0.7.0+
2013-01-14 17:13:10 -07:00
Patrick Wendell
a0013beb03
Stash
2013-01-14 15:15:02 -08:00
Patrick Wendell
8ad6220bd3
Bugfix
2013-01-14 14:56:36 -08:00
Patrick Wendell
38d9a3a863
Remove AnyRef constraint in updateState
2013-01-14 13:31:49 -08:00
Patrick Wendell
ae5290f4a2
Bug fix
2013-01-14 13:31:32 -08:00
Patrick Wendell
6069446356
Making comments consistent w/ Spark style
2013-01-14 10:42:05 -08:00
Patrick Wendell
d182a57cae
Two changes:
...
- Updating countByX() types based on bug fix
- Porting new documentation to Java
2013-01-14 10:03:55 -08:00
Patrick Wendell
a292ed8d8a
Some style cleanup
2013-01-14 09:42:36 -08:00
Patrick Wendell
3461cd99b7
Flume example and bug fix
2013-01-14 09:42:36 -08:00
Patrick Wendell
5bcb048167
More work on InputStreams
2013-01-14 09:42:36 -08:00
Patrick Wendell
280b6d0186
Porting to new Duration class
2013-01-14 09:42:36 -08:00
Patrick Wendell
c2537057f9
Fixing issue with <Long> types
2013-01-14 09:42:36 -08:00
Patrick Wendell
b36c4f7cce
More work on StreamingContext
2013-01-14 09:42:36 -08:00
Patrick Wendell
5004eec37c
Import Cleanup
2013-01-14 09:42:36 -08:00
Patrick Wendell
2fe39a4468
Some docs for the JavaTestUtils
2013-01-14 09:42:36 -08:00
Patrick Wendell
560c312c60
Docs, some tests, and work on
...
StreamingContext
2013-01-14 09:42:36 -08:00
Patrick Wendell
7e1049d8f1
Squashing a few TODOs
2013-01-14 09:42:36 -08:00
Patrick Wendell
74182010a4
Style cleanup and moving functions
2013-01-14 09:42:36 -08:00
Patrick Wendell
056f5efc55
More pair functions
2013-01-14 09:42:36 -08:00
Patrick Wendell
6e514a8d35
PairDStream and DStreamLike
2013-01-14 09:42:36 -08:00
Patrick Wendell
f144e0413a
Adding transform and union
2013-01-14 09:42:36 -08:00
Patrick Wendell
0d0bab25bd
Reduce tests
2013-01-14 09:42:36 -08:00
Patrick Wendell
22a8c7be9a
Adding more tests
2013-01-14 09:42:36 -08:00
Patrick Wendell
91b3d41448
Better equality test (thanks Josh)
2013-01-14 09:42:36 -08:00
Patrick Wendell
b0974e6c1d
Remving main method from tests
2013-01-14 09:42:36 -08:00
Patrick Wendell
867a7455e2
Adding some initial tests to streaming API.
2013-01-14 09:42:36 -08:00
Patrick Wendell
b607c9e916
A very rough, early cut at some Java functionality for Streaming.
2013-01-14 09:42:36 -08:00
Tathagata Das
f90f794cde
Minor name fix
2013-01-13 21:25:57 -08:00
Tathagata Das
6cc8592f26
Fixed bug
2013-01-13 21:20:49 -08:00
Tathagata Das
0dbd411a56
Added documentation for PairDStreamFunctions.
2013-01-13 21:08:35 -08:00
Tathagata Das
0a2e333341
Removed stream id from the constructor of NetworkReceiver to make it easier for PluggableNetworkInputDStream.
2013-01-13 16:18:39 -08:00
Tathagata Das
365506fb03
Changed variable name form ***Time to ***Duration to keep things consistent.
2013-01-09 14:29:25 -08:00
Tathagata Das
156e8b47ef
Split Time to Time (absolute instant of time) and Duration (duration of time).
2013-01-09 12:42:10 -08:00
Tathagata Das
8c1b872512
Moved Twitter example to the where the other examples are.
2013-01-07 17:48:10 -08:00
Tathagata Das
64dceec293
Merge branch 'streaming-merge' into dev-merge
2013-01-07 16:54:35 -08:00
Tathagata Das
d808e1026a
Merge branch 'dev' into dev-merge
2013-01-07 16:41:11 -08:00
Tathagata Das
4719e6d8fe
Changed locations for unit test logs.
2013-01-07 16:06:07 -08:00
Tathagata Das
e60514d79e
Fixed bug
2013-01-07 15:16:16 -08:00
Tathagata Das
237bac36e9
Renamed examples and added documentation.
2013-01-07 14:37:21 -08:00
Tathagata Das
af8738dfb5
Moved Spark Streaming examples to examples sub-project.
2013-01-06 19:31:54 -08:00
Patrick Wendell
2ef993d159
BufferingBlockCreator -> NetworkReceiver.BlockGenerator
2013-01-02 14:19:51 -08:00
Patrick Wendell
96a6ff0b09
Merge branch 'dev-merge' into datahandler-fix
...
Conflicts:
streaming/src/main/scala/spark/streaming/dstream/DataHandler.scala
2013-01-02 14:08:15 -08:00
Patrick Wendell
493d65ce65
Several code-quality improvements to DataHandler.
...
- Changed to more accurate name: BufferingBlockCreator
- Docstring now correctly reflects the abstraction
offered by the class
- Made internal methods private
- Fixed indentation problems
2013-01-02 13:39:18 -08:00
Tathagata Das
02497f0cd4
Updated Streaming Programming Guide.
2013-01-01 12:21:32 -08:00
Tathagata Das
18b9b3b99f
More classes made private[streaming] to hide from scala docs.
2012-12-30 20:00:42 -08:00
Tathagata Das
7e0271b438
Refactored a whole lot to push all DStreams into the spark.streaming.dstream package.
2012-12-30 15:19:55 -08:00
Tathagata Das
9e644402c1
Improved jekyll and scala docs. Made many classes and method private to remove them from scala docs.
2012-12-29 18:31:51 -08:00
Patrick Wendell
518111573f
Merge pull request #8 from radlab/twitter-example
...
Adding a Twitter InputDStream with an example
2012-12-29 14:23:01 -08:00
Tathagata Das
0bc0a60d30
Modifications to make sure LocalScheduler terminate cleanly without errors when SparkContext is shutdown, to minimize spurious exception during master failure tests.
2012-12-27 15:37:33 -08:00
Patrick Wendell
bce84ceabb
Minor changes after review and general cleanup.
...
- Added filters to Twitter example
- Removed un-used import
- Some code clean-up
2012-12-21 20:57:46 -08:00
Patrick Wendell
9ac4cb1c5f
Adding a Twitter InputDStream with an example
2012-12-21 17:18:19 -08:00
Tathagata Das
8512dd3225
Merge branch 'dev' of github.com:radlab/spark into dev-checkpoint
...
Conflicts:
core/src/main/scala/spark/ParallelCollection.scala
core/src/test/scala/spark/CheckpointSuite.scala
streaming/src/main/scala/spark/streaming/DStream.scala
2012-12-20 14:24:19 -08:00
Tathagata Das
8e74fac215
Made checkpoint data in RDDs optional to further reduce serialized size.
2012-12-11 15:36:12 -08:00
Patrick Wendell
3e796bdd57
Changes in response to TD's review.
2012-12-07 19:34:05 -08:00
Patrick Wendell
3ff9710265
Adding Flume InputDStream
2012-12-07 16:42:39 -08:00
Denny
556c38ed91
Added kafka JAR
2012-12-05 11:54:42 -08:00
Denny
a23462191f
Adjust Kafka code to work with new streaming changes.
2012-12-05 10:30:40 -08:00
Denny
15df4b0e52
Merge branch 'dev' into kafka
...
Conflicts:
streaming/src/main/scala/spark/streaming/DStream.scala
2012-12-05 10:16:56 -08:00
Tathagata Das
21a0852976
Refactored RDD checkpointing to minimize extra fields in RDD class.
2012-12-04 22:10:25 -08:00
Tathagata Das
609e00d599
Minor mods
2012-12-02 02:39:08 +00:00
Tathagata Das
b4dba55f78
Made RDD checkpoint not create a new thread. Fixed bug in detecting when spark.cleaner.delay is insufficient.
2012-12-02 02:03:05 +00:00
Tathagata Das
477de94894
Minor modifications.
2012-12-01 13:15:06 -08:00
Tathagata Das
62965c5d8e
Added ssc.union
2012-12-01 08:26:10 -08:00
Tathagata Das
d5e7aad039
Bug fixes
2012-11-28 08:36:55 +00:00
Tathagata Das
b18d70870a
Modified bunch HashMaps in Spark to use TimeStampedHashMap and made various modules use CleanupTask to periodically clean up metadata.
2012-11-27 15:08:49 -08:00
Tathagata Das
0fe2fc4d5e
Merged branch mesos/master to branch dev.
2012-11-26 13:16:59 -08:00
Tathagata Das
fd11d23bb3
Modified StreamingContext API to make constructor accept the batch size (since it is always needed, Patrick's suggestion). Added description to DStream and StreamingContext.
2012-11-19 19:04:39 -08:00
Tathagata Das
c97ebf6437
Fixed bug in the number of splits in RDD after checkpointing. Modified reduceByKeyAndWindow (naive) computation from window+reduceByKey to reduceByKey+window+reduceByKey.
2012-11-19 23:22:07 +00:00
Denny
5e2b0a3bf6
Added Kafka Wordcount producer
2012-11-19 10:17:58 -08:00
Denny
6757ed6a40
Comment out code for fault-tolerance.
2012-11-19 09:42:35 -08:00
Denny
f56befa914
Merge branch 'dev' into kafka
2012-11-19 09:29:54 -08:00
Tathagata Das
3fd7b8319b
Merge branch 'dev' of github.com:radlab/spark into dev
2012-11-17 17:27:07 -08:00
Tathagata Das
10c1abcb6a
Fixed checkpointing bug in CoGroupedRDD. CoGroupSplits kept around the RDD splits of its parent RDDs, thus checkpointing its parents did not release the references to the parent splits.
2012-11-17 17:27:00 -08:00
Patrick Wendell
efa93fd0e6
Merge pull request #4 from radlab/streaming-example
...
A "streaming page view" example.
2012-11-16 20:40:27 -08:00
Patrick Wendell
720cb0f467
A "streaming page view" example.
2012-11-16 12:11:22 -08:00
Denny
2aceae25be
Merge branch 'dev' into kafka
...
Conflicts:
streaming/src/main/scala/spark/streaming/DStream.scala
2012-11-13 13:16:18 -08:00
Denny
b6f7ba813e
change import for example function
2012-11-13 13:15:32 -08:00
Tathagata Das
26fec8f0b8
Fixed bug in MappedValuesRDD, and set default graph checkpoint interval to be batch duration.
2012-11-13 11:05:57 -08:00
Tathagata Das
c3ccd14cf8
Replaced StateRDD in StateDStream with MapPartitionsRDD.
2012-11-13 02:43:03 -08:00
Tathagata Das
8a25d530ed
Optimized checkpoint writing by reusing FileSystem object. Fixed bug in updating of checkpoint data in DStream where the checkpointed RDDs, upon recovery, were not recognized as checkpointed RDDs and therefore deleted from HDFS. Made InputStreamsSuite more robust to timing delays.
2012-11-13 02:16:28 -08:00
Denny
255b3e44c1
Merge branch 'dev' into kafka
2012-11-12 19:39:29 -08:00
Tathagata Das
564dd8c3f4
Speeded up CheckpointSuite
2012-11-12 14:22:05 -08:00
Tathagata Das
b9bfd1456f
Changed default level on calling DStream.persist() to be MEMORY_ONLY_SER. Also changed the persist level of StateDStream to be MEMORY_ONLY_SER.
2012-11-12 21:51:42 +00:00
Tathagata Das
ae61ebaee6
Fixed bugs in RawNetworkInputDStream and in its examples. Made the ReducedWindowedDStream persist RDDs to MEMOERY_SER_ONLY by default. Removed unncessary examples. Added streaming-env.sh.template to add recommended setting for streaming.
2012-11-12 21:45:16 +00:00
tdas
052d0b800f
Merge branch 'dev' of github.com:radlab/spark into dev
2012-11-11 22:56:14 +00:00
Tathagata Das
46222dc56d
Fixed bug in FileInputDStream that allowed it to miss new files. Added tests in the InputStreamsSuite to test checkpointing of file and network streams.
2012-11-11 13:20:09 -08:00
Denny
0fd4c93f1c
Updated comment.
2012-11-11 11:15:31 -08:00
Denny
deb2c4df72
Add comment.
2012-11-11 11:11:49 -08:00
Denny
d006109e95
Kafka Stream comments.
2012-11-11 11:06:49 -08:00
Denny
2e8f2ee4ad
Merge branch 'dev' of github.com:radlab/spark into kafka
...
Conflicts:
streaming/src/main/scala/spark/streaming/DStream.scala
2012-11-09 12:26:17 -08:00
Denny
e5a0936787
Kafka Stream.
2012-11-09 12:23:46 -08:00
tdas
52d21cb682
Removed unnecessary files.
2012-11-08 11:35:40 +00:00
tdas
cc2a65f547
Fixed bug in InputStreamsSuite
2012-11-08 11:17:57 +00:00
Tathagata Das
fc3d0b602a
Added FailureTestsuite for testing multiple, repeated master failures.
2012-11-06 17:23:31 -08:00
Denny
485803d740
Merge branch 'dev' of github.com:radlab/spark into kafka
2012-11-06 09:41:45 -08:00
Denny
0c1de43fc7
Working on kafka.
2012-11-06 09:41:42 -08:00