Tathagata Das
7e0271b438
Refactored a whole lot to push all DStreams into the spark.streaming.dstream package.
2012-12-30 15:19:55 -08:00
Tathagata Das
9e644402c1
Improved jekyll and scala docs. Made many classes and method private to remove them from scala docs.
2012-12-29 18:31:51 -08:00
Patrick Wendell
518111573f
Merge pull request #8 from radlab/twitter-example
...
Adding a Twitter InputDStream with an example
2012-12-29 14:23:01 -08:00
Tathagata Das
0bc0a60d30
Modifications to make sure LocalScheduler terminate cleanly without errors when SparkContext is shutdown, to minimize spurious exception during master failure tests.
2012-12-27 15:37:33 -08:00
Patrick Wendell
bce84ceabb
Minor changes after review and general cleanup.
...
- Added filters to Twitter example
- Removed un-used import
- Some code clean-up
2012-12-21 20:57:46 -08:00
Patrick Wendell
9ac4cb1c5f
Adding a Twitter InputDStream with an example
2012-12-21 17:18:19 -08:00
Tathagata Das
8512dd3225
Merge branch 'dev' of github.com:radlab/spark into dev-checkpoint
...
Conflicts:
core/src/main/scala/spark/ParallelCollection.scala
core/src/test/scala/spark/CheckpointSuite.scala
streaming/src/main/scala/spark/streaming/DStream.scala
2012-12-20 14:24:19 -08:00
Tathagata Das
8e74fac215
Made checkpoint data in RDDs optional to further reduce serialized size.
2012-12-11 15:36:12 -08:00
Patrick Wendell
3e796bdd57
Changes in response to TD's review.
2012-12-07 19:34:05 -08:00
Patrick Wendell
3ff9710265
Adding Flume InputDStream
2012-12-07 16:42:39 -08:00
Denny
556c38ed91
Added kafka JAR
2012-12-05 11:54:42 -08:00
Denny
a23462191f
Adjust Kafka code to work with new streaming changes.
2012-12-05 10:30:40 -08:00
Denny
15df4b0e52
Merge branch 'dev' into kafka
...
Conflicts:
streaming/src/main/scala/spark/streaming/DStream.scala
2012-12-05 10:16:56 -08:00
Tathagata Das
21a0852976
Refactored RDD checkpointing to minimize extra fields in RDD class.
2012-12-04 22:10:25 -08:00
Tathagata Das
609e00d599
Minor mods
2012-12-02 02:39:08 +00:00
Tathagata Das
b4dba55f78
Made RDD checkpoint not create a new thread. Fixed bug in detecting when spark.cleaner.delay is insufficient.
2012-12-02 02:03:05 +00:00
Tathagata Das
477de94894
Minor modifications.
2012-12-01 13:15:06 -08:00
Tathagata Das
62965c5d8e
Added ssc.union
2012-12-01 08:26:10 -08:00
Tathagata Das
d5e7aad039
Bug fixes
2012-11-28 08:36:55 +00:00
Tathagata Das
b18d70870a
Modified bunch HashMaps in Spark to use TimeStampedHashMap and made various modules use CleanupTask to periodically clean up metadata.
2012-11-27 15:08:49 -08:00
Tathagata Das
0fe2fc4d5e
Merged branch mesos/master to branch dev.
2012-11-26 13:16:59 -08:00
Tathagata Das
fd11d23bb3
Modified StreamingContext API to make constructor accept the batch size (since it is always needed, Patrick's suggestion). Added description to DStream and StreamingContext.
2012-11-19 19:04:39 -08:00
Tathagata Das
c97ebf6437
Fixed bug in the number of splits in RDD after checkpointing. Modified reduceByKeyAndWindow (naive) computation from window+reduceByKey to reduceByKey+window+reduceByKey.
2012-11-19 23:22:07 +00:00
Denny
5e2b0a3bf6
Added Kafka Wordcount producer
2012-11-19 10:17:58 -08:00
Denny
6757ed6a40
Comment out code for fault-tolerance.
2012-11-19 09:42:35 -08:00
Denny
f56befa914
Merge branch 'dev' into kafka
2012-11-19 09:29:54 -08:00
Tathagata Das
3fd7b8319b
Merge branch 'dev' of github.com:radlab/spark into dev
2012-11-17 17:27:07 -08:00
Tathagata Das
10c1abcb6a
Fixed checkpointing bug in CoGroupedRDD. CoGroupSplits kept around the RDD splits of its parent RDDs, thus checkpointing its parents did not release the references to the parent splits.
2012-11-17 17:27:00 -08:00
Patrick Wendell
efa93fd0e6
Merge pull request #4 from radlab/streaming-example
...
A "streaming page view" example.
2012-11-16 20:40:27 -08:00
Patrick Wendell
720cb0f467
A "streaming page view" example.
2012-11-16 12:11:22 -08:00
Denny
2aceae25be
Merge branch 'dev' into kafka
...
Conflicts:
streaming/src/main/scala/spark/streaming/DStream.scala
2012-11-13 13:16:18 -08:00
Denny
b6f7ba813e
change import for example function
2012-11-13 13:15:32 -08:00
Tathagata Das
26fec8f0b8
Fixed bug in MappedValuesRDD, and set default graph checkpoint interval to be batch duration.
2012-11-13 11:05:57 -08:00
Tathagata Das
c3ccd14cf8
Replaced StateRDD in StateDStream with MapPartitionsRDD.
2012-11-13 02:43:03 -08:00
Tathagata Das
8a25d530ed
Optimized checkpoint writing by reusing FileSystem object. Fixed bug in updating of checkpoint data in DStream where the checkpointed RDDs, upon recovery, were not recognized as checkpointed RDDs and therefore deleted from HDFS. Made InputStreamsSuite more robust to timing delays.
2012-11-13 02:16:28 -08:00
Denny
255b3e44c1
Merge branch 'dev' into kafka
2012-11-12 19:39:29 -08:00
Tathagata Das
564dd8c3f4
Speeded up CheckpointSuite
2012-11-12 14:22:05 -08:00
Tathagata Das
b9bfd1456f
Changed default level on calling DStream.persist() to be MEMORY_ONLY_SER. Also changed the persist level of StateDStream to be MEMORY_ONLY_SER.
2012-11-12 21:51:42 +00:00
Tathagata Das
ae61ebaee6
Fixed bugs in RawNetworkInputDStream and in its examples. Made the ReducedWindowedDStream persist RDDs to MEMOERY_SER_ONLY by default. Removed unncessary examples. Added streaming-env.sh.template to add recommended setting for streaming.
2012-11-12 21:45:16 +00:00
tdas
052d0b800f
Merge branch 'dev' of github.com:radlab/spark into dev
2012-11-11 22:56:14 +00:00
Tathagata Das
46222dc56d
Fixed bug in FileInputDStream that allowed it to miss new files. Added tests in the InputStreamsSuite to test checkpointing of file and network streams.
2012-11-11 13:20:09 -08:00
Denny
0fd4c93f1c
Updated comment.
2012-11-11 11:15:31 -08:00
Denny
deb2c4df72
Add comment.
2012-11-11 11:11:49 -08:00
Denny
d006109e95
Kafka Stream comments.
2012-11-11 11:06:49 -08:00
Denny
2e8f2ee4ad
Merge branch 'dev' of github.com:radlab/spark into kafka
...
Conflicts:
streaming/src/main/scala/spark/streaming/DStream.scala
2012-11-09 12:26:17 -08:00
Denny
e5a0936787
Kafka Stream.
2012-11-09 12:23:46 -08:00
tdas
52d21cb682
Removed unnecessary files.
2012-11-08 11:35:40 +00:00
tdas
cc2a65f547
Fixed bug in InputStreamsSuite
2012-11-08 11:17:57 +00:00
Tathagata Das
fc3d0b602a
Added FailureTestsuite for testing multiple, repeated master failures.
2012-11-06 17:23:31 -08:00
Denny
485803d740
Merge branch 'dev' of github.com:radlab/spark into kafka
2012-11-06 09:41:45 -08:00
Denny
0c1de43fc7
Working on kafka.
2012-11-06 09:41:42 -08:00
Tathagata Das
f8bb719cd2
Added a few more comments to the checkpoint-related functions.
2012-11-05 17:53:56 -08:00
Tathagata Das
395167f2b2
Made more bug fixes for checkpointing.
2012-11-05 16:11:50 -08:00
Tathagata Das
72b2303f99
Fixed major bugs in checkpointing.
2012-11-05 11:41:36 -08:00
Tathagata Das
d154238789
Made checkpointing of dstream graph to work with checkpointing of RDDs. For streams requiring checkpointing of its RDD, the default checkpoint interval is set to 10 seconds.
2012-11-04 12:12:06 -08:00
Tathagata Das
3fb5c9ee24
Fixed serialization bug in countByWindow, added countByKey and countByKeyAndWindow, and added testcases for them.
2012-11-02 12:12:25 -07:00
Tathagata Das
1b900183c8
Added save operations to DStreams.
2012-10-27 18:55:50 -07:00
Tathagata Das
650d717544
Merge branch 'dev' of github.com:radlab/spark into dev
2012-10-25 13:03:18 -07:00
Matei Zaharia
863a55ae42
Merge remote-tracking branch 'public/master' into dev
...
Conflicts:
core/src/main/scala/spark/BlockStoreShuffleFetcher.scala
core/src/main/scala/spark/KryoSerializer.scala
core/src/main/scala/spark/MapOutputTracker.scala
core/src/main/scala/spark/RDD.scala
core/src/main/scala/spark/SparkContext.scala
core/src/main/scala/spark/executor/Executor.scala
core/src/main/scala/spark/network/Connection.scala
core/src/main/scala/spark/network/ConnectionManagerTest.scala
core/src/main/scala/spark/rdd/BlockRDD.scala
core/src/main/scala/spark/rdd/NewHadoopRDD.scala
core/src/main/scala/spark/scheduler/ShuffleMapTask.scala
core/src/main/scala/spark/scheduler/cluster/StandaloneSchedulerBackend.scala
core/src/main/scala/spark/storage/BlockManager.scala
core/src/main/scala/spark/storage/BlockMessage.scala
core/src/main/scala/spark/storage/BlockStore.scala
core/src/main/scala/spark/storage/StorageLevel.scala
core/src/main/scala/spark/util/AkkaUtils.scala
project/SparkBuild.scala
run
2012-10-24 23:21:00 -07:00
Tathagata Das
926e05b030
Added tests for the file input stream.
2012-10-24 23:14:37 -07:00
Tathagata Das
ed71df46cd
Minor fixes.
2012-10-24 16:49:40 -07:00
Tathagata Das
1ef6ea2513
Added tests for testing network input stream.
2012-10-24 14:44:20 -07:00
Tathagata Das
020d643484
Renamed the streaming testsuites.
2012-10-23 16:24:05 -07:00
Tathagata Das
0e5d9be4df
Renamed APIs to create queueStream and fileStream.
2012-10-23 15:17:05 -07:00
Tathagata Das
c2731dd3ef
Updated StateDStream api to use Options instead of nulls.
2012-10-23 15:10:27 -07:00
Tathagata Das
19191d178d
Renamed the network input streams.
2012-10-23 14:40:24 -07:00
Tathagata Das
a6de5758f1
Modified API of NetworkInputDStreams and got ObjectInputDStream and RawInputDStream working.
2012-10-23 01:41:13 -07:00
Tathagata Das
2c87c853ba
Renamed examples
2012-10-22 15:31:19 -07:00
Tathagata Das
d85c66636b
Added MapValueDStream, FlatMappedValuesDStream and CoGroupedDStream, and therefore DStream operations mapValue, flatMapValues, cogroup, and join. Also, added tests for DStream operations filter, glom, mapPartitions, groupByKey, mapValues, flatMapValues, cogroup, and join.
2012-10-21 17:40:08 -07:00
Tathagata Das
c4a2b6f636
Fixed some bugs in tests for forgetting RDDs, and made sure that use of manual clock leads to a zeroTime of 0 in the DStreams (more intuitive).
2012-10-21 10:41:25 -07:00
Tathagata Das
6d5eb4b40c
Added functionality to forget RDDs from DStreams.
2012-10-19 12:11:44 -07:00
Tathagata Das
b760d6426a
Minor modifications.
2012-10-15 12:26:44 -07:00
Tathagata Das
3f1aae5c71
Refactored DStreamSuiteBase to create CheckpointSuite- testsuite for testing checkpointing under different operations.
2012-10-14 21:39:30 -07:00
Tathagata Das
b08708e6fc
Fixed bugs in the streaming testsuites.
2012-10-13 21:02:24 -07:00
Tathagata Das
e95ff45b53
Implemented checkpointing of StreamingContext and DStream graph.
2012-10-13 20:10:49 -07:00
Tathagata Das
6d1fe02685
Merge branch 'dev' of github.com:radlab/spark into dev
2012-09-17 14:26:06 -07:00
Tathagata Das
86d420478f
Allowed StreamingContext to be created from existing SparkContext
2012-09-17 14:25:48 -07:00
Tathagata Das
3cbc72ff1d
Minor tweaks
2012-09-14 07:00:30 +00:00
Tathagata Das
0269792c17
Merge branch 'dev' of github.com:radlab/spark into dev
...
Conflicts:
streaming/src/main/scala/spark/streaming/Scheduler.scala
2012-09-07 20:18:30 +00:00
Tathagata Das
b5750726ff
Fixed bugs in streaming Scheduler and optimized QueueInputDStream.
2012-09-07 20:16:21 +00:00
haoyuan
381e2c7ac4
add warmup code for TopKWordCountRaw.scala
2012-09-06 20:54:52 -07:00
haoyuan
0681bbc5d9
Merge branch 'dev' of github.com:radlab/spark into dev
2012-09-07 02:18:33 +00:00
haoyuan
db08a362aa
commit opt for grep scalibility test.
2012-09-07 02:17:52 +00:00
Tathagata Das
4a7bde6865
Fixed bugs and added testcases for naive reduceByKeyAndWindow.
2012-09-06 19:06:59 -07:00
Tathagata Das
203ac8fa8b
Merge branch 'dev' of github.com:radlab/spark into dev
2012-09-06 05:29:06 -07:00
Tathagata Das
babb7e3ce2
Re-implemented ReducedWindowedDSteam to simplify and fix bugs. Added slice operator to DStream. Also, refactored DStream testsuites and added tests for reduceByKeyAndWindow.
2012-09-06 05:28:29 -07:00
root
019de4562c
Less warmup in word count
2012-09-06 02:50:41 +00:00
root
4a5d0d249e
Merge branch 'dev' of github.com:radlab/spark into dev
2012-09-05 08:23:09 +00:00
root
b7ad291ac5
Tuning Akka for more connections
2012-09-05 07:08:07 +00:00
root
fc186dc18a
Merge branch 'dev' of github.com:radlab/spark into dev
2012-09-05 05:53:18 +00:00
root
4ea032a142
Some changes to make important log output visible even if we set the logging to WARNING
2012-09-05 05:53:07 +00:00
Tathagata Das
25fd684b89
Merge branch 'dev' of github.com:radlab/spark into dev
2012-09-04 20:44:14 -07:00
Tathagata Das
7c09ad0e04
Changed DStream member access permissions from private to protected. Updated StateDStream to checkpoint RDDs and forget lineage.
2012-09-04 19:11:49 -07:00
haoyuan
96a1f2277d
fix the compile error in TopKWordCountRaw.scala
2012-09-04 18:03:34 -07:00
haoyuan
2ff72f60ac
add TopKWordCountRaw.scala
2012-09-04 17:55:55 -07:00
Tathagata Das
389a78722c
Updated the return types of PairDStreamFunctions to return DStreams instead of ShuffleDStreams for cleaner abstraction.
2012-09-04 15:37:46 -07:00
root
7b892ee66e
Merge branch 'dev' of github.com:radlab/spark into dev
2012-09-04 04:27:10 +00:00
root
1878731671
Various test programs
2012-09-04 04:26:53 +00:00
Tathagata Das
b8e9e8ea78
Merge branch 'dev' of github.com:radlab/spark into dev
2012-09-02 02:35:32 -07:00
Tathagata Das
7419d2c7ea
Added transformRDD DStream operation and TransformedDStream. Added sbt assembly option for streaming project.
2012-09-02 02:35:17 -07:00
root
ceabf71257
tweaks
2012-09-01 21:52:42 +00:00
root
6025889be0
More raw network receiver programs
2012-09-01 20:51:07 +00:00
root
bf993cda63
Make batch size configurable in RawCount
2012-09-01 19:59:23 +00:00
root
83dad56334
Further fixes to raw text sender, plus an app that uses it
2012-09-01 19:45:25 +00:00
Matei Zaharia
f84d2bbe55
Bug fixes to RateLimitedOutputStream
2012-09-01 00:31:15 -07:00
Matei Zaharia
44758aa8e2
First work towards a RawInputDStream and a sender program for it.
2012-09-01 00:17:59 -07:00
Matei Zaharia
51fb13dd16
Bug fix
2012-08-31 15:36:11 -07:00
Matei Zaharia
ce42a46375
Bug fix
2012-08-31 15:35:35 -07:00
Matei Zaharia
f92d4a6ac1
Better output messages for streaming job duration
2012-08-31 15:33:48 -07:00
Tathagata Das
2d01d38a41
Added StateDStream, corresponding stateful stream operations, and testcases. Also refactored few PairDStreamFunctions methods.
2012-08-31 03:47:34 -07:00
root
e1da274a48
WordCount tweaks
2012-08-31 07:16:19 +00:00
root
d4d2cb670f
Make checkpoint interval configurable in WordCount2
2012-08-31 00:34:57 +00:00
root
1f8085b8d0
Compile fixes
2012-08-29 03:20:56 +00:00
Tathagata Das
43e66146f7
Merge branch 'dev' of github.com/radlab/spark into dev
2012-08-28 13:51:05 -07:00
Tathagata Das
b5b93a621c
Added capabllity to take streaming input from network. Renamed SparkStreamContext to StreamingContext.
2012-08-28 12:35:19 -07:00
root
e2cf197a0a
Made WordCount2 even more configurable
2012-08-27 03:34:15 +00:00
root
b78c5ae803
Merge branch 'dev' of github.com:radlab/spark into dev
2012-08-27 01:16:39 +00:00
root
9de1c3abf9
Tweaks to WordCount2
2012-08-27 00:57:00 +00:00
Matei Zaharia
57796b183e
Code style
2012-08-26 17:25:22 -07:00
Matei Zaharia
22b1a20e61
Made Time and Interval immutable
2012-08-26 17:04:34 -07:00
Matei Zaharia
23a29b6d19
Merge branch 'dev' of github.com:radlab/spark into dev
2012-08-26 16:45:37 -07:00
Matei Zaharia
b120e24fe0
Add equals and hashCode to Time
2012-08-26 16:45:14 -07:00
root
b08ff710af
Added sliding word count, and some fixes to reduce window DStream
2012-08-26 23:40:50 +00:00
Matei Zaharia
ad6537321e
Make Time serializable
2012-08-26 16:27:23 -07:00
Matei Zaharia
e7a5cbb543
Reduce log4j verbosity for streaming
2012-08-24 16:45:01 -07:00
Matei Zaharia
091b1438f5
Fix WordCount job name
2012-08-24 16:43:59 -07:00
Tathagata Das
cae894ee7a
Added new Clock interface that is used by RecurringTimer to scheduler events on system time or manually-configured time.
2012-08-06 14:52:46 -07:00
Matei Zaharia
43b81eb271
Renamed RDS to DStream, plus minor style fixes
2012-08-02 14:05:51 -04:00
Matei Zaharia
29bf44473c
Added an RDS that repeatedly returns the same input
2012-08-02 11:43:04 -04:00
Matei Zaharia
650d11817e
Added a WordCount for external data and fixed bugs in file streams
2012-08-02 11:09:43 -04:00
Tathagata Das
ed897ac5e1
Moved streaming files not immediately necessary to spark.streaming.util.
2012-08-01 22:28:54 -07:00
Tathagata Das
3be54c2a8a
1. Refactored SparkStreamContext, Scheduler, InputRDS, FileInputRDS and a few other files.
...
2. Modified Time class to represent milliseconds (long) directly, instead of LongTime.
3. Added new files QueueInputRDS, RecurringTimer, etc.
4. Added RDDSuite as the skeleton for testcases.
5. Added two examples in spark.streaming.examples.
6. Removed all past examples and a few unnecessary files. Moved a number of files to spark.streaming.util.
2012-08-01 22:09:27 -07:00
Tathagata Das
5a26ca4a80
Restructured file locations to separate examples and other programs from core programs.
2012-07-30 13:29:13 -07:00
Matei Zaharia
fcee4153b9
Renamed stream package to streaming
2012-07-29 13:35:22 -07:00
Matei Zaharia
47b7ebad12
Added the Spark Streaing code, ported to Akka 2
2012-07-28 20:03:26 -07:00