Tathagata Das
|
b18d70870a
|
Modified bunch HashMaps in Spark to use TimeStampedHashMap and made various modules use CleanupTask to periodically clean up metadata.
|
2012-11-27 15:08:49 -08:00 |
|
Tathagata Das
|
fd11d23bb3
|
Modified StreamingContext API to make constructor accept the batch size (since it is always needed, Patrick's suggestion). Added description to DStream and StreamingContext.
|
2012-11-19 19:04:39 -08:00 |
|
Tathagata Das
|
c97ebf6437
|
Fixed bug in the number of splits in RDD after checkpointing. Modified reduceByKeyAndWindow (naive) computation from window+reduceByKey to reduceByKey+window+reduceByKey.
|
2012-11-19 23:22:07 +00:00 |
|
Denny
|
5e2b0a3bf6
|
Added Kafka Wordcount producer
|
2012-11-19 10:17:58 -08:00 |
|
Denny
|
6757ed6a40
|
Comment out code for fault-tolerance.
|
2012-11-19 09:42:35 -08:00 |
|
Denny
|
f56befa914
|
Merge branch 'dev' into kafka
|
2012-11-19 09:29:54 -08:00 |
|
Tathagata Das
|
3fd7b8319b
|
Merge branch 'dev' of github.com:radlab/spark into dev
|
2012-11-17 17:27:07 -08:00 |
|
Tathagata Das
|
10c1abcb6a
|
Fixed checkpointing bug in CoGroupedRDD. CoGroupSplits kept around the RDD splits of its parent RDDs, thus checkpointing its parents did not release the references to the parent splits.
|
2012-11-17 17:27:00 -08:00 |
|
Patrick Wendell
|
efa93fd0e6
|
Merge pull request #4 from radlab/streaming-example
A "streaming page view" example.
|
2012-11-16 20:40:27 -08:00 |
|
Patrick Wendell
|
720cb0f467
|
A "streaming page view" example.
|
2012-11-16 12:11:22 -08:00 |
|
Denny
|
2aceae25be
|
Merge branch 'dev' into kafka
Conflicts:
streaming/src/main/scala/spark/streaming/DStream.scala
|
2012-11-13 13:16:18 -08:00 |
|
Denny
|
b6f7ba813e
|
change import for example function
|
2012-11-13 13:15:32 -08:00 |
|
Tathagata Das
|
26fec8f0b8
|
Fixed bug in MappedValuesRDD, and set default graph checkpoint interval to be batch duration.
|
2012-11-13 11:05:57 -08:00 |
|
Tathagata Das
|
c3ccd14cf8
|
Replaced StateRDD in StateDStream with MapPartitionsRDD.
|
2012-11-13 02:43:03 -08:00 |
|
Tathagata Das
|
8a25d530ed
|
Optimized checkpoint writing by reusing FileSystem object. Fixed bug in updating of checkpoint data in DStream where the checkpointed RDDs, upon recovery, were not recognized as checkpointed RDDs and therefore deleted from HDFS. Made InputStreamsSuite more robust to timing delays.
|
2012-11-13 02:16:28 -08:00 |
|
Denny
|
255b3e44c1
|
Merge branch 'dev' into kafka
|
2012-11-12 19:39:29 -08:00 |
|
Tathagata Das
|
b9bfd1456f
|
Changed default level on calling DStream.persist() to be MEMORY_ONLY_SER. Also changed the persist level of StateDStream to be MEMORY_ONLY_SER.
|
2012-11-12 21:51:42 +00:00 |
|
Tathagata Das
|
ae61ebaee6
|
Fixed bugs in RawNetworkInputDStream and in its examples. Made the ReducedWindowedDStream persist RDDs to MEMOERY_SER_ONLY by default. Removed unncessary examples. Added streaming-env.sh.template to add recommended setting for streaming.
|
2012-11-12 21:45:16 +00:00 |
|
tdas
|
052d0b800f
|
Merge branch 'dev' of github.com:radlab/spark into dev
|
2012-11-11 22:56:14 +00:00 |
|
Tathagata Das
|
46222dc56d
|
Fixed bug in FileInputDStream that allowed it to miss new files. Added tests in the InputStreamsSuite to test checkpointing of file and network streams.
|
2012-11-11 13:20:09 -08:00 |
|
Denny
|
0fd4c93f1c
|
Updated comment.
|
2012-11-11 11:15:31 -08:00 |
|
Denny
|
deb2c4df72
|
Add comment.
|
2012-11-11 11:11:49 -08:00 |
|
Denny
|
d006109e95
|
Kafka Stream comments.
|
2012-11-11 11:06:49 -08:00 |
|
Denny
|
2e8f2ee4ad
|
Merge branch 'dev' of github.com:radlab/spark into kafka
Conflicts:
streaming/src/main/scala/spark/streaming/DStream.scala
|
2012-11-09 12:26:17 -08:00 |
|
Denny
|
e5a0936787
|
Kafka Stream.
|
2012-11-09 12:23:46 -08:00 |
|
tdas
|
52d21cb682
|
Removed unnecessary files.
|
2012-11-08 11:35:40 +00:00 |
|
Tathagata Das
|
fc3d0b602a
|
Added FailureTestsuite for testing multiple, repeated master failures.
|
2012-11-06 17:23:31 -08:00 |
|
Denny
|
485803d740
|
Merge branch 'dev' of github.com:radlab/spark into kafka
|
2012-11-06 09:41:45 -08:00 |
|
Denny
|
0c1de43fc7
|
Working on kafka.
|
2012-11-06 09:41:42 -08:00 |
|
Tathagata Das
|
f8bb719cd2
|
Added a few more comments to the checkpoint-related functions.
|
2012-11-05 17:53:56 -08:00 |
|
Tathagata Das
|
395167f2b2
|
Made more bug fixes for checkpointing.
|
2012-11-05 16:11:50 -08:00 |
|
Tathagata Das
|
72b2303f99
|
Fixed major bugs in checkpointing.
|
2012-11-05 11:41:36 -08:00 |
|
Tathagata Das
|
d154238789
|
Made checkpointing of dstream graph to work with checkpointing of RDDs. For streams requiring checkpointing of its RDD, the default checkpoint interval is set to 10 seconds.
|
2012-11-04 12:12:06 -08:00 |
|
Tathagata Das
|
3fb5c9ee24
|
Fixed serialization bug in countByWindow, added countByKey and countByKeyAndWindow, and added testcases for them.
|
2012-11-02 12:12:25 -07:00 |
|
Tathagata Das
|
1b900183c8
|
Added save operations to DStreams.
|
2012-10-27 18:55:50 -07:00 |
|
Tathagata Das
|
650d717544
|
Merge branch 'dev' of github.com:radlab/spark into dev
|
2012-10-25 13:03:18 -07:00 |
|
Matei Zaharia
|
863a55ae42
|
Merge remote-tracking branch 'public/master' into dev
Conflicts:
core/src/main/scala/spark/BlockStoreShuffleFetcher.scala
core/src/main/scala/spark/KryoSerializer.scala
core/src/main/scala/spark/MapOutputTracker.scala
core/src/main/scala/spark/RDD.scala
core/src/main/scala/spark/SparkContext.scala
core/src/main/scala/spark/executor/Executor.scala
core/src/main/scala/spark/network/Connection.scala
core/src/main/scala/spark/network/ConnectionManagerTest.scala
core/src/main/scala/spark/rdd/BlockRDD.scala
core/src/main/scala/spark/rdd/NewHadoopRDD.scala
core/src/main/scala/spark/scheduler/ShuffleMapTask.scala
core/src/main/scala/spark/scheduler/cluster/StandaloneSchedulerBackend.scala
core/src/main/scala/spark/storage/BlockManager.scala
core/src/main/scala/spark/storage/BlockMessage.scala
core/src/main/scala/spark/storage/BlockStore.scala
core/src/main/scala/spark/storage/StorageLevel.scala
core/src/main/scala/spark/util/AkkaUtils.scala
project/SparkBuild.scala
run
|
2012-10-24 23:21:00 -07:00 |
|
Tathagata Das
|
926e05b030
|
Added tests for the file input stream.
|
2012-10-24 23:14:37 -07:00 |
|
Tathagata Das
|
ed71df46cd
|
Minor fixes.
|
2012-10-24 16:49:40 -07:00 |
|
Tathagata Das
|
1ef6ea2513
|
Added tests for testing network input stream.
|
2012-10-24 14:44:20 -07:00 |
|
Tathagata Das
|
0e5d9be4df
|
Renamed APIs to create queueStream and fileStream.
|
2012-10-23 15:17:05 -07:00 |
|
Tathagata Das
|
c2731dd3ef
|
Updated StateDStream api to use Options instead of nulls.
|
2012-10-23 15:10:27 -07:00 |
|
Tathagata Das
|
19191d178d
|
Renamed the network input streams.
|
2012-10-23 14:40:24 -07:00 |
|
Tathagata Das
|
a6de5758f1
|
Modified API of NetworkInputDStreams and got ObjectInputDStream and RawInputDStream working.
|
2012-10-23 01:41:13 -07:00 |
|
Tathagata Das
|
2c87c853ba
|
Renamed examples
|
2012-10-22 15:31:19 -07:00 |
|
Tathagata Das
|
d85c66636b
|
Added MapValueDStream, FlatMappedValuesDStream and CoGroupedDStream, and therefore DStream operations mapValue, flatMapValues, cogroup, and join. Also, added tests for DStream operations filter, glom, mapPartitions, groupByKey, mapValues, flatMapValues, cogroup, and join.
|
2012-10-21 17:40:08 -07:00 |
|
Tathagata Das
|
c4a2b6f636
|
Fixed some bugs in tests for forgetting RDDs, and made sure that use of manual clock leads to a zeroTime of 0 in the DStreams (more intuitive).
|
2012-10-21 10:41:25 -07:00 |
|
Tathagata Das
|
6d5eb4b40c
|
Added functionality to forget RDDs from DStreams.
|
2012-10-19 12:11:44 -07:00 |
|
Tathagata Das
|
b760d6426a
|
Minor modifications.
|
2012-10-15 12:26:44 -07:00 |
|
Tathagata Das
|
3f1aae5c71
|
Refactored DStreamSuiteBase to create CheckpointSuite- testsuite for testing checkpointing under different operations.
|
2012-10-14 21:39:30 -07:00 |
|
Tathagata Das
|
e95ff45b53
|
Implemented checkpointing of StreamingContext and DStream graph.
|
2012-10-13 20:10:49 -07:00 |
|
Tathagata Das
|
6d1fe02685
|
Merge branch 'dev' of github.com:radlab/spark into dev
|
2012-09-17 14:26:06 -07:00 |
|
Tathagata Das
|
86d420478f
|
Allowed StreamingContext to be created from existing SparkContext
|
2012-09-17 14:25:48 -07:00 |
|
Tathagata Das
|
3cbc72ff1d
|
Minor tweaks
|
2012-09-14 07:00:30 +00:00 |
|
Tathagata Das
|
0269792c17
|
Merge branch 'dev' of github.com:radlab/spark into dev
Conflicts:
streaming/src/main/scala/spark/streaming/Scheduler.scala
|
2012-09-07 20:18:30 +00:00 |
|
Tathagata Das
|
b5750726ff
|
Fixed bugs in streaming Scheduler and optimized QueueInputDStream.
|
2012-09-07 20:16:21 +00:00 |
|
haoyuan
|
381e2c7ac4
|
add warmup code for TopKWordCountRaw.scala
|
2012-09-06 20:54:52 -07:00 |
|
haoyuan
|
0681bbc5d9
|
Merge branch 'dev' of github.com:radlab/spark into dev
|
2012-09-07 02:18:33 +00:00 |
|
haoyuan
|
db08a362aa
|
commit opt for grep scalibility test.
|
2012-09-07 02:17:52 +00:00 |
|
Tathagata Das
|
4a7bde6865
|
Fixed bugs and added testcases for naive reduceByKeyAndWindow.
|
2012-09-06 19:06:59 -07:00 |
|
Tathagata Das
|
203ac8fa8b
|
Merge branch 'dev' of github.com:radlab/spark into dev
|
2012-09-06 05:29:06 -07:00 |
|
Tathagata Das
|
babb7e3ce2
|
Re-implemented ReducedWindowedDSteam to simplify and fix bugs. Added slice operator to DStream. Also, refactored DStream testsuites and added tests for reduceByKeyAndWindow.
|
2012-09-06 05:28:29 -07:00 |
|
root
|
019de4562c
|
Less warmup in word count
|
2012-09-06 02:50:41 +00:00 |
|
root
|
4a5d0d249e
|
Merge branch 'dev' of github.com:radlab/spark into dev
|
2012-09-05 08:23:09 +00:00 |
|
root
|
b7ad291ac5
|
Tuning Akka for more connections
|
2012-09-05 07:08:07 +00:00 |
|
root
|
fc186dc18a
|
Merge branch 'dev' of github.com:radlab/spark into dev
|
2012-09-05 05:53:18 +00:00 |
|
root
|
4ea032a142
|
Some changes to make important log output visible even if we set the logging to WARNING
|
2012-09-05 05:53:07 +00:00 |
|
Tathagata Das
|
25fd684b89
|
Merge branch 'dev' of github.com:radlab/spark into dev
|
2012-09-04 20:44:14 -07:00 |
|
Tathagata Das
|
7c09ad0e04
|
Changed DStream member access permissions from private to protected. Updated StateDStream to checkpoint RDDs and forget lineage.
|
2012-09-04 19:11:49 -07:00 |
|
haoyuan
|
96a1f2277d
|
fix the compile error in TopKWordCountRaw.scala
|
2012-09-04 18:03:34 -07:00 |
|
haoyuan
|
2ff72f60ac
|
add TopKWordCountRaw.scala
|
2012-09-04 17:55:55 -07:00 |
|
Tathagata Das
|
389a78722c
|
Updated the return types of PairDStreamFunctions to return DStreams instead of ShuffleDStreams for cleaner abstraction.
|
2012-09-04 15:37:46 -07:00 |
|
root
|
7b892ee66e
|
Merge branch 'dev' of github.com:radlab/spark into dev
|
2012-09-04 04:27:10 +00:00 |
|
root
|
1878731671
|
Various test programs
|
2012-09-04 04:26:53 +00:00 |
|
Tathagata Das
|
b8e9e8ea78
|
Merge branch 'dev' of github.com:radlab/spark into dev
|
2012-09-02 02:35:32 -07:00 |
|
Tathagata Das
|
7419d2c7ea
|
Added transformRDD DStream operation and TransformedDStream. Added sbt assembly option for streaming project.
|
2012-09-02 02:35:17 -07:00 |
|
root
|
ceabf71257
|
tweaks
|
2012-09-01 21:52:42 +00:00 |
|
root
|
6025889be0
|
More raw network receiver programs
|
2012-09-01 20:51:07 +00:00 |
|
root
|
bf993cda63
|
Make batch size configurable in RawCount
|
2012-09-01 19:59:23 +00:00 |
|
root
|
83dad56334
|
Further fixes to raw text sender, plus an app that uses it
|
2012-09-01 19:45:25 +00:00 |
|
Matei Zaharia
|
f84d2bbe55
|
Bug fixes to RateLimitedOutputStream
|
2012-09-01 00:31:15 -07:00 |
|
Matei Zaharia
|
44758aa8e2
|
First work towards a RawInputDStream and a sender program for it.
|
2012-09-01 00:17:59 -07:00 |
|
Matei Zaharia
|
51fb13dd16
|
Bug fix
|
2012-08-31 15:36:11 -07:00 |
|
Matei Zaharia
|
ce42a46375
|
Bug fix
|
2012-08-31 15:35:35 -07:00 |
|
Matei Zaharia
|
f92d4a6ac1
|
Better output messages for streaming job duration
|
2012-08-31 15:33:48 -07:00 |
|
Tathagata Das
|
2d01d38a41
|
Added StateDStream, corresponding stateful stream operations, and testcases. Also refactored few PairDStreamFunctions methods.
|
2012-08-31 03:47:34 -07:00 |
|
root
|
e1da274a48
|
WordCount tweaks
|
2012-08-31 07:16:19 +00:00 |
|
root
|
d4d2cb670f
|
Make checkpoint interval configurable in WordCount2
|
2012-08-31 00:34:57 +00:00 |
|
root
|
1f8085b8d0
|
Compile fixes
|
2012-08-29 03:20:56 +00:00 |
|
Tathagata Das
|
43e66146f7
|
Merge branch 'dev' of github.com/radlab/spark into dev
|
2012-08-28 13:51:05 -07:00 |
|
Tathagata Das
|
b5b93a621c
|
Added capabllity to take streaming input from network. Renamed SparkStreamContext to StreamingContext.
|
2012-08-28 12:35:19 -07:00 |
|
root
|
e2cf197a0a
|
Made WordCount2 even more configurable
|
2012-08-27 03:34:15 +00:00 |
|
root
|
b78c5ae803
|
Merge branch 'dev' of github.com:radlab/spark into dev
|
2012-08-27 01:16:39 +00:00 |
|
root
|
9de1c3abf9
|
Tweaks to WordCount2
|
2012-08-27 00:57:00 +00:00 |
|
Matei Zaharia
|
57796b183e
|
Code style
|
2012-08-26 17:25:22 -07:00 |
|
Matei Zaharia
|
22b1a20e61
|
Made Time and Interval immutable
|
2012-08-26 17:04:34 -07:00 |
|
Matei Zaharia
|
23a29b6d19
|
Merge branch 'dev' of github.com:radlab/spark into dev
|
2012-08-26 16:45:37 -07:00 |
|
Matei Zaharia
|
b120e24fe0
|
Add equals and hashCode to Time
|
2012-08-26 16:45:14 -07:00 |
|
root
|
b08ff710af
|
Added sliding word count, and some fixes to reduce window DStream
|
2012-08-26 23:40:50 +00:00 |
|
Matei Zaharia
|
ad6537321e
|
Make Time serializable
|
2012-08-26 16:27:23 -07:00 |
|