Matei Zaharia
642029e7f4
Various fixes to configuration code
...
- Got rid of global SparkContext.globalConf
- Pass SparkConf to serializers and compression codecs
- Made SparkConf public instead of private[spark]
- Improved API of SparkContext and SparkConf
- Switched executor environment vars to be passed through SparkConf
- Fixed some places that were still using system properties
- Fixed some tests, though others are still failing
This still fails several tests in core, repl and streaming, likely due
to properties not being set or cleared correctly (some of the tests run
fine in isolation).
2013-12-28 17:13:15 -05:00
Tathagata Das
271e3237f3
Minor changes in comments and strings to address comments in PR 289.
2013-12-27 12:26:57 -08:00
Tathagata Das
6e43039614
Refactored streaming project to separate out the twitter functionality.
2013-12-26 18:02:49 -08:00
Tathagata Das
577c8cc834
Removed unncessary options from WindowedDStream.
2013-12-26 14:17:16 -08:00
Tathagata Das
3618d70b2a
Added warning if filestream adds files with no data in them (file RDDs have 0 partitions).
2013-12-26 12:45:40 -08:00
Tathagata Das
be64719138
Changed file stream to not catch any exceptions related to finding new files (FileNotFound exception is still caught and ignored).
2013-12-26 12:33:12 -08:00
Tathagata Das
069cb14bdc
Updated groupByKeyAndWindow to be computed incrementally, and added mapSideCombine to combineByKeyAndWindow.
2013-12-26 02:58:29 -08:00
Tathagata Das
bacc65cf28
Removed slack time in file stream and added better handling of exceptions due to failures due FileNotFound exceptions.
2013-12-26 10:18:46 +00:00
Tathagata Das
d4dfab503a
Fixed Python API for sc.setCheckpointDir. Also other fixes based on Reynold's comments on PR 289.
2013-12-24 14:01:13 -08:00
Prashant Sharma
2573add94c
spark-544, introducing SparkConf and related configuration overhaul.
2013-12-25 00:09:36 +05:30
Tathagata Das
e9165d2a39
Merge branch 'scheduler-update' into window-improvement
2013-12-23 17:49:41 -08:00
Tathagata Das
0af7f84c8e
Minor formatting fixes.
2013-12-23 17:47:16 -08:00
Tathagata Das
8ca14a1e51
Updated testsuites to work with the slack time of file stream.
2013-12-23 16:27:00 -08:00
Tathagata Das
b31e91f927
Merge branch 'scheduler-update' into filestream-fix
2013-12-23 15:59:15 -08:00
Tathagata Das
19d1d58b67
Fixed bug in file stream that prevented some files from being read
...
correctly.
2013-12-23 23:48:43 +00:00
Tathagata Das
f9771690a6
Minor formatting fixes.
2013-12-23 11:32:26 -08:00
Tathagata Das
dc3ee6b612
Added comments to BatchInfo and JobSet, based on Patrick's comment on PR 277.
2013-12-23 11:30:42 -08:00
Tathagata Das
e7b62cbfbf
Updated CheckpointWriter and FileInputDStream to be robust against failed FileSystem objects. Refactored JobGenerator to use actor so that all updating of DStream's metadata is single threaded.
2013-12-22 18:49:36 -08:00
Tathagata Das
d91ec6f8ea
Merge branch 'scheduler-update' into filestream-fix
2013-12-22 15:23:35 -08:00
Tathagata Das
3ddbdbfbc7
Minor updated based on comments on PR 277.
2013-12-20 19:51:37 -08:00
Tathagata Das
de41c436a0
Merge branch 'scheduler-update' into window-improvement
...
Conflicts:
streaming/src/main/scala/org/apache/spark/streaming/dstream/WindowedDStream.scala
2013-12-19 12:05:08 -08:00
Tathagata Das
984c582487
Merge branch 'scheduler-update' into filestream-fix
...
Conflicts:
core/src/main/scala/org/apache/spark/rdd/CheckpointRDD.scala
streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala
streaming/src/main/scala/org/apache/spark/streaming/dstream/FileInputDStream.scala
streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobGenerator.scala
streaming/src/test/scala/org/apache/spark/streaming/CheckpointSuite.scala
2013-12-19 11:20:48 -08:00
Tathagata Das
ec71b445ad
Minor changes.
2013-12-18 23:39:28 -08:00
Tathagata Das
e93b391d75
Merge branch 'apache-master' into scheduler-update
...
Conflicts:
streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala
streaming/src/main/scala/org/apache/spark/streaming/dstream/ForEachDStream.scala
2013-12-18 17:51:14 -08:00
Tathagata Das
b80ec05635
Added StatsReportListener to generate processing time statistics across multiple batches.
2013-12-18 15:35:24 -08:00
Tathagata Das
097e120c0c
Refactored streaming scheduler and added listener interface.
...
- Refactored Scheduler + JobManager to JobGenerator + JobScheduler and
added JobSet for cleaner code. Moved scheduler related code to
streaming.scheduler package.
- Added StreamingListener trait (similar to SparkListener) to enable
gathering to streaming stats like processing times and delays.
StreamingContext.addListener() to added listeners.
- Deduped some code in streaming tests by modifying TestSuiteBase, and
added StreamingListenerSuite.
2013-12-12 20:48:02 -08:00
Tathagata Das
5e9ce83d68
Fixed multiple file stream and checkpointing bugs.
...
- Made file stream more robust to transient failures.
- Changed Spark.setCheckpointDir API to not have the second
'useExisting' parameter. Spark will always create a unique directory
for checkpointing underneath the directory provide to the funtion.
- Fixed bug wrt local relative paths as checkpoint directory.
- Made DStream and RDD checkpointing use
SparkContext.hadoopConfiguration, so that more HDFS compatible
filesystems are supported for checkpointing.
2013-12-11 14:01:36 -08:00
Prashant Sharma
17db6a9041
Style fixes and addressed review comments at #221
2013-12-10 11:47:16 +05:30
Tathagata Das
03ef6e8899
Added flag in window operation to use partition awaare union.
2013-11-21 11:38:56 -08:00
Tathagata Das
fd031679df
Added partitioner aware union, modified DStream.window.
2013-11-21 11:28:37 -08:00
Tathagata Das
2ec4b2e38d
Added partition aware union to improve reduceByKeyAndWindow
2013-11-20 23:49:30 -08:00
Prashant Sharma
95d8dbce91
Merge branch 'master' of github.com:apache/incubator-spark into scala-2.10-temp
...
Conflicts:
core/src/main/scala/org/apache/spark/util/collection/PrimitiveVector.scala
streaming/src/main/scala/org/apache/spark/streaming/api/java/JavaStreamingContext.scala
2013-11-21 12:34:46 +05:30
Prashant Sharma
199e9cf02d
Merge branch 'scala210-master' of github.com:colorant/incubator-spark into scala-2.10
...
Conflicts:
core/src/main/scala/org/apache/spark/deploy/client/Client.scala
core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala
core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala
core/src/test/scala/org/apache/spark/MapOutputTrackerSuite.scala
2013-11-21 11:55:48 +05:30
Henry Saputra
10be58f251
Another set of changes to remove unnecessary semicolon (;) from Scala code.
...
Passed the sbt/sbt compile and test
2013-11-19 16:56:23 -08:00
Henry Saputra
9c934b640f
Remove the semicolons at the end of Scala code to make it more pure Scala code.
...
Also remove unused imports as I found them along the way.
Remove return statements when returning value in the Scala code.
Passing compile and tests.
2013-11-19 10:19:03 -08:00
Aaron Davidson
f629ba95b6
Various merge corrections
...
I've diff'd this patch against my own -- since they were both created
independently, this means that two sets of eyes have gone over all the
merge conflicts that were created, so I'm feeling significantly more
confident in the resulting PR.
@rxin has looked at the changes to the repl and is resoundingly
confident that they are correct.
2013-11-14 22:13:09 -08:00
Raymond Liu
a60620b76a
Merge branch 'master' into scala-2.10
2013-11-14 12:44:19 +08:00
Raymond Liu
0f2e3c6e31
Merge branch 'master' into scala-2.10
2013-11-13 16:55:11 +08:00
Tathagata Das
7ccbbdacb9
Made block generator thread safe to fix Kafka bug.
2013-11-12 00:10:45 -08:00
Prashant Sharma
6860b79f6e
Remove deprecated actorFor and use actorSelection everywhere.
2013-11-12 12:43:53 +05:30
Tathagata Das
dc9570782a
Merge branch 'apache-master' into transform
2013-10-25 14:22:23 -07:00
Patrick Wendell
31e92b72e3
Adding Java versions and associated tests
2013-10-24 21:14:56 -07:00
Patrick Wendell
9423532fab
Removing Java for now
2013-10-24 14:31:34 -07:00
Patrick Wendell
08c1a42d7d
Add a repartition
operator.
...
This patch adds an operator called repartition with more straightforward
semantics than the current `coalesce` operator. There are a few use cases
where this operator is useful:
1. If a user wants to increase the number of partitions in the RDD. This
is more common now with streaming. E.g. a user is ingesting data on one
node but they want to add more partitions to ensure parallelism of
subsequent operations across threads or the cluster.
Right now they have to call rdd.coalesce(numSplits, shuffle=true) - that's
super confusing.
2. If a user has input data where the number of partitions is not known. E.g.
> sc.textFile("some file").coalesce(50)....
This is both vague semantically (am I growing or shrinking this RDD) but also,
may not work correctly if the base RDD has fewer than 50 partitions.
The new operator forces shuffles every time, so it will always produce exactly
the number of new partitions. It also throws an exception rather than silently
not-working if a bad input is passed.
I am currently adding streaming tests (requires refactoring some of the test
suite to allow testing at partition granularity), so this is not ready for
merge yet. But feedback is welcome.
2013-10-24 14:31:33 -07:00
Tathagata Das
0400aba1c0
Merge branch 'apache-master' into transform
2013-10-24 11:05:00 -07:00
Tathagata Das
bacfe5ebca
Added JavaStreamingContext.transform
2013-10-24 10:56:24 -07:00
Matei Zaharia
dd659642e7
Merge pull request #64 from prabeesh/master
...
MQTT Adapter for Spark Streaming
MQTT is a machine-to-machine (M2M)/Internet of Things connectivity protocol.
It was designed as an extremely lightweight publish/subscribe messaging transport. You may read more about it here http://mqtt.org/
Message Queue Telemetry Transport (MQTT) is an open message protocol for M2M communications. It enables the transfer of telemetry-style data in the form of messages from devices like sensors and actuators, to mobile phones, embedded systems on vehicles, or laptops and full scale computers.
The protocol was invented by Andy Stanford-Clark of IBM, and Arlen Nipper of Cirrus Link Solutions
This protocol enables a publish/subscribe messaging model in an extremely lightweight way. It is useful for connections with remote locations where line of code and network bandwidth is a constraint.
MQTT is one of the widely used protocol for 'Internet of Things'. This protocol is getting much attraction as anything and everything is getting connected to internet and they all produce data. Researchers and companies predict some 25 billion devices will be connected to the internet by 2015.
Plugin/Support for MQTT is available in popular MQs like RabbitMQ, ActiveMQ etc.
Support for MQTT in Spark will help people with Internet of Things (IoT) projects to use Spark Streaming for their real time data processing needs (from sensors and other embedded devices etc).
2013-10-23 15:07:59 -07:00
Tathagata Das
fe8626efd1
Merge branch 'apache-master' into transform
2013-10-22 23:40:40 -07:00
Tathagata Das
72d2e1dd77
Fixed bug in Java transformWith, added more Java testcases for transform and transformWith, added missing variations of Java join and cogroup, updated various Scala and Java API docs.
2013-10-22 23:35:51 -07:00
Matei Zaharia
731c94e91d
Merge pull request #56 from jerryshao/kafka-0.8-dev
...
Upgrade Kafka 0.7.2 to Kafka 0.8.0-beta1 for Spark Streaming
Conflicts:
streaming/pom.xml
2013-10-21 23:31:38 -07:00
Tathagata Das
0666498799
Updated TransformDStream to allow n-ary DStream transform. Added transformWith, leftOuterJoin and rightOuterJoin operations to DStream for Scala and Java APIs. Also added n-ary union and n-ary transform operations to StreamingContext for Scala and Java APIs.
2013-10-21 05:34:09 -07:00
Prabeesh K
d223d38933
Update MQTTInputDStream.scala
2013-10-18 09:09:49 +05:30
prabeesh
890f8fe439
modify code, use Spark Logging Class
2013-10-17 10:00:40 +05:30
prabeesh
2e48b23eae
added mqtt adapter
2013-10-16 13:36:25 +05:30
prabeesh
742ada91e0
mqttinputdstream for mqttstreaming adapter
2013-10-16 13:35:29 +05:30
Matei Zaharia
b5346064d6
Merge pull request #8 from vchekan/checkpoint-ttl-restore
...
Serialize and restore spark.cleaner.ttl to savepoint
In accordance to conversation in spark-dev maillist, preserve spark.cleaner.ttl parameter when serializing checkpoint.
2013-10-15 21:25:03 -07:00
Aaron Davidson
a395911138
Refactor BlockId into an actual type
...
This is an unfortunately invasive change which converts all of our BlockId
strings into actual BlockId types. Here are some advantages of doing this now:
+ Type safety
+ Code clarity - it's now obvious what the key of a shuffle or rdd block is,
for instance. Additionally, appearing in tuple/map type signatures is a big
readability bonus. A Seq[(String, BlockStatus)] is not very clear.
Further, we can now use more Scala features, like matching on BlockId types.
+ Explicit usage - we can now formally tell where various BlockIds are being used
(without doing string searches); this makes updating current BlockIds a much
clearer process, and compiler-supported.
(I'm looking at you, shuffle file consolidation.)
+ It will only get harder to make this change as time goes on.
Since this touches a lot of files, it'd be best to either get this patch
in quickly or throw it on the ground to avoid too many secondary merge conflicts.
2013-10-12 22:44:57 -07:00
jerryshao
c23cd72b4b
Upgrade Kafka 0.7.2 to Kafka 0.8.0-beta1 for Spark Streaming
2013-10-12 20:00:42 +08:00
Martin Weindel
e09f4a9601
fixed some warnings
2013-10-05 23:08:23 +02:00
Prashant Sharma
276c37a51c
Akka 2.2 migration
2013-09-22 08:20:12 +05:30
Vadim Chekan
fbe40c5806
Serialize and restore spark.cleaner.ttl to savepoint
2013-09-20 12:13:48 -07:00
Prashant Sharma
4106ae9fbf
Merged with master
2013-09-06 17:53:01 +05:30
Matei Zaharia
0a8cc30921
Move some classes to more appropriate packages:
...
* RDD, *RDDFunctions -> org.apache.spark.rdd
* Utils, ClosureCleaner, SizeEstimator -> org.apache.spark.util
* JavaSerializer, KryoSerializer -> org.apache.spark.serializer
2013-09-01 14:13:16 -07:00
Matei Zaharia
46eecd110a
Initial work to rename package to org.apache.spark
2013-09-01 14:13:13 -07:00
Matei Zaharia
5a6ac12840
Merge pull request #701 from ScrapCodes/documentation-suggestions
...
Documentation suggestions for spark streaming.
2013-08-22 22:08:03 -07:00
Prashant Sharma
2bc348e92c
Linking custom receiver guide
2013-08-23 09:44:02 +05:30
Prashant Sharma
3049415e24
Corrections in documentation comment
2013-08-23 09:40:28 +05:30
Josh Rosen
d7f78b443b
Change scala.Option to Guava Optional in Java APIs.
2013-08-11 12:05:09 -07:00
Reynold Xin
c61843a69f
Changed other LZF uses to use the compression codec interface.
2013-07-31 10:32:13 -07:00
Matei Zaharia
af3c9d5042
Add Apache license headers and LICENSE and NOTICE files
2013-07-16 17:21:33 -07:00
Prashant Sharma
119c98c1be
code formatting, The warning related to scope exit and enter is not worth fixing as it only affects debugging scopes and nothing else.
2013-07-16 15:01:33 +05:30
Prashant Sharma
55da6e9504
Fixed warning erasure -> runtimeClass
2013-07-16 14:37:08 +05:30
Prashant Sharma
ff14f38f3d
Fixed warning Throwables
2013-07-16 14:34:56 +05:30
Prashant Sharma
63addd93a8
Fixed warning ClassManifest -> ClassTag
2013-07-16 14:09:52 +05:30
Prashant Sharma
e86d5dbaad
Merge branch 'master' into master-merge
...
Conflicts:
README.md
core/pom.xml
core/src/main/scala/spark/deploy/JsonProtocol.scala
core/src/main/scala/spark/deploy/LocalSparkCluster.scala
core/src/main/scala/spark/deploy/master/Master.scala
core/src/main/scala/spark/deploy/master/MasterWebUI.scala
core/src/main/scala/spark/deploy/worker/Worker.scala
core/src/main/scala/spark/deploy/worker/WorkerWebUI.scala
core/src/main/scala/spark/storage/BlockManagerUI.scala
core/src/main/scala/spark/util/AkkaUtils.scala
pom.xml
project/SparkBuild.scala
streaming/src/main/scala/spark/streaming/receivers/ActorReceiver.scala
2013-07-12 14:49:16 +05:30
Shivaram Venkataraman
3350ad0d7f
Catch RejectedExecution exception in Checkpoint handler.
2013-07-07 04:09:37 -07:00
Matei Zaharia
1ffadb2d9e
Merge remote-tracking branch 'pwendell/ui-updates'
...
Conflicts:
core/src/main/scala/spark/scheduler/DAGScheduler.scala
core/src/main/scala/spark/util/AkkaUtils.scala
pom.xml
2013-07-06 15:51:41 -07:00
Matei Zaharia
94871e4703
Merge pull request #655 from tgravescs/master
...
Add support for running Spark on Yarn on a secure Hadoop Cluster
2013-07-06 15:26:19 -07:00
Tathagata Das
280418ac45
Reduced the number of Iterator to ArrayBuffer copies in NetworkReceiver.
2013-07-05 21:38:21 -07:00
Prashant Sharma
a5f1f6a907
Merge branch 'master' into master-merge
...
Conflicts:
core/pom.xml
core/src/main/scala/spark/MapOutputTracker.scala
core/src/main/scala/spark/RDD.scala
core/src/main/scala/spark/RDDCheckpointData.scala
core/src/main/scala/spark/SparkContext.scala
core/src/main/scala/spark/Utils.scala
core/src/main/scala/spark/api/python/PythonRDD.scala
core/src/main/scala/spark/deploy/client/Client.scala
core/src/main/scala/spark/deploy/master/MasterWebUI.scala
core/src/main/scala/spark/deploy/worker/Worker.scala
core/src/main/scala/spark/deploy/worker/WorkerWebUI.scala
core/src/main/scala/spark/rdd/BlockRDD.scala
core/src/main/scala/spark/rdd/ZippedRDD.scala
core/src/main/scala/spark/scheduler/cluster/StandaloneSchedulerBackend.scala
core/src/main/scala/spark/storage/BlockManager.scala
core/src/main/scala/spark/storage/BlockManagerMaster.scala
core/src/main/scala/spark/storage/BlockManagerMasterActor.scala
core/src/main/scala/spark/storage/BlockManagerUI.scala
core/src/main/scala/spark/util/AkkaUtils.scala
core/src/test/scala/spark/SizeEstimatorSuite.scala
pom.xml
project/SparkBuild.scala
repl/src/main/scala/spark/repl/SparkILoop.scala
repl/src/test/scala/spark/repl/ReplSuite.scala
streaming/src/main/scala/spark/streaming/StreamingContext.scala
streaming/src/main/scala/spark/streaming/api/java/JavaStreamingContext.scala
streaming/src/main/scala/spark/streaming/dstream/KafkaInputDStream.scala
streaming/src/main/scala/spark/streaming/util/MasterFailureTest.scala
2013-07-03 11:43:26 +05:30
Y.CORP.YAHOO.COM\tgraves
923cf92900
Rework from pull request. Removed --user option from Spark on Yarn Client, made the user of JAVA_HOME environment
...
variable conditional on if its set, and created addCredentials in each of the SparkHadoopUtil classes
to only add the credentials when the profile is hadoop2-yarn.
2013-07-02 21:18:59 -05:00
Matei Zaharia
4358acfe07
Initialize Twitter4J OAuth from system properties instead of prompting
2013-06-29 15:25:06 -07:00
Matei Zaharia
1667158544
Merge remote-tracking branch 'mrpotes/master'
2013-06-29 14:36:09 -07:00
Patrick Wendell
362d996c81
Handful of changes based on matei's review
...
- Avoid exception when no tasks have finished for a stage
- Adding DOCTYPE so css renders properly
- Adding progress slider
2013-06-27 19:14:28 -07:00
James Phillpotts
366572edca
Include a default OAuth implementation, and update examples and JavaStreamingContext
2013-06-25 22:59:34 +01:00
Tathagata Das
c89af0a7f9
Merge branch 'master' into streaming
...
Conflicts:
.gitignore
2013-06-24 23:57:47 -07:00
Tathagata Das
48c7e373c6
Minor formatting fixes
2013-06-24 23:11:04 -07:00
Tathagata Das
1249e9153b
Merge pull request #572 from Reinvigorate/sm-block-interval
...
Adding spark.streaming.blockInterval property
2013-06-24 21:46:33 -07:00
Tathagata Das
cfcda95f86
Merge pull request #571 from Reinvigorate/sm-kafka-serializers
...
Surfacing decoders on KafkaInputDStream
2013-06-24 21:44:50 -07:00
James Phillpotts
8955787a59
Twitter API v1 is retired - username/password auth no longer possible
2013-06-24 09:15:17 +01:00
James Phillpotts
93a1643405
Allow other twitter authorizations than username/password
2013-06-21 14:21:52 +01:00
Thomas Graves
75d78c7ac9
Add support for Spark on Yarn on a secure Hadoop cluster
2013-06-19 11:18:42 -05:00
seanm
f25282def5
fixing kafkaStream Java API and adding test
2013-05-10 17:34:28 -06:00
seanm
3632980b1b
fixing indentation
2013-05-10 15:54:26 -06:00
seanm
b95c1bdbba
count() now uses a transform instead of ConstantInputDStream
2013-05-10 12:47:24 -06:00
Reynold Xin
90577ada69
Merge branch 'shuffle-performance-fix-0.7' of github.com:shane-huang/spark into shufflemerge
...
Conflicts:
core/src/main/scala/spark/storage/BlockManager.scala
core/src/main/scala/spark/storage/DiskStore.scala
project/SparkBuild.scala
2013-05-07 15:56:19 -07:00
Prashant Sharma
24bbf318b3
Fixied other warnings
2013-04-29 19:56:28 +05:30
Prashant Sharma
d3518f57cd
Fixed warning: erasure -> runtimeClass
2013-04-29 18:14:25 +05:30
Prashant Sharma
8f3ac240cb
Fixed Warning: ClassManifest -> ClassTag
2013-04-29 16:39:13 +05:30
Mridul Muralidharan
430c531464
Remove debug statements
2013-04-29 00:24:30 +05:30
Mridul Muralidharan
3a89a76b87
Make log message more descriptive to aid in debugging
2013-04-29 00:04:12 +05:30
Mridul Muralidharan
7fa6978a1e
Allow CheckpointWriter pending tasks to finish
2013-04-28 23:08:10 +05:30
Mridul Muralidharan
afee902443
Attempt to fix streaming test failures after yarn branch merge
2013-04-28 22:26:45 +05:30
Prashant Sharma
ad88f083a6
scala 2.10 and master merge
2013-04-24 18:08:26 +05:30
seanm
7e56e99573
Surfacing decoders on KafkaInputDStream
2013-04-16 17:17:16 -06:00
seanm
ab0f834dbb
adding spark.streaming.blockInterval property
2013-04-16 11:57:05 -06:00
seanm
b42d68c8ce
fixing Spark Streaming count() so that 0 will be emitted when there is nothing to count
2013-04-15 12:54:55 -06:00
shane-huang
df47b40b76
Shuffle Performance fix: Use netty embeded OIO file server instead of ConnectionManager
...
Shuffle Performance Optimization: do not send 0-byte block requests to reduce network messages
change reference from io.Source to scala.io.Source to avoid looking into io.netty package
Signed-off-by: shane-huang <shengsheng.huang@intel.com>
2013-04-07 14:37:12 +08:00
seanm
329ef34c2e
fixing autooffset.reset behavior when set to 'largest'
2013-03-26 23:56:15 -06:00
Holden Karau
1f5381119f
method first in trait IterableLike is deprecated: use `head' instead
2013-03-24 19:19:40 -07:00
seanm
d61978d0ab
keeping JavaStreamingContext in sync with StreamingContext + adding comments for better clarity
2013-03-15 23:36:52 -06:00
seanm
33fa1e7e4a
removing dependency on ZookeeperConsumerConnector + purging last relic of kafka reliability that never solidified (ie- setOffsets)
2013-03-15 00:10:13 -06:00
seanm
d069283211
fixing memory leak in kafka MessageHandler
2013-03-14 23:45:33 -06:00
seanm
cfa8e769a8
KafkaInputDStream improvements. Allows more Kafka configurability
2013-03-14 23:45:19 -06:00
Stephen Haberman
0cf320485d
Forgot equals.
2013-03-12 00:05:35 -05:00
Stephen Haberman
9e68f48625
More quickly call close in HadoopRDD.
...
This also refactors out the common "gotNext" iterator pattern into
a shared utility class.
2013-03-11 23:59:17 -05:00
Matei Zaharia
4223be3aa4
Merge pull request #503 from pwendell/bug-fix
...
createNewSparkContext should use sparkHome/jars/environment.
2013-02-25 19:43:05 -08:00
Patrick Wendell
284ba90958
createNewSparkContext should use sparkHome/jars/environment.
...
This fixes a bug introduced by Matei's recent change.
2013-02-25 19:40:52 -08:00
Matei Zaharia
5d7b591cfe
Pass a code JAR to SparkContext in our examples. Fixes SPARK-594.
2013-02-25 19:34:32 -08:00
Matei Zaharia
4d480ec59e
Fixed something that was reported as a compile error in ScalaDoc.
...
For some reason, ScalaDoc complained about no such constructor for
StreamingContext; it doesn't seem like an actual Scala error but it
prevented sbt publish and from working because docs weren't built.
2013-02-25 15:53:43 -08:00
Matei Zaharia
490f056cdd
Allow passing sparkHome and JARs to StreamingContext constructor
...
Also warns if spark.cleaner.ttl is not set in the version where you pass
your own SparkContext.
2013-02-25 15:13:30 -08:00
Tathagata Das
28f8b721f6
Added back the initial spark job before starting streaming receivers
2013-02-24 13:01:54 -08:00
Tathagata Das
b4eb24de96
Updated streaming programming guide with Java API info, and comments from Patrick.
2013-02-23 23:59:45 -08:00
Tathagata Das
41285eaae3
Fixed differences in APIs of StreamingContext and JavaStreamingContext. Change rawNetworkStream to rawSocketStream, and added twitter, actor, zeroMQ streams to JavaStreamingContext. Also added them to JavaAPISuite.
2013-02-23 16:25:07 -08:00
Tathagata Das
d8cee52d52
Merge branch 'mesos-streaming' into streaming
2013-02-22 18:25:34 -08:00
Tathagata Das
688e62718f
Merge pull request #479 from ScrapCodes/zeromq-streaming
...
Zeromq streaming
2013-02-22 12:17:17 -08:00
Tathagata Das
208edaac1b
Fixed condition in InputDStream isTimeValid.
2013-02-21 15:22:26 -08:00
Tathagata Das
972fe7714f
Merge branch 'mesos-streaming' into streaming
...
Conflicts:
streaming/src/test/java/spark/streaming/JavaAPISuite.java
2013-02-20 11:06:01 -08:00
Tathagata Das
fb9956256d
Merge branch 'mesos-master' into streaming
...
Conflicts:
core/src/main/scala/spark/rdd/CheckpointRDD.scala
streaming/src/main/scala/spark/streaming/dstream/ReducedWindowedDStream.scala
2013-02-20 09:01:29 -08:00
Prashant Sharma
4e5b09664c
fixes corresponding to review feedback at pull request #479
2013-02-20 19:14:52 +05:30
Patrick Wendell
041c19e5f0
Small changes that were missing in merge
2013-02-19 08:44:20 -08:00
Patrick Wendell
fed1122d74
Use RDD type for slice
operator in Java.
...
This commit uses the RDD type in `slice`, making it available to both normal
and pair RDD's in java. It also updates the signature for `slice` to match
changes in the Scala API.
2013-02-19 08:32:38 -08:00
Patrick Wendell
35880de42e
Use RDD type for transform
operator in Java.
...
This is an improved implementation of the `transform` operator in Java.
The main difference is that this allows all four possible types of
transform functions
1. JavaRDD -> JavaRDD
2. JavaRDD -> JavaPairRDD
3. JavaPairRDD -> JavaPairRDD
4. JavaPairRDD -> JavaRDD
whereas previously only (1) and (3) were possible.
Conflicts:
streaming/src/test/java/spark/streaming/JavaAPISuite.java
2013-02-19 08:31:58 -08:00
Patrick Wendell
9d49a6b03f
Use RDD type for foreach
operator in Java.
2013-02-19 08:30:32 -08:00
Prashant Sharma
8d44480d84
example for demonstrating ZeroMQ stream
2013-02-19 19:42:14 +05:30
Prashant Sharma
f7d3e309cb
ZeroMQ stream as receiver
2013-02-19 19:32:52 +05:30
Tathagata Das
9e82be1503
Merge branch 'streaming' into ScrapCodes-streaming-actor
...
Conflicts:
docs/plugin-custom-receiver.md
streaming/src/main/scala/spark/streaming/StreamingContext.scala
streaming/src/main/scala/spark/streaming/dstream/KafkaInputDStream.scala
streaming/src/main/scala/spark/streaming/dstream/PluggableInputDStream.scala
streaming/src/main/scala/spark/streaming/receivers/ActorReceiver.scala
streaming/src/test/scala/spark/streaming/InputStreamsSuite.scala
2013-02-19 02:48:50 -08:00
Tathagata Das
12ea14c211
Changed networkStream to socketStream and pluggableNetworkStream to become networkStream as a way to create streams from arbitrary network receiver.
2013-02-18 15:18:34 -08:00
Tathagata Das
6a6e6bda57
Merge branch 'streaming' into ScrapCode-streaming
...
Conflicts:
streaming/src/main/scala/spark/streaming/dstream/KafkaInputDStream.scala
streaming/src/main/scala/spark/streaming/dstream/NetworkInputDStream.scala
2013-02-18 13:26:12 -08:00
Tathagata Das
8ad561dc7d
Added checkpointing and fault-tolerance semantics to the programming guide. Fixed default checkpoint interval to being a multiple of slide duration. Fixed visibility of some classes and objects to clean up docs.
2013-02-18 02:12:41 -08:00
Matei Zaharia
7151e1e4c8
Rename "jobs" to "applications" in the standalone cluster
2013-02-17 23:23:08 -08:00
Tathagata Das
f98c7da23e
Many changes to ensure better 2nd recovery if 2nd failure happens while
...
recovering from 1st failure
- Made the scheduler to checkpoint after clearing old metadata which
ensures that a new checkpoint is written as soon as at least one batch
gets computed while recovering from a failure. This ensures that if
there is a 2nd failure while recovering from 1st failure, the system
start 2nd recovery from a newer checkpoint.
- Modified Checkpoint writer to write checkpoint in a different thread.
- Added a check to make sure that compute for InputDStreams gets called
only for strictly increasing times.
- Changed implementation of slice to call getOrCompute on parent DStream
in time-increasing order.
- Added testcase to test slice.
- Fixed testGroupByKeyAndWindow testcase in JavaAPISuite to verify
results with expected output in an order-independent manner.
2013-02-17 15:06:41 -08:00
Stephen Haberman
ae2234687d
Make CoGroupedRDDs explicitly have the same key type.
2013-02-16 13:10:31 -06:00
Tathagata Das
ddcb976b0d
Made MasterFailureTest more robust.
2013-02-15 06:54:47 +00:00
Tathagata Das
4b8402e900
Moved Java streaming examples to examples/src/main/java/spark/streaming/... and fixed logging in NetworkInputTracker to highlight errors when receiver deregisters/shuts down.
2013-02-14 18:10:37 -08:00
Tathagata Das
def8126d77
Added TwitterInputDStream from example to StreamingContext. Renamed example TwitterBasic to TwitterPopularTags.
2013-02-14 17:49:43 -08:00
Tathagata Das
2eacf22401
Removed countByKeyAndWindow on paired DStreams, and added countByValueAndWindow for all DStreams. Updated both scala and java API and testsuites.
2013-02-14 12:21:47 -08:00
Tathagata Das
03e8dc6861
Changes functions comments to make them more consistent.
2013-02-13 20:59:29 -08:00
Tathagata Das
12b020b668
Added filter functionality to reduceByKeyAndWindow with inverse. Consolidated reduceByKeyAndWindow's many functions into smaller number of functions with optional parameters.
2013-02-13 20:53:50 -08:00
Tathagata Das
39addd3803
Changed scheduler and file input stream to fix bugs in the driver fault tolerance. Added MasterFailureTest to rigorously test master fault tolerance with file input stream.
2013-02-13 12:17:45 -08:00
Patrick Wendell
3f3e77f28b
STREAMING-50: Support transform workaround in JavaPairDStream
...
This ports a useful workaround (the `transform` function) to
JavaPairDStream. It is necessary to do things like sorting which
are not supported yet in the core streaming API.
2013-02-12 14:02:32 -08:00
Patrick Wendell
c65988bdc1
Fix for MapPartitions
2013-02-11 10:03:37 -08:00
Patrick Wendell
20cf770545
Fix for flatmap
2013-02-11 10:03:37 -08:00
Patrick Wendell
f0b68c623c
Initial cut at replacing K, V in Java files
2013-02-11 10:03:37 -08:00
Tathagata Das
fd90daf850
Fixed bugs in FileInputDStream and Scheduler that occasionally failed to reprocess old files after recovering from master failure. Completely modified spark.streaming.FailureTest to test multiple master failures using file input stream.
2013-02-10 19:48:42 -08:00
Tathagata Das
99a5fc498a
Added an initial spark job to ensure worker nodes are initialized.
2013-02-09 15:18:05 -08:00
Tathagata Das
4cc223b478
Merge branch 'mesos-master' into streaming
2013-02-07 13:59:31 -08:00
Tathagata Das
d55e3aa467
Updated JavaStreamingContext with updated kafkaStream API.
2013-02-07 13:59:18 -08:00
Tathagata Das
c6b2f765d3
Merge branch 'mesos-streaming' into streaming
2013-02-07 13:13:53 -08:00
Tathagata Das
12300758cc
Merge pull request #372 from Reinvigorate/sm-kafka
...
Removing offset management code that is non-existent in kafka 0.7.0+
2013-02-07 12:41:07 -08:00
Tathagata Das
915d9931fe
Merge pull request #373 from Reinvigorate/sm-updateStateByKey
...
StateDStream changes to give updateStateByKey consistent behavior
2013-02-07 11:59:19 -08:00
Patrick Wendell
7eea64aa4c
Streaming constructor which takes JavaSparkContext
...
It's sometimes helpful to directly pass a JavaSparkContext,
and take advantage of the various constructors available for that.
2013-02-05 11:43:16 -08:00
Matei Zaharia
9ae11603b4
Merge pull request #415 from stephenh/driver
...
Replace old 'master' term with 'driver'.
2013-01-29 10:41:42 -08:00
Matei Zaharia
b29599e5cf
Fix code that depended on metadata cleaner interval being in minutes
2013-01-28 22:24:47 -08:00
Stephen Haberman
7dfb82a992
Replace old 'master' term with 'driver'.
2013-01-25 11:03:00 -06:00
Tathagata Das
666ce431aa
Added support for rescheduling unprocessed batches on master failure.
2013-01-23 03:15:36 -08:00
Tathagata Das
fad2b82fc8
Added support for saving input files of FileInputDStream to graph checkpoints. Modified 'file input stream with checkpoint' testcase to test recovery of pre-master-failure input files.
2013-01-22 18:10:00 -08:00
Tathagata Das
364cdb679c
Refactored DStreamCheckpointData.
2013-01-22 00:43:31 -08:00
Prashant Sharma
d17065c4b5
actor as receiver
2013-01-22 13:28:29 +05:30
Josh Rosen
551a47a620
Refactor daemon thread pool creation.
2013-01-21 23:31:00 -08:00
Prashant Sharma
43bfd7bb21
Changed method name of createReceiver to getReceiver as it is not intended to be a factory.
2013-01-21 11:39:30 +05:30
seanm
c0694291c8
Splitting StreamingContext.queueStream into two methods
2013-01-20 12:09:45 -07:00
Tathagata Das
4f8fe58b25
Merge branch 'mesos-streaming' into streaming
...
Conflicts:
core/src/main/scala/spark/api/java/JavaRDDLike.scala
core/src/main/scala/spark/api/java/JavaSparkContext.scala
core/src/test/scala/spark/JavaAPISuite.java
2013-01-20 01:13:56 -08:00
Tathagata Das
214345ceac
Fixed issue https://spark-project.atlassian.net/browse/STREAMING-29 , along with updates to doc comments in SparkContext.checkpoint().
2013-01-19 23:50:17 -08:00
Prashant Sharma
56b9bd197c
Plug in actor as stream receiver API
2013-01-19 22:04:07 +05:30
Prashant Sharma
bb6ab92e31
Changed method name of createReceiver to getReceiver as it is not intended to be a factory.
2013-01-19 22:04:07 +05:30
seanm
d3064fe707
kafkaStream API cleanup. A quorum of zookeepers can now be specified
2013-01-18 21:34:29 -07:00
seanm
56b7fbafa2
further KafkaInputDStream cleanup (removing unused and commented out code relating to offset management)
2013-01-18 21:15:54 -07:00
Patrick Wendell
e0165bf714
Adding queueStream and some slight refactoring
2013-01-17 21:25:49 -08:00
Patrick Wendell
70ba994d6d
Import fixup
2013-01-17 18:41:59 -08:00
Patrick Wendell
2261e62ee5
Style cleanup
2013-01-17 18:41:59 -08:00
Patrick Wendell
82b8707c6b
Checkpointing in Streaming java API
2013-01-17 18:41:58 -08:00
Patrick Wendell
61b877c688
Adding flatMap
2013-01-17 18:41:58 -08:00
Patrick Wendell
8e6cbbc6c7
Adding other updateState functions
2013-01-17 18:41:58 -08:00
Patrick Wendell
2a872335c5
Bug fix and test cleanup
2013-01-17 18:41:58 -08:00
seanm
a4639400ea
Merge branch 'streaming' into sm-updateStateByKey
2013-01-15 20:29:01 -07:00
Tathagata Das
1638fcb0dc
Fixed updateStateByKey to work with primitive types.
2013-01-14 17:18:39 -08:00
seanm
c203a29296
StateDStream changes to give updateStateByKey consistent behavior
2013-01-14 17:22:03 -07:00
seanm
b61a4ec773
Removing offset management code that is non-existent in kafka 0.7.0+
2013-01-14 17:13:10 -07:00
Patrick Wendell
a0013beb03
Stash
2013-01-14 15:15:02 -08:00
Patrick Wendell
8ad6220bd3
Bugfix
2013-01-14 14:56:36 -08:00
Patrick Wendell
38d9a3a863
Remove AnyRef constraint in updateState
2013-01-14 13:31:49 -08:00
Patrick Wendell
ae5290f4a2
Bug fix
2013-01-14 13:31:32 -08:00
Patrick Wendell
6069446356
Making comments consistent w/ Spark style
2013-01-14 10:42:05 -08:00
Patrick Wendell
d182a57cae
Two changes:
...
- Updating countByX() types based on bug fix
- Porting new documentation to Java
2013-01-14 10:03:55 -08:00
Patrick Wendell
a292ed8d8a
Some style cleanup
2013-01-14 09:42:36 -08:00
Patrick Wendell
3461cd99b7
Flume example and bug fix
2013-01-14 09:42:36 -08:00
Patrick Wendell
5bcb048167
More work on InputStreams
2013-01-14 09:42:36 -08:00
Patrick Wendell
280b6d0186
Porting to new Duration class
2013-01-14 09:42:36 -08:00
Patrick Wendell
c2537057f9
Fixing issue with <Long> types
2013-01-14 09:42:36 -08:00
Patrick Wendell
b36c4f7cce
More work on StreamingContext
2013-01-14 09:42:36 -08:00
Patrick Wendell
5004eec37c
Import Cleanup
2013-01-14 09:42:36 -08:00
Patrick Wendell
560c312c60
Docs, some tests, and work on
...
StreamingContext
2013-01-14 09:42:36 -08:00
Patrick Wendell
7e1049d8f1
Squashing a few TODOs
2013-01-14 09:42:36 -08:00
Patrick Wendell
74182010a4
Style cleanup and moving functions
2013-01-14 09:42:36 -08:00
Patrick Wendell
056f5efc55
More pair functions
2013-01-14 09:42:36 -08:00
Patrick Wendell
6e514a8d35
PairDStream and DStreamLike
2013-01-14 09:42:36 -08:00
Patrick Wendell
f144e0413a
Adding transform and union
2013-01-14 09:42:36 -08:00
Patrick Wendell
22a8c7be9a
Adding more tests
2013-01-14 09:42:36 -08:00
Patrick Wendell
867a7455e2
Adding some initial tests to streaming API.
2013-01-14 09:42:36 -08:00
Patrick Wendell
b607c9e916
A very rough, early cut at some Java functionality for Streaming.
2013-01-14 09:42:36 -08:00
Tathagata Das
f90f794cde
Minor name fix
2013-01-13 21:25:57 -08:00
Tathagata Das
0dbd411a56
Added documentation for PairDStreamFunctions.
2013-01-13 21:08:35 -08:00
Tathagata Das
0a2e333341
Removed stream id from the constructor of NetworkReceiver to make it easier for PluggableNetworkInputDStream.
2013-01-13 16:18:39 -08:00
Tathagata Das
365506fb03
Changed variable name form ***Time to ***Duration to keep things consistent.
2013-01-09 14:29:25 -08:00
Tathagata Das
156e8b47ef
Split Time to Time (absolute instant of time) and Duration (duration of time).
2013-01-09 12:42:10 -08:00
Tathagata Das
8c1b872512
Moved Twitter example to the where the other examples are.
2013-01-07 17:48:10 -08:00
Tathagata Das
64dceec293
Merge branch 'streaming-merge' into dev-merge
2013-01-07 16:54:35 -08:00
Tathagata Das
d808e1026a
Merge branch 'dev' into dev-merge
2013-01-07 16:41:11 -08:00
Tathagata Das
237bac36e9
Renamed examples and added documentation.
2013-01-07 14:37:21 -08:00
Tathagata Das
af8738dfb5
Moved Spark Streaming examples to examples sub-project.
2013-01-06 19:31:54 -08:00
Patrick Wendell
2ef993d159
BufferingBlockCreator -> NetworkReceiver.BlockGenerator
2013-01-02 14:19:51 -08:00
Patrick Wendell
96a6ff0b09
Merge branch 'dev-merge' into datahandler-fix
...
Conflicts:
streaming/src/main/scala/spark/streaming/dstream/DataHandler.scala
2013-01-02 14:08:15 -08:00
Patrick Wendell
493d65ce65
Several code-quality improvements to DataHandler.
...
- Changed to more accurate name: BufferingBlockCreator
- Docstring now correctly reflects the abstraction
offered by the class
- Made internal methods private
- Fixed indentation problems
2013-01-02 13:39:18 -08:00
Tathagata Das
02497f0cd4
Updated Streaming Programming Guide.
2013-01-01 12:21:32 -08:00
Tathagata Das
18b9b3b99f
More classes made private[streaming] to hide from scala docs.
2012-12-30 20:00:42 -08:00
Tathagata Das
7e0271b438
Refactored a whole lot to push all DStreams into the spark.streaming.dstream package.
2012-12-30 15:19:55 -08:00
Tathagata Das
9e644402c1
Improved jekyll and scala docs. Made many classes and method private to remove them from scala docs.
2012-12-29 18:31:51 -08:00
Patrick Wendell
518111573f
Merge pull request #8 from radlab/twitter-example
...
Adding a Twitter InputDStream with an example
2012-12-29 14:23:01 -08:00
Patrick Wendell
bce84ceabb
Minor changes after review and general cleanup.
...
- Added filters to Twitter example
- Removed un-used import
- Some code clean-up
2012-12-21 20:57:46 -08:00
Patrick Wendell
9ac4cb1c5f
Adding a Twitter InputDStream with an example
2012-12-21 17:18:19 -08:00
Tathagata Das
8512dd3225
Merge branch 'dev' of github.com:radlab/spark into dev-checkpoint
...
Conflicts:
core/src/main/scala/spark/ParallelCollection.scala
core/src/test/scala/spark/CheckpointSuite.scala
streaming/src/main/scala/spark/streaming/DStream.scala
2012-12-20 14:24:19 -08:00
Tathagata Das
8e74fac215
Made checkpoint data in RDDs optional to further reduce serialized size.
2012-12-11 15:36:12 -08:00
Patrick Wendell
3e796bdd57
Changes in response to TD's review.
2012-12-07 19:34:05 -08:00
Patrick Wendell
3ff9710265
Adding Flume InputDStream
2012-12-07 16:42:39 -08:00
Denny
a23462191f
Adjust Kafka code to work with new streaming changes.
2012-12-05 10:30:40 -08:00
Denny
15df4b0e52
Merge branch 'dev' into kafka
...
Conflicts:
streaming/src/main/scala/spark/streaming/DStream.scala
2012-12-05 10:16:56 -08:00
Tathagata Das
21a0852976
Refactored RDD checkpointing to minimize extra fields in RDD class.
2012-12-04 22:10:25 -08:00
Tathagata Das
609e00d599
Minor mods
2012-12-02 02:39:08 +00:00
Tathagata Das
b4dba55f78
Made RDD checkpoint not create a new thread. Fixed bug in detecting when spark.cleaner.delay is insufficient.
2012-12-02 02:03:05 +00:00
Tathagata Das
477de94894
Minor modifications.
2012-12-01 13:15:06 -08:00
Tathagata Das
62965c5d8e
Added ssc.union
2012-12-01 08:26:10 -08:00
Tathagata Das
d5e7aad039
Bug fixes
2012-11-28 08:36:55 +00:00
Tathagata Das
b18d70870a
Modified bunch HashMaps in Spark to use TimeStampedHashMap and made various modules use CleanupTask to periodically clean up metadata.
2012-11-27 15:08:49 -08:00
Tathagata Das
fd11d23bb3
Modified StreamingContext API to make constructor accept the batch size (since it is always needed, Patrick's suggestion). Added description to DStream and StreamingContext.
2012-11-19 19:04:39 -08:00
Tathagata Das
c97ebf6437
Fixed bug in the number of splits in RDD after checkpointing. Modified reduceByKeyAndWindow (naive) computation from window+reduceByKey to reduceByKey+window+reduceByKey.
2012-11-19 23:22:07 +00:00
Denny
5e2b0a3bf6
Added Kafka Wordcount producer
2012-11-19 10:17:58 -08:00
Denny
6757ed6a40
Comment out code for fault-tolerance.
2012-11-19 09:42:35 -08:00
Denny
f56befa914
Merge branch 'dev' into kafka
2012-11-19 09:29:54 -08:00
Tathagata Das
3fd7b8319b
Merge branch 'dev' of github.com:radlab/spark into dev
2012-11-17 17:27:07 -08:00
Tathagata Das
10c1abcb6a
Fixed checkpointing bug in CoGroupedRDD. CoGroupSplits kept around the RDD splits of its parent RDDs, thus checkpointing its parents did not release the references to the parent splits.
2012-11-17 17:27:00 -08:00
Patrick Wendell
efa93fd0e6
Merge pull request #4 from radlab/streaming-example
...
A "streaming page view" example.
2012-11-16 20:40:27 -08:00
Patrick Wendell
720cb0f467
A "streaming page view" example.
2012-11-16 12:11:22 -08:00
Denny
2aceae25be
Merge branch 'dev' into kafka
...
Conflicts:
streaming/src/main/scala/spark/streaming/DStream.scala
2012-11-13 13:16:18 -08:00
Denny
b6f7ba813e
change import for example function
2012-11-13 13:15:32 -08:00
Tathagata Das
26fec8f0b8
Fixed bug in MappedValuesRDD, and set default graph checkpoint interval to be batch duration.
2012-11-13 11:05:57 -08:00
Tathagata Das
c3ccd14cf8
Replaced StateRDD in StateDStream with MapPartitionsRDD.
2012-11-13 02:43:03 -08:00
Tathagata Das
8a25d530ed
Optimized checkpoint writing by reusing FileSystem object. Fixed bug in updating of checkpoint data in DStream where the checkpointed RDDs, upon recovery, were not recognized as checkpointed RDDs and therefore deleted from HDFS. Made InputStreamsSuite more robust to timing delays.
2012-11-13 02:16:28 -08:00
Denny
255b3e44c1
Merge branch 'dev' into kafka
2012-11-12 19:39:29 -08:00
Tathagata Das
b9bfd1456f
Changed default level on calling DStream.persist() to be MEMORY_ONLY_SER. Also changed the persist level of StateDStream to be MEMORY_ONLY_SER.
2012-11-12 21:51:42 +00:00
Tathagata Das
ae61ebaee6
Fixed bugs in RawNetworkInputDStream and in its examples. Made the ReducedWindowedDStream persist RDDs to MEMOERY_SER_ONLY by default. Removed unncessary examples. Added streaming-env.sh.template to add recommended setting for streaming.
2012-11-12 21:45:16 +00:00
tdas
052d0b800f
Merge branch 'dev' of github.com:radlab/spark into dev
2012-11-11 22:56:14 +00:00
Tathagata Das
46222dc56d
Fixed bug in FileInputDStream that allowed it to miss new files. Added tests in the InputStreamsSuite to test checkpointing of file and network streams.
2012-11-11 13:20:09 -08:00
Denny
0fd4c93f1c
Updated comment.
2012-11-11 11:15:31 -08:00
Denny
deb2c4df72
Add comment.
2012-11-11 11:11:49 -08:00
Denny
d006109e95
Kafka Stream comments.
2012-11-11 11:06:49 -08:00
Denny
2e8f2ee4ad
Merge branch 'dev' of github.com:radlab/spark into kafka
...
Conflicts:
streaming/src/main/scala/spark/streaming/DStream.scala
2012-11-09 12:26:17 -08:00
Denny
e5a0936787
Kafka Stream.
2012-11-09 12:23:46 -08:00
tdas
52d21cb682
Removed unnecessary files.
2012-11-08 11:35:40 +00:00
Tathagata Das
fc3d0b602a
Added FailureTestsuite for testing multiple, repeated master failures.
2012-11-06 17:23:31 -08:00
Denny
485803d740
Merge branch 'dev' of github.com:radlab/spark into kafka
2012-11-06 09:41:45 -08:00
Denny
0c1de43fc7
Working on kafka.
2012-11-06 09:41:42 -08:00
Tathagata Das
f8bb719cd2
Added a few more comments to the checkpoint-related functions.
2012-11-05 17:53:56 -08:00
Tathagata Das
395167f2b2
Made more bug fixes for checkpointing.
2012-11-05 16:11:50 -08:00
Tathagata Das
72b2303f99
Fixed major bugs in checkpointing.
2012-11-05 11:41:36 -08:00
Tathagata Das
d154238789
Made checkpointing of dstream graph to work with checkpointing of RDDs. For streams requiring checkpointing of its RDD, the default checkpoint interval is set to 10 seconds.
2012-11-04 12:12:06 -08:00
Tathagata Das
3fb5c9ee24
Fixed serialization bug in countByWindow, added countByKey and countByKeyAndWindow, and added testcases for them.
2012-11-02 12:12:25 -07:00
Tathagata Das
1b900183c8
Added save operations to DStreams.
2012-10-27 18:55:50 -07:00
Tathagata Das
650d717544
Merge branch 'dev' of github.com:radlab/spark into dev
2012-10-25 13:03:18 -07:00
Matei Zaharia
863a55ae42
Merge remote-tracking branch 'public/master' into dev
...
Conflicts:
core/src/main/scala/spark/BlockStoreShuffleFetcher.scala
core/src/main/scala/spark/KryoSerializer.scala
core/src/main/scala/spark/MapOutputTracker.scala
core/src/main/scala/spark/RDD.scala
core/src/main/scala/spark/SparkContext.scala
core/src/main/scala/spark/executor/Executor.scala
core/src/main/scala/spark/network/Connection.scala
core/src/main/scala/spark/network/ConnectionManagerTest.scala
core/src/main/scala/spark/rdd/BlockRDD.scala
core/src/main/scala/spark/rdd/NewHadoopRDD.scala
core/src/main/scala/spark/scheduler/ShuffleMapTask.scala
core/src/main/scala/spark/scheduler/cluster/StandaloneSchedulerBackend.scala
core/src/main/scala/spark/storage/BlockManager.scala
core/src/main/scala/spark/storage/BlockMessage.scala
core/src/main/scala/spark/storage/BlockStore.scala
core/src/main/scala/spark/storage/StorageLevel.scala
core/src/main/scala/spark/util/AkkaUtils.scala
project/SparkBuild.scala
run
2012-10-24 23:21:00 -07:00
Tathagata Das
926e05b030
Added tests for the file input stream.
2012-10-24 23:14:37 -07:00
Tathagata Das
ed71df46cd
Minor fixes.
2012-10-24 16:49:40 -07:00
Tathagata Das
1ef6ea2513
Added tests for testing network input stream.
2012-10-24 14:44:20 -07:00
Tathagata Das
0e5d9be4df
Renamed APIs to create queueStream and fileStream.
2012-10-23 15:17:05 -07:00
Tathagata Das
c2731dd3ef
Updated StateDStream api to use Options instead of nulls.
2012-10-23 15:10:27 -07:00
Tathagata Das
19191d178d
Renamed the network input streams.
2012-10-23 14:40:24 -07:00
Tathagata Das
a6de5758f1
Modified API of NetworkInputDStreams and got ObjectInputDStream and RawInputDStream working.
2012-10-23 01:41:13 -07:00
Tathagata Das
2c87c853ba
Renamed examples
2012-10-22 15:31:19 -07:00
Tathagata Das
d85c66636b
Added MapValueDStream, FlatMappedValuesDStream and CoGroupedDStream, and therefore DStream operations mapValue, flatMapValues, cogroup, and join. Also, added tests for DStream operations filter, glom, mapPartitions, groupByKey, mapValues, flatMapValues, cogroup, and join.
2012-10-21 17:40:08 -07:00
Tathagata Das
c4a2b6f636
Fixed some bugs in tests for forgetting RDDs, and made sure that use of manual clock leads to a zeroTime of 0 in the DStreams (more intuitive).
2012-10-21 10:41:25 -07:00
Tathagata Das
6d5eb4b40c
Added functionality to forget RDDs from DStreams.
2012-10-19 12:11:44 -07:00
Tathagata Das
b760d6426a
Minor modifications.
2012-10-15 12:26:44 -07:00
Tathagata Das
3f1aae5c71
Refactored DStreamSuiteBase to create CheckpointSuite- testsuite for testing checkpointing under different operations.
2012-10-14 21:39:30 -07:00
Tathagata Das
e95ff45b53
Implemented checkpointing of StreamingContext and DStream graph.
2012-10-13 20:10:49 -07:00
Tathagata Das
6d1fe02685
Merge branch 'dev' of github.com:radlab/spark into dev
2012-09-17 14:26:06 -07:00
Tathagata Das
86d420478f
Allowed StreamingContext to be created from existing SparkContext
2012-09-17 14:25:48 -07:00
Tathagata Das
3cbc72ff1d
Minor tweaks
2012-09-14 07:00:30 +00:00
Tathagata Das
0269792c17
Merge branch 'dev' of github.com:radlab/spark into dev
...
Conflicts:
streaming/src/main/scala/spark/streaming/Scheduler.scala
2012-09-07 20:18:30 +00:00
Tathagata Das
b5750726ff
Fixed bugs in streaming Scheduler and optimized QueueInputDStream.
2012-09-07 20:16:21 +00:00
haoyuan
381e2c7ac4
add warmup code for TopKWordCountRaw.scala
2012-09-06 20:54:52 -07:00
haoyuan
0681bbc5d9
Merge branch 'dev' of github.com:radlab/spark into dev
2012-09-07 02:18:33 +00:00
haoyuan
db08a362aa
commit opt for grep scalibility test.
2012-09-07 02:17:52 +00:00
Tathagata Das
4a7bde6865
Fixed bugs and added testcases for naive reduceByKeyAndWindow.
2012-09-06 19:06:59 -07:00
Tathagata Das
203ac8fa8b
Merge branch 'dev' of github.com:radlab/spark into dev
2012-09-06 05:29:06 -07:00
Tathagata Das
babb7e3ce2
Re-implemented ReducedWindowedDSteam to simplify and fix bugs. Added slice operator to DStream. Also, refactored DStream testsuites and added tests for reduceByKeyAndWindow.
2012-09-06 05:28:29 -07:00
root
019de4562c
Less warmup in word count
2012-09-06 02:50:41 +00:00
root
4a5d0d249e
Merge branch 'dev' of github.com:radlab/spark into dev
2012-09-05 08:23:09 +00:00
root
b7ad291ac5
Tuning Akka for more connections
2012-09-05 07:08:07 +00:00
root
fc186dc18a
Merge branch 'dev' of github.com:radlab/spark into dev
2012-09-05 05:53:18 +00:00
root
4ea032a142
Some changes to make important log output visible even if we set the logging to WARNING
2012-09-05 05:53:07 +00:00
Tathagata Das
25fd684b89
Merge branch 'dev' of github.com:radlab/spark into dev
2012-09-04 20:44:14 -07:00
Tathagata Das
7c09ad0e04
Changed DStream member access permissions from private to protected. Updated StateDStream to checkpoint RDDs and forget lineage.
2012-09-04 19:11:49 -07:00
haoyuan
96a1f2277d
fix the compile error in TopKWordCountRaw.scala
2012-09-04 18:03:34 -07:00
haoyuan
2ff72f60ac
add TopKWordCountRaw.scala
2012-09-04 17:55:55 -07:00
Tathagata Das
389a78722c
Updated the return types of PairDStreamFunctions to return DStreams instead of ShuffleDStreams for cleaner abstraction.
2012-09-04 15:37:46 -07:00
root
7b892ee66e
Merge branch 'dev' of github.com:radlab/spark into dev
2012-09-04 04:27:10 +00:00
root
1878731671
Various test programs
2012-09-04 04:26:53 +00:00
Tathagata Das
b8e9e8ea78
Merge branch 'dev' of github.com:radlab/spark into dev
2012-09-02 02:35:32 -07:00
Tathagata Das
7419d2c7ea
Added transformRDD DStream operation and TransformedDStream. Added sbt assembly option for streaming project.
2012-09-02 02:35:17 -07:00
root
ceabf71257
tweaks
2012-09-01 21:52:42 +00:00
root
6025889be0
More raw network receiver programs
2012-09-01 20:51:07 +00:00
root
bf993cda63
Make batch size configurable in RawCount
2012-09-01 19:59:23 +00:00
root
83dad56334
Further fixes to raw text sender, plus an app that uses it
2012-09-01 19:45:25 +00:00
Matei Zaharia
f84d2bbe55
Bug fixes to RateLimitedOutputStream
2012-09-01 00:31:15 -07:00
Matei Zaharia
44758aa8e2
First work towards a RawInputDStream and a sender program for it.
2012-09-01 00:17:59 -07:00
Matei Zaharia
51fb13dd16
Bug fix
2012-08-31 15:36:11 -07:00
Matei Zaharia
ce42a46375
Bug fix
2012-08-31 15:35:35 -07:00
Matei Zaharia
f92d4a6ac1
Better output messages for streaming job duration
2012-08-31 15:33:48 -07:00
Tathagata Das
2d01d38a41
Added StateDStream, corresponding stateful stream operations, and testcases. Also refactored few PairDStreamFunctions methods.
2012-08-31 03:47:34 -07:00
root
e1da274a48
WordCount tweaks
2012-08-31 07:16:19 +00:00
root
d4d2cb670f
Make checkpoint interval configurable in WordCount2
2012-08-31 00:34:57 +00:00
root
1f8085b8d0
Compile fixes
2012-08-29 03:20:56 +00:00
Tathagata Das
43e66146f7
Merge branch 'dev' of github.com/radlab/spark into dev
2012-08-28 13:51:05 -07:00
Tathagata Das
b5b93a621c
Added capabllity to take streaming input from network. Renamed SparkStreamContext to StreamingContext.
2012-08-28 12:35:19 -07:00
root
e2cf197a0a
Made WordCount2 even more configurable
2012-08-27 03:34:15 +00:00
root
b78c5ae803
Merge branch 'dev' of github.com:radlab/spark into dev
2012-08-27 01:16:39 +00:00
root
9de1c3abf9
Tweaks to WordCount2
2012-08-27 00:57:00 +00:00
Matei Zaharia
57796b183e
Code style
2012-08-26 17:25:22 -07:00
Matei Zaharia
22b1a20e61
Made Time and Interval immutable
2012-08-26 17:04:34 -07:00
Matei Zaharia
23a29b6d19
Merge branch 'dev' of github.com:radlab/spark into dev
2012-08-26 16:45:37 -07:00
Matei Zaharia
b120e24fe0
Add equals and hashCode to Time
2012-08-26 16:45:14 -07:00
root
b08ff710af
Added sliding word count, and some fixes to reduce window DStream
2012-08-26 23:40:50 +00:00
Matei Zaharia
ad6537321e
Make Time serializable
2012-08-26 16:27:23 -07:00
Matei Zaharia
091b1438f5
Fix WordCount job name
2012-08-24 16:43:59 -07:00
Tathagata Das
cae894ee7a
Added new Clock interface that is used by RecurringTimer to scheduler events on system time or manually-configured time.
2012-08-06 14:52:46 -07:00
Matei Zaharia
43b81eb271
Renamed RDS to DStream, plus minor style fixes
2012-08-02 14:05:51 -04:00
Matei Zaharia
29bf44473c
Added an RDS that repeatedly returns the same input
2012-08-02 11:43:04 -04:00
Matei Zaharia
650d11817e
Added a WordCount for external data and fixed bugs in file streams
2012-08-02 11:09:43 -04:00
Tathagata Das
ed897ac5e1
Moved streaming files not immediately necessary to spark.streaming.util.
2012-08-01 22:28:54 -07:00
Tathagata Das
3be54c2a8a
1. Refactored SparkStreamContext, Scheduler, InputRDS, FileInputRDS and a few other files.
...
2. Modified Time class to represent milliseconds (long) directly, instead of LongTime.
3. Added new files QueueInputRDS, RecurringTimer, etc.
4. Added RDDSuite as the skeleton for testcases.
5. Added two examples in spark.streaming.examples.
6. Removed all past examples and a few unnecessary files. Moved a number of files to spark.streaming.util.
2012-08-01 22:09:27 -07:00