Tathagata Das
bacfe5ebca
Added JavaStreamingContext.transform
2013-10-24 10:56:24 -07:00
Matei Zaharia
dd659642e7
Merge pull request #64 from prabeesh/master
...
MQTT Adapter for Spark Streaming
MQTT is a machine-to-machine (M2M)/Internet of Things connectivity protocol.
It was designed as an extremely lightweight publish/subscribe messaging transport. You may read more about it here http://mqtt.org/
Message Queue Telemetry Transport (MQTT) is an open message protocol for M2M communications. It enables the transfer of telemetry-style data in the form of messages from devices like sensors and actuators, to mobile phones, embedded systems on vehicles, or laptops and full scale computers.
The protocol was invented by Andy Stanford-Clark of IBM, and Arlen Nipper of Cirrus Link Solutions
This protocol enables a publish/subscribe messaging model in an extremely lightweight way. It is useful for connections with remote locations where line of code and network bandwidth is a constraint.
MQTT is one of the widely used protocol for 'Internet of Things'. This protocol is getting much attraction as anything and everything is getting connected to internet and they all produce data. Researchers and companies predict some 25 billion devices will be connected to the internet by 2015.
Plugin/Support for MQTT is available in popular MQs like RabbitMQ, ActiveMQ etc.
Support for MQTT in Spark will help people with Internet of Things (IoT) projects to use Spark Streaming for their real time data processing needs (from sensors and other embedded devices etc).
2013-10-23 15:07:59 -07:00
Tathagata Das
fe8626efd1
Merge branch 'apache-master' into transform
2013-10-22 23:40:40 -07:00
Tathagata Das
72d2e1dd77
Fixed bug in Java transformWith, added more Java testcases for transform and transformWith, added missing variations of Java join and cogroup, updated various Scala and Java API docs.
2013-10-22 23:35:51 -07:00
Matei Zaharia
731c94e91d
Merge pull request #56 from jerryshao/kafka-0.8-dev
...
Upgrade Kafka 0.7.2 to Kafka 0.8.0-beta1 for Spark Streaming
Conflicts:
streaming/pom.xml
2013-10-21 23:31:38 -07:00
Tathagata Das
0666498799
Updated TransformDStream to allow n-ary DStream transform. Added transformWith, leftOuterJoin and rightOuterJoin operations to DStream for Scala and Java APIs. Also added n-ary union and n-ary transform operations to StreamingContext for Scala and Java APIs.
2013-10-21 05:34:09 -07:00
Reynold Xin
4e44d65b5e
Exclusion rules for Maven build files.
2013-10-19 12:35:55 -07:00
Prabeesh K
d223d38933
Update MQTTInputDStream.scala
2013-10-18 09:09:49 +05:30
prabeesh
890f8fe439
modify code, use Spark Logging Class
2013-10-17 10:00:40 +05:30
prabeesh
9a7575728d
add maven dependencies for mqtt
2013-10-16 13:41:49 +05:30
prabeesh
2e48b23eae
added mqtt adapter
2013-10-16 13:36:25 +05:30
prabeesh
742ada91e0
mqttinputdstream for mqttstreaming adapter
2013-10-16 13:35:29 +05:30
Matei Zaharia
b5346064d6
Merge pull request #8 from vchekan/checkpoint-ttl-restore
...
Serialize and restore spark.cleaner.ttl to savepoint
In accordance to conversation in spark-dev maillist, preserve spark.cleaner.ttl parameter when serializing checkpoint.
2013-10-15 21:25:03 -07:00
Aaron Davidson
a395911138
Refactor BlockId into an actual type
...
This is an unfortunately invasive change which converts all of our BlockId
strings into actual BlockId types. Here are some advantages of doing this now:
+ Type safety
+ Code clarity - it's now obvious what the key of a shuffle or rdd block is,
for instance. Additionally, appearing in tuple/map type signatures is a big
readability bonus. A Seq[(String, BlockStatus)] is not very clear.
Further, we can now use more Scala features, like matching on BlockId types.
+ Explicit usage - we can now formally tell where various BlockIds are being used
(without doing string searches); this makes updating current BlockIds a much
clearer process, and compiler-supported.
(I'm looking at you, shuffle file consolidation.)
+ It will only get harder to make this change as time goes on.
Since this touches a lot of files, it'd be best to either get this patch
in quickly or throw it on the ground to avoid too many secondary merge conflicts.
2013-10-12 22:44:57 -07:00
jerryshao
c23cd72b4b
Upgrade Kafka 0.7.2 to Kafka 0.8.0-beta1 for Spark Streaming
2013-10-12 20:00:42 +08:00
Prashant Sharma
26860639c5
Merge branch 'scala-2.10' of github.com:ScrapCodes/spark into scala-2.10
...
Conflicts:
core/src/main/scala/org/apache/spark/scheduler/cluster/ClusterTaskSetManager.scala
project/SparkBuild.scala
2013-10-10 09:42:23 +05:30
Prashant Sharma
7be75682b9
Merge branch 'master' into wip-merge-master
...
Conflicts:
bagel/pom.xml
core/pom.xml
core/src/test/scala/org/apache/spark/ui/UISuite.scala
examples/pom.xml
mllib/pom.xml
pom.xml
project/SparkBuild.scala
repl/pom.xml
streaming/pom.xml
tools/pom.xml
In scala 2.10, a shorter representation is used for naming artifacts
so changed to shorter scala version for artifacts and made it a property in pom.
2013-10-08 11:29:40 +05:30
Patrick Wendell
aa9fb84994
Merging build changes in from 0.8
2013-10-05 22:07:00 -07:00
Martin Weindel
e09f4a9601
fixed some warnings
2013-10-05 23:08:23 +02:00
Prashant Sharma
5829692885
Merge branch 'master' into scala-2.10
...
Conflicts:
core/src/main/scala/org/apache/spark/ui/jobs/JobProgressUI.scala
docs/_config.yml
project/SparkBuild.scala
repl/src/main/scala/org/apache/spark/repl/SparkILoop.scala
2013-10-01 11:57:24 +05:30
Prashant Sharma
7ff4c2d399
fixed maven build for scala 2.10
2013-09-26 10:48:24 +05:30
Patrick Wendell
6079721fa1
Update build version in master
2013-09-24 11:41:51 -07:00
Prashant Sharma
276c37a51c
Akka 2.2 migration
2013-09-22 08:20:12 +05:30
Vadim Chekan
fbe40c5806
Serialize and restore spark.cleaner.ttl to savepoint
2013-09-20 12:13:48 -07:00
Prashant Sharma
6fcfefcb27
Few more fixes to tests broken during merge
2013-09-10 10:57:47 +05:30
Prashant Sharma
4106ae9fbf
Merged with master
2013-09-06 17:53:01 +05:30
Matei Zaharia
0a8cc30921
Move some classes to more appropriate packages:
...
* RDD, *RDDFunctions -> org.apache.spark.rdd
* Utils, ClosureCleaner, SizeEstimator -> org.apache.spark.util
* JavaSerializer, KryoSerializer -> org.apache.spark.serializer
2013-09-01 14:13:16 -07:00
Matei Zaharia
5701eb92c7
Fix some URLs
2013-09-01 14:13:16 -07:00
Matei Zaharia
46eecd110a
Initial work to rename package to org.apache.spark
2013-09-01 14:13:13 -07:00
Matei Zaharia
5a6ac12840
Merge pull request #701 from ScrapCodes/documentation-suggestions
...
Documentation suggestions for spark streaming.
2013-08-22 22:08:03 -07:00
Prashant Sharma
2bc348e92c
Linking custom receiver guide
2013-08-23 09:44:02 +05:30
Prashant Sharma
3049415e24
Corrections in documentation comment
2013-08-23 09:40:28 +05:30
Jey Kottalam
23f4622aff
Remove redundant dependencies from POMs
2013-08-18 18:53:57 -07:00
Jey Kottalam
ad580b94d5
Maven build now also works with YARN
2013-08-16 13:50:12 -07:00
Jey Kottalam
11b42a84db
Maven build now works with CDH hadoop-2.0.0-mr1
2013-08-16 13:50:12 -07:00
Jey Kottalam
353fab2440
Initial changes to make Maven build agnostic of hadoop version
2013-08-16 13:50:12 -07:00
Josh Rosen
d7f78b443b
Change scala.Option to Guava Optional in Java APIs.
2013-08-11 12:05:09 -07:00
Reynold Xin
c61843a69f
Changed other LZF uses to use the compression codec interface.
2013-07-31 10:32:13 -07:00
Matei Zaharia
af3c9d5042
Add Apache license headers and LICENSE and NOTICE files
2013-07-16 17:21:33 -07:00
Prashant Sharma
119c98c1be
code formatting, The warning related to scope exit and enter is not worth fixing as it only affects debugging scopes and nothing else.
2013-07-16 15:01:33 +05:30
Prashant Sharma
55da6e9504
Fixed warning erasure -> runtimeClass
2013-07-16 14:37:08 +05:30
Prashant Sharma
ff14f38f3d
Fixed warning Throwables
2013-07-16 14:34:56 +05:30
Prashant Sharma
63addd93a8
Fixed warning ClassManifest -> ClassTag
2013-07-16 14:09:52 +05:30
Prashant Sharma
e86d5dbaad
Merge branch 'master' into master-merge
...
Conflicts:
README.md
core/pom.xml
core/src/main/scala/spark/deploy/JsonProtocol.scala
core/src/main/scala/spark/deploy/LocalSparkCluster.scala
core/src/main/scala/spark/deploy/master/Master.scala
core/src/main/scala/spark/deploy/master/MasterWebUI.scala
core/src/main/scala/spark/deploy/worker/Worker.scala
core/src/main/scala/spark/deploy/worker/WorkerWebUI.scala
core/src/main/scala/spark/storage/BlockManagerUI.scala
core/src/main/scala/spark/util/AkkaUtils.scala
pom.xml
project/SparkBuild.scala
streaming/src/main/scala/spark/streaming/receivers/ActorReceiver.scala
2013-07-12 14:49:16 +05:30
Matei Zaharia
7dcda9ae74
Merge pull request #688 from markhamstra/scalaDependencies
...
Fixed SPARK-795 with explicit dependencies
2013-07-08 23:24:23 -07:00
Mark Hamstra
0b39d66f3f
pom cleanup
2013-07-08 16:07:09 -07:00
Mark Hamstra
afdaf430bd
Explicit dependencies for scala-library and scalap to prevent 2.9.2 vs. 2.9.3 problems
2013-07-08 15:40:50 -07:00
Shivaram Venkataraman
3350ad0d7f
Catch RejectedExecution exception in Checkpoint handler.
2013-07-07 04:09:37 -07:00
Matei Zaharia
1ffadb2d9e
Merge remote-tracking branch 'pwendell/ui-updates'
...
Conflicts:
core/src/main/scala/spark/scheduler/DAGScheduler.scala
core/src/main/scala/spark/util/AkkaUtils.scala
pom.xml
2013-07-06 15:51:41 -07:00
Matei Zaharia
94871e4703
Merge pull request #655 from tgravescs/master
...
Add support for running Spark on Yarn on a secure Hadoop Cluster
2013-07-06 15:26:19 -07:00
Tathagata Das
280418ac45
Reduced the number of Iterator to ArrayBuffer copies in NetworkReceiver.
2013-07-05 21:38:21 -07:00
Prashant Sharma
a5f1f6a907
Merge branch 'master' into master-merge
...
Conflicts:
core/pom.xml
core/src/main/scala/spark/MapOutputTracker.scala
core/src/main/scala/spark/RDD.scala
core/src/main/scala/spark/RDDCheckpointData.scala
core/src/main/scala/spark/SparkContext.scala
core/src/main/scala/spark/Utils.scala
core/src/main/scala/spark/api/python/PythonRDD.scala
core/src/main/scala/spark/deploy/client/Client.scala
core/src/main/scala/spark/deploy/master/MasterWebUI.scala
core/src/main/scala/spark/deploy/worker/Worker.scala
core/src/main/scala/spark/deploy/worker/WorkerWebUI.scala
core/src/main/scala/spark/rdd/BlockRDD.scala
core/src/main/scala/spark/rdd/ZippedRDD.scala
core/src/main/scala/spark/scheduler/cluster/StandaloneSchedulerBackend.scala
core/src/main/scala/spark/storage/BlockManager.scala
core/src/main/scala/spark/storage/BlockManagerMaster.scala
core/src/main/scala/spark/storage/BlockManagerMasterActor.scala
core/src/main/scala/spark/storage/BlockManagerUI.scala
core/src/main/scala/spark/util/AkkaUtils.scala
core/src/test/scala/spark/SizeEstimatorSuite.scala
pom.xml
project/SparkBuild.scala
repl/src/main/scala/spark/repl/SparkILoop.scala
repl/src/test/scala/spark/repl/ReplSuite.scala
streaming/src/main/scala/spark/streaming/StreamingContext.scala
streaming/src/main/scala/spark/streaming/api/java/JavaStreamingContext.scala
streaming/src/main/scala/spark/streaming/dstream/KafkaInputDStream.scala
streaming/src/main/scala/spark/streaming/util/MasterFailureTest.scala
2013-07-03 11:43:26 +05:30
Y.CORP.YAHOO.COM\tgraves
923cf92900
Rework from pull request. Removed --user option from Spark on Yarn Client, made the user of JAVA_HOME environment
...
variable conditional on if its set, and created addCredentials in each of the SparkHadoopUtil classes
to only add the credentials when the profile is hadoop2-yarn.
2013-07-02 21:18:59 -05:00
Matei Zaharia
4358acfe07
Initialize Twitter4J OAuth from system properties instead of prompting
2013-06-29 15:25:06 -07:00
Matei Zaharia
1667158544
Merge remote-tracking branch 'mrpotes/master'
2013-06-29 14:36:09 -07:00
Patrick Wendell
362d996c81
Handful of changes based on matei's review
...
- Avoid exception when no tasks have finished for a stage
- Adding DOCTYPE so css renders properly
- Adding progress slider
2013-06-27 19:14:28 -07:00
James Phillpotts
366572edca
Include a default OAuth implementation, and update examples and JavaStreamingContext
2013-06-25 22:59:34 +01:00
Tathagata Das
c89af0a7f9
Merge branch 'master' into streaming
...
Conflicts:
.gitignore
2013-06-24 23:57:47 -07:00
Tathagata Das
48c7e373c6
Minor formatting fixes
2013-06-24 23:11:04 -07:00
Tathagata Das
1249e9153b
Merge pull request #572 from Reinvigorate/sm-block-interval
...
Adding spark.streaming.blockInterval property
2013-06-24 21:46:33 -07:00
Tathagata Das
cfcda95f86
Merge pull request #571 from Reinvigorate/sm-kafka-serializers
...
Surfacing decoders on KafkaInputDStream
2013-06-24 21:44:50 -07:00
James Phillpotts
8955787a59
Twitter API v1 is retired - username/password auth no longer possible
2013-06-24 09:15:17 +01:00
James Phillpotts
93a1643405
Allow other twitter authorizations than username/password
2013-06-21 14:21:52 +01:00
Thomas Graves
75d78c7ac9
Add support for Spark on Yarn on a secure Hadoop cluster
2013-06-19 11:18:42 -05:00
Jey Kottalam
e7982c798e
Exclude old versions of Netty from Maven-based build
2013-05-18 21:24:58 -07:00
seanm
f25282def5
fixing kafkaStream Java API and adding test
2013-05-10 17:34:28 -06:00
seanm
3632980b1b
fixing indentation
2013-05-10 15:54:26 -06:00
seanm
b95c1bdbba
count() now uses a transform instead of ConstantInputDStream
2013-05-10 12:47:24 -06:00
seanm
d761e7359d
adding kafkaStream API tests
2013-05-10 12:05:10 -06:00
Reynold Xin
90577ada69
Merge branch 'shuffle-performance-fix-0.7' of github.com:shane-huang/spark into shufflemerge
...
Conflicts:
core/src/main/scala/spark/storage/BlockManager.scala
core/src/main/scala/spark/storage/DiskStore.scala
project/SparkBuild.scala
2013-05-07 15:56:19 -07:00
Prashant Sharma
4041a2689e
Updated to latest stable scala 2.10.1 and akka 2.1.2
2013-05-01 11:35:35 +05:30
Prashant Sharma
24bbf318b3
Fixied other warnings
2013-04-29 19:56:28 +05:30
Prashant Sharma
d3518f57cd
Fixed warning: erasure -> runtimeClass
2013-04-29 18:14:25 +05:30
Prashant Sharma
8f3ac240cb
Fixed Warning: ClassManifest -> ClassTag
2013-04-29 16:39:13 +05:30
Prashant Sharma
4b4a36ea7d
Fixed pom.xml with updated dependencies.
2013-04-29 12:55:43 +05:30
Mridul Muralidharan
430c531464
Remove debug statements
2013-04-29 00:24:30 +05:30
Mridul Muralidharan
3a89a76b87
Make log message more descriptive to aid in debugging
2013-04-29 00:04:12 +05:30
Mridul Muralidharan
7fa6978a1e
Allow CheckpointWriter pending tasks to finish
2013-04-28 23:08:10 +05:30
Mridul Muralidharan
afee902443
Attempt to fix streaming test failures after yarn branch merge
2013-04-28 22:26:45 +05:30
Prashant Sharma
bb4102b0ee
Fixed breaking tests in streaming checkpoint suite. Changed RichInt to Int as it is final and not serializable
2013-04-25 14:38:01 +05:30
Prashant Sharma
ad88f083a6
scala 2.10 and master merge
2013-04-24 18:08:26 +05:30
Mridul Muralidharan
dd515ca3ee
Attempt at fixing merge conflict
2013-04-24 09:24:17 +05:30
seanm
7e56e99573
Surfacing decoders on KafkaInputDStream
2013-04-16 17:17:16 -06:00
seanm
ab0f834dbb
adding spark.streaming.blockInterval property
2013-04-16 11:57:05 -06:00
seanm
b42d68c8ce
fixing Spark Streaming count() so that 0 will be emitted when there is nothing to count
2013-04-15 12:54:55 -06:00
Matei Zaharia
65caa8f711
Merge remote-tracking branch 'jey/bump-development-version-to-0.8.0'
...
Conflicts:
docs/_config.yml
project/SparkBuild.scala
2013-04-08 12:43:17 -04:00
Mridul Muralidharan
6798a09df8
Add support for building against hadoop2-yarn : adding new maven profile for it
2013-04-07 17:47:38 +05:30
shane-huang
df47b40b76
Shuffle Performance fix: Use netty embeded OIO file server instead of ConnectionManager
...
Shuffle Performance Optimization: do not send 0-byte block requests to reduce network messages
change reference from io.Source to scala.io.Source to avoid looking into io.netty package
Signed-off-by: shane-huang <shengsheng.huang@intel.com>
2013-04-07 14:37:12 +08:00
Jey Kottalam
bc8ba222ff
Bump development version to 0.8.0
2013-03-28 15:42:01 -07:00
Jey Kottalam
b569b3f200
Move streaming test initialization into 'before' blocks
2013-03-28 15:08:41 -07:00
seanm
329ef34c2e
fixing autooffset.reset behavior when set to 'largest'
2013-03-26 23:56:15 -06:00
Holden Karau
1f5381119f
method first in trait IterableLike is deprecated: use `head' instead
2013-03-24 19:19:40 -07:00
seanm
d61978d0ab
keeping JavaStreamingContext in sync with StreamingContext + adding comments for better clarity
2013-03-15 23:36:52 -06:00
Mikhail Bautin
7fd2708eda
Add a log4j compile dependency to fix build in IntelliJ
...
Also rename parent project to spark-parent (otherwise it shows up as
"parent" in IntelliJ, which is very confusing).
2013-03-15 11:41:51 -07:00
seanm
33fa1e7e4a
removing dependency on ZookeeperConsumerConnector + purging last relic of kafka reliability that never solidified (ie- setOffsets)
2013-03-15 00:10:13 -06:00
seanm
d069283211
fixing memory leak in kafka MessageHandler
2013-03-14 23:45:33 -06:00
seanm
cfa8e769a8
KafkaInputDStream improvements. Allows more Kafka configurability
2013-03-14 23:45:19 -06:00
Stephen Haberman
0cf320485d
Forgot equals.
2013-03-12 00:05:35 -05:00
Stephen Haberman
9e68f48625
More quickly call close in HadoopRDD.
...
This also refactors out the common "gotNext" iterator pattern into
a shared utility class.
2013-03-11 23:59:17 -05:00
Mark Hamstra
b409073102
Instead of failing to bind to a fixed, already-in-use port, let the OS choose an available port for TestServer.
2013-03-01 15:05:07 -08:00