Tathagata Das
aa99f226a6
Removed XYZFunctions and added XYZUtils as a common Scala and Java interface for creating XYZ streams.
2014-01-07 01:56:15 -08:00
Sean Owen
4b92a20232
Issue #318 : minor style updates per review from Reynold Xin
2014-01-07 09:38:45 +00:00
prabeesh
a91f14cfdc
spark -> org.apache.spark
2014-01-07 12:21:20 +05:30
Patrick Wendell
c0498f9265
Merge remote-tracking branch 'apache-github/master' into standalone-driver
...
Conflicts:
core/src/main/scala/org/apache/spark/deploy/client/AppClient.scala
core/src/main/scala/org/apache/spark/deploy/client/TestClient.scala
core/src/main/scala/org/apache/spark/deploy/master/Master.scala
core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala
core/src/main/scala/org/apache/spark/scheduler/cluster/SparkDeploySchedulerBackend.scala
2014-01-06 17:29:21 -08:00
Sean Owen
7379b2915f
Merge remote-tracking branch 'upstream/master'
2014-01-06 15:13:16 +00:00
Tathagata Das
3b4c4c7f4d
Merge remote-tracking branch 'apache/master' into project-refactor
...
Conflicts:
examples/src/main/java/org/apache/spark/streaming/examples/JavaFlumeEventCount.java
streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala
streaming/src/main/scala/org/apache/spark/streaming/api/java/JavaStreamingContext.scala
streaming/src/test/java/org/apache/spark/streaming/JavaAPISuite.java
streaming/src/test/scala/org/apache/spark/streaming/InputStreamsSuite.scala
streaming/src/test/scala/org/apache/spark/streaming/TestSuiteBase.scala
2014-01-06 03:05:52 -08:00
Tathagata Das
d0fd3b9ad2
Changed JavaStreamingContextWith*** to ***Function in streaming.api.java.*** package. Also fixed packages of Flume and MQTT tests.
2014-01-06 01:47:53 -08:00
Patrick Wendell
79f52809c8
Removing SPARK_EXAMPLES_JAR in the code
2014-01-05 11:49:42 -08:00
Reza Zadeh
06c0f7628a
use SparseMatrix everywhere
2014-01-04 14:28:07 -08:00
Reza Zadeh
e9bd6cb51d
new example file
2014-01-04 12:33:22 -08:00
Tathagata Das
a1b8dd53e3
Added StreamingContext.getOrCreate to for automatic recovery, and added RecoverableNetworkWordCount example to use it.
2014-01-02 19:07:22 -08:00
Sean Owen
66d501276b
Suggested small changes to Java code for slightly more standard style, encapsulation and in some cases performance
2014-01-02 16:17:57 +00:00
Prashant Sharma
94b7a7fe37
run-example -> bin/run-example
2014-01-02 18:41:21 +05:30
Tathagata Das
f4e4066191
Refactored kafka, flume, zeromq, mqtt as separate external projects, with their own self-contained scala API, java API, scala unit tests and java unit tests. Updated examples to use the external projects.
2013-12-30 11:13:24 -08:00
Matei Zaharia
b4ceed40d6
Merge remote-tracking branch 'origin/master' into conf2
...
Conflicts:
core/src/main/scala/org/apache/spark/SparkContext.scala
core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala
core/src/main/scala/org/apache/spark/scheduler/cluster/ClusterTaskSetManager.scala
core/src/main/scala/org/apache/spark/scheduler/local/LocalScheduler.scala
core/src/main/scala/org/apache/spark/util/MetadataCleaner.scala
core/src/test/scala/org/apache/spark/scheduler/TaskResultGetterSuite.scala
core/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala
new-yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala
streaming/src/main/scala/org/apache/spark/streaming/api/java/JavaStreamingContext.scala
streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobGenerator.scala
streaming/src/test/scala/org/apache/spark/streaming/BasicOperationsSuite.scala
streaming/src/test/scala/org/apache/spark/streaming/CheckpointSuite.scala
streaming/src/test/scala/org/apache/spark/streaming/InputStreamsSuite.scala
streaming/src/test/scala/org/apache/spark/streaming/TestSuiteBase.scala
streaming/src/test/scala/org/apache/spark/streaming/WindowOperationsSuite.scala
2013-12-29 15:08:08 -05:00
Matei Zaharia
642029e7f4
Various fixes to configuration code
...
- Got rid of global SparkContext.globalConf
- Pass SparkConf to serializers and compression codecs
- Made SparkConf public instead of private[spark]
- Improved API of SparkContext and SparkConf
- Switched executor environment vars to be passed through SparkConf
- Fixed some places that were still using system properties
- Fixed some tests, though others are still failing
This still fails several tests in core, repl and streaming, likely due
to properties not being set or cleared correctly (some of the tests run
fine in isolation).
2013-12-28 17:13:15 -05:00
Tathagata Das
6e43039614
Refactored streaming project to separate out the twitter functionality.
2013-12-26 18:02:49 -08:00
Patrick Wendell
c9c0f745af
Minor style clean-up
2013-12-25 01:19:25 -08:00
Patrick Wendell
760823d393
Adding better option parsing
2013-12-25 01:19:01 -08:00
Prashant Sharma
2573add94c
spark-544, introducing SparkConf and related configuration overhaul.
2013-12-25 00:09:36 +05:30
azuryyu
a8bb86389d
Fixed job name in the java streaming example.
2013-12-24 16:52:20 +08:00
Reynold Xin
6bcac986b2
Merge branch 'master' of github.com:apache/incubator-spark
2013-11-25 15:47:47 +08:00
Prashant Sharma
95d8dbce91
Merge branch 'master' of github.com:apache/incubator-spark into scala-2.10-temp
...
Conflicts:
core/src/main/scala/org/apache/spark/util/collection/PrimitiveVector.scala
streaming/src/main/scala/org/apache/spark/streaming/api/java/JavaStreamingContext.scala
2013-11-21 12:34:46 +05:30
Prashant Sharma
199e9cf02d
Merge branch 'scala210-master' of github.com:colorant/incubator-spark into scala-2.10
...
Conflicts:
core/src/main/scala/org/apache/spark/deploy/client/Client.scala
core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala
core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala
core/src/test/scala/org/apache/spark/MapOutputTrackerSuite.scala
2013-11-21 11:55:48 +05:30
Henry Saputra
43dfac5132
Merge branch 'master' into removesemicolonscala
2013-11-19 16:57:57 -08:00
Henry Saputra
9c934b640f
Remove the semicolons at the end of Scala code to make it more pure Scala code.
...
Also remove unused imports as I found them along the way.
Remove return statements when returning value in the Scala code.
Passing compile and tests.
2013-11-19 10:19:03 -08:00
Aaron Davidson
50fd8d98c0
Enable the Broadcast examples to work in a cluster setting
...
Since they rely on println to display results, we need to first collect
those results to the driver to have them actually display locally.
2013-11-18 22:51:35 -08:00
Raymond Liu
0f2e3c6e31
Merge branch 'master' into scala-2.10
2013-11-13 16:55:11 +08:00
Prashant Sharma
6860b79f6e
Remove deprecated actorFor and use actorSelection everywhere.
2013-11-12 12:43:53 +05:30
Reynold Xin
551a43fd3d
Merge branch 'master' of github.com:apache/incubator-spark into mergemerge
...
Conflicts:
README.md
core/src/main/scala/org/apache/spark/util/collection/OpenHashMap.scala
core/src/main/scala/org/apache/spark/util/collection/OpenHashSet.scala
core/src/main/scala/org/apache/spark/util/collection/PrimitiveKeyOpenHashMap.scala
2013-11-04 21:02:36 -08:00
Ankur Dave
5064f9b2d2
Merge remote-tracking branch 'spark-upstream/master'
...
Conflicts:
project/SparkBuild.scala
2013-10-30 15:59:09 -07:00
tgravescs
e5e0ebdb11
fix sparkhdfs lr test
2013-10-29 20:12:45 -05:00
Ali Ghodsi
05a0df2b9e
Makes Spark SIMR ready.
2013-10-24 11:59:51 -07:00
Matei Zaharia
dd659642e7
Merge pull request #64 from prabeesh/master
...
MQTT Adapter for Spark Streaming
MQTT is a machine-to-machine (M2M)/Internet of Things connectivity protocol.
It was designed as an extremely lightweight publish/subscribe messaging transport. You may read more about it here http://mqtt.org/
Message Queue Telemetry Transport (MQTT) is an open message protocol for M2M communications. It enables the transfer of telemetry-style data in the form of messages from devices like sensors and actuators, to mobile phones, embedded systems on vehicles, or laptops and full scale computers.
The protocol was invented by Andy Stanford-Clark of IBM, and Arlen Nipper of Cirrus Link Solutions
This protocol enables a publish/subscribe messaging model in an extremely lightweight way. It is useful for connections with remote locations where line of code and network bandwidth is a constraint.
MQTT is one of the widely used protocol for 'Internet of Things'. This protocol is getting much attraction as anything and everything is getting connected to internet and they all produce data. Researchers and companies predict some 25 billion devices will be connected to the internet by 2015.
Plugin/Support for MQTT is available in popular MQs like RabbitMQ, ActiveMQ etc.
Support for MQTT in Spark will help people with Internet of Things (IoT) projects to use Spark Streaming for their real time data processing needs (from sensors and other embedded devices etc).
2013-10-23 15:07:59 -07:00
Matei Zaharia
731c94e91d
Merge pull request #56 from jerryshao/kafka-0.8-dev
...
Upgrade Kafka 0.7.2 to Kafka 0.8.0-beta1 for Spark Streaming
Conflicts:
streaming/pom.xml
2013-10-21 23:31:38 -07:00
Prabeesh K
9ca1bd9530
Update MQTTWordCount.scala
2013-10-22 09:05:57 +05:30
Prabeesh K
dbafa11396
Update MQTTWordCount.scala
2013-10-22 08:50:34 +05:30
Joseph E. Gonzalez
1856b37e9d
Merge branch 'master' of https://github.com/apache/incubator-spark into indexedrdd_graphx
2013-10-18 12:21:19 -07:00
Prabeesh K
6ec39829e9
Update MQTTWordCount.scala
2013-10-18 17:00:28 +05:30
Mosharaf Chowdhury
e96bd0068f
BroadcastTest2 --> BroadcastTest
2013-10-16 21:33:33 -07:00
Mosharaf Chowdhury
feb45d391f
Default blockSize is 4MB.
...
BroadcastTest2 example added for testing broadcasts.
2013-10-16 21:33:33 -07:00
prabeesh
9eaf68fd40
added mqtt adapter wordcount example
2013-10-16 13:40:38 +05:30
Patrick Wendell
35befe07bb
Fixing spark streaming example and a bug in examples build.
...
- Examples assembly included a log4j.properties which clobbered Spark's
- Example had an error where some classes weren't serializable
- Did some other clean-up in this example
2013-10-15 22:55:43 -07:00
Joseph E. Gonzalez
ef7c369092
merged with upstream changes
2013-10-14 22:56:42 -07:00
jerryshao
c23cd72b4b
Upgrade Kafka 0.7.2 to Kafka 0.8.0-beta1 for Spark Streaming
2013-10-12 20:00:42 +08:00
Neal Wiggins
67d4a31f87
Remove unnecessary mutable imports
2013-10-11 09:47:27 -07:00
Martin Weindel
e09f4a9601
fixed some warnings
2013-10-05 23:08:23 +02:00
Prashant Sharma
276c37a51c
Akka 2.2 migration
2013-09-22 08:20:12 +05:30
Joseph E. Gonzalez
8b59fb72c4
Merging latest changes from spark main branch
2013-09-17 20:56:12 -07:00
Prashant Sharma
4106ae9fbf
Merged with master
2013-09-06 17:53:01 +05:30
Matei Zaharia
12b2f1f9c9
Add missing license headers found with RAT
2013-09-02 12:23:03 -07:00
Matei Zaharia
0a8cc30921
Move some classes to more appropriate packages:
...
* RDD, *RDDFunctions -> org.apache.spark.rdd
* Utils, ClosureCleaner, SizeEstimator -> org.apache.spark.util
* JavaSerializer, KryoSerializer -> org.apache.spark.serializer
2013-09-01 14:13:16 -07:00
Matei Zaharia
46eecd110a
Initial work to rename package to org.apache.spark
2013-09-01 14:13:13 -07:00
Matei Zaharia
aab345c463
Fix finding of assembly JAR, as well as some pointers to ./run
2013-08-29 21:19:06 -07:00
Matei Zaharia
53cd50c069
Change build and run instructions to use assemblies
...
This commit makes Spark invocation saner by using an assembly JAR to
find all of Spark's dependencies instead of adding all the JARs in
lib_managed. It also packages the examples into an assembly and uses
that as SPARK_EXAMPLES_JAR. Finally, it replaces the old "run" script
with two better-named scripts: "run-examples" for examples, and
"spark-class" for Spark internal classes (e.g. REPL, master, etc). This
is also designed to minimize the confusion people have in trying to use
"run" to run their own classes; it's not meant to do that, but now at
least if they look at it, they can modify run-examples to do a decent
job for them.
As part of this, Bagel's examples are also now properly moved to the
examples package instead of bagel.
2013-08-29 21:19:04 -07:00
Jey Kottalam
4f43fd791a
make SparkHadoopUtil a member of SparkEnv
2013-08-15 16:50:37 -07:00
Evan Sparks
ff9ebfabb4
Merge pull request #762 from shivaram/sgd-cleanup
...
Refactor SGD options into a new class.
2013-08-11 10:52:55 -07:00
Alexander Pivovarov
2d97cc46af
Fixed path to JavaALS.java and JavaKMeans.java, fixed hadoop2-yarn profile
2013-08-10 23:04:50 -07:00
Matei Zaharia
4c4f769187
Optimize Scala PageRank to use reduceByKey
2013-08-10 18:09:54 -07:00
Matei Zaharia
06e4f2a8f2
Merge pull request #789 from MLnick/master
...
Adding Scala version of PageRank example
2013-08-10 18:06:23 -07:00
Matei Zaharia
cd247ba5bb
Merge pull request #786 from shivaram/mllib-java
...
Java fixes, tests and examples for ALS, KMeans
2013-08-09 20:41:13 -07:00
Matei Zaharia
06303a62e5
Optimize JavaPageRank to use reduceByKey instead of groupByKey
2013-08-08 18:50:00 -07:00
Shivaram Venkataraman
2812e72200
Add setters for optimizer, gradient in SGD.
...
Also remove java-specific constructor for LabeledPoint.
2013-08-08 16:24:31 -07:00
Shivaram Venkataraman
e1a209f791
Remove Java-specific constructor for Rating.
...
The scala constructor works for native type java types. Modify examples
to match this.
2013-08-08 14:36:02 -07:00
Nick Pentreath
c4eea875ac
Style changes as per Matei's comments
2013-08-08 12:40:37 +02:00
Nick Pentreath
cce758b893
Adding Scala version of PageRank example
2013-08-07 16:38:52 +02:00
Shivaram Venkataraman
338b7a7455
Merge branch 'master' of git://github.com/mesos/spark into sgd-cleanup
...
Conflicts:
mllib/src/main/scala/spark/mllib/util/MLUtils.scala
2013-08-06 21:21:55 -07:00
Shivaram Venkataraman
7db69d56f2
Refactor GLM algorithms and add Java tests
...
This change adds Java examples and unit tests for all GLM algorithms
to make sure the MLLib interface works from Java. Changes include
- Introduce LabeledPoint and avoid using Doubles in train arguments
- Rename train to run in class methods
- Make the optimizer a member variable of GLM to make sure the builder
pattern works
2013-08-06 17:23:22 -07:00
Shivaram Venkataraman
471fbadd0c
Java examples, tests for KMeans and ALS
...
- Changes ALS to accept RDD[Rating] instead of (Int, Int, Double) making it
easier to call from Java
- Renames class methods from `train` to `run` to enable static methods to be
called from Java.
- Add unit tests which check if both static / class methods can be called.
- Also add examples which port the main() function in ALS, KMeans to the
examples project.
Couple of minor changes to existing code:
- Add a toJavaRDD method in RDD to convert scala RDD to java RDD easily
- Workaround a bug where using double[] from Java leads to class cast exception in
KMeans init
2013-08-06 15:43:46 -07:00
stayhf
882baee489
Got rid of unnecessary map function
2013-08-06 21:34:39 +00:00
stayhf
326a7a82e0
changes as reviewer requested
2013-08-06 21:03:24 +00:00
stayhf
98fd62605d
Updated code with reviewer's suggestions
2013-08-05 00:30:28 +00:00
stayhf
a682637301
Simple PageRank algorithm implementation in Java for SPARK-760
2013-08-03 06:01:16 +00:00
Matei Zaharia
af3c9d5042
Add Apache license headers and LICENSE and NOTICE files
2013-07-16 17:21:33 -07:00
Prashant Sharma
a5f1f6a907
Merge branch 'master' into master-merge
...
Conflicts:
core/pom.xml
core/src/main/scala/spark/MapOutputTracker.scala
core/src/main/scala/spark/RDD.scala
core/src/main/scala/spark/RDDCheckpointData.scala
core/src/main/scala/spark/SparkContext.scala
core/src/main/scala/spark/Utils.scala
core/src/main/scala/spark/api/python/PythonRDD.scala
core/src/main/scala/spark/deploy/client/Client.scala
core/src/main/scala/spark/deploy/master/MasterWebUI.scala
core/src/main/scala/spark/deploy/worker/Worker.scala
core/src/main/scala/spark/deploy/worker/WorkerWebUI.scala
core/src/main/scala/spark/rdd/BlockRDD.scala
core/src/main/scala/spark/rdd/ZippedRDD.scala
core/src/main/scala/spark/scheduler/cluster/StandaloneSchedulerBackend.scala
core/src/main/scala/spark/storage/BlockManager.scala
core/src/main/scala/spark/storage/BlockManagerMaster.scala
core/src/main/scala/spark/storage/BlockManagerMasterActor.scala
core/src/main/scala/spark/storage/BlockManagerUI.scala
core/src/main/scala/spark/util/AkkaUtils.scala
core/src/test/scala/spark/SizeEstimatorSuite.scala
pom.xml
project/SparkBuild.scala
repl/src/main/scala/spark/repl/SparkILoop.scala
repl/src/test/scala/spark/repl/ReplSuite.scala
streaming/src/main/scala/spark/streaming/StreamingContext.scala
streaming/src/main/scala/spark/streaming/api/java/JavaStreamingContext.scala
streaming/src/main/scala/spark/streaming/dstream/KafkaInputDStream.scala
streaming/src/main/scala/spark/streaming/util/MasterFailureTest.scala
2013-07-03 11:43:26 +05:30
Matei Zaharia
ccfe953a4d
Merge pull request #577 from skumargithub/master
...
Example of cumulative counting using updateStateByKey
2013-06-29 17:57:53 -07:00
Matei Zaharia
1667158544
Merge remote-tracking branch 'mrpotes/master'
2013-06-29 14:36:09 -07:00
James Phillpotts
176193b1e8
Fix usage and parameter extraction
2013-06-25 23:06:15 +01:00
James Phillpotts
366572edca
Include a default OAuth implementation, and update examples and JavaStreamingContext
2013-06-25 22:59:34 +01:00
Tathagata Das
c89af0a7f9
Merge branch 'master' into streaming
...
Conflicts:
.gitignore
2013-06-24 23:57:47 -07:00
Matei Zaharia
dbfab49d2a
Merge remote-tracking branch 'milliondreams/casdemo'
...
Conflicts:
project/SparkBuild.scala
2013-06-18 14:55:31 +02:00
Matei Zaharia
b7794813b1
Fix run script on Windows for Scala 2.10
2013-06-15 09:37:13 -07:00
Rohit Rai
b5b12823fa
Fixing the style as per feedback
2013-06-13 14:05:46 +05:30
Rohit Rai
b104c7f5c7
Example to write the output to cassandra
2013-06-03 15:15:52 +05:30
Rohit Rai
56c64c4033
A better way to read column value if you are sure the column exists in every row.
2013-06-03 12:48:35 +05:30
Rohit Rai
81c2adc15c
Removing infix call
2013-06-02 12:51:15 +05:30
Rohit Rai
3be7bdcefd
Adding example to make Spark RDD from Cassandra
2013-06-01 19:32:17 +05:30
Ethan Jewett
ee6f6aa6cd
Add hBase example
2013-05-09 18:33:38 -05:00
Reynold Xin
012c9e5ab0
Revert "Merge pull request #596 from esjewett/master" because the
...
dependency on hbase introduces netty-3.2.2 which conflicts with
netty-3.5.3 already in Spark. This caused multiple test failures.
This reverts commit 0f1b7a06e1
, reversing
changes made to aacca1b8a8
.
2013-05-09 14:20:01 -07:00
Ethan Jewett
a3d5f92210
Switch to using SparkContext method to create RDD
2013-05-07 11:43:06 -05:00
unknown
cbf6a5ee1e
Removed unused code, clarified intent of the program, batch size to 1 second
2013-05-06 08:05:45 -06:00
Ethan Jewett
7cff7e7897
Fix indents and mention other configuration options
2013-05-04 14:56:55 -05:00
Ethan Jewett
9290f16430
Remove unnecessary column family config
2013-05-04 12:39:14 -05:00
Ethan Jewett
02e8cfa617
HBase example
2013-05-04 12:31:30 -05:00
unknown
1d54401d7e
Modified as per TD's suggestions
2013-04-30 23:01:32 -06:00
Prashant Sharma
8f3ac240cb
Fixed Warning: ClassManifest -> ClassTag
2013-04-29 16:39:13 +05:30
Mridul Muralidharan
dd515ca3ee
Attempt at fixing merge conflict
2013-04-24 09:24:17 +05:30
unknown
0dc1e2d60f
Examaple of cumulative counting using updateStateByKey
2013-04-22 09:22:45 -06:00
Mridul Muralidharan
7acab3ab45
Fix review comments, add a new api to SparkHadoopUtil to create appropriate Configuration. Modify an example to show how to use SplitInfo
2013-04-22 08:01:13 +05:30
seanm
7e56e99573
Surfacing decoders on KafkaInputDStream
2013-04-16 17:17:16 -06:00