Henry Saputra
91a563608e
Merge branch 'master' into remove_simpleredundantreturn_scala
2014-01-12 10:34:13 -08:00
Henry Saputra
93a65e5fde
Remove simple redundant return statement for Scala methods/functions:
...
-) Only change simple return statements at the end of method
-) Ignore the complex if-else check
-) Ignore the ones inside synchronized
2014-01-12 10:30:04 -08:00
Reza Zadeh
f324d53555
Merge remote-tracking branch 'upstream/master' into sparsesvd
2014-01-11 13:27:15 -08:00
Reza Zadeh
1afdeaeb2f
add dimension parameters to example
2014-01-10 21:30:54 -08:00
Tathagata Das
4f39e79c23
Merge remote-tracking branch 'apache/master' into driver-test
...
Conflicts:
streaming/src/main/scala/org/apache/spark/streaming/DStreamGraph.scala
2014-01-10 15:47:01 -08:00
Tathagata Das
e4bb845238
Updated docs based on Patrick's comments in PR 383.
2014-01-10 12:17:09 -08:00
Reza Zadeh
21c8a54c08
Merge remote-tracking branch 'upstream/master' into sparsesvd
...
Conflicts:
docs/mllib-guide.md
2014-01-09 22:45:32 -08:00
Reza Zadeh
cf5bd4ab2e
fix example
2014-01-09 22:39:41 -08:00
Patrick Wendell
997c830e0b
Merge pull request #363 from pwendell/streaming-logs
...
Set default logging to WARN for Spark streaming examples.
This programatically sets the log level to WARN by default for streaming
tests. If the user has already specified a log4j.properties file,
the user's file will take precedence over this default.
2014-01-09 22:22:20 -08:00
Patrick Wendell
7b748b83a1
Minor clean-up
2014-01-09 20:42:48 -08:00
Tathagata Das
f1d206c6b4
Merge branch 'standalone-driver' into driver-test
...
Conflicts:
core/src/main/scala/org/apache/spark/SparkContext.scala
core/src/main/scala/org/apache/spark/deploy/worker/DriverRunner.scala
examples/src/main/java/org/apache/spark/streaming/examples/JavaNetworkWordCount.java
streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala
streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala
streaming/src/main/scala/org/apache/spark/streaming/api/java/JavaStreamingContext.scala
streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobGenerator.scala
2014-01-09 15:06:24 -08:00
Tathagata Das
6f713e2a3e
Changed the way StreamingContext finds and reads checkpoint files, and added JavaStreamingContext.getOrCreate.
2014-01-09 13:42:04 -08:00
Ankur Dave
3b2e22e2c3
Revert changes to examples/.../PageRankUtils.scala
...
Reverts to 04d83fc37f9eef89c20331c85291a0a169f75e6d:examples/src/main/scala/org/apache/spark/examples/bagel/PageRankUtils.scala.
2014-01-09 13:27:40 -08:00
Patrick Wendell
35f80da21a
Set default logging to WARN for Spark streaming examples.
...
This programatically sets the log level to WARN by default for streaming
tests. If the user has already specified a log4j.properties file,
the user's file will take precedence over this default.
2014-01-09 10:42:58 -08:00
Ankur Dave
91227566bc
Merge remote-tracking branch 'spark-upstream/master' into HEAD
...
Conflicts:
README.md
core/src/main/scala/org/apache/spark/util/collection/OpenHashMap.scala
core/src/main/scala/org/apache/spark/util/collection/OpenHashSet.scala
core/src/main/scala/org/apache/spark/util/collection/PrimitiveKeyOpenHashMap.scala
pom.xml
project/SparkBuild.scala
repl/src/main/scala/org/apache/spark/repl/SparkILoop.scala
2014-01-08 21:19:08 -08:00
Patrick Wendell
bc81ce040d
Merge remote-tracking branch 'apache-github/master' into standalone-driver
...
Conflicts:
core/src/test/scala/org/apache/spark/deploy/JsonProtocolSuite.scala
pom.xml
2014-01-08 00:38:31 -08:00
Patrick Wendell
c0f0155eca
Merge pull request #313 from tdas/project-refactor
...
Refactored the streaming project to separate external libraries like Twitter, Kafka, Flume, etc.
At a high level, these are the following changes.
1. All the external code was put in `SPARK_HOME/external/` as separate SBT projects and Maven modules. Their artifact names are `spark-streaming-twitter`, `spark-streaming-kafka`, etc. Both SparkBuild.scala and pom.xml files have been updated. References to external libraries and repositories have been removed from the settings of root and streaming projects/modules.
2. To avail the external functionality (say, creating a Twitter stream), the developer has to `import org.apache.spark.streaming.twitter._` . For Scala API, the developer has to call `TwitterUtils.createStream(streamingContext, ...)`. For the Java API, the developer has to call `TwitterUtils.createStream(javaStreamingContext, ...)`.
3. Each external project has its own scala and java unit tests. Note the unit tests of each external library use classes of the streaming unit tests (`TestSuiteBase`, `LocalJavaStreamingContext`, etc.). To enable this code sharing among test classes, `dependsOn(streaming % "compile->compile,test->test")` was used in the SparkBuild.scala . In the streaming/pom.xml, an additional `maven-jar-plugin` was necessary to capture this dependency (see comment inside the pom.xml for more information).
4. Jars of the external projects have been added to examples project but not to the assembly project.
5. In some files, imports have been rearrange to conform to the Spark coding guidelines.
2014-01-07 22:21:52 -08:00
Patrick Wendell
e688e11206
Add log4j exclusion rule to maven.
...
To make this work I had to rename the defaults file. Otherwise
maven's pattern matching rules included it when trying to match
other log4j.properties files.
I also fixed a bug in the existing maven build where two
<transformers> tags were present in assembly/pom.xml
such that one overwrote the other.
2014-01-07 12:56:24 -08:00
Tathagata Das
8f02f1c3d4
Fixed examples/pom.xml and run-example based on Patrick's suggestions.
2014-01-07 11:02:29 -08:00
Reynold Xin
15d9534501
Merge pull request #318 from srowen/master
...
Suggested small changes to Java code for slightly more standard style, encapsulation and in some cases performance
Sorry if this is too abrupt or not a welcome set of changes, but thought I'd see if I could contribute a little. I'm a Java developer and just getting seriously into Spark. So I thought I'd suggest a number of small changes to the couple Java parts of the code to make it a little tighter, more standard and even a bit faster.
Feel free to take all, some or none of this. Happy to explain any of it.
2014-01-07 08:10:02 -08:00
Tathagata Das
aa99f226a6
Removed XYZFunctions and added XYZUtils as a common Scala and Java interface for creating XYZ streams.
2014-01-07 01:56:15 -08:00
Sean Owen
4b92a20232
Issue #318 : minor style updates per review from Reynold Xin
2014-01-07 09:38:45 +00:00
prabeesh
a91f14cfdc
spark -> org.apache.spark
2014-01-07 12:21:20 +05:30
Patrick Wendell
c0498f9265
Merge remote-tracking branch 'apache-github/master' into standalone-driver
...
Conflicts:
core/src/main/scala/org/apache/spark/deploy/client/AppClient.scala
core/src/main/scala/org/apache/spark/deploy/client/TestClient.scala
core/src/main/scala/org/apache/spark/deploy/master/Master.scala
core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala
core/src/main/scala/org/apache/spark/scheduler/cluster/SparkDeploySchedulerBackend.scala
2014-01-06 17:29:21 -08:00
Sean Owen
7379b2915f
Merge remote-tracking branch 'upstream/master'
2014-01-06 15:13:16 +00:00
Tathagata Das
3b4c4c7f4d
Merge remote-tracking branch 'apache/master' into project-refactor
...
Conflicts:
examples/src/main/java/org/apache/spark/streaming/examples/JavaFlumeEventCount.java
streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala
streaming/src/main/scala/org/apache/spark/streaming/api/java/JavaStreamingContext.scala
streaming/src/test/java/org/apache/spark/streaming/JavaAPISuite.java
streaming/src/test/scala/org/apache/spark/streaming/InputStreamsSuite.scala
streaming/src/test/scala/org/apache/spark/streaming/TestSuiteBase.scala
2014-01-06 03:05:52 -08:00
Tathagata Das
d0fd3b9ad2
Changed JavaStreamingContextWith*** to ***Function in streaming.api.java.*** package. Also fixed packages of Flume and MQTT tests.
2014-01-06 01:47:53 -08:00
Patrick Wendell
79f52809c8
Removing SPARK_EXAMPLES_JAR in the code
2014-01-05 11:49:42 -08:00
Reza Zadeh
06c0f7628a
use SparseMatrix everywhere
2014-01-04 14:28:07 -08:00
Reza Zadeh
e9bd6cb51d
new example file
2014-01-04 12:33:22 -08:00
Tathagata Das
a1b8dd53e3
Added StreamingContext.getOrCreate to for automatic recovery, and added RecoverableNetworkWordCount example to use it.
2014-01-02 19:07:22 -08:00
Sean Owen
66d501276b
Suggested small changes to Java code for slightly more standard style, encapsulation and in some cases performance
2014-01-02 16:17:57 +00:00
Prashant Sharma
94b7a7fe37
run-example -> bin/run-example
2014-01-02 18:41:21 +05:30
Tathagata Das
97630849ff
Added pom.xml for external projects and removed unnecessary dependencies and repositoris from other poms and sbt.
2013-12-31 00:28:57 -08:00
Tathagata Das
f4e4066191
Refactored kafka, flume, zeromq, mqtt as separate external projects, with their own self-contained scala API, java API, scala unit tests and java unit tests. Updated examples to use the external projects.
2013-12-30 11:13:24 -08:00
Matei Zaharia
b4ceed40d6
Merge remote-tracking branch 'origin/master' into conf2
...
Conflicts:
core/src/main/scala/org/apache/spark/SparkContext.scala
core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala
core/src/main/scala/org/apache/spark/scheduler/cluster/ClusterTaskSetManager.scala
core/src/main/scala/org/apache/spark/scheduler/local/LocalScheduler.scala
core/src/main/scala/org/apache/spark/util/MetadataCleaner.scala
core/src/test/scala/org/apache/spark/scheduler/TaskResultGetterSuite.scala
core/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala
new-yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala
streaming/src/main/scala/org/apache/spark/streaming/api/java/JavaStreamingContext.scala
streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobGenerator.scala
streaming/src/test/scala/org/apache/spark/streaming/BasicOperationsSuite.scala
streaming/src/test/scala/org/apache/spark/streaming/CheckpointSuite.scala
streaming/src/test/scala/org/apache/spark/streaming/InputStreamsSuite.scala
streaming/src/test/scala/org/apache/spark/streaming/TestSuiteBase.scala
streaming/src/test/scala/org/apache/spark/streaming/WindowOperationsSuite.scala
2013-12-29 15:08:08 -05:00
Matei Zaharia
642029e7f4
Various fixes to configuration code
...
- Got rid of global SparkContext.globalConf
- Pass SparkConf to serializers and compression codecs
- Made SparkConf public instead of private[spark]
- Improved API of SparkContext and SparkConf
- Switched executor environment vars to be passed through SparkConf
- Fixed some places that were still using system properties
- Fixed some tests, though others are still failing
This still fails several tests in core, repl and streaming, likely due
to properties not being set or cleared correctly (some of the tests run
fine in isolation).
2013-12-28 17:13:15 -05:00
Tathagata Das
6e43039614
Refactored streaming project to separate out the twitter functionality.
2013-12-26 18:02:49 -08:00
Patrick Wendell
c9c0f745af
Minor style clean-up
2013-12-25 01:19:25 -08:00
Patrick Wendell
760823d393
Adding better option parsing
2013-12-25 01:19:01 -08:00
Prashant Sharma
2573add94c
spark-544, introducing SparkConf and related configuration overhaul.
2013-12-25 00:09:36 +05:30
azuryyu
a8bb86389d
Fixed job name in the java streaming example.
2013-12-24 16:52:20 +08:00
Mark Hamstra
09ed7ddfa0
Use scala.binary.version in POMs
2013-12-15 12:39:58 -08:00
Patrick Wendell
6e8a96c7e7
Fix maven build issues in 2.10 branch
2013-12-13 23:14:08 -08:00
Prashant Sharma
17db6a9041
Style fixes and addressed review comments at #221
2013-12-10 11:47:16 +05:30
Prashant Sharma
7ad6921ae0
Incorporated Patrick's feedback comment on #211 and made maven build/dep-resolution atleast a bit faster.
2013-12-07 12:45:57 +05:30
Reynold Xin
6bcac986b2
Merge branch 'master' of github.com:apache/incubator-spark
2013-11-25 15:47:47 +08:00
Prashant Sharma
95d8dbce91
Merge branch 'master' of github.com:apache/incubator-spark into scala-2.10-temp
...
Conflicts:
core/src/main/scala/org/apache/spark/util/collection/PrimitiveVector.scala
streaming/src/main/scala/org/apache/spark/streaming/api/java/JavaStreamingContext.scala
2013-11-21 12:34:46 +05:30
Prashant Sharma
199e9cf02d
Merge branch 'scala210-master' of github.com:colorant/incubator-spark into scala-2.10
...
Conflicts:
core/src/main/scala/org/apache/spark/deploy/client/Client.scala
core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala
core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala
core/src/test/scala/org/apache/spark/MapOutputTrackerSuite.scala
2013-11-21 11:55:48 +05:30
Henry Saputra
43dfac5132
Merge branch 'master' into removesemicolonscala
2013-11-19 16:57:57 -08:00
Henry Saputra
9c934b640f
Remove the semicolons at the end of Scala code to make it more pure Scala code.
...
Also remove unused imports as I found them along the way.
Remove return statements when returning value in the Scala code.
Passing compile and tests.
2013-11-19 10:19:03 -08:00
Aaron Davidson
50fd8d98c0
Enable the Broadcast examples to work in a cluster setting
...
Since they rely on println to display results, we need to first collect
those results to the driver to have them actually display locally.
2013-11-18 22:51:35 -08:00
Raymond Liu
0f2e3c6e31
Merge branch 'master' into scala-2.10
2013-11-13 16:55:11 +08:00
Prashant Sharma
6860b79f6e
Remove deprecated actorFor and use actorSelection everywhere.
2013-11-12 12:43:53 +05:30
Reynold Xin
551a43fd3d
Merge branch 'master' of github.com:apache/incubator-spark into mergemerge
...
Conflicts:
README.md
core/src/main/scala/org/apache/spark/util/collection/OpenHashMap.scala
core/src/main/scala/org/apache/spark/util/collection/OpenHashSet.scala
core/src/main/scala/org/apache/spark/util/collection/PrimitiveKeyOpenHashMap.scala
2013-11-04 21:02:36 -08:00
Ankur Dave
5064f9b2d2
Merge remote-tracking branch 'spark-upstream/master'
...
Conflicts:
project/SparkBuild.scala
2013-10-30 15:59:09 -07:00
tgravescs
e5e0ebdb11
fix sparkhdfs lr test
2013-10-29 20:12:45 -05:00
Ali Ghodsi
05a0df2b9e
Makes Spark SIMR ready.
2013-10-24 11:59:51 -07:00
Matei Zaharia
dd659642e7
Merge pull request #64 from prabeesh/master
...
MQTT Adapter for Spark Streaming
MQTT is a machine-to-machine (M2M)/Internet of Things connectivity protocol.
It was designed as an extremely lightweight publish/subscribe messaging transport. You may read more about it here http://mqtt.org/
Message Queue Telemetry Transport (MQTT) is an open message protocol for M2M communications. It enables the transfer of telemetry-style data in the form of messages from devices like sensors and actuators, to mobile phones, embedded systems on vehicles, or laptops and full scale computers.
The protocol was invented by Andy Stanford-Clark of IBM, and Arlen Nipper of Cirrus Link Solutions
This protocol enables a publish/subscribe messaging model in an extremely lightweight way. It is useful for connections with remote locations where line of code and network bandwidth is a constraint.
MQTT is one of the widely used protocol for 'Internet of Things'. This protocol is getting much attraction as anything and everything is getting connected to internet and they all produce data. Researchers and companies predict some 25 billion devices will be connected to the internet by 2015.
Plugin/Support for MQTT is available in popular MQs like RabbitMQ, ActiveMQ etc.
Support for MQTT in Spark will help people with Internet of Things (IoT) projects to use Spark Streaming for their real time data processing needs (from sensors and other embedded devices etc).
2013-10-23 15:07:59 -07:00
Matei Zaharia
731c94e91d
Merge pull request #56 from jerryshao/kafka-0.8-dev
...
Upgrade Kafka 0.7.2 to Kafka 0.8.0-beta1 for Spark Streaming
Conflicts:
streaming/pom.xml
2013-10-21 23:31:38 -07:00
Prabeesh K
9ca1bd9530
Update MQTTWordCount.scala
2013-10-22 09:05:57 +05:30
Prabeesh K
dbafa11396
Update MQTTWordCount.scala
2013-10-22 08:50:34 +05:30
Reynold Xin
4e44d65b5e
Exclusion rules for Maven build files.
2013-10-19 12:35:55 -07:00
Joseph E. Gonzalez
1856b37e9d
Merge branch 'master' of https://github.com/apache/incubator-spark into indexedrdd_graphx
2013-10-18 12:21:19 -07:00
Prabeesh K
6ec39829e9
Update MQTTWordCount.scala
2013-10-18 17:00:28 +05:30
Mosharaf Chowdhury
e96bd0068f
BroadcastTest2 --> BroadcastTest
2013-10-16 21:33:33 -07:00
Mosharaf Chowdhury
feb45d391f
Default blockSize is 4MB.
...
BroadcastTest2 example added for testing broadcasts.
2013-10-16 21:33:33 -07:00
prabeesh
ee4178f144
remove unused dependency
2013-10-17 09:57:48 +05:30
prabeesh
7d36a117c1
add maven dependencies for mqtt
2013-10-16 13:41:26 +05:30
prabeesh
9eaf68fd40
added mqtt adapter wordcount example
2013-10-16 13:40:38 +05:30
Patrick Wendell
35befe07bb
Fixing spark streaming example and a bug in examples build.
...
- Examples assembly included a log4j.properties which clobbered Spark's
- Example had an error where some classes weren't serializable
- Did some other clean-up in this example
2013-10-15 22:55:43 -07:00
Joseph E. Gonzalez
ef7c369092
merged with upstream changes
2013-10-14 22:56:42 -07:00
jerryshao
c23cd72b4b
Upgrade Kafka 0.7.2 to Kafka 0.8.0-beta1 for Spark Streaming
2013-10-12 20:00:42 +08:00
Neal Wiggins
67d4a31f87
Remove unnecessary mutable imports
2013-10-11 09:47:27 -07:00
Prashant Sharma
26860639c5
Merge branch 'scala-2.10' of github.com:ScrapCodes/spark into scala-2.10
...
Conflicts:
core/src/main/scala/org/apache/spark/scheduler/cluster/ClusterTaskSetManager.scala
project/SparkBuild.scala
2013-10-10 09:42:23 +05:30
Prashant Sharma
7be75682b9
Merge branch 'master' into wip-merge-master
...
Conflicts:
bagel/pom.xml
core/pom.xml
core/src/test/scala/org/apache/spark/ui/UISuite.scala
examples/pom.xml
mllib/pom.xml
pom.xml
project/SparkBuild.scala
repl/pom.xml
streaming/pom.xml
tools/pom.xml
In scala 2.10, a shorter representation is used for naming artifacts
so changed to shorter scala version for artifacts and made it a property in pom.
2013-10-08 11:29:40 +05:30
Patrick Wendell
aa9fb84994
Merging build changes in from 0.8
2013-10-05 22:07:00 -07:00
Martin Weindel
e09f4a9601
fixed some warnings
2013-10-05 23:08:23 +02:00
Prashant Sharma
5829692885
Merge branch 'master' into scala-2.10
...
Conflicts:
core/src/main/scala/org/apache/spark/ui/jobs/JobProgressUI.scala
docs/_config.yml
project/SparkBuild.scala
repl/src/main/scala/org/apache/spark/repl/SparkILoop.scala
2013-10-01 11:57:24 +05:30
Prashant Sharma
604dc40996
Sync with master and some build fixes
2013-09-26 11:40:02 +05:30
Prashant Sharma
7ff4c2d399
fixed maven build for scala 2.10
2013-09-26 10:48:24 +05:30
Patrick Wendell
6079721fa1
Update build version in master
2013-09-24 11:41:51 -07:00
Prashant Sharma
276c37a51c
Akka 2.2 migration
2013-09-22 08:20:12 +05:30
Joseph E. Gonzalez
8b59fb72c4
Merging latest changes from spark main branch
2013-09-17 20:56:12 -07:00
Prashant Sharma
383e151fd7
Merge branch 'master' of git://github.com/mesos/spark into scala-2.10
...
Conflicts:
core/src/main/scala/org/apache/spark/SparkContext.scala
project/SparkBuild.scala
2013-09-15 10:55:12 +05:30
Jey Kottalam
30a32c8335
Minor YARN build cleanups
2013-09-06 11:31:16 -07:00
Prashant Sharma
4106ae9fbf
Merged with master
2013-09-06 17:53:01 +05:30
Matei Zaharia
12b2f1f9c9
Add missing license headers found with RAT
2013-09-02 12:23:03 -07:00
Matei Zaharia
0a8cc30921
Move some classes to more appropriate packages:
...
* RDD, *RDDFunctions -> org.apache.spark.rdd
* Utils, ClosureCleaner, SizeEstimator -> org.apache.spark.util
* JavaSerializer, KryoSerializer -> org.apache.spark.serializer
2013-09-01 14:13:16 -07:00
Matei Zaharia
5701eb92c7
Fix some URLs
2013-09-01 14:13:16 -07:00
Matei Zaharia
46eecd110a
Initial work to rename package to org.apache.spark
2013-09-01 14:13:13 -07:00
Matei Zaharia
666d93c294
Update Maven build to create assemblies expected by new scripts
...
This includes the following changes:
- The "assembly" package now builds in Maven by default, and creates an
assembly containing both hadoop-client and Spark, unlike the old
BigTop distribution assembly that skipped hadoop-client
- There is now a bigtop-dist package to build the old BigTop assembly
- The repl-bin package is no longer built by default since the scripts
don't reply on it; instead it can be enabled with -Prepl-bin
- Py4J is now included in the assembly/lib folder as a local Maven repo,
so that the Maven package can link to it
- run-example now adds the original Spark classpath as well because the
Maven examples assembly lists spark-core and such as provided
- The various Maven projects add a spark-yarn dependency correctly
2013-08-29 21:19:06 -07:00
Matei Zaharia
aab345c463
Fix finding of assembly JAR, as well as some pointers to ./run
2013-08-29 21:19:06 -07:00
Matei Zaharia
53cd50c069
Change build and run instructions to use assemblies
...
This commit makes Spark invocation saner by using an assembly JAR to
find all of Spark's dependencies instead of adding all the JARs in
lib_managed. It also packages the examples into an assembly and uses
that as SPARK_EXAMPLES_JAR. Finally, it replaces the old "run" script
with two better-named scripts: "run-examples" for examples, and
"spark-class" for Spark internal classes (e.g. REPL, master, etc). This
is also designed to minimize the confusion people have in trying to use
"run" to run their own classes; it's not meant to do that, but now at
least if they look at it, they can modify run-examples to do a decent
job for them.
As part of this, Bagel's examples are also now properly moved to the
examples package instead of bagel.
2013-08-29 21:19:04 -07:00
Jey Kottalam
23f4622aff
Remove redundant dependencies from POMs
2013-08-18 18:53:57 -07:00
Jey Kottalam
c1e547bb7f
Updates to repl and example POMs to match SBT build
2013-08-16 13:50:12 -07:00
Jey Kottalam
ad580b94d5
Maven build now also works with YARN
2013-08-16 13:50:12 -07:00
Jey Kottalam
9dd15fe700
Don't mark hadoop-client as 'provided'
2013-08-16 13:50:12 -07:00
Jey Kottalam
11b42a84db
Maven build now works with CDH hadoop-2.0.0-mr1
2013-08-16 13:50:12 -07:00
Jey Kottalam
353fab2440
Initial changes to make Maven build agnostic of hadoop version
2013-08-16 13:50:12 -07:00
Jey Kottalam
4f43fd791a
make SparkHadoopUtil a member of SparkEnv
2013-08-15 16:50:37 -07:00
Evan Sparks
ff9ebfabb4
Merge pull request #762 from shivaram/sgd-cleanup
...
Refactor SGD options into a new class.
2013-08-11 10:52:55 -07:00
Alexander Pivovarov
2d97cc46af
Fixed path to JavaALS.java and JavaKMeans.java, fixed hadoop2-yarn profile
2013-08-10 23:04:50 -07:00
Matei Zaharia
4c4f769187
Optimize Scala PageRank to use reduceByKey
2013-08-10 18:09:54 -07:00
Matei Zaharia
06e4f2a8f2
Merge pull request #789 from MLnick/master
...
Adding Scala version of PageRank example
2013-08-10 18:06:23 -07:00
Matei Zaharia
cd247ba5bb
Merge pull request #786 from shivaram/mllib-java
...
Java fixes, tests and examples for ALS, KMeans
2013-08-09 20:41:13 -07:00
Matei Zaharia
06303a62e5
Optimize JavaPageRank to use reduceByKey instead of groupByKey
2013-08-08 18:50:00 -07:00
Shivaram Venkataraman
2812e72200
Add setters for optimizer, gradient in SGD.
...
Also remove java-specific constructor for LabeledPoint.
2013-08-08 16:24:31 -07:00
Shivaram Venkataraman
e1a209f791
Remove Java-specific constructor for Rating.
...
The scala constructor works for native type java types. Modify examples
to match this.
2013-08-08 14:36:02 -07:00
Nick Pentreath
c4eea875ac
Style changes as per Matei's comments
2013-08-08 12:40:37 +02:00
Nick Pentreath
cce758b893
Adding Scala version of PageRank example
2013-08-07 16:38:52 +02:00
Shivaram Venkataraman
338b7a7455
Merge branch 'master' of git://github.com/mesos/spark into sgd-cleanup
...
Conflicts:
mllib/src/main/scala/spark/mllib/util/MLUtils.scala
2013-08-06 21:21:55 -07:00
Shivaram Venkataraman
7db69d56f2
Refactor GLM algorithms and add Java tests
...
This change adds Java examples and unit tests for all GLM algorithms
to make sure the MLLib interface works from Java. Changes include
- Introduce LabeledPoint and avoid using Doubles in train arguments
- Rename train to run in class methods
- Make the optimizer a member variable of GLM to make sure the builder
pattern works
2013-08-06 17:23:22 -07:00
Shivaram Venkataraman
471fbadd0c
Java examples, tests for KMeans and ALS
...
- Changes ALS to accept RDD[Rating] instead of (Int, Int, Double) making it
easier to call from Java
- Renames class methods from `train` to `run` to enable static methods to be
called from Java.
- Add unit tests which check if both static / class methods can be called.
- Also add examples which port the main() function in ALS, KMeans to the
examples project.
Couple of minor changes to existing code:
- Add a toJavaRDD method in RDD to convert scala RDD to java RDD easily
- Workaround a bug where using double[] from Java leads to class cast exception in
KMeans init
2013-08-06 15:43:46 -07:00
stayhf
882baee489
Got rid of unnecessary map function
2013-08-06 21:34:39 +00:00
stayhf
326a7a82e0
changes as reviewer requested
2013-08-06 21:03:24 +00:00
stayhf
98fd62605d
Updated code with reviewer's suggestions
2013-08-05 00:30:28 +00:00
stayhf
a682637301
Simple PageRank algorithm implementation in Java for SPARK-760
2013-08-03 06:01:16 +00:00
Matei Zaharia
af3c9d5042
Add Apache license headers and LICENSE and NOTICE files
2013-07-16 17:21:33 -07:00
Prashant Sharma
e86d5dbaad
Merge branch 'master' into master-merge
...
Conflicts:
README.md
core/pom.xml
core/src/main/scala/spark/deploy/JsonProtocol.scala
core/src/main/scala/spark/deploy/LocalSparkCluster.scala
core/src/main/scala/spark/deploy/master/Master.scala
core/src/main/scala/spark/deploy/master/MasterWebUI.scala
core/src/main/scala/spark/deploy/worker/Worker.scala
core/src/main/scala/spark/deploy/worker/WorkerWebUI.scala
core/src/main/scala/spark/storage/BlockManagerUI.scala
core/src/main/scala/spark/util/AkkaUtils.scala
pom.xml
project/SparkBuild.scala
streaming/src/main/scala/spark/streaming/receivers/ActorReceiver.scala
2013-07-12 14:49:16 +05:30
Mark Hamstra
0b39d66f3f
pom cleanup
2013-07-08 16:07:09 -07:00
Mark Hamstra
afdaf430bd
Explicit dependencies for scala-library and scalap to prevent 2.9.2 vs. 2.9.3 problems
2013-07-08 15:40:50 -07:00
Prashant Sharma
a5f1f6a907
Merge branch 'master' into master-merge
...
Conflicts:
core/pom.xml
core/src/main/scala/spark/MapOutputTracker.scala
core/src/main/scala/spark/RDD.scala
core/src/main/scala/spark/RDDCheckpointData.scala
core/src/main/scala/spark/SparkContext.scala
core/src/main/scala/spark/Utils.scala
core/src/main/scala/spark/api/python/PythonRDD.scala
core/src/main/scala/spark/deploy/client/Client.scala
core/src/main/scala/spark/deploy/master/MasterWebUI.scala
core/src/main/scala/spark/deploy/worker/Worker.scala
core/src/main/scala/spark/deploy/worker/WorkerWebUI.scala
core/src/main/scala/spark/rdd/BlockRDD.scala
core/src/main/scala/spark/rdd/ZippedRDD.scala
core/src/main/scala/spark/scheduler/cluster/StandaloneSchedulerBackend.scala
core/src/main/scala/spark/storage/BlockManager.scala
core/src/main/scala/spark/storage/BlockManagerMaster.scala
core/src/main/scala/spark/storage/BlockManagerMasterActor.scala
core/src/main/scala/spark/storage/BlockManagerUI.scala
core/src/main/scala/spark/util/AkkaUtils.scala
core/src/test/scala/spark/SizeEstimatorSuite.scala
pom.xml
project/SparkBuild.scala
repl/src/main/scala/spark/repl/SparkILoop.scala
repl/src/test/scala/spark/repl/ReplSuite.scala
streaming/src/main/scala/spark/streaming/StreamingContext.scala
streaming/src/main/scala/spark/streaming/api/java/JavaStreamingContext.scala
streaming/src/main/scala/spark/streaming/dstream/KafkaInputDStream.scala
streaming/src/main/scala/spark/streaming/util/MasterFailureTest.scala
2013-07-03 11:43:26 +05:30
Konstantin Boudnik
6fdbc68f2c
Fixing missed hbase dependency in examples hadoop2-yarn profile
2013-07-01 17:45:07 -07:00
Matei Zaharia
ccfe953a4d
Merge pull request #577 from skumargithub/master
...
Example of cumulative counting using updateStateByKey
2013-06-29 17:57:53 -07:00
Matei Zaharia
1667158544
Merge remote-tracking branch 'mrpotes/master'
2013-06-29 14:36:09 -07:00
James Phillpotts
176193b1e8
Fix usage and parameter extraction
2013-06-25 23:06:15 +01:00
James Phillpotts
366572edca
Include a default OAuth implementation, and update examples and JavaStreamingContext
2013-06-25 22:59:34 +01:00
Tathagata Das
c89af0a7f9
Merge branch 'master' into streaming
...
Conflicts:
.gitignore
2013-06-24 23:57:47 -07:00
Matei Zaharia
dbfab49d2a
Merge remote-tracking branch 'milliondreams/casdemo'
...
Conflicts:
project/SparkBuild.scala
2013-06-18 14:55:31 +02:00
Matei Zaharia
b7794813b1
Fix run script on Windows for Scala 2.10
2013-06-15 09:37:13 -07:00
Rohit Rai
b5b12823fa
Fixing the style as per feedback
2013-06-13 14:05:46 +05:30
Rohit Rai
b104c7f5c7
Example to write the output to cassandra
2013-06-03 15:15:52 +05:30
Rohit Rai
56c64c4033
A better way to read column value if you are sure the column exists in every row.
2013-06-03 12:48:35 +05:30
Rohit Rai
6d8423fd1b
Adding deps to examples/pom.xml
...
Fixing exclusion in examples deps in SparkBuild.scala
2013-06-02 13:03:45 +05:30
Rohit Rai
81c2adc15c
Removing infix call
2013-06-02 12:51:15 +05:30
Rohit Rai
3be7bdcefd
Adding example to make Spark RDD from Cassandra
2013-06-01 19:32:17 +05:30
Ethan Jewett
3217d486f7
Add hBase dependency to examples POM
2013-05-20 19:41:38 -05:00
Ethan Jewett
ee6f6aa6cd
Add hBase example
2013-05-09 18:33:38 -05:00
Reynold Xin
012c9e5ab0
Revert "Merge pull request #596 from esjewett/master" because the
...
dependency on hbase introduces netty-3.2.2 which conflicts with
netty-3.5.3 already in Spark. This caused multiple test failures.
This reverts commit 0f1b7a06e1
, reversing
changes made to aacca1b8a8
.
2013-05-09 14:20:01 -07:00
Ethan Jewett
a3d5f92210
Switch to using SparkContext method to create RDD
2013-05-07 11:43:06 -05:00
unknown
cbf6a5ee1e
Removed unused code, clarified intent of the program, batch size to 1 second
2013-05-06 08:05:45 -06:00
Ethan Jewett
7cff7e7897
Fix indents and mention other configuration options
2013-05-04 14:56:55 -05:00
Ethan Jewett
9290f16430
Remove unnecessary column family config
2013-05-04 12:39:14 -05:00
Ethan Jewett
02e8cfa617
HBase example
2013-05-04 12:31:30 -05:00
unknown
1d54401d7e
Modified as per TD's suggestions
2013-04-30 23:01:32 -06:00
Prashant Sharma
8f3ac240cb
Fixed Warning: ClassManifest -> ClassTag
2013-04-29 16:39:13 +05:30
Prashant Sharma
4b4a36ea7d
Fixed pom.xml with updated dependencies.
2013-04-29 12:55:43 +05:30
Mridul Muralidharan
dd515ca3ee
Attempt at fixing merge conflict
2013-04-24 09:24:17 +05:30
unknown
0dc1e2d60f
Examaple of cumulative counting using updateStateByKey
2013-04-22 09:22:45 -06:00
Mridul Muralidharan
7acab3ab45
Fix review comments, add a new api to SparkHadoopUtil to create appropriate Configuration. Modify an example to show how to use SplitInfo
2013-04-22 08:01:13 +05:30
seanm
7e56e99573
Surfacing decoders on KafkaInputDStream
2013-04-16 17:17:16 -06:00
Andrew Ash
f1d8871ca1
Uniform whitespace across scala examples
2013-04-09 23:35:13 -04:00
Matei Zaharia
65caa8f711
Merge remote-tracking branch 'jey/bump-development-version-to-0.8.0'
...
Conflicts:
docs/_config.yml
project/SparkBuild.scala
2013-04-08 12:43:17 -04:00
Matei Zaharia
b362df39ea
Merge pull request #552 from MLnick/master
...
Bumping version for Twitter Algebird to latest
2013-04-07 17:17:52 -07:00
Mridul Muralidharan
6798a09df8
Add support for building against hadoop2-yarn : adding new maven profile for it
2013-04-07 17:47:38 +05:30
Nick Pentreath
0f54344fd8
Bumping Algebird version in examples now that it supports JDK 1.6
2013-04-03 13:15:34 +02:00
Erik van oosten
b5e60c3253
Corrected order of CountMinSketchMonoid arguments
2013-04-02 15:25:22 +03:00
Jey Kottalam
bc8ba222ff
Bump development version to 0.8.0
2013-03-28 15:42:01 -07:00
Matei Zaharia
ca4d083ec8
Merge pull request #528 from MLnick/java-examples
...
[SPARK-707] Adding Java versions of Pi, LogQuery and K-Means examples
2013-03-20 11:22:36 -07:00
Nick Pentreath
52398cc1a3
Java indentation 4 --> 2 spaces
2013-03-20 09:55:42 +02:00
Nick Pentreath
9fa47a2039
A few cosmetic changes for JavaKMeans
2013-03-19 15:31:03 +02:00
Nick Pentreath
568ddf7330
Adding Java K-Means example
2013-03-19 15:29:22 +02:00
Nick Pentreath
b990caeb80
Changes to more closely match line length limit style
2013-03-17 20:03:27 +02:00
Mikhail Bautin
7fd2708eda
Add a log4j compile dependency to fix build in IntelliJ
...
Also rename parent project to spark-parent (otherwise it shows up as
"parent" in IntelliJ, which is very confusing).
2013-03-15 11:41:51 -07:00
Nick Pentreath
13757b1198
Adding Java versions of Pi and LogQuery
2013-03-15 10:52:01 +02:00
Mark Hamstra
8b06b359da
bump version to 0.7.1-SNAPSHOT in the subproject poms to keep the maven build building.
2013-02-28 23:34:34 -08:00
Matei Zaharia
5d7b591cfe
Pass a code JAR to SparkContext in our examples. Fixes SPARK-594.
2013-02-25 19:34:32 -08:00
Matei Zaharia
6b87ef7c86
Fix compile error
2013-02-25 14:01:16 -08:00
Matei Zaharia
01bd136ba5
Use public method sparkContext instead of protected sc in streaming examples
2013-02-25 13:27:11 -08:00
Tathagata Das
f282bc4960
Changed Algebird from 0.1.9 to 0.1.8
2013-02-24 12:44:12 -08:00
Tathagata Das
c1a040db3a
Fixed bugs in examples.
2013-02-24 11:00:30 -08:00
Tathagata Das
41285eaae3
Fixed differences in APIs of StreamingContext and JavaStreamingContext. Change rawNetworkStream to rawSocketStream, and added twitter, actor, zeroMQ streams to JavaStreamingContext. Also added them to JavaAPISuite.
2013-02-23 16:25:07 -08:00
Tathagata Das
cfa65ebff1
Merge pull request #480 from MLnick/streaming-eg-algebird
...
[Streaming] Examples using Twitter's Algebird library
2013-02-22 12:29:04 -08:00
Tathagata Das
688e62718f
Merge pull request #479 from ScrapCodes/zeromq-streaming
...
Zeromq streaming
2013-02-22 12:17:17 -08:00
Nick Pentreath
d9bdae8cc2
Adding documentation for HLL and CMS examples. More efficient and clear use of the monoids.
2013-02-21 12:31:31 +02:00
Nick Pentreath
718474b9c6
Bumping Algebird to 0.1.9
2013-02-21 12:11:31 +02:00
Nick Pentreath
16d456742e
Merge remote-tracking branch 'upstream/streaming' into streaming-eg-algebird
2013-02-21 09:33:08 +02:00
Tathagata Das
972fe7714f
Merge branch 'mesos-streaming' into streaming
...
Conflicts:
streaming/src/test/java/spark/streaming/JavaAPISuite.java
2013-02-20 11:06:01 -08:00
Tathagata Das
fb9956256d
Merge branch 'mesos-master' into streaming
...
Conflicts:
core/src/main/scala/spark/rdd/CheckpointRDD.scala
streaming/src/main/scala/spark/streaming/dstream/ReducedWindowedDStream.scala
2013-02-20 09:01:29 -08:00
Prashant Sharma
4e5b09664c
fixes corresponding to review feedback at pull request #479
2013-02-20 19:14:52 +05:30
Prashant Sharma
05dc385649
A bug fix post merge, following changes to AkkaUtils
2013-02-20 15:28:12 +05:30
Nick Pentreath
8a281399f9
Streaming example using Twitter Algebird's Count Min Sketch monoid
2013-02-19 17:56:02 +02:00
Nick Pentreath
d8ee184d95
Dependencies and refactoring for streaming HLL example, and using context.twitterStream method
2013-02-19 17:42:57 +02:00
Prashant Sharma
8d44480d84
example for demonstrating ZeroMQ stream
2013-02-19 19:42:14 +05:30
Nick Pentreath
315ea069e8
Merge remote-tracking branch 'upstream/streaming' into streaming-eg-algebird
...
Conflicts:
project/SparkBuild.scala
2013-02-19 13:58:05 +02:00
Nick Pentreath
015893f0e8
Adding streaming HyperLogLog example using Algebird
2013-02-19 13:21:33 +02:00
Tathagata Das
7e30c46aaf
Added comment to the KafkaWordCount, given by Sean McNamara.
2013-02-19 03:05:44 -08:00
Tathagata Das
9e82be1503
Merge branch 'streaming' into ScrapCodes-streaming-actor
...
Conflicts:
docs/plugin-custom-receiver.md
streaming/src/main/scala/spark/streaming/StreamingContext.scala
streaming/src/main/scala/spark/streaming/dstream/KafkaInputDStream.scala
streaming/src/main/scala/spark/streaming/dstream/PluggableInputDStream.scala
streaming/src/main/scala/spark/streaming/receivers/ActorReceiver.scala
streaming/src/test/scala/spark/streaming/InputStreamsSuite.scala
2013-02-19 02:48:50 -08:00
Tathagata Das
12ea14c211
Changed networkStream to socketStream and pluggableNetworkStream to become networkStream as a way to create streams from arbitrary network receiver.
2013-02-18 15:18:34 -08:00
Tathagata Das
6a6e6bda57
Merge branch 'streaming' into ScrapCode-streaming
...
Conflicts:
streaming/src/main/scala/spark/streaming/dstream/KafkaInputDStream.scala
streaming/src/main/scala/spark/streaming/dstream/NetworkInputDStream.scala
2013-02-18 13:26:12 -08:00
Tathagata Das
4b8402e900
Moved Java streaming examples to examples/src/main/java/spark/streaming/... and fixed logging in NetworkInputTracker to highlight errors when receiver deregisters/shuts down.
2013-02-14 18:10:37 -08:00
Tathagata Das
def8126d77
Added TwitterInputDStream from example to StreamingContext. Renamed example TwitterBasic to TwitterPopularTags.
2013-02-14 17:49:43 -08:00
Tathagata Das
2eacf22401
Removed countByKeyAndWindow on paired DStreams, and added countByValueAndWindow for all DStreams. Updated both scala and java API and testsuites.
2013-02-14 12:21:47 -08:00
Prashant Sharma
291dd47c7f
Taking FeederActor out as seperate program
2013-02-08 14:34:07 +05:30
Tathagata Das
4cc223b478
Merge branch 'mesos-master' into streaming
2013-02-07 13:59:31 -08:00
Tathagata Das
12300758cc
Merge pull request #372 from Reinvigorate/sm-kafka
...
Removing offset management code that is non-existent in kafka 0.7.0+
2013-02-07 12:41:07 -08:00
Patrick Wendell
dab81a8511
Fixing to match Spark styleguide
2013-02-05 20:57:04 -08:00
Patrick Wendell
cc37601ecb
Adding an example with an OLAP roll-up
2013-02-04 14:18:11 -08:00
Mikhail Bautin
fe3eceab57
Remove activation of profiles by default
...
See the discussion at https://github.com/mesos/spark/pull/355 for why
default profile activation is a problem.
2013-01-31 13:30:41 -08:00
Prashant Sharma
4496bf197b
Improved document comment in example
2013-01-25 14:34:38 +05:30
Prashant Sharma
d17065c4b5
actor as receiver
2013-01-22 13:28:29 +05:30
Prashant Sharma
43bfd7bb21
Changed method name of createReceiver to getReceiver as it is not intended to be a factory.
2013-01-21 11:39:30 +05:30
Matei Zaharia
6e3754bf47
Add Maven build file for streaming, and fix some issues in SBT file
...
As part of this, changed our Scala 2.9.2 Kafka library to be available
as a local Maven repository, following the example in
(http://blog.dub.podval.org/2010/01/maven-in-project-repository.html )
2013-01-20 19:22:24 -08:00
Matei Zaharia
86057ec7c8
Merge branch 'master' into streaming
...
Conflicts:
core/src/main/scala/spark/api/python/PythonRDD.scala
2013-01-20 12:47:55 -08:00
Matei Zaharia
2a8c2a6790
Minor formatting fixes
2013-01-20 10:24:53 -08:00
Tathagata Das
4f8fe58b25
Merge branch 'mesos-streaming' into streaming
...
Conflicts:
core/src/main/scala/spark/api/java/JavaRDDLike.scala
core/src/main/scala/spark/api/java/JavaSparkContext.scala
core/src/test/scala/spark/JavaAPISuite.java
2013-01-20 01:13:56 -08:00
Prashant Sharma
56b9bd197c
Plug in actor as stream receiver API
2013-01-19 22:04:07 +05:30
Prashant Sharma
bb6ab92e31
Changed method name of createReceiver to getReceiver as it is not intended to be a factory.
2013-01-19 22:04:07 +05:30
seanm
d3064fe707
kafkaStream API cleanup. A quorum of zookeepers can now be specified
2013-01-18 21:34:29 -07:00
Patrick Wendell
12b72b3e73
NetworkWordCount example
2013-01-17 22:37:56 -08:00
Patrick Wendell
e0165bf714
Adding queueStream and some slight refactoring
2013-01-17 21:25:49 -08:00
Patrick Wendell
6fba7683c2
Small doc fix
2013-01-17 18:46:24 -08:00
Nick Pentreath
a5ba7a9f32
Use only one update function and pass in transpose of ratings matrix where appropriate
2013-01-17 16:21:00 +02:00
Nick Pentreath
a512df551f
Fixed index error missing first argument
2013-01-17 16:05:27 +02:00
Nick Pentreath
42fbef3c2a
Adding default command line args to SparkALS
2013-01-17 15:54:59 +02:00
Tathagata Das
cd1521cfdb
Merge branch 'master' into streaming
...
Conflicts:
core/src/main/scala/spark/rdd/CoGroupedRDD.scala
core/src/main/scala/spark/rdd/FilteredRDD.scala
docs/_layouts/global.html
docs/index.md
run
2013-01-15 12:08:51 -08:00
Patrick Wendell
d182a57cae
Two changes:
...
- Updating countByX() types based on bug fix
- Porting new documentation to Java
2013-01-14 10:03:55 -08:00
Patrick Wendell
3461cd99b7
Flume example and bug fix
2013-01-14 09:42:36 -08:00
Tathagata Das
0a2e333341
Removed stream id from the constructor of NetworkReceiver to make it easier for PluggableNetworkInputDStream.
2013-01-13 16:18:39 -08:00
Eric Zhang
ba06e9c97c
Update examples/src/main/scala/spark/examples/LocalLR.scala
...
fix spelling mistake
2013-01-13 15:33:11 +08:00
Shivaram Venkataraman
bbc56d85ed
Rename environment variable for hadoop profiles to hadoopVersion
2013-01-12 15:24:13 -08:00
Shivaram Venkataraman
9262522306
Activate hadoop2 profile in pom.xml with -Dhadoop=2
2013-01-10 22:07:34 -08:00
Shivaram Venkataraman
f7adb382ac
Activate hadoop1 if property hadoop is missing. hadoop2 can be activated now
...
by using -Dhadoop -Phadoop2.
2013-01-08 03:19:43 -08:00
Patrick Wendell
6c502e3793
Making the Twitter example distributed.
...
This adds a distributed (receiver-based) implementation of the
Twitter dstream. It also changes the example to perform a
distributed sort rather than collecting the dataset at one node.
2013-01-07 22:01:11 -08:00
Tathagata Das
8c1b872512
Moved Twitter example to the where the other examples are.
2013-01-07 17:48:10 -08:00
Shivaram Venkataraman
4bbe07e5ec
Activate hadoop1 profile by default for maven builds
2013-01-07 17:46:22 -08:00
Tathagata Das
237bac36e9
Renamed examples and added documentation.
2013-01-07 14:37:21 -08:00
Tathagata Das
af8738dfb5
Moved Spark Streaming examples to examples sub-project.
2013-01-06 19:31:54 -08:00
Thomas Dudziak
02d64f9662
Mark hadoop dependencies provided in all library artifacts
2012-12-10 21:27:54 -08:00
Matei Zaharia
ccff0a089a
Use the same output directories that SBT had in subprojects
...
This will make it easier to make the "run" script work with a Maven build
2012-12-10 10:58:56 -08:00
Thomas Dudziak
3b643e86bc
Updated versions in the pom.xml files to match current master
2012-11-27 17:50:42 -08:00
Thomas Dudziak
69297c64be
Addressed code review comments
2012-11-27 15:45:16 -08:00
Thomas Dudziak
811a32257b
Added maven and debian build files
2012-11-20 16:19:51 -08:00
root
acf8272324
Fix K-means example a little
2012-11-10 23:07:21 -08:00
Matei Zaharia
8d7b77bcb5
Some doc and usability improvements:
...
- Added a StorageLevels class for easy access to StorageLevel constants
in Java
- Added doc comments on Function classes in Java
- Updated Accumulator and HadoopWriter docs slightly
2012-10-12 17:53:20 -07:00
Mosharaf Chowdhury
119e50c7b9
Conflict fixed
2012-10-02 22:25:39 -07:00
Matei Zaharia
56c90485fd
More updates to documentation
2012-09-25 19:31:07 -07:00
Mosharaf Chowdhury
3883532545
Bug fix. Fixed log messages. Updated BroadcastTest example to have iterations.
2012-08-30 21:43:00 -07:00
Josh Rosen
566feafe1d
Cache points in SparkLR example.
2012-08-26 15:24:43 -07:00
Matei Zaharia
6ae3c375a9
Renamed apply() to call() in Java API and allowed it to throw Exceptions
2012-08-12 23:10:19 +02:00
Imran Rashid
edc6972f8e
move Vector class into core and spark.util package
2012-07-28 20:15:42 -07:00
Josh Rosen
2a60c998cc
Remove StringOps.split() from Java WordCount.
2012-07-25 10:13:06 -07:00
Josh Rosen
6a78e88237
Minor cleanup and optimizations in Java API.
...
- Add override keywords.
- Cache RDDs and counts in TC example.
- Clean up JavaRDDLike's abstract methods.
2012-07-24 09:47:00 -07:00
Josh Rosen
460da878fc
Improve Java API examples
...
- Replace JavaLR example with JavaHdfsLR example.
- Use anonymous classes in JavaWordCount; add options.
- Remove @Override annotations.
2012-07-22 14:40:39 -07:00
Josh Rosen
01dce3f569
Add Java API
...
Add distinct() method to RDD.
Fix bug in DoubleRDDFunctions.
2012-07-18 17:34:29 -07:00
Matei Zaharia
28fed4ce3b
Add System.exit(0) at the end of all the example programs.
2012-06-05 23:31:28 -07:00
haoyuan
651932e703
Format the code as coding style agreed by Matei/TD/Haoyuan
2012-02-09 13:26:23 -08:00
Matei Zaharia
100e800782
Some fixes to the examples (mostly to use functional API)
2012-01-31 00:33:18 -08:00
Matei Zaharia
fabcc82528
Merge pull request #103 from edisontung/master
...
Made improvements to takeSample. Also changed SparkLocalKMeans to SparkKMeans
2012-01-13 19:20:03 -08:00
Matei Zaharia
3034fc0d91
Merge commit 'ad4ebff42c1b738746b2b9ecfbb041b6d06e3e16'
2011-12-14 18:19:43 +01:00
Matei Zaharia
72c4839c5f
Fixed LocalFileLR to deal with a change in Scala IO sources
...
(you can no longer iterate over a Source multiple times).
2011-12-01 13:52:12 -08:00
Edison Tung
42f8847a21
Revert de01b6deaaee1b43321e0aac330f4a98c0ea61c6^..HEAD
2011-12-01 13:43:25 -08:00
Edison Tung
e1c814be4c
Renamed SparkLocalKMeans to SparkKMeans
2011-12-01 13:34:03 -08:00
Edison Tung
3b9d9de583
Added KMeans examples
...
LocalKMeans runs locally with a randomly generated dataset.
SparkLocalKMeans takes an input file and runs KMeans on it.
2011-11-21 16:37:58 -08:00
Ankur Dave
35b6358a7c
Report errors in tasks to the driver via a Mesos status update
...
When a task throws an exception, the Spark executor previously just
logged it to a local file on the slave and exited. This commit causes
Spark to also report the exception back to the driver using a Mesos
status update, so the user doesn't have to look through a log file on
the slave.
Here's what the reporting currently looks like:
# ./run spark.examples.ExceptionHandlingTest master@203.0.113.1:5050
[...]
11/10/26 21:04:13 INFO spark.SimpleJob: Lost TID 1 (task 0:1)
11/10/26 21:04:13 INFO spark.SimpleJob: Loss was due to java.lang.Exception: Testing exception handling
[...]
11/10/26 21:04:16 INFO spark.SparkContext: Job finished in 5.988547328 s
2011-11-14 01:54:53 +00:00
Matei Zaharia
d4c8e69dc7
K-means example
2011-11-01 19:25:58 -07:00
Ismael Juma
0fba22b3d2
Fix issue #65 : Change @serializable to extends Serializable in 2.9 branch
...
Note that we use scala.Serializable introduced in Scala 2.9 instead of
java.io.Serializable. Also, case classes inherit from scala.Serializable by
default.
2011-08-02 10:16:33 +01:00
Matei Zaharia
c4dd68ae21
Merge branch 'mos-bt'
...
This merge keeps only the broadcast work in mos-bt because the structure
of shuffle has changed with the new RDD design. We still need some kind
of parallel shuffle but that will be added later.
Conflicts:
core/src/main/scala/spark/BitTorrentBroadcast.scala
core/src/main/scala/spark/ChainedBroadcast.scala
core/src/main/scala/spark/RDD.scala
core/src/main/scala/spark/SparkContext.scala
core/src/main/scala/spark/Utils.scala
core/src/main/scala/spark/shuffle/BasicLocalFileShuffle.scala
core/src/main/scala/spark/shuffle/DfsShuffle.scala
2011-06-26 18:22:12 -07:00
Ismael Juma
26000af4fa
Replace deprecated fromFunction with either tabulate or fill.
...
tabulate used if indexed used by function and fill otherwise.
2011-05-26 22:12:11 +01:00
Ismael Juma
0b6a862b68
Use math instead of Math as the latter is deprecated.
2011-05-26 22:06:36 +01:00
Mosharaf Chowdhury
9d78779257
Merge branch 'mos-shuffle-tracked' into mos-bt
...
Conflicts:
core/src/main/scala/spark/Broadcast.scala
2011-04-27 20:47:07 -07:00
Mosharaf Chowdhury
ac7e066383
Merge branch 'master' into mos-shuffle-tracked
...
Conflicts:
.gitignore
core/src/main/scala/spark/LocalFileShuffle.scala
src/scala/spark/BasicLocalFileShuffle.scala
src/scala/spark/Broadcast.scala
src/scala/spark/LocalFileShuffle.scala
2011-04-27 14:35:03 -07:00
Matei Zaharia
309367c477
Initial work towards new RDD design
2011-02-26 23:15:33 -08:00
Mosharaf Chowdhury
1a73c0d265
Merged with master. Using sbt.
2011-02-09 10:48:48 -08:00
Mosharaf Chowdhury
495b38658e
Merge branch 'master' into mos-bt
2011-02-09 10:40:23 -08:00
Matei Zaharia
a11fe23017
Moved examples to spark.examples package
2011-02-02 16:30:27 -08:00
Matei Zaharia
e5c4cd8a5e
Made examples and core subprojects
2011-02-01 15:11:08 -08:00