Commit graph

198 commits

Author SHA1 Message Date
Henry Saputra 43dfac5132 Merge branch 'master' into removesemicolonscala 2013-11-19 16:57:57 -08:00
Henry Saputra 9c934b640f Remove the semicolons at the end of Scala code to make it more pure Scala code.
Also remove unused imports as I found them along the way.
Remove return statements when returning value in the Scala code.

Passing compile and tests.
2013-11-19 10:19:03 -08:00
Aaron Davidson 50fd8d98c0 Enable the Broadcast examples to work in a cluster setting
Since they rely on println to display results, we need to first collect
those results to the driver to have them actually display locally.
2013-11-18 22:51:35 -08:00
tgravescs e5e0ebdb11 fix sparkhdfs lr test 2013-10-29 20:12:45 -05:00
Ali Ghodsi 05a0df2b9e Makes Spark SIMR ready. 2013-10-24 11:59:51 -07:00
Matei Zaharia dd659642e7 Merge pull request #64 from prabeesh/master
MQTT Adapter for Spark Streaming

MQTT is a machine-to-machine (M2M)/Internet of Things connectivity protocol.
It was designed as an extremely lightweight publish/subscribe messaging transport. You may read more about it here http://mqtt.org/

Message Queue Telemetry Transport (MQTT) is an open message protocol for M2M communications. It enables the transfer of telemetry-style data in the form of messages from devices like sensors and actuators, to mobile phones, embedded systems on vehicles, or laptops and full scale computers.

The protocol was invented by Andy Stanford-Clark of IBM, and Arlen Nipper of Cirrus Link Solutions

This protocol enables a publish/subscribe messaging model in an extremely lightweight way. It is useful for connections with remote locations where line of code and network bandwidth is a constraint.

MQTT is one of the widely used protocol for 'Internet of Things'. This protocol is getting much attraction as anything and everything is getting connected to internet and they all produce data. Researchers and companies predict some 25 billion devices will be connected to the internet by 2015.

Plugin/Support for MQTT is available in popular MQs like RabbitMQ, ActiveMQ etc.

Support for MQTT in Spark will help people with Internet of Things (IoT) projects to use Spark Streaming for their real time data processing needs (from sensors and other embedded devices etc).
2013-10-23 15:07:59 -07:00
Matei Zaharia 731c94e91d Merge pull request #56 from jerryshao/kafka-0.8-dev
Upgrade Kafka 0.7.2 to Kafka 0.8.0-beta1 for Spark Streaming

Conflicts:
	streaming/pom.xml
2013-10-21 23:31:38 -07:00
Prabeesh K 9ca1bd9530 Update MQTTWordCount.scala 2013-10-22 09:05:57 +05:30
Prabeesh K dbafa11396 Update MQTTWordCount.scala 2013-10-22 08:50:34 +05:30
Reynold Xin 4e44d65b5e Exclusion rules for Maven build files. 2013-10-19 12:35:55 -07:00
Prabeesh K 6ec39829e9 Update MQTTWordCount.scala 2013-10-18 17:00:28 +05:30
Mosharaf Chowdhury e96bd0068f BroadcastTest2 --> BroadcastTest 2013-10-16 21:33:33 -07:00
Mosharaf Chowdhury feb45d391f Default blockSize is 4MB.
BroadcastTest2 example added for testing broadcasts.
2013-10-16 21:33:33 -07:00
prabeesh ee4178f144 remove unused dependency 2013-10-17 09:57:48 +05:30
prabeesh 7d36a117c1 add maven dependencies for mqtt 2013-10-16 13:41:26 +05:30
prabeesh 9eaf68fd40 added mqtt adapter wordcount example 2013-10-16 13:40:38 +05:30
Patrick Wendell 35befe07bb Fixing spark streaming example and a bug in examples build.
- Examples assembly included a log4j.properties which clobbered Spark's
- Example had an error where some classes weren't serializable
- Did some other clean-up in this example
2013-10-15 22:55:43 -07:00
jerryshao c23cd72b4b Upgrade Kafka 0.7.2 to Kafka 0.8.0-beta1 for Spark Streaming 2013-10-12 20:00:42 +08:00
Neal Wiggins 67d4a31f87 Remove unnecessary mutable imports 2013-10-11 09:47:27 -07:00
Patrick Wendell aa9fb84994 Merging build changes in from 0.8 2013-10-05 22:07:00 -07:00
Patrick Wendell 6079721fa1 Update build version in master 2013-09-24 11:41:51 -07:00
Jey Kottalam 30a32c8335 Minor YARN build cleanups 2013-09-06 11:31:16 -07:00
Matei Zaharia 12b2f1f9c9 Add missing license headers found with RAT 2013-09-02 12:23:03 -07:00
Matei Zaharia 0a8cc30921 Move some classes to more appropriate packages:
* RDD, *RDDFunctions -> org.apache.spark.rdd
* Utils, ClosureCleaner, SizeEstimator -> org.apache.spark.util
* JavaSerializer, KryoSerializer -> org.apache.spark.serializer
2013-09-01 14:13:16 -07:00
Matei Zaharia 5701eb92c7 Fix some URLs 2013-09-01 14:13:16 -07:00
Matei Zaharia 46eecd110a Initial work to rename package to org.apache.spark 2013-09-01 14:13:13 -07:00
Matei Zaharia 666d93c294 Update Maven build to create assemblies expected by new scripts
This includes the following changes:
- The "assembly" package now builds in Maven by default, and creates an
  assembly containing both hadoop-client and Spark, unlike the old
  BigTop distribution assembly that skipped hadoop-client
- There is now a bigtop-dist package to build the old BigTop assembly
- The repl-bin package is no longer built by default since the scripts
  don't reply on it; instead it can be enabled with -Prepl-bin
- Py4J is now included in the assembly/lib folder as a local Maven repo,
  so that the Maven package can link to it
- run-example now adds the original Spark classpath as well because the
  Maven examples assembly lists spark-core and such as provided
- The various Maven projects add a spark-yarn dependency correctly
2013-08-29 21:19:06 -07:00
Matei Zaharia aab345c463 Fix finding of assembly JAR, as well as some pointers to ./run 2013-08-29 21:19:06 -07:00
Matei Zaharia 53cd50c069 Change build and run instructions to use assemblies
This commit makes Spark invocation saner by using an assembly JAR to
find all of Spark's dependencies instead of adding all the JARs in
lib_managed. It also packages the examples into an assembly and uses
that as SPARK_EXAMPLES_JAR. Finally, it replaces the old "run" script
with two better-named scripts: "run-examples" for examples, and
"spark-class" for Spark internal classes (e.g. REPL, master, etc). This
is also designed to minimize the confusion people have in trying to use
"run" to run their own classes; it's not meant to do that, but now at
least if they look at it, they can modify run-examples to do a decent
job for them.

As part of this, Bagel's examples are also now properly moved to the
examples package instead of bagel.
2013-08-29 21:19:04 -07:00
Jey Kottalam 23f4622aff Remove redundant dependencies from POMs 2013-08-18 18:53:57 -07:00
Jey Kottalam c1e547bb7f Updates to repl and example POMs to match SBT build 2013-08-16 13:50:12 -07:00
Jey Kottalam ad580b94d5 Maven build now also works with YARN 2013-08-16 13:50:12 -07:00
Jey Kottalam 9dd15fe700 Don't mark hadoop-client as 'provided' 2013-08-16 13:50:12 -07:00
Jey Kottalam 11b42a84db Maven build now works with CDH hadoop-2.0.0-mr1 2013-08-16 13:50:12 -07:00
Jey Kottalam 353fab2440 Initial changes to make Maven build agnostic of hadoop version 2013-08-16 13:50:12 -07:00
Jey Kottalam 4f43fd791a make SparkHadoopUtil a member of SparkEnv 2013-08-15 16:50:37 -07:00
Evan Sparks ff9ebfabb4 Merge pull request #762 from shivaram/sgd-cleanup
Refactor SGD options into a new class.
2013-08-11 10:52:55 -07:00
Alexander Pivovarov 2d97cc46af Fixed path to JavaALS.java and JavaKMeans.java, fixed hadoop2-yarn profile 2013-08-10 23:04:50 -07:00
Matei Zaharia 4c4f769187 Optimize Scala PageRank to use reduceByKey 2013-08-10 18:09:54 -07:00
Matei Zaharia 06e4f2a8f2 Merge pull request #789 from MLnick/master
Adding Scala version of PageRank example
2013-08-10 18:06:23 -07:00
Matei Zaharia cd247ba5bb Merge pull request #786 from shivaram/mllib-java
Java fixes, tests and examples for ALS, KMeans
2013-08-09 20:41:13 -07:00
Matei Zaharia 06303a62e5 Optimize JavaPageRank to use reduceByKey instead of groupByKey 2013-08-08 18:50:00 -07:00
Shivaram Venkataraman 2812e72200 Add setters for optimizer, gradient in SGD.
Also remove java-specific constructor for LabeledPoint.
2013-08-08 16:24:31 -07:00
Shivaram Venkataraman e1a209f791 Remove Java-specific constructor for Rating.
The scala constructor works for native type java types. Modify examples
to match this.
2013-08-08 14:36:02 -07:00
Nick Pentreath c4eea875ac Style changes as per Matei's comments 2013-08-08 12:40:37 +02:00
Nick Pentreath cce758b893 Adding Scala version of PageRank example 2013-08-07 16:38:52 +02:00
Shivaram Venkataraman 338b7a7455 Merge branch 'master' of git://github.com/mesos/spark into sgd-cleanup
Conflicts:
	mllib/src/main/scala/spark/mllib/util/MLUtils.scala
2013-08-06 21:21:55 -07:00
Shivaram Venkataraman 7db69d56f2 Refactor GLM algorithms and add Java tests
This change adds Java examples and unit tests for all GLM algorithms
to make sure the MLLib interface works from Java. Changes include
- Introduce LabeledPoint and avoid using Doubles in train arguments
- Rename train to run in class methods
- Make the optimizer a member variable of GLM to make sure the builder
  pattern works
2013-08-06 17:23:22 -07:00
Shivaram Venkataraman 471fbadd0c Java examples, tests for KMeans and ALS
- Changes ALS to accept RDD[Rating] instead of (Int, Int, Double) making it
  easier to call from Java
- Renames class methods from `train` to `run` to enable static methods to be
  called from Java.
- Add unit tests which check if both static / class methods can be called.
- Also add examples which port the main() function in ALS, KMeans to the
  examples project.

Couple of minor changes to existing code:
- Add a toJavaRDD method in RDD to convert scala RDD to java RDD easily
- Workaround a bug where using double[] from Java leads to class cast exception in
  KMeans init
2013-08-06 15:43:46 -07:00
stayhf 882baee489 Got rid of unnecessary map function 2013-08-06 21:34:39 +00:00