Commit graph

372 commits

Author SHA1 Message Date
tgravescs 17bb9a27b2 Add mockito to the sbt build 2013-11-11 10:01:23 -06:00
Josh Rosen a37ff0f1db Add spark-tools assembly to spark-class classpath.
This allows the JavaAPICompletenessChecker to be
run with Spark 0.8+.
2013-11-09 13:42:45 -08:00
Russell Cardullo ef85a51f85 Add graphite sink for metrics
This adds a metrics sink for graphite.  The sink must
be configured with the host and port of a graphite node
and optionally may be configured with a prefix that will
be prepended to all metrics that are sent to graphite.
2013-11-08 16:36:03 -08:00
Patrick Wendell af4a529f6e Exclude jopt from kafka dependency.
Kafka uses an older version of jopt that causes bad conflicts with the version
used by spark-perf. It's not easy to remove this downstream because of the way
that spark-perf uses Spark (by including a spark assembly as an unmanaged jar).
This fixes the problem at its source by just never including it.
2013-10-25 09:20:30 -07:00
Prashant Sharma c77ca1fed9 Updating to latest akka 2.2.3, which fixes our only failing Driver Suite 2013-10-24 16:11:40 +05:30
Matei Zaharia dadfc63b03 Fix Maven build to use MQTT repository 2013-10-23 15:29:22 -07:00
Matei Zaharia dd659642e7 Merge pull request #64 from prabeesh/master
MQTT Adapter for Spark Streaming

MQTT is a machine-to-machine (M2M)/Internet of Things connectivity protocol.
It was designed as an extremely lightweight publish/subscribe messaging transport. You may read more about it here http://mqtt.org/

Message Queue Telemetry Transport (MQTT) is an open message protocol for M2M communications. It enables the transfer of telemetry-style data in the form of messages from devices like sensors and actuators, to mobile phones, embedded systems on vehicles, or laptops and full scale computers.

The protocol was invented by Andy Stanford-Clark of IBM, and Arlen Nipper of Cirrus Link Solutions

This protocol enables a publish/subscribe messaging model in an extremely lightweight way. It is useful for connections with remote locations where line of code and network bandwidth is a constraint.

MQTT is one of the widely used protocol for 'Internet of Things'. This protocol is getting much attraction as anything and everything is getting connected to internet and they all produce data. Researchers and companies predict some 25 billion devices will be connected to the internet by 2015.

Plugin/Support for MQTT is available in popular MQs like RabbitMQ, ActiveMQ etc.

Support for MQTT in Spark will help people with Internet of Things (IoT) projects to use Spark Streaming for their real time data processing needs (from sensors and other embedded devices etc).
2013-10-23 15:07:59 -07:00
Matei Zaharia 731c94e91d Merge pull request #56 from jerryshao/kafka-0.8-dev
Upgrade Kafka 0.7.2 to Kafka 0.8.0-beta1 for Spark Streaming

Conflicts:
	streaming/pom.xml
2013-10-21 23:31:38 -07:00
Matei Zaharia 8de9706b86 Merge pull request #66 from shivaram/sbt-assembly-deps
Add SBT target to assemble dependencies

This pull request is an attempt to address the long assembly build times during development. Instead of rebuilding the assembly jar for every Spark change, this pull request adds a new SBT target `spark` that packages all the Spark modules and builds an assembly of the dependencies.

So the work flow that should work now would be something like

```
./sbt/sbt spark # Doing this once should suffice
## Make changes
./sbt/sbt compile
./sbt/sbt test or ./spark-shell
```
2013-10-18 20:32:39 -07:00
prabeesh 29245605bf remove unused dependency 2013-10-17 09:57:30 +05:30
Shivaram Venkataraman 0a4b76fcc2 Rename SBT target to assemble-deps. 2013-10-16 17:05:46 -07:00
prabeesh 06de3d516d added mqtt adapter library dependencies 2013-10-16 13:38:37 +05:30
Patrick Wendell 35befe07bb Fixing spark streaming example and a bug in examples build.
- Examples assembly included a log4j.properties which clobbered Spark's
- Example had an error where some classes weren't serializable
- Did some other clean-up in this example
2013-10-15 22:55:43 -07:00
Shivaram Venkataraman 051cd960d9 Merge branch 'master' of https://github.com/apache/incubator-spark into sbt-assembly-deps 2013-10-15 13:26:40 -07:00
jerryshao c23cd72b4b Upgrade Kafka 0.7.2 to Kafka 0.8.0-beta1 for Spark Streaming 2013-10-12 20:00:42 +08:00
Shivaram Venkataraman c441904bce Add a comment and exclude tools 2013-10-11 18:23:15 -07:00
Matei Zaharia c71499b779 Merge pull request #19 from aarondav/master-zk
Standalone Scheduler fault tolerance using ZooKeeper

This patch implements full distributed fault tolerance for standalone scheduler Masters.
There is only one master Leader at a time, which is actively serving scheduling
requests. If this Leader crashes, another master will eventually be elected, reconstruct
the state from the first Master, and continue serving scheduling requests.

Leader election is performed using the ZooKeeper leader election pattern. We try to minimize
the use of ZooKeeper and the assumptions about ZooKeeper's behavior, so there is a layer of
retries and session monitoring on top of the ZooKeeper client.

Master failover follows directly from the single-node Master recovery via the file
system (patch d5a96fe), save that the Master state is stored in ZooKeeper instead.

Configuration:
By default, no recovery mechanism is enabled (spark.deploy.recoveryMode = NONE).
By setting spark.deploy.recoveryMode to ZOOKEEPER and setting spark.deploy.zookeeper.url
to an appropriate ZooKeeper URL, ZooKeeper recovery mode is enabled.
By setting spark.deploy.recoveryMode to FILESYSTEM and setting spark.deploy.recoveryDirectory
to an appropriate directory accessible by the Master, we will keep the behavior of from d5a96fe.

Additionally, places where a Master could be specificied by a spark:// url can now take
comma-delimited lists to specify backup masters. Note that this is only used for registration
of NEW Workers and application Clients. Once a Worker or Client has registered with the
Master Leader, it is "in the system" and will never need to register again.
2013-10-10 17:16:42 -07:00
Prashant Sharma 26860639c5 Merge branch 'scala-2.10' of github.com:ScrapCodes/spark into scala-2.10
Conflicts:
	core/src/main/scala/org/apache/spark/scheduler/cluster/ClusterTaskSetManager.scala
	project/SparkBuild.scala
2013-10-10 09:42:23 +05:30
Shivaram Venkataraman 484166d520 Add new SBT target for dependency assembly 2013-10-09 04:24:34 -07:00
Prashant Sharma 7be75682b9 Merge branch 'master' into wip-merge-master
Conflicts:
	bagel/pom.xml
	core/pom.xml
	core/src/test/scala/org/apache/spark/ui/UISuite.scala
	examples/pom.xml
	mllib/pom.xml
	pom.xml
	project/SparkBuild.scala
	repl/pom.xml
	streaming/pom.xml
	tools/pom.xml

In scala 2.10, a shorter representation is used for naming artifacts
 so changed to shorter scala version for artifacts and made it a property in pom.
2013-10-08 11:29:40 +05:30
Reynold Xin 213b70a2db Merge pull request #31 from sundeepn/branch-0.8
Resolving package conflicts with hadoop 0.23.9

Hadoop 0.23.9 is having a package conflict with easymock's dependencies.

(cherry picked from commit 023e3fdf00)
Signed-off-by: Reynold Xin <rxin@apache.org>
2013-10-07 10:54:22 -07:00
Martin Weindel 9b0c9c893d scala 2.10 requires Java 1.6,
using Scala 2.10.3,
resolved maven-scala-plugin warning
2013-10-05 21:41:09 +02:00
Prashant Sharma c810ee0690 Merge branch 'master' into scala-2.10
Conflicts:
	core/src/test/scala/org/apache/spark/DistributedSuite.scala
	project/SparkBuild.scala
2013-10-05 15:52:57 +05:30
Du Li 9fd6bba60d ask ivy/sbt to check local maven repo under ~/.m2 2013-10-01 15:46:51 -07:00
Prashant Sharma 5829692885 Merge branch 'master' into scala-2.10
Conflicts:
	core/src/main/scala/org/apache/spark/ui/jobs/JobProgressUI.scala
	docs/_config.yml
	project/SparkBuild.scala
	repl/src/main/scala/org/apache/spark/repl/SparkILoop.scala
2013-10-01 11:57:24 +05:30
Aaron Davidson f549ea33d3 Standalone Scheduler fault tolerance using ZooKeeper
This patch implements full distributed fault tolerance for standalone scheduler Masters.
There is only one master Leader at a time, which is actively serving scheduling
requests. If this Leader crashes, another master will eventually be elected, reconstruct
the state from the first Master, and continue serving scheduling requests.

Leader election is performed using the ZooKeeper leader election pattern. We try to minimize
the use of ZooKeeper and the assumptions about ZooKeeper's behavior, so there is a layer of
retries and session monitoring on top of the ZooKeeper client.

Master failover follows directly from the single-node Master recovery via the file
system (patch 194ba4b8), save that the Master state is stored in ZooKeeper instead.

Configuration:
By default, no recovery mechanism is enabled (spark.deploy.recoveryMode = NONE).
By setting spark.deploy.recoveryMode to ZOOKEEPER and setting spark.deploy.zookeeper.url
to an appropriate ZooKeeper URL, ZooKeeper recovery mode is enabled.
By setting spark.deploy.recoveryMode to FILESYSTEM and setting spark.deploy.recoveryDirectory
to an appropriate directory accessible by the Master, we will keep the behavior of from 194ba4b8.

Additionally, places where a Master could be specificied by a spark:// url can now take
comma-delimited lists to specify backup masters. Note that this is only used for registration
of NEW Workers and application Clients. Once a Worker or Client has registered with the
Master Leader, it is "in the system" and will never need to register again.

Forthcoming:
Documentation, tests (! - only ad hoc testing has been performed so far)
I do not intend for this commit to be merged until tests are added, but this patch should
still be mostly reviewable until then.
2013-09-26 15:04:23 -07:00
Reynold Xin 3f283278b0 Removed scala -optimize flag. 2013-09-26 13:58:10 -07:00
Reynold Xin c514cd1587 Merge pull request #930 from holdenk/master
Add mapPartitionsWithIndex
2013-09-26 13:48:20 -07:00
Prashant Sharma 604dc40996 Sync with master and some build fixes 2013-09-26 11:40:02 +05:30
Prashant Sharma 7ff4c2d399 fixed maven build for scala 2.10 2013-09-26 10:48:24 +05:30
Patrick Wendell 6079721fa1 Update build version in master 2013-09-24 11:41:51 -07:00
Prashant Sharma 276c37a51c Akka 2.2 migration 2013-09-22 08:20:12 +05:30
Patrick Wendell c856860c5b Bumping Mesos version to 0.13.0 2013-09-15 12:46:26 -07:00
Prashant Sharma 383e151fd7 Merge branch 'master' of git://github.com/mesos/spark into scala-2.10
Conflicts:
	core/src/main/scala/org/apache/spark/SparkContext.scala
	project/SparkBuild.scala
2013-09-15 10:55:12 +05:30
Prashant Sharma 20c65bc334 Fixed repl suite 2013-09-15 10:43:06 +05:30
Holden Karau 68068977b8 Fix build on ubuntu 2013-09-14 20:51:11 -07:00
Patrick Wendell 91a59e6b10 Merge pull request #919 from mateiz/jets3t
Add explicit jets3t dependency, which is excluded in hadoop-client
2013-09-11 10:21:48 -07:00
Patrick Wendell 0c1985b153 Fix HDFS access bug with assembly build.
Due to this change in HDFS:
https://issues.apache.org/jira/browse/HADOOP-7549

there is a bug when using the new assembly builds. The symptom is that any HDFS access
results in an exception saying "No filesystem for scheme 'hdfs'". This adds a merge
strategy in the assembly build which fixes the problem.
2013-09-10 22:05:13 -07:00
Matei Zaharia f117dc6d0d Add explicit jets3t dependency, which is excluded in hadoop-client 2013-09-10 06:39:25 +00:00
Patrick Wendell f68848d95d Merge pull request #906 from pwendell/ganglia-sink
Clean-up of Metrics Code/Docs and Add Ganglia Sink
2013-09-08 18:32:16 -07:00
Matei Zaharia 0b957997ad Merge pull request #908 from pwendell/master
Fix target JVM version in scala build
2013-09-08 15:30:16 -07:00
Patrick Wendell 27bd74c8ad Fix target JVM version in scala build 2013-09-08 14:37:45 -07:00
Patrick Wendell 8de8ee5d3c Ganglia sink 2013-09-08 10:08:18 -07:00
Patrick Wendell a8e376ec0f Merge pull request #904 from pwendell/master
Adding Apache license to two files
2013-09-07 21:16:01 -07:00
Patrick Wendell 6d2198643c Adding Apache license to two files 2013-09-07 20:46:58 -07:00
Jey Kottalam 30a32c8335 Minor YARN build cleanups 2013-09-06 11:31:16 -07:00
Prashant Sharma 4106ae9fbf Merged with master 2013-09-06 17:53:01 +05:30
Matei Zaharia 59218bdd49 Add Apache parent POM 2013-09-02 18:34:03 -07:00
Matei Zaharia 5701eb92c7 Fix some URLs 2013-09-01 14:13:16 -07:00
Matei Zaharia 46eecd110a Initial work to rename package to org.apache.spark 2013-09-01 14:13:13 -07:00
Matei Zaharia 666d93c294 Update Maven build to create assemblies expected by new scripts
This includes the following changes:
- The "assembly" package now builds in Maven by default, and creates an
  assembly containing both hadoop-client and Spark, unlike the old
  BigTop distribution assembly that skipped hadoop-client
- There is now a bigtop-dist package to build the old BigTop assembly
- The repl-bin package is no longer built by default since the scripts
  don't reply on it; instead it can be enabled with -Prepl-bin
- Py4J is now included in the assembly/lib folder as a local Maven repo,
  so that the Maven package can link to it
- run-example now adds the original Spark classpath as well because the
  Maven examples assembly lists spark-core and such as provided
- The various Maven projects add a spark-yarn dependency correctly
2013-08-29 21:19:06 -07:00
Matei Zaharia 8d81358a05 Provide more memory for tests 2013-08-29 21:19:06 -07:00
Matei Zaharia 53cd50c069 Change build and run instructions to use assemblies
This commit makes Spark invocation saner by using an assembly JAR to
find all of Spark's dependencies instead of adding all the JARs in
lib_managed. It also packages the examples into an assembly and uses
that as SPARK_EXAMPLES_JAR. Finally, it replaces the old "run" script
with two better-named scripts: "run-examples" for examples, and
"spark-class" for Spark internal classes (e.g. REPL, master, etc). This
is also designed to minimize the confusion people have in trying to use
"run" to run their own classes; it's not meant to do that, but now at
least if they look at it, they can modify run-examples to do a decent
job for them.

As part of this, Bagel's examples are also now properly moved to the
examples package instead of bagel.
2013-08-29 21:19:04 -07:00
Reynold Xin 9db1e50344 Revert "Merge pull request #841 from rxin/json"
This reverts commit 1fb1b09928, reversing
changes made to c69c48947d.
2013-08-26 11:05:14 -07:00
Jey Kottalam a9db1b7b6e Upgrade SBT IDE project generators 2013-08-23 10:27:18 -07:00
Jey Kottalam b7f9e6374a Fix SBT generation of IDE project files 2013-08-23 10:26:37 -07:00
Jey Kottalam 281b6c5f28 Re-add removed dependency on 'commons-daemon'
Fixes SBT build under Hadoop 0.23.9 and 2.0.4
2013-08-22 15:45:45 -07:00
Matei Zaharia ae8ba83ef2 Merge pull request #855 from jey/update-build-docs
Update build docs
2013-08-22 10:14:54 -07:00
Matei Zaharia 8a36fd09dd Merge pull request #854 from markhamstra/pomUpdate
Synced sbt and maven builds to use the same dependencies, etc.
2013-08-22 10:13:35 -07:00
Jey Kottalam f9cc1fbf27 Remove references to unsupported Hadoop versions 2013-08-21 17:14:36 -07:00
Mark Hamstra ff6f1b0500 Synced sbt and maven builds 2013-08-21 13:50:24 -07:00
Reynold Xin af602ba9d3 Downgraded default build hadoop version to 1.0.4. 2013-08-21 11:38:24 -07:00
Matei Zaharia aa2b89d98d Merge remote-tracking branch 'jey/hadoop-agnostic'
Conflicts:
	core/src/main/scala/spark/PairRDDFunctions.scala
2013-08-20 10:14:15 -07:00
Jey Kottalam 6f6944c807 Update SBT build to use simpler fix for Hadoop 0.23.9 2013-08-19 12:33:13 -07:00
Jey Kottalam 67b593607c Rename YARN build flag to SPARK_WITH_YARN 2013-08-16 14:00:05 -07:00
Jey Kottalam b1d99744a8 Fix SBT build under Hadoop 0.23.x 2013-08-16 13:50:12 -07:00
Jey Kottalam 8add2d7a59 Fix repl/assembly when YARN enabled 2013-08-16 13:50:12 -07:00
Jey Kottalam 3f98eff63a Allow make-distribution.sh to specify Hadoop version used 2013-08-16 13:50:09 -07:00
Reynold Xin c961c19b7b Use the JSON formatter from Scala library and removed dependency on lift-json.
It made the JSON creation slightly more complicated, but reduces one external dependency. The scala library also properly escape "/" (which lift-json doesn't).
2013-08-15 18:23:01 -07:00
Jey Kottalam a0f0848463 Update default version of Hadoop to 1.2.1 2013-08-15 16:50:37 -07:00
Jey Kottalam cb4ef19214 yarn support 2013-08-15 16:50:37 -07:00
Jey Kottalam 273b499b9a yarn sbt 2013-08-15 16:50:37 -07:00
Jey Kottalam 69c3bbf688 dynamically detect hadoop version 2013-08-15 16:50:37 -07:00
Matei Zaharia d9588183fa Update to Mesos 0.12.1 2013-08-13 18:51:35 -07:00
jerryshao 320e87e7ab Add MetricsServlet for Spark metrics system 2013-08-12 13:23:23 +08:00
Matei Zaharia dce5e47435 Merge pull request #800 from dlyubimov/HBASE_VERSION
Pull HBASE_VERSION in the head of sbt build
2013-08-09 21:53:45 -07:00
Matei Zaharia cd247ba5bb Merge pull request #786 from shivaram/mllib-java
Java fixes, tests and examples for ALS, KMeans
2013-08-09 20:41:13 -07:00
Dmitriy Lyubimov 27f674f82b fewer words 2013-08-09 13:54:41 -07:00
Dmitriy Lyubimov ae95b57469 Pull HBASE_VERSION in the head of sbt build 2013-08-09 12:45:18 -07:00
Matei Zaharia 5a4003c1ac Update to Chill 0.3.1 2013-08-08 13:30:27 -07:00
Shivaram Venkataraman 471fbadd0c Java examples, tests for KMeans and ALS
- Changes ALS to accept RDD[Rating] instead of (Int, Int, Double) making it
  easier to call from Java
- Renames class methods from `train` to `run` to enable static methods to be
  called from Java.
- Add unit tests which check if both static / class methods can be called.
- Also add examples which port the main() function in ALS, KMeans to the
  examples project.

Couple of minor changes to existing code:
- Add a toJavaRDD method in RDD to convert scala RDD to java RDD easily
- Workaround a bug where using double[] from Java leads to class cast exception in
  KMeans init
2013-08-06 15:43:46 -07:00
Matei Zaharia e466a55a6b Revert Mesos version to 0.9 since the 0.12 artifact has target Java 7 2013-08-01 15:45:21 -07:00
Matei Zaharia b2b86c2575 Merge pull request #753 from shivaram/glm-refactor
Build changes for ML lib
2013-07-31 15:51:39 -07:00
Matei Zaharia 14bf2fe039 Merge pull request #749 from benh/spark-executor-uri
Added property 'spark.executor.uri' for launching on Mesos.
2013-07-31 14:18:16 -07:00
Shivaram Venkataraman 15fd0d619d Add mllib, bagel to repl dependencies
Also don't build an assembly jar for them
2013-07-30 18:31:11 -07:00
Reynold Xin 3b1ced83fb Exclude older version of Snappy in streaming and examples. 2013-07-30 17:25:36 -07:00
Reynold Xin 368c58eac5 Merge branch 'lazy_file_open' of github.com:lyogavin/spark into compression
Conflicts:
	project/SparkBuild.scala
2013-07-30 16:04:18 -07:00
Shivaram Venkataraman 48851d4dd9 Add bagel, mllib to SBT assembly.
Also add jblas dependency to mllib pom.xml
2013-07-30 14:03:15 -07:00
Benjamin Hindman f6f46455eb Added property 'spark.executor.uri' for launching on Mesos without
requiring Spark to be installed. Using 'make_distribution.sh' a user
can put a Spark distribution at a URI supported by Mesos (e.g.,
'hdfs://...') and then set that when launching their job. Also added
SPARK_EXECUTOR_URI for the REPL.
2013-07-29 23:32:52 -07:00
ryanlecompte 8e0939f5a9 refactor Kryo serializer support to use chill/chill-java 2013-07-24 20:43:57 -07:00
jerryshao 5730193e0c Fix some typos 2013-07-24 14:57:47 +08:00
jerryshao 576528f0f9 Add dependency of Codahale's metrics library 2013-07-24 14:57:46 +08:00
Josh Rosen c83680434b Add JavaAPICompletenessChecker.
This is used to find methods in the Scala API that
need to be ported to the Java API.  To use it:

  ./run spark.tools.JavaAPICompletenessChecker
Conflicts:
	project/SparkBuild.scala
	run
	run2.cmd
2013-07-22 16:11:49 -07:00
Liang-Chi Hsieh d1738d72ba also exclude asm for hadoop2. hadoop1 looks like no need to do that too. 2013-07-20 00:37:24 +08:00
Liang-Chi Hsieh 3aad452653 fix a bug in build process that pulls in two versionf of ASM. 2013-07-19 02:29:46 +08:00
Matei Zaharia cad48edb70 Merge pull request #708 from ScrapCodes/dependencies-upgrade
Dependency upgrade Akka 2.0.3 -> 2.0.5
2013-07-16 21:41:28 -07:00
Matei Zaharia af3c9d5042 Add Apache license headers and LICENSE and NOTICE files 2013-07-16 17:21:33 -07:00
Prashant Sharma 2748e73eb9 Dependency upgrade Akka 2.0.3 -> 2.0.5 2013-07-16 16:08:46 +05:30
Prashant Sharma 9d7781c4e1 Adding commons io as dependency 2013-07-15 12:03:48 +05:30
Prashant Sharma a3494d405d Merge branch 'master' of github.com:mesos/spark into scala-2.10
Conflicts:
	core/src/main/scala/spark/Utils.scala
	core/src/test/scala/spark/ui/UISuite.scala
	project/SparkBuild.scala
	run
2013-07-15 11:15:55 +05:30