Commit graph

294 commits

Author SHA1 Message Date
Patrick Wendell af4a529f6e Exclude jopt from kafka dependency.
Kafka uses an older version of jopt that causes bad conflicts with the version
used by spark-perf. It's not easy to remove this downstream because of the way
that spark-perf uses Spark (by including a spark assembly as an unmanaged jar).
This fixes the problem at its source by just never including it.
2013-10-25 09:20:30 -07:00
Matei Zaharia dadfc63b03 Fix Maven build to use MQTT repository 2013-10-23 15:29:22 -07:00
Matei Zaharia dd659642e7 Merge pull request #64 from prabeesh/master
MQTT Adapter for Spark Streaming

MQTT is a machine-to-machine (M2M)/Internet of Things connectivity protocol.
It was designed as an extremely lightweight publish/subscribe messaging transport. You may read more about it here http://mqtt.org/

Message Queue Telemetry Transport (MQTT) is an open message protocol for M2M communications. It enables the transfer of telemetry-style data in the form of messages from devices like sensors and actuators, to mobile phones, embedded systems on vehicles, or laptops and full scale computers.

The protocol was invented by Andy Stanford-Clark of IBM, and Arlen Nipper of Cirrus Link Solutions

This protocol enables a publish/subscribe messaging model in an extremely lightweight way. It is useful for connections with remote locations where line of code and network bandwidth is a constraint.

MQTT is one of the widely used protocol for 'Internet of Things'. This protocol is getting much attraction as anything and everything is getting connected to internet and they all produce data. Researchers and companies predict some 25 billion devices will be connected to the internet by 2015.

Plugin/Support for MQTT is available in popular MQs like RabbitMQ, ActiveMQ etc.

Support for MQTT in Spark will help people with Internet of Things (IoT) projects to use Spark Streaming for their real time data processing needs (from sensors and other embedded devices etc).
2013-10-23 15:07:59 -07:00
Matei Zaharia 731c94e91d Merge pull request #56 from jerryshao/kafka-0.8-dev
Upgrade Kafka 0.7.2 to Kafka 0.8.0-beta1 for Spark Streaming

Conflicts:
	streaming/pom.xml
2013-10-21 23:31:38 -07:00
Matei Zaharia 8de9706b86 Merge pull request #66 from shivaram/sbt-assembly-deps
Add SBT target to assemble dependencies

This pull request is an attempt to address the long assembly build times during development. Instead of rebuilding the assembly jar for every Spark change, this pull request adds a new SBT target `spark` that packages all the Spark modules and builds an assembly of the dependencies.

So the work flow that should work now would be something like

```
./sbt/sbt spark # Doing this once should suffice
## Make changes
./sbt/sbt compile
./sbt/sbt test or ./spark-shell
```
2013-10-18 20:32:39 -07:00
prabeesh 29245605bf remove unused dependency 2013-10-17 09:57:30 +05:30
Shivaram Venkataraman 0a4b76fcc2 Rename SBT target to assemble-deps. 2013-10-16 17:05:46 -07:00
prabeesh 06de3d516d added mqtt adapter library dependencies 2013-10-16 13:38:37 +05:30
Patrick Wendell 35befe07bb Fixing spark streaming example and a bug in examples build.
- Examples assembly included a log4j.properties which clobbered Spark's
- Example had an error where some classes weren't serializable
- Did some other clean-up in this example
2013-10-15 22:55:43 -07:00
Shivaram Venkataraman 051cd960d9 Merge branch 'master' of https://github.com/apache/incubator-spark into sbt-assembly-deps 2013-10-15 13:26:40 -07:00
jerryshao c23cd72b4b Upgrade Kafka 0.7.2 to Kafka 0.8.0-beta1 for Spark Streaming 2013-10-12 20:00:42 +08:00
Shivaram Venkataraman c441904bce Add a comment and exclude tools 2013-10-11 18:23:15 -07:00
Matei Zaharia c71499b779 Merge pull request #19 from aarondav/master-zk
Standalone Scheduler fault tolerance using ZooKeeper

This patch implements full distributed fault tolerance for standalone scheduler Masters.
There is only one master Leader at a time, which is actively serving scheduling
requests. If this Leader crashes, another master will eventually be elected, reconstruct
the state from the first Master, and continue serving scheduling requests.

Leader election is performed using the ZooKeeper leader election pattern. We try to minimize
the use of ZooKeeper and the assumptions about ZooKeeper's behavior, so there is a layer of
retries and session monitoring on top of the ZooKeeper client.

Master failover follows directly from the single-node Master recovery via the file
system (patch d5a96fe), save that the Master state is stored in ZooKeeper instead.

Configuration:
By default, no recovery mechanism is enabled (spark.deploy.recoveryMode = NONE).
By setting spark.deploy.recoveryMode to ZOOKEEPER and setting spark.deploy.zookeeper.url
to an appropriate ZooKeeper URL, ZooKeeper recovery mode is enabled.
By setting spark.deploy.recoveryMode to FILESYSTEM and setting spark.deploy.recoveryDirectory
to an appropriate directory accessible by the Master, we will keep the behavior of from d5a96fe.

Additionally, places where a Master could be specificied by a spark:// url can now take
comma-delimited lists to specify backup masters. Note that this is only used for registration
of NEW Workers and application Clients. Once a Worker or Client has registered with the
Master Leader, it is "in the system" and will never need to register again.
2013-10-10 17:16:42 -07:00
Shivaram Venkataraman 484166d520 Add new SBT target for dependency assembly 2013-10-09 04:24:34 -07:00
Reynold Xin 213b70a2db Merge pull request #31 from sundeepn/branch-0.8
Resolving package conflicts with hadoop 0.23.9

Hadoop 0.23.9 is having a package conflict with easymock's dependencies.

(cherry picked from commit 023e3fdf00)
Signed-off-by: Reynold Xin <rxin@apache.org>
2013-10-07 10:54:22 -07:00
Du Li 9fd6bba60d ask ivy/sbt to check local maven repo under ~/.m2 2013-10-01 15:46:51 -07:00
Aaron Davidson f549ea33d3 Standalone Scheduler fault tolerance using ZooKeeper
This patch implements full distributed fault tolerance for standalone scheduler Masters.
There is only one master Leader at a time, which is actively serving scheduling
requests. If this Leader crashes, another master will eventually be elected, reconstruct
the state from the first Master, and continue serving scheduling requests.

Leader election is performed using the ZooKeeper leader election pattern. We try to minimize
the use of ZooKeeper and the assumptions about ZooKeeper's behavior, so there is a layer of
retries and session monitoring on top of the ZooKeeper client.

Master failover follows directly from the single-node Master recovery via the file
system (patch 194ba4b8), save that the Master state is stored in ZooKeeper instead.

Configuration:
By default, no recovery mechanism is enabled (spark.deploy.recoveryMode = NONE).
By setting spark.deploy.recoveryMode to ZOOKEEPER and setting spark.deploy.zookeeper.url
to an appropriate ZooKeeper URL, ZooKeeper recovery mode is enabled.
By setting spark.deploy.recoveryMode to FILESYSTEM and setting spark.deploy.recoveryDirectory
to an appropriate directory accessible by the Master, we will keep the behavior of from 194ba4b8.

Additionally, places where a Master could be specificied by a spark:// url can now take
comma-delimited lists to specify backup masters. Note that this is only used for registration
of NEW Workers and application Clients. Once a Worker or Client has registered with the
Master Leader, it is "in the system" and will never need to register again.

Forthcoming:
Documentation, tests (! - only ad hoc testing has been performed so far)
I do not intend for this commit to be merged until tests are added, but this patch should
still be mostly reviewable until then.
2013-09-26 15:04:23 -07:00
Reynold Xin 3f283278b0 Removed scala -optimize flag. 2013-09-26 13:58:10 -07:00
Reynold Xin c514cd1587 Merge pull request #930 from holdenk/master
Add mapPartitionsWithIndex
2013-09-26 13:48:20 -07:00
Patrick Wendell 6079721fa1 Update build version in master 2013-09-24 11:41:51 -07:00
Patrick Wendell c856860c5b Bumping Mesos version to 0.13.0 2013-09-15 12:46:26 -07:00
Holden Karau 68068977b8 Fix build on ubuntu 2013-09-14 20:51:11 -07:00
Patrick Wendell 91a59e6b10 Merge pull request #919 from mateiz/jets3t
Add explicit jets3t dependency, which is excluded in hadoop-client
2013-09-11 10:21:48 -07:00
Patrick Wendell 0c1985b153 Fix HDFS access bug with assembly build.
Due to this change in HDFS:
https://issues.apache.org/jira/browse/HADOOP-7549

there is a bug when using the new assembly builds. The symptom is that any HDFS access
results in an exception saying "No filesystem for scheme 'hdfs'". This adds a merge
strategy in the assembly build which fixes the problem.
2013-09-10 22:05:13 -07:00
Matei Zaharia f117dc6d0d Add explicit jets3t dependency, which is excluded in hadoop-client 2013-09-10 06:39:25 +00:00
Patrick Wendell f68848d95d Merge pull request #906 from pwendell/ganglia-sink
Clean-up of Metrics Code/Docs and Add Ganglia Sink
2013-09-08 18:32:16 -07:00
Matei Zaharia 0b957997ad Merge pull request #908 from pwendell/master
Fix target JVM version in scala build
2013-09-08 15:30:16 -07:00
Patrick Wendell 27bd74c8ad Fix target JVM version in scala build 2013-09-08 14:37:45 -07:00
Patrick Wendell 8de8ee5d3c Ganglia sink 2013-09-08 10:08:18 -07:00
Patrick Wendell a8e376ec0f Merge pull request #904 from pwendell/master
Adding Apache license to two files
2013-09-07 21:16:01 -07:00
Patrick Wendell 6d2198643c Adding Apache license to two files 2013-09-07 20:46:58 -07:00
Jey Kottalam 30a32c8335 Minor YARN build cleanups 2013-09-06 11:31:16 -07:00
Matei Zaharia 59218bdd49 Add Apache parent POM 2013-09-02 18:34:03 -07:00
Matei Zaharia 5701eb92c7 Fix some URLs 2013-09-01 14:13:16 -07:00
Matei Zaharia 46eecd110a Initial work to rename package to org.apache.spark 2013-09-01 14:13:13 -07:00
Matei Zaharia 666d93c294 Update Maven build to create assemblies expected by new scripts
This includes the following changes:
- The "assembly" package now builds in Maven by default, and creates an
  assembly containing both hadoop-client and Spark, unlike the old
  BigTop distribution assembly that skipped hadoop-client
- There is now a bigtop-dist package to build the old BigTop assembly
- The repl-bin package is no longer built by default since the scripts
  don't reply on it; instead it can be enabled with -Prepl-bin
- Py4J is now included in the assembly/lib folder as a local Maven repo,
  so that the Maven package can link to it
- run-example now adds the original Spark classpath as well because the
  Maven examples assembly lists spark-core and such as provided
- The various Maven projects add a spark-yarn dependency correctly
2013-08-29 21:19:06 -07:00
Matei Zaharia 8d81358a05 Provide more memory for tests 2013-08-29 21:19:06 -07:00
Matei Zaharia 53cd50c069 Change build and run instructions to use assemblies
This commit makes Spark invocation saner by using an assembly JAR to
find all of Spark's dependencies instead of adding all the JARs in
lib_managed. It also packages the examples into an assembly and uses
that as SPARK_EXAMPLES_JAR. Finally, it replaces the old "run" script
with two better-named scripts: "run-examples" for examples, and
"spark-class" for Spark internal classes (e.g. REPL, master, etc). This
is also designed to minimize the confusion people have in trying to use
"run" to run their own classes; it's not meant to do that, but now at
least if they look at it, they can modify run-examples to do a decent
job for them.

As part of this, Bagel's examples are also now properly moved to the
examples package instead of bagel.
2013-08-29 21:19:04 -07:00
Reynold Xin 9db1e50344 Revert "Merge pull request #841 from rxin/json"
This reverts commit 1fb1b09928, reversing
changes made to c69c48947d.
2013-08-26 11:05:14 -07:00
Jey Kottalam a9db1b7b6e Upgrade SBT IDE project generators 2013-08-23 10:27:18 -07:00
Jey Kottalam b7f9e6374a Fix SBT generation of IDE project files 2013-08-23 10:26:37 -07:00
Jey Kottalam 281b6c5f28 Re-add removed dependency on 'commons-daemon'
Fixes SBT build under Hadoop 0.23.9 and 2.0.4
2013-08-22 15:45:45 -07:00
Matei Zaharia ae8ba83ef2 Merge pull request #855 from jey/update-build-docs
Update build docs
2013-08-22 10:14:54 -07:00
Matei Zaharia 8a36fd09dd Merge pull request #854 from markhamstra/pomUpdate
Synced sbt and maven builds to use the same dependencies, etc.
2013-08-22 10:13:35 -07:00
Jey Kottalam f9cc1fbf27 Remove references to unsupported Hadoop versions 2013-08-21 17:14:36 -07:00
Mark Hamstra ff6f1b0500 Synced sbt and maven builds 2013-08-21 13:50:24 -07:00
Reynold Xin af602ba9d3 Downgraded default build hadoop version to 1.0.4. 2013-08-21 11:38:24 -07:00
Matei Zaharia aa2b89d98d Merge remote-tracking branch 'jey/hadoop-agnostic'
Conflicts:
	core/src/main/scala/spark/PairRDDFunctions.scala
2013-08-20 10:14:15 -07:00
Jey Kottalam 6f6944c807 Update SBT build to use simpler fix for Hadoop 0.23.9 2013-08-19 12:33:13 -07:00
Jey Kottalam 67b593607c Rename YARN build flag to SPARK_WITH_YARN 2013-08-16 14:00:05 -07:00