Commit graph

97 commits

Author SHA1 Message Date
Matei Zaharia dadfc63b03 Fix Maven build to use MQTT repository 2013-10-23 15:29:22 -07:00
Reynold Xin 4e44d65b5e Exclusion rules for Maven build files. 2013-10-19 12:35:55 -07:00
Henry Saputra 3fed3e2283 Update pom.xml to use version 13 of the ASF parent pom and add mailingLists element. 2013-10-14 23:10:54 -07:00
Matei Zaharia c71499b779 Merge pull request #19 from aarondav/master-zk
Standalone Scheduler fault tolerance using ZooKeeper

This patch implements full distributed fault tolerance for standalone scheduler Masters.
There is only one master Leader at a time, which is actively serving scheduling
requests. If this Leader crashes, another master will eventually be elected, reconstruct
the state from the first Master, and continue serving scheduling requests.

Leader election is performed using the ZooKeeper leader election pattern. We try to minimize
the use of ZooKeeper and the assumptions about ZooKeeper's behavior, so there is a layer of
retries and session monitoring on top of the ZooKeeper client.

Master failover follows directly from the single-node Master recovery via the file
system (patch d5a96fe), save that the Master state is stored in ZooKeeper instead.

Configuration:
By default, no recovery mechanism is enabled (spark.deploy.recoveryMode = NONE).
By setting spark.deploy.recoveryMode to ZOOKEEPER and setting spark.deploy.zookeeper.url
to an appropriate ZooKeeper URL, ZooKeeper recovery mode is enabled.
By setting spark.deploy.recoveryMode to FILESYSTEM and setting spark.deploy.recoveryDirectory
to an appropriate directory accessible by the Master, we will keep the behavior of from d5a96fe.

Additionally, places where a Master could be specificied by a spark:// url can now take
comma-delimited lists to specify backup masters. Note that this is only used for registration
of NEW Workers and application Clients. Once a Worker or Client has registered with the
Master Leader, it is "in the system" and will never need to register again.
2013-10-10 17:16:42 -07:00
Patrick Wendell aa9fb84994 Merging build changes in from 0.8 2013-10-05 22:07:00 -07:00
Aaron Davidson f549ea33d3 Standalone Scheduler fault tolerance using ZooKeeper
This patch implements full distributed fault tolerance for standalone scheduler Masters.
There is only one master Leader at a time, which is actively serving scheduling
requests. If this Leader crashes, another master will eventually be elected, reconstruct
the state from the first Master, and continue serving scheduling requests.

Leader election is performed using the ZooKeeper leader election pattern. We try to minimize
the use of ZooKeeper and the assumptions about ZooKeeper's behavior, so there is a layer of
retries and session monitoring on top of the ZooKeeper client.

Master failover follows directly from the single-node Master recovery via the file
system (patch 194ba4b8), save that the Master state is stored in ZooKeeper instead.

Configuration:
By default, no recovery mechanism is enabled (spark.deploy.recoveryMode = NONE).
By setting spark.deploy.recoveryMode to ZOOKEEPER and setting spark.deploy.zookeeper.url
to an appropriate ZooKeeper URL, ZooKeeper recovery mode is enabled.
By setting spark.deploy.recoveryMode to FILESYSTEM and setting spark.deploy.recoveryDirectory
to an appropriate directory accessible by the Master, we will keep the behavior of from 194ba4b8.

Additionally, places where a Master could be specificied by a spark:// url can now take
comma-delimited lists to specify backup masters. Note that this is only used for registration
of NEW Workers and application Clients. Once a Worker or Client has registered with the
Master Leader, it is "in the system" and will never need to register again.

Forthcoming:
Documentation, tests (! - only ad hoc testing has been performed so far)
I do not intend for this commit to be merged until tests are added, but this patch should
still be mostly reviewable until then.
2013-09-26 15:04:23 -07:00
Reynold Xin 3f283278b0 Removed scala -optimize flag. 2013-09-26 13:58:10 -07:00
Patrick Wendell 6079721fa1 Update build version in master 2013-09-24 11:41:51 -07:00
Patrick Wendell c856860c5b Bumping Mesos version to 0.13.0 2013-09-15 12:46:26 -07:00
Patrick Wendell e9eba8c3ce Use different Hadoop version for YARN artifacts.
This uses a seperate Hadoop version for YARN artifact. This means when people link against
spark-yarn, things will resolve correctly.
2013-09-13 15:34:57 -07:00
Patrick Wendell 45ec1cc90e Add git scm url for publishing 2013-09-12 13:47:31 -07:00
Matei Zaharia f117dc6d0d Add explicit jets3t dependency, which is excluded in hadoop-client 2013-09-10 06:39:25 +00:00
Matei Zaharia 0456384939 Merge pull request #911 from pwendell/ganglia-sink
Adding Manen dependency for Ganglia
2013-09-09 09:57:54 -07:00
Patrick Wendell 528fdbae97 Adding Manen dependency 2013-09-09 09:32:18 -07:00
Jey Kottalam 70661246fd Fix YARN assembly generation under Maven 2013-09-06 11:31:16 -07:00
Matei Zaharia 59218bdd49 Add Apache parent POM 2013-09-02 18:34:03 -07:00
Matei Zaharia 5701eb92c7 Fix some URLs 2013-09-01 14:13:16 -07:00
Matei Zaharia 46eecd110a Initial work to rename package to org.apache.spark 2013-09-01 14:13:13 -07:00
Matei Zaharia 666d93c294 Update Maven build to create assemblies expected by new scripts
This includes the following changes:
- The "assembly" package now builds in Maven by default, and creates an
  assembly containing both hadoop-client and Spark, unlike the old
  BigTop distribution assembly that skipped hadoop-client
- There is now a bigtop-dist package to build the old BigTop assembly
- The repl-bin package is no longer built by default since the scripts
  don't reply on it; instead it can be enabled with -Prepl-bin
- Py4J is now included in the assembly/lib folder as a local Maven repo,
  so that the Maven package can link to it
- run-example now adds the original Spark classpath as well because the
  Maven examples assembly lists spark-core and such as provided
- The various Maven projects add a spark-yarn dependency correctly
2013-08-29 21:19:06 -07:00
Matei Zaharia 8d81358a05 Provide more memory for tests 2013-08-29 21:19:06 -07:00
Reynold Xin 9db1e50344 Revert "Merge pull request #841 from rxin/json"
This reverts commit 1fb1b09928, reversing
changes made to c69c48947d.
2013-08-26 11:05:14 -07:00
Matei Zaharia ae8ba83ef2 Merge pull request #855 from jey/update-build-docs
Update build docs
2013-08-22 10:14:54 -07:00
Mark Hamstra ff6f1b0500 Synced sbt and maven builds 2013-08-21 13:50:24 -07:00
Jey Kottalam 31644a011c Use "hadoop.version" property when specifying Hadoop YARN version too 2013-08-21 13:24:28 -07:00
Reynold Xin af602ba9d3 Downgraded default build hadoop version to 1.0.4. 2013-08-21 11:38:24 -07:00
Matei Zaharia aa2b89d98d Merge remote-tracking branch 'jey/hadoop-agnostic'
Conflicts:
	core/src/main/scala/spark/PairRDDFunctions.scala
2013-08-20 10:14:15 -07:00
Jey Kottalam bdd861c6c3 Fix Maven build with Hadoop 0.23.9 2013-08-18 18:28:57 -07:00
Jey Kottalam ad580b94d5 Maven build now also works with YARN 2013-08-16 13:50:12 -07:00
Jey Kottalam 11b42a84db Maven build now works with CDH hadoop-2.0.0-mr1 2013-08-16 13:50:12 -07:00
Jey Kottalam 353fab2440 Initial changes to make Maven build agnostic of hadoop version 2013-08-16 13:50:12 -07:00
Reynold Xin c961c19b7b Use the JSON formatter from Scala library and removed dependency on lift-json.
It made the JSON creation slightly more complicated, but reduces one external dependency. The scala library also properly escape "/" (which lift-json doesn't).
2013-08-15 18:23:01 -07:00
Jey Kottalam a0f0848463 Update default version of Hadoop to 1.2.1 2013-08-15 16:50:37 -07:00
Matei Zaharia d9588183fa Update to Mesos 0.12.1 2013-08-13 18:51:35 -07:00
Patrick Wendell a0133bfbad Merge pull request #784 from jerryshao/dev-metrics-servlet
Add MetricsServlet for Spark metrics system
2013-08-13 09:28:18 -07:00
jerryshao 320e87e7ab Add MetricsServlet for Spark metrics system 2013-08-12 13:23:23 +08:00
Alexander Pivovarov ca28f2e639 Changed yarn.version to 2.0.5 in pom.xml 2013-08-10 22:50:04 -07:00
Matei Zaharia 5a4003c1ac Update to Chill 0.3.1 2013-08-08 13:30:27 -07:00
Matei Zaharia b2b86c2575 Merge pull request #753 from shivaram/glm-refactor
Build changes for ML lib
2013-07-31 15:51:39 -07:00
Reynold Xin 311aae76a2 Added Snappy dependency to Maven build files. 2013-07-30 17:25:42 -07:00
Shivaram Venkataraman 48851d4dd9 Add bagel, mllib to SBT assembly.
Also add jblas dependency to mllib pom.xml
2013-07-30 14:03:15 -07:00
Matei Zaharia 8eb8b52997 Fix Chill version in Maven 2013-07-25 08:58:02 -07:00
ryanlecompte 30a369a808 update pom.xml 2013-07-24 20:55:48 -07:00
Matei Zaharia c258718606 Fix Maven build errors after previous commits 2013-07-24 16:12:32 -07:00
Matei Zaharia 5584ebcbd3 Merge pull request #675 from c0s/assembly
Building spark assembly for further consumption of the Spark project with a deployed cluster
2013-07-24 11:46:46 -07:00
jerryshao a79f6077f0 Add Maven metrics library dependency and code changes 2013-07-24 14:57:47 +08:00
Josh Rosen c83680434b Add JavaAPICompletenessChecker.
This is used to find methods in the Scala API that
need to be ported to the Java API.  To use it:

  ./run spark.tools.JavaAPICompletenessChecker
Conflicts:
	project/SparkBuild.scala
	run
	run2.cmd
2013-07-22 16:11:49 -07:00
Konstantin Boudnik f4d514810e Building spark assembly for further consumption of the Spark project with a deployed cluster 2013-07-21 11:47:29 -07:00
Matei Zaharia af3c9d5042 Add Apache license headers and LICENSE and NOTICE files 2013-07-16 17:21:33 -07:00
Matei Zaharia 00a14deb6d Update to latest Scala Maven plugin and allow Zinc external compiler 2013-07-16 11:52:20 -07:00
Mark Hamstra 0b39d66f3f pom cleanup 2013-07-08 16:07:09 -07:00