ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Matei Zaharia	c71499b779	Merge pull request #19 from aarondav/master-zk Standalone Scheduler fault tolerance using ZooKeeper This patch implements full distributed fault tolerance for standalone scheduler Masters. There is only one master Leader at a time, which is actively serving scheduling requests. If this Leader crashes, another master will eventually be elected, reconstruct the state from the first Master, and continue serving scheduling requests. Leader election is performed using the ZooKeeper leader election pattern. We try to minimize the use of ZooKeeper and the assumptions about ZooKeeper's behavior, so there is a layer of retries and session monitoring on top of the ZooKeeper client. Master failover follows directly from the single-node Master recovery via the file system (patch `d5a96fe`), save that the Master state is stored in ZooKeeper instead. Configuration: By default, no recovery mechanism is enabled (spark.deploy.recoveryMode = NONE). By setting spark.deploy.recoveryMode to ZOOKEEPER and setting spark.deploy.zookeeper.url to an appropriate ZooKeeper URL, ZooKeeper recovery mode is enabled. By setting spark.deploy.recoveryMode to FILESYSTEM and setting spark.deploy.recoveryDirectory to an appropriate directory accessible by the Master, we will keep the behavior of from `d5a96fe`. Additionally, places where a Master could be specificied by a spark:// url can now take comma-delimited lists to specify backup masters. Note that this is only used for registration of NEW Workers and application Clients. Once a Worker or Client has registered with the Master Leader, it is "in the system" and will never need to register again.	2013-10-10 17:16:42 -07:00
Reynold Xin	213b70a2db	Merge pull request #31 from sundeepn/branch-0.8 Resolving package conflicts with hadoop 0.23.9 Hadoop 0.23.9 is having a package conflict with easymock's dependencies. (cherry picked from commit `023e3fdf00`) Signed-off-by: Reynold Xin <rxin@apache.org>	2013-10-07 10:54:22 -07:00
Du Li	9fd6bba60d	ask ivy/sbt to check local maven repo under ~/.m2	2013-10-01 15:46:51 -07:00
Aaron Davidson	f549ea33d3	Standalone Scheduler fault tolerance using ZooKeeper This patch implements full distributed fault tolerance for standalone scheduler Masters. There is only one master Leader at a time, which is actively serving scheduling requests. If this Leader crashes, another master will eventually be elected, reconstruct the state from the first Master, and continue serving scheduling requests. Leader election is performed using the ZooKeeper leader election pattern. We try to minimize the use of ZooKeeper and the assumptions about ZooKeeper's behavior, so there is a layer of retries and session monitoring on top of the ZooKeeper client. Master failover follows directly from the single-node Master recovery via the file system (patch 194ba4b8), save that the Master state is stored in ZooKeeper instead. Configuration: By default, no recovery mechanism is enabled (spark.deploy.recoveryMode = NONE). By setting spark.deploy.recoveryMode to ZOOKEEPER and setting spark.deploy.zookeeper.url to an appropriate ZooKeeper URL, ZooKeeper recovery mode is enabled. By setting spark.deploy.recoveryMode to FILESYSTEM and setting spark.deploy.recoveryDirectory to an appropriate directory accessible by the Master, we will keep the behavior of from 194ba4b8. Additionally, places where a Master could be specificied by a spark:// url can now take comma-delimited lists to specify backup masters. Note that this is only used for registration of NEW Workers and application Clients. Once a Worker or Client has registered with the Master Leader, it is "in the system" and will never need to register again. Forthcoming: Documentation, tests (! - only ad hoc testing has been performed so far) I do not intend for this commit to be merged until tests are added, but this patch should still be mostly reviewable until then.	2013-09-26 15:04:23 -07:00
Reynold Xin	3f283278b0	Removed scala -optimize flag.	2013-09-26 13:58:10 -07:00
Reynold Xin	c514cd1587	Merge pull request #930 from holdenk/master Add mapPartitionsWithIndex	2013-09-26 13:48:20 -07:00
Patrick Wendell	6079721fa1	Update build version in master	2013-09-24 11:41:51 -07:00
Patrick Wendell	c856860c5b	Bumping Mesos version to 0.13.0	2013-09-15 12:46:26 -07:00
Holden Karau	68068977b8	Fix build on ubuntu	2013-09-14 20:51:11 -07:00
Patrick Wendell	91a59e6b10	Merge pull request #919 from mateiz/jets3t Add explicit jets3t dependency, which is excluded in hadoop-client	2013-09-11 10:21:48 -07:00
Patrick Wendell	0c1985b153	Fix HDFS access bug with assembly build. Due to this change in HDFS: https://issues.apache.org/jira/browse/HADOOP-7549 there is a bug when using the new assembly builds. The symptom is that any HDFS access results in an exception saying "No filesystem for scheme 'hdfs'". This adds a merge strategy in the assembly build which fixes the problem.	2013-09-10 22:05:13 -07:00
Matei Zaharia	f117dc6d0d	Add explicit jets3t dependency, which is excluded in hadoop-client	2013-09-10 06:39:25 +00:00
Patrick Wendell	f68848d95d	Merge pull request #906 from pwendell/ganglia-sink Clean-up of Metrics Code/Docs and Add Ganglia Sink	2013-09-08 18:32:16 -07:00
Matei Zaharia	0b957997ad	Merge pull request #908 from pwendell/master Fix target JVM version in scala build	2013-09-08 15:30:16 -07:00
Patrick Wendell	27bd74c8ad	Fix target JVM version in scala build	2013-09-08 14:37:45 -07:00
Patrick Wendell	8de8ee5d3c	Ganglia sink	2013-09-08 10:08:18 -07:00
Patrick Wendell	a8e376ec0f	Merge pull request #904 from pwendell/master Adding Apache license to two files	2013-09-07 21:16:01 -07:00
Patrick Wendell	6d2198643c	Adding Apache license to two files	2013-09-07 20:46:58 -07:00
Jey Kottalam	30a32c8335	Minor YARN build cleanups	2013-09-06 11:31:16 -07:00
Matei Zaharia	59218bdd49	Add Apache parent POM	2013-09-02 18:34:03 -07:00
Matei Zaharia	5701eb92c7	Fix some URLs	2013-09-01 14:13:16 -07:00
Matei Zaharia	46eecd110a	Initial work to rename package to org.apache.spark	2013-09-01 14:13:13 -07:00
Matei Zaharia	666d93c294	Update Maven build to create assemblies expected by new scripts This includes the following changes: - The "assembly" package now builds in Maven by default, and creates an assembly containing both hadoop-client and Spark, unlike the old BigTop distribution assembly that skipped hadoop-client - There is now a bigtop-dist package to build the old BigTop assembly - The repl-bin package is no longer built by default since the scripts don't reply on it; instead it can be enabled with -Prepl-bin - Py4J is now included in the assembly/lib folder as a local Maven repo, so that the Maven package can link to it - run-example now adds the original Spark classpath as well because the Maven examples assembly lists spark-core and such as provided - The various Maven projects add a spark-yarn dependency correctly	2013-08-29 21:19:06 -07:00
Matei Zaharia	8d81358a05	Provide more memory for tests	2013-08-29 21:19:06 -07:00
Matei Zaharia	53cd50c069	Change build and run instructions to use assemblies This commit makes Spark invocation saner by using an assembly JAR to find all of Spark's dependencies instead of adding all the JARs in lib_managed. It also packages the examples into an assembly and uses that as SPARK_EXAMPLES_JAR. Finally, it replaces the old "run" script with two better-named scripts: "run-examples" for examples, and "spark-class" for Spark internal classes (e.g. REPL, master, etc). This is also designed to minimize the confusion people have in trying to use "run" to run their own classes; it's not meant to do that, but now at least if they look at it, they can modify run-examples to do a decent job for them. As part of this, Bagel's examples are also now properly moved to the examples package instead of bagel.	2013-08-29 21:19:04 -07:00
Reynold Xin	9db1e50344	Revert "Merge pull request #841 from rxin/json" This reverts commit `1fb1b09928`, reversing changes made to `c69c48947d`.	2013-08-26 11:05:14 -07:00
Jey Kottalam	a9db1b7b6e	Upgrade SBT IDE project generators	2013-08-23 10:27:18 -07:00
Jey Kottalam	b7f9e6374a	Fix SBT generation of IDE project files	2013-08-23 10:26:37 -07:00
Jey Kottalam	281b6c5f28	Re-add removed dependency on 'commons-daemon' Fixes SBT build under Hadoop 0.23.9 and 2.0.4	2013-08-22 15:45:45 -07:00
Matei Zaharia	ae8ba83ef2	Merge pull request #855 from jey/update-build-docs Update build docs	2013-08-22 10:14:54 -07:00
Matei Zaharia	8a36fd09dd	Merge pull request #854 from markhamstra/pomUpdate Synced sbt and maven builds to use the same dependencies, etc.	2013-08-22 10:13:35 -07:00
Jey Kottalam	f9cc1fbf27	Remove references to unsupported Hadoop versions	2013-08-21 17:14:36 -07:00
Mark Hamstra	ff6f1b0500	Synced sbt and maven builds	2013-08-21 13:50:24 -07:00
Reynold Xin	af602ba9d3	Downgraded default build hadoop version to 1.0.4.	2013-08-21 11:38:24 -07:00
Matei Zaharia	aa2b89d98d	Merge remote-tracking branch 'jey/hadoop-agnostic' Conflicts: core/src/main/scala/spark/PairRDDFunctions.scala	2013-08-20 10:14:15 -07:00
Jey Kottalam	6f6944c807	Update SBT build to use simpler fix for Hadoop 0.23.9	2013-08-19 12:33:13 -07:00
Jey Kottalam	67b593607c	Rename YARN build flag to SPARK_WITH_YARN	2013-08-16 14:00:05 -07:00
Jey Kottalam	b1d99744a8	Fix SBT build under Hadoop 0.23.x	2013-08-16 13:50:12 -07:00
Jey Kottalam	8add2d7a59	Fix repl/assembly when YARN enabled	2013-08-16 13:50:12 -07:00
Jey Kottalam	3f98eff63a	Allow make-distribution.sh to specify Hadoop version used	2013-08-16 13:50:09 -07:00
Reynold Xin	c961c19b7b	Use the JSON formatter from Scala library and removed dependency on lift-json. It made the JSON creation slightly more complicated, but reduces one external dependency. The scala library also properly escape "/" (which lift-json doesn't).	2013-08-15 18:23:01 -07:00
Jey Kottalam	a0f0848463	Update default version of Hadoop to 1.2.1	2013-08-15 16:50:37 -07:00
Jey Kottalam	cb4ef19214	yarn support	2013-08-15 16:50:37 -07:00
Jey Kottalam	273b499b9a	yarn sbt	2013-08-15 16:50:37 -07:00
Jey Kottalam	69c3bbf688	dynamically detect hadoop version	2013-08-15 16:50:37 -07:00
Matei Zaharia	d9588183fa	Update to Mesos 0.12.1	2013-08-13 18:51:35 -07:00
jerryshao	320e87e7ab	Add MetricsServlet for Spark metrics system	2013-08-12 13:23:23 +08:00
Matei Zaharia	dce5e47435	Merge pull request #800 from dlyubimov/HBASE_VERSION Pull HBASE_VERSION in the head of sbt build	2013-08-09 21:53:45 -07:00
Matei Zaharia	cd247ba5bb	Merge pull request #786 from shivaram/mllib-java Java fixes, tests and examples for ALS, KMeans	2013-08-09 20:41:13 -07:00
Dmitriy Lyubimov	27f674f82b	fewer words	2013-08-09 13:54:41 -07:00

1 2 3 4 5 ...

281 commits