ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Ankur Dave	3eb83191cb	Generate GraphX docs	2014-01-10 11:37:28 -08:00
Ankur Dave	6bd9a78e78	Add back Bagel links to docs, but mark them superseded	2014-01-10 11:37:10 -08:00
Joseph E. Gonzalez	b1eeefb401	WIP. Updating figures and cleaning up initial skeleton for GraphX Programming guide.	2014-01-10 00:39:08 -08:00
Patrick Wendell	dd03cea02a	Merge pull request #378 from pwendell/consolidate_on Enable shuffle consolidation by default. Bump this to being enabled for 0.9.0.	2014-01-09 23:38:03 -08:00
Reza Zadeh	21c8a54c08	Merge remote-tracking branch 'upstream/master' into sparsesvd Conflicts: docs/mllib-guide.md	2014-01-09 22:45:32 -08:00
Patrick Wendell	460f655cc6	Enable shuffle consolidation by default. Bump this to being enabled for 0.9.0.	2014-01-09 22:42:50 -08:00
Patrick Wendell	300eaa994c	Merge pull request #353 from pwendell/ipython-simplify Simplify and fix pyspark script. This patch removes compatibility for IPython < 1.0 but fixes the launch script and makes it much simpler. I tested this using the three commands in the PySpark documentation page: 1. IPYTHON=1 ./pyspark 2. IPYTHON_OPTS="notebook" ./pyspark 3. IPYTHON_OPTS="notebook --pylab inline" ./pyspark There are two changes: - We rely on PYTHONSTARTUP env var to start PySpark - Removed the quotes around $IPYTHON_OPTS... having quotes gloms them together as a single argument passed to `exec` which seemed to cause ipython to fail (it instead expects them as multiple arguments).	2014-01-09 20:29:51 -08:00
Patrick Wendell	d86a85e9ca	Merge pull request #293 from pwendell/standalone-driver SPARK-998: Support Launching Driver Inside of Standalone Mode [NOTE: I need to bring the tests up to date with new changes, so for now they will fail] This patch provides support for launching driver programs inside of a standalone cluster manager. It also supports monitoring and re-launching of driver programs which is useful for long running, recoverable applications such as Spark Streaming jobs. For those jobs, this patch allows a deployment mode which is resilient to the failure of any worker node, failure of a master node (provided a multi-master setup), and even failures of the applicaiton itself, provided they are recoverable on a restart. Driver information, such as the status and logs from a driver, is displayed in the UI There are a few small TODO's here, but the code is generally feature-complete. They are: - Bring tests up to date and add test coverage - Restarting on failure should be optional and maybe off by default. - See if we can re-use akka connections to facilitate clients behind a firewall A sensible place to start for review would be to look at the `DriverClient` class which presents users the ability to launch their driver program. I've also added an example program (`DriverSubmissionTest`) that allows you to test this locally and play around with killing workers, etc. Most of the code is devoted to persisting driver state in the cluster manger, exposing it in the UI, and dealing correctly with various types of failures. Instructions to test locally: - `sbt/sbt assembly/assembly examples/assembly` - start a local version of the standalone cluster manager ``` ./spark-class org.apache.spark.deploy.client.DriverClient \ -j -Dspark.test.property=something \ -e SPARK_TEST_KEY=SOMEVALUE \ launch spark://10.99.1.14:7077 \ ../path-to-examples-assembly-jar \ org.apache.spark.examples.DriverSubmissionTest 1000 some extra options --some-option-here -X 13 ``` - Go in the UI and make sure it started correctly, look at the output etc - Kill workers, the driver program, masters, etc.	2014-01-09 18:37:52 -08:00
Ankur Dave	b5b0de2de5	Start fixing formatting of graphx-programming-guide	2014-01-09 13:24:25 -08:00
Ankur Dave	e4483582fc	Add docs/graphx-programming-guide.md from 7210257ba3038d5e22d4b60fe9c3113dc45c3dff:README.md	2014-01-09 10:24:43 -08:00
Thomas Graves	c617083e47	yarn-client addJar fix and misc other	2014-01-09 10:24:35 -06:00
Ankur Dave	91227566bc	Merge remote-tracking branch 'spark-upstream/master' into HEAD Conflicts: README.md core/src/main/scala/org/apache/spark/util/collection/OpenHashMap.scala core/src/main/scala/org/apache/spark/util/collection/OpenHashSet.scala core/src/main/scala/org/apache/spark/util/collection/PrimitiveKeyOpenHashMap.scala pom.xml project/SparkBuild.scala repl/src/main/scala/org/apache/spark/repl/SparkILoop.scala	2014-01-08 21:19:08 -08:00
Patrick Wendell	112c0a1776	Fixing config option "retained_stages" => "retainedStages". This is a very esoteric option and it's out of sync with the style we use. So it seems fitting to fix it for 0.9.0.	2014-01-08 21:16:16 -08:00
Thomas Graves	6eef78d769	Merge pull request #345 from colorant/yarn support distributing extra files to worker for yarn client mode So that user doesn't need to package all dependency into one assemble jar as spark app jar	2014-01-08 08:49:20 -06:00
Patrick Wendell	bc81ce040d	Merge remote-tracking branch 'apache-github/master' into standalone-driver Conflicts: core/src/test/scala/org/apache/spark/deploy/JsonProtocolSuite.scala pom.xml	2014-01-08 00:38:31 -08:00
Patrick Wendell	c78b381e91	Fixes	2014-01-08 00:09:12 -08:00
Patrick Wendell	bb6a39a687	Merge pull request #322 from falaki/MLLibDocumentationImprovement SPARK-1009 Updated MLlib docs to show how to use it in Python In addition added detailed examples for regression, clustering and recommendation algorithms in a separate Scala section. Fixed a few minor issues with existing documentation.	2014-01-07 22:32:18 -08:00
Hossein Falaki	46cb980a5f	Fixed merge conflict	2014-01-07 21:28:26 -08:00
Patrick Wendell	82a1d38aea	Simplify and fix pyspark script. This patch removes compatibility for IPython < 1.0 but fixes the launch script and makes it much simpler. I tested this using the three commands in the PySpark documentation page: 1. IPYTHON=1 ./pyspark 2. IPYTHON_OPTS="notebook" ./pyspark 3. IPYTHON_OPTS="notebook --pylab inline" ./pyspark There are two changes: - We rely on PYTHONSTARTUP env var to start PySpark - Removed the quotes around $IPYTHON_OPTS... having quotes gloms them together as a single argument passed to `exec` which seemed to cause ipython to fail (it instead expects them as multiple arguments).	2014-01-07 17:55:25 -08:00
Reza Zadeh	4f38b6fab5	documentation for sparsematrix	2014-01-07 17:19:28 -08:00
Matei Zaharia	2c421749ea	Address review comments	2014-01-07 19:30:23 -05:00
Matei Zaharia	d8bcc8e9a0	Add way to limit default # of cores used by applications on standalone mode Also documents the spark.deploy.spreadOut option.	2014-01-07 14:35:52 -05:00
Patrick Wendell	c3cf0475e8	Merge pull request #339 from ScrapCodes/conf-improvements Conf improvements There are two new features. 1. Allow users to set arbitrary akka configurations via spark conf. 2. Allow configuration to be printed in logs for diagnosis.	2014-01-07 00:54:25 -08:00
Reynold Xin	a862cafacf	Merge pull request #331 from holdenk/master Add a script to download sbt if not present on the system As per the discussion on the dev mailing list this script will use the system sbt if present or otherwise attempt to install the sbt launcher. The fall back error message in the event it fails instructs the user to install sbt. While the URLs it fetches from aren't controlled by the spark project directly, they are stable and the current authoritative sources.	2014-01-07 00:18:20 -08:00
Prashant Sharma	c729fa7c8e	formatting related fixes suggested by Patrick.	2014-01-07 13:08:16 +05:30
Prashant Sharma	b84dc780d3	Allow configuration to be printed in logs for diagnosis.	2014-01-07 13:01:43 +05:30
Prashant Sharma	b3018811e1	Allow users to set arbitrary akka configurations via spark conf.	2014-01-07 13:01:43 +05:30
Patrick Wendell	b72cceba27	Some doc fixes	2014-01-06 22:05:53 -08:00
Raymond Liu	67af803136	Export --file for YarnClient mode to support sending extra files to worker on yarn cluster	2014-01-07 10:24:11 +08:00
Patrick Wendell	c0498f9265	Merge remote-tracking branch 'apache-github/master' into standalone-driver Conflicts: core/src/main/scala/org/apache/spark/deploy/client/AppClient.scala core/src/main/scala/org/apache/spark/deploy/client/TestClient.scala core/src/main/scala/org/apache/spark/deploy/master/Master.scala core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala core/src/main/scala/org/apache/spark/scheduler/cluster/SparkDeploySchedulerBackend.scala	2014-01-06 17:29:21 -08:00
Hossein Falaki	150089dae1	Added proper evaluation example for collaborative filtering and fixed typo	2014-01-06 12:43:17 -08:00
Andrew Ash	2dd4fb5698	Clarify spark.cores.max It controls the count of cores across the cluster, not on a per-machine basis.	2014-01-06 09:01:46 -08:00
Holden Karau	d86dc74d79	Code review feedback	2014-01-05 22:05:30 -08:00
Reza Zadeh	746148bc18	fix docs to use SparseMatrix	2014-01-05 18:03:57 -08:00
Reza Zadeh	73daa700bd	add k parameter	2014-01-04 01:52:28 -08:00
Patrick Wendell	604fad9c39	Merge remote-tracking branch 'apache-github/master' into remove-binaries Conflicts: core/src/test/scala/org/apache/spark/DriverSuite.scala docs/python-programming-guide.md	2014-01-03 21:29:33 -08:00
Hossein Falaki	8b5be06752	Added table of contents and minor fixes	2014-01-03 16:38:33 -08:00
Patrick Wendell	4ae101ff38	Merge pull request #317 from ScrapCodes/spark-915-segregate-scripts Spark-915 segregate scripts	2014-01-03 11:24:35 -08:00
Prashant Sharma	74ba97fcf7	sbin/spark-class* -> bin/spark-class*	2014-01-03 15:08:01 +05:30
Prashant Sharma	94f2fffa23	fixed review comments	2014-01-03 14:43:37 +05:30
Prashant Sharma	b4bb80002b	Merge branch 'master' into spark-1002-remove-jars	2014-01-03 12:12:04 +05:30
Raymond Liu	f442afc22e	fix docs for yarn	2014-01-03 14:14:35 +08:00
Raymond Liu	ebdfa6bb97	Using name yarn-alpha/yarn instead of yarn-2.0/yarn-2.2	2014-01-03 12:14:38 +08:00
Raymond Liu	7815a3ace9	Update maven build documentation	2014-01-03 12:12:38 +08:00
Raymond Liu	be343d2a56	Fix yarn/README.md and update docs/running-on-yarn.md	2014-01-03 12:12:38 +08:00
Hossein Falaki	81989e2664	Commented the last part of collaborative filtering examples that lead to errors	2014-01-02 16:22:13 -08:00
Hossein Falaki	c189c8362c	Added Scala and Python examples for mllib	2014-01-02 15:22:20 -08:00
Prashant Sharma	59e8009b8d	a few left over document change	2014-01-02 21:48:44 +05:30
Prashant Sharma	a3f90a2ecf	pyspark -> bin/pyspark	2014-01-02 18:50:12 +05:30
Prashant Sharma	94b7a7fe37	run-example -> bin/run-example	2014-01-02 18:41:21 +05:30
Prashant Sharma	b810a85cdd	spark-shell -> bin/spark-shell	2014-01-02 18:37:40 +05:30
Prashant Sharma	980afd280a	Merge branch 'scripts-reorg' of github.com:shane-huang/incubator-spark into spark-915-segregate-scripts Conflicts: bin/spark-shell core/pom.xml core/src/main/scala/org/apache/spark/SparkContext.scala core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/CoarseMesosSchedulerBackend.scala core/src/main/scala/org/apache/spark/ui/UIWorkloadGenerator.scala core/src/test/scala/org/apache/spark/DriverSuite.scala python/run-tests sbin/compute-classpath.sh sbin/spark-class sbin/stop-slaves.sh	2014-01-02 17:55:21 +05:30
Reza Zadeh	61405785bc	Merge remote-tracking branch 'upstream/master' into sparsesvd	2014-01-02 01:50:30 -08:00
Prashant Sharma	6be4c11194	Removed sbt folder and changed docs accordingly	2014-01-02 14:09:37 +05:30
Reza Zadeh	53ccf65362	doc tweaks	2014-01-01 20:03:47 -08:00
Reza Zadeh	97dc527849	doc tweak	2014-01-01 20:02:37 -08:00
Reza Zadeh	b941b6f7b0	doc tweaks	2014-01-01 20:01:13 -08:00
Reza Zadeh	dd0d3f008b	New documentation	2014-01-01 19:53:04 -08:00
Matei Zaharia	0fa5809768	Updated docs for SparkConf and handled review comments	2013-12-30 22:17:28 -05:00
Patrick Wendell	6ffa9bb226	Documentation and adding supervise option	2013-12-29 11:26:56 -08:00
Reynold Xin	72a17b69f5	Revert "Merge pull request #310 from jyunfan/master" This reverts commit `79b20e4dbe`, reversing changes made to `7375047d51`.	2013-12-28 21:25:40 -10:00
Jyun-Fan Tsai	17f6620a71	Fix typo in the Accumulators section val => var	2013-12-29 11:30:02 +08:00
fengdong	ad8ce0148a	changed the example links in the scala-programming-guid	2013-12-18 19:03:32 +08:00
fengdong	ddebaf8280	Fixed the example link.	2013-12-18 11:00:36 +08:00
Reynold Xin	7db9165961	Merge pull request #251 from pwendell/master Fix list rendering in YARN markdown docs. This is some minor clean-up which makes the list render correctly.	2013-12-14 14:16:34 -08:00
Prashant Sharma	d3090b79a5	A few corrections to documentation.	2013-12-12 10:12:06 +05:30
Prashant Sharma	603af51bb5	Merge branch 'master' into akka-bug-fix Conflicts: core/pom.xml core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala pom.xml project/SparkBuild.scala streaming/pom.xml yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandler.scala	2013-12-11 10:21:53 +05:30
Patrick Wendell	1291dd4dce	Fix list rendering in YARN markdown docs.	2013-12-10 16:38:33 -08:00
Patrick Wendell	0428145ed4	Small fix	2013-12-07 22:33:11 -08:00
Patrick Wendell	b3e87c0f51	Adding HDP 2.0 version	2013-12-07 22:31:46 -08:00
Patrick Wendell	41c60b337a	Various broken links in documentation	2013-12-07 22:31:44 -08:00
Patrick Wendell	6494d62fe4	Merge pull request #240 from pwendell/master SPARK-917 Improve API links in nav bar	2013-12-07 11:56:16 -08:00
Patrick Wendell	dd331a6b26	SPARK-917 Improve API links in nav bar	2013-12-07 11:49:49 -08:00
Aaron Davidson	cb6ac8aafb	Correct spellling error in configuration.md	2013-12-07 01:40:01 -08:00
Patrick Wendell	7a1d1c93b8	Minor formatting fix in config file	2013-12-06 20:28:22 -08:00
Patrick Wendell	1b38f5f277	Merge pull request #236 from pwendell/shuffle-docs Adding disclaimer for shuffle file consolidation	2013-12-06 20:16:15 -08:00
Patrick Wendell	b9451acdf4	Adding disclaimer for shuffle file consolidation	2013-12-06 19:25:28 -08:00
Patrick Wendell	bb6e25c663	Minor doc fixes and updating README	2013-12-06 17:42:28 -08:00
Ali Ghodsi	e2c2914faa	more docs	2013-12-06 16:54:06 -08:00
Ali Ghodsi	f2fb4b4228	Updated documentation about the YARN v2.2 build process	2013-12-06 16:31:26 -08:00
Patrick Wendell	5d460253d6	Merge pull request #228 from pwendell/master Document missing configs and set shuffle consolidation to false.	2013-12-05 12:31:24 -08:00
Patrick Wendell	1450b8ef87	Small changes from Matei review	2013-12-04 18:49:32 -08:00
Patrick Wendell	b1c6fa1584	Document missing configs and set shuffle consolidation to false.	2013-12-04 18:39:34 -08:00
Andrew Ash	0c5af38b86	Typo: applicaton	2013-12-04 12:30:25 -08:00
Prashant Sharma	17987778da	Merge branch 'master' into wip-scala-2.10 Conflicts: core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala core/src/main/scala/org/apache/spark/rdd/MapPartitionsRDD.scala core/src/main/scala/org/apache/spark/rdd/MapPartitionsWithContextRDD.scala core/src/main/scala/org/apache/spark/rdd/RDD.scala python/pyspark/rdd.py	2013-11-27 14:44:12 +05:30
Prashant Sharma	54862af5ee	Improvements from the review comments and followed Boy Scout Rule.	2013-11-27 14:26:28 +05:30
Prashant Sharma	dca946ff67	Documenting the newly added spark properties.	2013-11-26 20:47:38 +05:30
Andrew Ash	08afef37a0	Update tuning.md Clarify when serializer is used based on recent user@ mailing list discussion.	2013-11-25 17:08:52 -08:00
Matei Zaharia	eb4296c8f7	Merge pull request #101 from colorant/yarn-client-scheduler For SPARK-527, Support spark-shell when running on YARN sync to trunk and resubmit here In current YARN mode approaching, the application is run in the Application Master as a user program thus the whole spark context is on remote. This approaching won't support application that involve local interaction and need to be run on where it is launched. So In this pull request I have a YarnClientClusterScheduler and backend added. With this scheduler, the user application is launched locally,While the executor will be launched by YARN on remote nodes with a thin AM which only launch the executor and monitor the Driver Actor status, so that when client app is done, it can finish the YARN Application as well. This enables spark-shell to run upon YARN. This also enable other Spark applications to have the spark context to run locally with a master-url "yarn-client". Thus e.g. SparkPi could have the result output locally on console instead of output in the log of the remote machine where AM is running on. Docs also updated to show how to use this yarn-client mode.	2013-11-25 15:25:29 -08:00
Prashant Sharma	44fd30d3fb	Merge branch 'master' into scala-2.10-wip Conflicts: core/src/main/scala/org/apache/spark/rdd/RDD.scala project/SparkBuild.scala	2013-11-25 18:10:54 +05:30
Reynold Xin	6bcac986b2	Merge branch 'master' of github.com:apache/incubator-spark	2013-11-25 15:47:47 +08:00
Matei Zaharia	859d62dc2a	Merge pull request #151 from russellcardullo/add-graphite-sink Add graphite sink for metrics This adds a metrics sink for graphite. The sink must be configured with the host and port of a graphite node and optionally may be configured with a prefix that will be prepended to all metrics that are sent to graphite.	2013-11-24 16:19:51 -08:00
Raymond Liu	ab3cefde53	Add YarnClientClusterScheduler and Backend. With this scheduler, the user application is launched locally, While the executor will be launched by YARN on remote nodes. This enables spark-shell to run upon YARN.	2013-11-22 09:23:27 +08:00
Prashant Sharma	95d8dbce91	Merge branch 'master' of github.com:apache/incubator-spark into scala-2.10-temp Conflicts: core/src/main/scala/org/apache/spark/util/collection/PrimitiveVector.scala streaming/src/main/scala/org/apache/spark/streaming/api/java/JavaStreamingContext.scala	2013-11-21 12:34:46 +05:30
Neal Wiggins	21b5478ed6	Fix Kryo Serializer buffer inconsistency The documentation here is inconsistent with the coded default and other documentation.	2013-11-20 16:19:25 -08:00
tgravescs	4093e9393a	Impove Spark on Yarn Error handling	2013-11-19 12:44:00 -06:00
Aaron Davidson	f629ba95b6	Various merge corrections I've diff'd this patch against my own -- since they were both created independently, this means that two sets of eyes have gone over all the merge conflicts that were created, so I'm feeling significantly more confident in the resulting PR. @rxin has looked at the changes to the repl and is resoundingly confident that they are correct.	2013-11-14 22:13:09 -08:00
RIA-pierre-borckmans	bef398e572	Fixed typos in the CDH4 distributions version codes.	2013-11-14 11:33:48 +01:00
Raymond Liu	a60620b76a	Merge branch 'master' into scala-2.10	2013-11-14 12:44:19 +08:00
Raymond Liu	0f2e3c6e31	Merge branch 'master' into scala-2.10	2013-11-13 16:55:11 +08:00
Russell Cardullo	ef85a51f85	Add graphite sink for metrics This adds a metrics sink for graphite. The sink must be configured with the host and port of a graphite node and optionally may be configured with a prefix that will be prepended to all metrics that are sent to graphite.	2013-11-08 16:36:03 -08:00
Reynold Xin	551a43fd3d	Merge branch 'master' of github.com:apache/incubator-spark into mergemerge Conflicts: README.md core/src/main/scala/org/apache/spark/util/collection/OpenHashMap.scala core/src/main/scala/org/apache/spark/util/collection/OpenHashSet.scala core/src/main/scala/org/apache/spark/util/collection/PrimitiveKeyOpenHashMap.scala	2013-11-04 21:02:36 -08:00
tgravescs	a35472e1dd	Allow spark on yarn to be run from HDFS. Allows the spark.jar, app.jar, and log4j.properties to be put into hdfs.	2013-11-04 16:16:28 -06:00
Fabrizio (Misto) Milo	3f89354c45	fix persistent-hdfs	2013-11-01 17:47:37 -07:00
Evan Chan	e54a37fe15	Document all the URIs for addJar/addFile	2013-11-01 10:58:11 -07:00
Ankur Dave	5064f9b2d2	Merge remote-tracking branch 'spark-upstream/master' Conflicts: project/SparkBuild.scala	2013-10-30 15:59:09 -07:00
Joseph E. Gonzalez	41b3122120	Strating to improve README.	2013-10-29 20:57:55 -07:00
Patrick Wendell	08c1a42d7d	Add a `repartition` operator. This patch adds an operator called repartition with more straightforward semantics than the current `coalesce` operator. There are a few use cases where this operator is useful: 1. If a user wants to increase the number of partitions in the RDD. This is more common now with streaming. E.g. a user is ingesting data on one node but they want to add more partitions to ensure parallelism of subsequent operations across threads or the cluster. Right now they have to call rdd.coalesce(numSplits, shuffle=true) - that's super confusing. 2. If a user has input data where the number of partitions is not known. E.g. > sc.textFile("some file").coalesce(50).... This is both vague semantically (am I growing or shrinking this RDD) but also, may not work correctly if the base RDD has fewer than 50 partitions. The new operator forces shuffles every time, so it will always produce exactly the number of new partitions. It also throws an exception rather than silently not-working if a bad input is passed. I am currently adding streaming tests (requires refactoring some of the test suite to allow testing at partition granularity), so this is not ready for merge yet. But feedback is welcome.	2013-10-24 14:31:33 -07:00
Matei Zaharia	452aa36d67	Merge pull request #97 from ewencp/pyspark-system-properties Add classmethod to SparkContext to set system properties. Add a new classmethod to SparkContext to set system properties like is possible in Scala/Java. Unlike the Java/Scala implementations, there's no access to System until the JVM bridge is created. Since SparkContext handles that, move the initialization of the JVM connection to a separate classmethod that can safely be called repeatedly as long as the same instance (or no instance) is provided.	2013-10-22 23:15:33 -07:00
Ewen Cheslack-Postava	c8748c25eb	Add notes to python documentation about using SparkContext.setSystemProperty.	2013-10-22 11:49:52 -07:00
Aaron Davidson	962bec97ee	Docs: Fix links to RDD API documentation	2013-10-22 09:39:36 -07:00
Reynold Xin	f628804c02	Merge pull request #76 from pwendell/master Clarify compression property. Clarifies that this governs compression of internal data, not input data or output data.	2013-10-18 23:19:42 -07:00
Patrick Wendell	6b62836285	Clarify compression property. Clarifies that this governs compression of internal data, not input data or output data.	2013-10-18 23:08:44 -07:00
Mosharaf Chowdhury	35b2415fb3	Code styling. Updated doc.	2013-10-17 13:14:12 -07:00
Matei Zaharia	8f11c36fe1	Merge remote-tracking branch 'tgravescs/sparkYarnDistCache' Closes #11 Conflicts: docs/running-on-yarn.md yarn/src/main/scala/org/apache/spark/deploy/yarn/ClientArguments.scala	2013-10-10 19:34:33 -07:00
Matei Zaharia	c71499b779	Merge pull request #19 from aarondav/master-zk Standalone Scheduler fault tolerance using ZooKeeper This patch implements full distributed fault tolerance for standalone scheduler Masters. There is only one master Leader at a time, which is actively serving scheduling requests. If this Leader crashes, another master will eventually be elected, reconstruct the state from the first Master, and continue serving scheduling requests. Leader election is performed using the ZooKeeper leader election pattern. We try to minimize the use of ZooKeeper and the assumptions about ZooKeeper's behavior, so there is a layer of retries and session monitoring on top of the ZooKeeper client. Master failover follows directly from the single-node Master recovery via the file system (patch `d5a96fe`), save that the Master state is stored in ZooKeeper instead. Configuration: By default, no recovery mechanism is enabled (spark.deploy.recoveryMode = NONE). By setting spark.deploy.recoveryMode to ZOOKEEPER and setting spark.deploy.zookeeper.url to an appropriate ZooKeeper URL, ZooKeeper recovery mode is enabled. By setting spark.deploy.recoveryMode to FILESYSTEM and setting spark.deploy.recoveryDirectory to an appropriate directory accessible by the Master, we will keep the behavior of from `d5a96fe`. Additionally, places where a Master could be specificied by a spark:// url can now take comma-delimited lists to specify backup masters. Note that this is only used for registration of NEW Workers and application Clients. Once a Worker or Client has registered with the Master Leader, it is "in the system" and will never need to register again.	2013-10-10 17:16:42 -07:00
Aaron Davidson	66c20635fa	Minor clarification and cleanup to spark-standalone.md	2013-10-10 14:45:12 -07:00
Aaron Davidson	42d8b8efe6	Address Matei's comments on documentation Updates to the documentation and changing some logError()s to logWarning()s.	2013-10-10 00:33:47 -07:00
Prashant Sharma	026ab75661	Merge branch 'master' of github.com:apache/incubator-spark into scala-2.10	2013-10-10 09:42:55 +05:30
Matei Zaharia	478b2b7edc	Fix PySpark docs and an overly long line of code after `fdbae41e`	2013-10-09 12:08:04 -07:00
Aaron Davidson	4ea8ee468f	Add docs for standalone scheduler fault tolerance Also fix a couple HTML/Markdown issues in other files.	2013-10-08 14:18:31 -07:00
Prashant Sharma	7be75682b9	Merge branch 'master' into wip-merge-master Conflicts: bagel/pom.xml core/pom.xml core/src/test/scala/org/apache/spark/ui/UISuite.scala examples/pom.xml mllib/pom.xml pom.xml project/SparkBuild.scala repl/pom.xml streaming/pom.xml tools/pom.xml In scala 2.10, a shorter representation is used for naming artifacts so changed to shorter scala version for artifacts and made it a property in pom.	2013-10-08 11:29:40 +05:30
Nick Pentreath	a5e58b8f98	Merge branch 'master' into implicit-als	2013-10-07 11:46:17 +02:00
Patrick Wendell	aa9fb84994	Merging build changes in from 0.8	2013-10-05 22:07:00 -07:00
Prashant Sharma	c810ee0690	Merge branch 'master' into scala-2.10 Conflicts: core/src/test/scala/org/apache/spark/DistributedSuite.scala project/SparkBuild.scala	2013-10-05 15:52:57 +05:30
Nick Pentreath	93b96b44d7	Adding implicit feedback ALS to MLlib user guide	2013-10-04 14:39:44 +02:00
tgravescs	0fff4ee852	Adding in the --addJars option to make SparkContext.addJar work on yarn and cleanup the classpaths	2013-10-03 11:52:16 -05:00
tgravescs	bc3b20abdc	Allow users to set the application name for Spark on Yarn	2013-10-02 12:54:17 -05:00
Prashant Sharma	5829692885	Merge branch 'master' into scala-2.10 Conflicts: core/src/main/scala/org/apache/spark/ui/jobs/JobProgressUI.scala docs/_config.yml project/SparkBuild.scala repl/src/main/scala/org/apache/spark/repl/SparkILoop.scala	2013-10-01 11:57:24 +05:30
shane-huang	84849baf88	Merge branch 'reorgscripts' into scripts-reorg	2013-09-27 09:28:33 +08:00
Prashant Sharma	604dc40996	Sync with master and some build fixes	2013-09-26 11:40:02 +05:30
Patrick Wendell	6079721fa1	Update build version in master	2013-09-24 11:41:51 -07:00
$Y.CORP.YAHOO.COM\tgraves$ Y.CORP.YAHOO.COM\tgraves	9d4246863a	Support distributed cache files and archives on spark on yarn and attempt to cleanup the staging directory on exit	2013-09-23 09:09:59 -05:00
shane-huang	fcfe4f9204	add admin scripts to sbin Signed-off-by: shane-huang <shengsheng.huang@intel.com>	2013-09-23 12:42:34 +08:00
shane-huang	dfbdc9ddb7	added spark-class and spark-executor to sbin Signed-off-by: shane-huang <shengsheng.huang@intel.com>	2013-09-23 11:28:58 +08:00
Jey Kottalam	ac0dd99394	Fix typo in Maven build docs	2013-09-15 13:29:22 -07:00
Patrick Wendell	dbd2c4fd94	Merge pull request #932 from pwendell/mesos-version Bumping Mesos version to 0.13.0	2013-09-15 13:20:41 -07:00
Patrick Wendell	c856860c5b	Bumping Mesos version to 0.13.0	2013-09-15 12:46:26 -07:00
Patrick Wendell	362ea0c051	Explain yarn.version in Maven build docs	2013-09-15 12:40:49 -07:00
Prashant Sharma	a90e0eff59	version changed 2.9.3 -> 2.10 in shell script.	2013-09-15 12:47:20 +05:30
Benjamin Hindman	8e2602dd70	More updates to Spark on Mesos documentation.	2013-09-11 16:08:54 -07:00
Benjamin Hindman	a0f0c1bed2	Updated Spark on Mesos documentation.	2013-09-11 16:05:25 -07:00
Patrick Wendell	bddf135670	Change port from 3030 to 4040	2013-09-11 10:01:38 -07:00
Matei Zaharia	2425eb85ca	Update Python API features	2013-09-10 11:12:59 -07:00
Patrick Wendell	cefee1ed1a	Document fortran dependency for MLBase	2013-09-09 21:45:04 -07:00
Matei Zaharia	7a5c4b647b	Small tweaks to MLlib docs	2013-09-08 21:47:24 -07:00
Matei Zaharia	7d3204b056	Merge pull request #905 from mateiz/docs2 Job scheduling and cluster mode docs	2013-09-08 21:39:12 -07:00
Matei Zaharia	b458854977	Fix some review comments	2013-09-08 21:25:49 -07:00
Ameet Talwalkar	81a8bd46ac	respose to PR comments	2013-09-08 19:21:30 -07:00
Ameet Talwalkar	bf280c8b0f	Merge remote-tracking branch 'upstream/master'	2013-09-08 18:41:38 -07:00

1 2 3 4 5 ...

519 commits