ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Patrick Wendell	0428145ed4	Small fix	2013-12-07 22:33:11 -08:00
Patrick Wendell	b3e87c0f51	Adding HDP 2.0 version	2013-12-07 22:31:46 -08:00
Patrick Wendell	41c60b337a	Various broken links in documentation	2013-12-07 22:31:44 -08:00
Patrick Wendell	6494d62fe4	Merge pull request #240 from pwendell/master SPARK-917 Improve API links in nav bar	2013-12-07 11:56:16 -08:00
Patrick Wendell	dd331a6b26	SPARK-917 Improve API links in nav bar	2013-12-07 11:49:49 -08:00
Aaron Davidson	cb6ac8aafb	Correct spellling error in configuration.md	2013-12-07 01:40:01 -08:00
Patrick Wendell	7a1d1c93b8	Minor formatting fix in config file	2013-12-06 20:28:22 -08:00
Patrick Wendell	1b38f5f277	Merge pull request #236 from pwendell/shuffle-docs Adding disclaimer for shuffle file consolidation	2013-12-06 20:16:15 -08:00
Patrick Wendell	b9451acdf4	Adding disclaimer for shuffle file consolidation	2013-12-06 19:25:28 -08:00
Patrick Wendell	bb6e25c663	Minor doc fixes and updating README	2013-12-06 17:42:28 -08:00
Ali Ghodsi	e2c2914faa	more docs	2013-12-06 16:54:06 -08:00
Ali Ghodsi	f2fb4b4228	Updated documentation about the YARN v2.2 build process	2013-12-06 16:31:26 -08:00
Patrick Wendell	5d460253d6	Merge pull request #228 from pwendell/master Document missing configs and set shuffle consolidation to false.	2013-12-05 12:31:24 -08:00
Patrick Wendell	1450b8ef87	Small changes from Matei review	2013-12-04 18:49:32 -08:00
Patrick Wendell	b1c6fa1584	Document missing configs and set shuffle consolidation to false.	2013-12-04 18:39:34 -08:00
Andrew Ash	0c5af38b86	Typo: applicaton	2013-12-04 12:30:25 -08:00
Andrew Ash	08afef37a0	Update tuning.md Clarify when serializer is used based on recent user@ mailing list discussion.	2013-11-25 17:08:52 -08:00
Matei Zaharia	eb4296c8f7	Merge pull request #101 from colorant/yarn-client-scheduler For SPARK-527, Support spark-shell when running on YARN sync to trunk and resubmit here In current YARN mode approaching, the application is run in the Application Master as a user program thus the whole spark context is on remote. This approaching won't support application that involve local interaction and need to be run on where it is launched. So In this pull request I have a YarnClientClusterScheduler and backend added. With this scheduler, the user application is launched locally,While the executor will be launched by YARN on remote nodes with a thin AM which only launch the executor and monitor the Driver Actor status, so that when client app is done, it can finish the YARN Application as well. This enables spark-shell to run upon YARN. This also enable other Spark applications to have the spark context to run locally with a master-url "yarn-client". Thus e.g. SparkPi could have the result output locally on console instead of output in the log of the remote machine where AM is running on. Docs also updated to show how to use this yarn-client mode.	2013-11-25 15:25:29 -08:00
Matei Zaharia	859d62dc2a	Merge pull request #151 from russellcardullo/add-graphite-sink Add graphite sink for metrics This adds a metrics sink for graphite. The sink must be configured with the host and port of a graphite node and optionally may be configured with a prefix that will be prepended to all metrics that are sent to graphite.	2013-11-24 16:19:51 -08:00
Raymond Liu	ab3cefde53	Add YarnClientClusterScheduler and Backend. With this scheduler, the user application is launched locally, While the executor will be launched by YARN on remote nodes. This enables spark-shell to run upon YARN.	2013-11-22 09:23:27 +08:00
Neal Wiggins	21b5478ed6	Fix Kryo Serializer buffer inconsistency The documentation here is inconsistent with the coded default and other documentation.	2013-11-20 16:19:25 -08:00
tgravescs	4093e9393a	Impove Spark on Yarn Error handling	2013-11-19 12:44:00 -06:00
RIA-pierre-borckmans	bef398e572	Fixed typos in the CDH4 distributions version codes.	2013-11-14 11:33:48 +01:00
Russell Cardullo	ef85a51f85	Add graphite sink for metrics This adds a metrics sink for graphite. The sink must be configured with the host and port of a graphite node and optionally may be configured with a prefix that will be prepended to all metrics that are sent to graphite.	2013-11-08 16:36:03 -08:00
tgravescs	a35472e1dd	Allow spark on yarn to be run from HDFS. Allows the spark.jar, app.jar, and log4j.properties to be put into hdfs.	2013-11-04 16:16:28 -06:00
Fabrizio (Misto) Milo	3f89354c45	fix persistent-hdfs	2013-11-01 17:47:37 -07:00
Evan Chan	e54a37fe15	Document all the URIs for addJar/addFile	2013-11-01 10:58:11 -07:00
Patrick Wendell	08c1a42d7d	Add a `repartition` operator. This patch adds an operator called repartition with more straightforward semantics than the current `coalesce` operator. There are a few use cases where this operator is useful: 1. If a user wants to increase the number of partitions in the RDD. This is more common now with streaming. E.g. a user is ingesting data on one node but they want to add more partitions to ensure parallelism of subsequent operations across threads or the cluster. Right now they have to call rdd.coalesce(numSplits, shuffle=true) - that's super confusing. 2. If a user has input data where the number of partitions is not known. E.g. > sc.textFile("some file").coalesce(50).... This is both vague semantically (am I growing or shrinking this RDD) but also, may not work correctly if the base RDD has fewer than 50 partitions. The new operator forces shuffles every time, so it will always produce exactly the number of new partitions. It also throws an exception rather than silently not-working if a bad input is passed. I am currently adding streaming tests (requires refactoring some of the test suite to allow testing at partition granularity), so this is not ready for merge yet. But feedback is welcome.	2013-10-24 14:31:33 -07:00
Matei Zaharia	452aa36d67	Merge pull request #97 from ewencp/pyspark-system-properties Add classmethod to SparkContext to set system properties. Add a new classmethod to SparkContext to set system properties like is possible in Scala/Java. Unlike the Java/Scala implementations, there's no access to System until the JVM bridge is created. Since SparkContext handles that, move the initialization of the JVM connection to a separate classmethod that can safely be called repeatedly as long as the same instance (or no instance) is provided.	2013-10-22 23:15:33 -07:00
Ewen Cheslack-Postava	c8748c25eb	Add notes to python documentation about using SparkContext.setSystemProperty.	2013-10-22 11:49:52 -07:00
Aaron Davidson	962bec97ee	Docs: Fix links to RDD API documentation	2013-10-22 09:39:36 -07:00
Reynold Xin	f628804c02	Merge pull request #76 from pwendell/master Clarify compression property. Clarifies that this governs compression of internal data, not input data or output data.	2013-10-18 23:19:42 -07:00
Patrick Wendell	6b62836285	Clarify compression property. Clarifies that this governs compression of internal data, not input data or output data.	2013-10-18 23:08:44 -07:00
Mosharaf Chowdhury	35b2415fb3	Code styling. Updated doc.	2013-10-17 13:14:12 -07:00
Matei Zaharia	8f11c36fe1	Merge remote-tracking branch 'tgravescs/sparkYarnDistCache' Closes #11 Conflicts: docs/running-on-yarn.md yarn/src/main/scala/org/apache/spark/deploy/yarn/ClientArguments.scala	2013-10-10 19:34:33 -07:00
Matei Zaharia	c71499b779	Merge pull request #19 from aarondav/master-zk Standalone Scheduler fault tolerance using ZooKeeper This patch implements full distributed fault tolerance for standalone scheduler Masters. There is only one master Leader at a time, which is actively serving scheduling requests. If this Leader crashes, another master will eventually be elected, reconstruct the state from the first Master, and continue serving scheduling requests. Leader election is performed using the ZooKeeper leader election pattern. We try to minimize the use of ZooKeeper and the assumptions about ZooKeeper's behavior, so there is a layer of retries and session monitoring on top of the ZooKeeper client. Master failover follows directly from the single-node Master recovery via the file system (patch `d5a96fe`), save that the Master state is stored in ZooKeeper instead. Configuration: By default, no recovery mechanism is enabled (spark.deploy.recoveryMode = NONE). By setting spark.deploy.recoveryMode to ZOOKEEPER and setting spark.deploy.zookeeper.url to an appropriate ZooKeeper URL, ZooKeeper recovery mode is enabled. By setting spark.deploy.recoveryMode to FILESYSTEM and setting spark.deploy.recoveryDirectory to an appropriate directory accessible by the Master, we will keep the behavior of from `d5a96fe`. Additionally, places where a Master could be specificied by a spark:// url can now take comma-delimited lists to specify backup masters. Note that this is only used for registration of NEW Workers and application Clients. Once a Worker or Client has registered with the Master Leader, it is "in the system" and will never need to register again.	2013-10-10 17:16:42 -07:00
Aaron Davidson	66c20635fa	Minor clarification and cleanup to spark-standalone.md	2013-10-10 14:45:12 -07:00
Aaron Davidson	42d8b8efe6	Address Matei's comments on documentation Updates to the documentation and changing some logError()s to logWarning()s.	2013-10-10 00:33:47 -07:00
Matei Zaharia	478b2b7edc	Fix PySpark docs and an overly long line of code after `fdbae41e`	2013-10-09 12:08:04 -07:00
Aaron Davidson	4ea8ee468f	Add docs for standalone scheduler fault tolerance Also fix a couple HTML/Markdown issues in other files.	2013-10-08 14:18:31 -07:00
Nick Pentreath	a5e58b8f98	Merge branch 'master' into implicit-als	2013-10-07 11:46:17 +02:00
Patrick Wendell	aa9fb84994	Merging build changes in from 0.8	2013-10-05 22:07:00 -07:00
Nick Pentreath	93b96b44d7	Adding implicit feedback ALS to MLlib user guide	2013-10-04 14:39:44 +02:00
tgravescs	0fff4ee852	Adding in the --addJars option to make SparkContext.addJar work on yarn and cleanup the classpaths	2013-10-03 11:52:16 -05:00
tgravescs	bc3b20abdc	Allow users to set the application name for Spark on Yarn	2013-10-02 12:54:17 -05:00
Patrick Wendell	6079721fa1	Update build version in master	2013-09-24 11:41:51 -07:00
$Y.CORP.YAHOO.COM\tgraves$ Y.CORP.YAHOO.COM\tgraves	9d4246863a	Support distributed cache files and archives on spark on yarn and attempt to cleanup the staging directory on exit	2013-09-23 09:09:59 -05:00
Jey Kottalam	ac0dd99394	Fix typo in Maven build docs	2013-09-15 13:29:22 -07:00
Patrick Wendell	dbd2c4fd94	Merge pull request #932 from pwendell/mesos-version Bumping Mesos version to 0.13.0	2013-09-15 13:20:41 -07:00
Patrick Wendell	c856860c5b	Bumping Mesos version to 0.13.0	2013-09-15 12:46:26 -07:00

1 2 3 4 5 ...

330 commits