ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Patrick Wendell	300eaa994c	Merge pull request #353 from pwendell/ipython-simplify Simplify and fix pyspark script. This patch removes compatibility for IPython < 1.0 but fixes the launch script and makes it much simpler. I tested this using the three commands in the PySpark documentation page: 1. IPYTHON=1 ./pyspark 2. IPYTHON_OPTS="notebook" ./pyspark 3. IPYTHON_OPTS="notebook --pylab inline" ./pyspark There are two changes: - We rely on PYTHONSTARTUP env var to start PySpark - Removed the quotes around $IPYTHON_OPTS... having quotes gloms them together as a single argument passed to `exec` which seemed to cause ipython to fail (it instead expects them as multiple arguments).	2014-01-09 20:29:51 -08:00
Patrick Wendell	77ca9e1ba8	Small fix suggested by josh	2014-01-09 18:41:00 -08:00
Ankur Dave	91227566bc	Merge remote-tracking branch 'spark-upstream/master' into HEAD Conflicts: README.md core/src/main/scala/org/apache/spark/util/collection/OpenHashMap.scala core/src/main/scala/org/apache/spark/util/collection/OpenHashSet.scala core/src/main/scala/org/apache/spark/util/collection/PrimitiveKeyOpenHashMap.scala pom.xml project/SparkBuild.scala repl/src/main/scala/org/apache/spark/repl/SparkILoop.scala	2014-01-08 21:19:08 -08:00
Patrick Wendell	c0f0155eca	Merge pull request #313 from tdas/project-refactor Refactored the streaming project to separate external libraries like Twitter, Kafka, Flume, etc. At a high level, these are the following changes. 1. All the external code was put in `SPARK_HOME/external/` as separate SBT projects and Maven modules. Their artifact names are `spark-streaming-twitter`, `spark-streaming-kafka`, etc. Both SparkBuild.scala and pom.xml files have been updated. References to external libraries and repositories have been removed from the settings of root and streaming projects/modules. 2. To avail the external functionality (say, creating a Twitter stream), the developer has to `import org.apache.spark.streaming.twitter._` . For Scala API, the developer has to call `TwitterUtils.createStream(streamingContext, ...)`. For the Java API, the developer has to call `TwitterUtils.createStream(javaStreamingContext, ...)`. 3. Each external project has its own scala and java unit tests. Note the unit tests of each external library use classes of the streaming unit tests (`TestSuiteBase`, `LocalJavaStreamingContext`, etc.). To enable this code sharing among test classes, `dependsOn(streaming % "compile->compile,test->test")` was used in the SparkBuild.scala . In the streaming/pom.xml, an additional `maven-jar-plugin` was necessary to capture this dependency (see comment inside the pom.xml for more information). 4. Jars of the external projects have been added to examples project but not to the assembly project. 5. In some files, imports have been rearrange to conform to the Spark coding guidelines.	2014-01-07 22:21:52 -08:00
Patrick Wendell	82a1d38aea	Simplify and fix pyspark script. This patch removes compatibility for IPython < 1.0 but fixes the launch script and makes it much simpler. I tested this using the three commands in the PySpark documentation page: 1. IPYTHON=1 ./pyspark 2. IPYTHON_OPTS="notebook" ./pyspark 3. IPYTHON_OPTS="notebook --pylab inline" ./pyspark There are two changes: - We rely on PYTHONSTARTUP env var to start PySpark - Removed the quotes around $IPYTHON_OPTS... having quotes gloms them together as a single argument passed to `exec` which seemed to cause ipython to fail (it instead expects them as multiple arguments).	2014-01-07 17:55:25 -08:00
Tathagata Das	8f02f1c3d4	Fixed examples/pom.xml and run-example based on Patrick's suggestions.	2014-01-07 11:02:29 -08:00
Holden Karau	2dc83de72e	CR feedback (sbt -> sbt/sbt and correct JAR path in script) :)	2014-01-05 23:29:26 -08:00
Holden Karau	7d0094bb56	Finish documentation changes	2014-01-05 22:12:47 -08:00
Patrick Wendell	604fad9c39	Merge remote-tracking branch 'apache-github/master' into remove-binaries Conflicts: core/src/test/scala/org/apache/spark/DriverSuite.scala docs/python-programming-guide.md	2014-01-03 21:29:33 -08:00
Prashant Sharma	9ae382c363	sbin/compute-classpath* bin/compute-classpath*	2014-01-03 15:12:29 +05:30
Prashant Sharma	74ba97fcf7	sbin/spark-class* -> bin/spark-class*	2014-01-03 15:08:01 +05:30
Prashant Sharma	94b7a7fe37	run-example -> bin/run-example	2014-01-02 18:41:21 +05:30
Prashant Sharma	980afd280a	Merge branch 'scripts-reorg' of github.com:shane-huang/incubator-spark into spark-915-segregate-scripts Conflicts: bin/spark-shell core/pom.xml core/src/main/scala/org/apache/spark/SparkContext.scala core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/CoarseMesosSchedulerBackend.scala core/src/main/scala/org/apache/spark/ui/UIWorkloadGenerator.scala core/src/test/scala/org/apache/spark/DriverSuite.scala python/run-tests sbin/compute-classpath.sh sbin/spark-class sbin/stop-slaves.sh	2014-01-02 17:55:21 +05:30
Raymond Liu	0f2e3c6e31	Merge branch 'master' into scala-2.10	2013-11-13 16:55:11 +08:00
Reynold Xin	95f1f5315e	Added GraphX to classpath.	2013-11-07 16:22:05 -08:00
Ankur Dave	5064f9b2d2	Merge remote-tracking branch 'spark-upstream/master' Conflicts: project/SparkBuild.scala	2013-10-30 15:59:09 -07:00
Matei Zaharia	8de9706b86	Merge pull request #66 from shivaram/sbt-assembly-deps Add SBT target to assemble dependencies This pull request is an attempt to address the long assembly build times during development. Instead of rebuilding the assembly jar for every Spark change, this pull request adds a new SBT target `spark` that packages all the Spark modules and builds an assembly of the dependencies. So the work flow that should work now would be something like ``` ./sbt/sbt spark # Doing this once should suffice ## Make changes ./sbt/sbt compile ./sbt/sbt test or ./spark-shell ```	2013-10-18 20:32:39 -07:00
Joseph E. Gonzalez	1856b37e9d	Merge branch 'master' of https://github.com/apache/incubator-spark into indexedrdd_graphx	2013-10-18 12:21:19 -07:00
KarthikTunga	8537f19268	SPARK-627 , Implementing --config arguments in the scripts	2013-10-16 23:00:33 -07:00
KarthikTunga	ff4fb1f7ee	SPARK-627 , Implementing --config arguments in the scripts	2013-10-16 22:55:15 -07:00
KarthikTunga	a32aa6b351	Implementing --config argument in the scripts	2013-10-16 22:51:09 -07:00
Shivaram Venkataraman	1dcded45e2	Exclude assembly jar from classpath if using deps	2013-10-16 13:43:41 -07:00
Shivaram Venkataraman	051cd960d9	Merge branch 'master' of https://github.com/apache/incubator-spark into sbt-assembly-deps	2013-10-15 13:26:40 -07:00
KarthikTunga	6c6b146fc2	Merge branch 'master' of https://github.com/apache/incubator-spark Updating local branch	2013-10-15 00:46:35 -07:00
KarthikTunga	d2c86e7188	SPARK-627 - reading --config argument	2013-10-15 00:35:44 -07:00
Joseph E. Gonzalez	ef7c369092	merged with upstream changes	2013-10-14 22:56:42 -07:00
Andrew xia	52ccf4f859	deprecate "spark" script and SPAKR_CLASSPATH environment variable	2013-10-12 14:34:14 +08:00
Shivaram Venkataraman	484166d520	Add new SBT target for dependency assembly	2013-10-09 04:24:34 -07:00
Aaron Davidson	0f070279e7	Address Matei's comments	2013-10-05 15:15:29 -07:00
Andrew xia	cc37b3151c	refactor $FWD variable	2013-09-29 22:00:19 +08:00
Aaron Davidson	d5a96feccb	Standalone Scheduler fault recovery Implements a basic form of Standalone Scheduler fault recovery. In particular, this allows faults to be manually recovered from by means of restarting the Master process on the same machine. This is the majority of the code necessary for general fault tolerance, which will first elect a leader and then recover the Master state. In order to enable fault recovery, the Master will persist a small amount of state related to the registration of Workers and Applications to disk. If the Master is started and sees that this state is still around, it will enter Recovery mode, during which time it will not schedule any new Executors on Workers (but it does accept the registration of new Clients and Workers). At this point, the Master attempts to reconnect to all Workers and Client applications that were registered at the time of failure. After confirming either the existence or nonexistence of all such nodes (within a certain timeout), the Master will exit Recovery mode and resume normal scheduling.	2013-09-26 14:59:35 -07:00
shane-huang	3a5aa920fc	rm bin/spark.cmd as we don't have windows test environment. Will added it later if needed Signed-off-by: shane-huang <shengsheng.huang@intel.com>	2013-09-26 17:10:08 +08:00
shane-huang	e8b1ee04fc	fix paths and change spark to use APP_MEM as application driver memory instead of SPARK_MEM, user should add application jars to SPARK_CLASSPATH Signed-off-by: shane-huang <shengsheng.huang@intel.com>	2013-09-26 17:08:47 +08:00
shane-huang	1d53792a0a	add scripts in bin Signed-off-by: shane-huang <shengsheng.huang@intel.com>	2013-09-23 16:13:46 +08:00
shane-huang	1d1a625800	moved user scripts to bin folder Signed-off-by: shane-huang <shengsheng.huang@intel.com>	2013-09-23 12:46:48 +08:00
shane-huang	fcfe4f9204	add admin scripts to sbin Signed-off-by: shane-huang <shengsheng.huang@intel.com>	2013-09-23 12:42:34 +08:00
shane-huang	dfbdc9ddb7	added spark-class and spark-executor to sbin Signed-off-by: shane-huang <shengsheng.huang@intel.com>	2013-09-23 11:28:58 +08:00
Joseph E. Gonzalez	8b59fb72c4	Merging latest changes from spark main branch	2013-09-17 20:56:12 -07:00
Prashant Sharma	a90e0eff59	version changed 2.9.3 -> 2.10 in shell script.	2013-09-15 12:47:20 +05:30
Prashant Sharma	4106ae9fbf	Merged with master	2013-09-06 17:53:01 +05:30
Matei Zaharia	3db404a43a	Run script fixes for Windows after package & assembly change	2013-09-01 23:45:57 +00:00
Matei Zaharia	46eecd110a	Initial work to rename package to org.apache.spark	2013-09-01 14:13:13 -07:00
Matei Zaharia	2ee6a7e32a	Print output from spark-daemon only when it fails to launch	2013-08-31 17:31:07 -07:00
Matei Zaharia	89a20b83e9	Delete some code that was added back in a merge and print less info in spark-daemon	2013-08-31 16:55:25 -07:00
Matei Zaharia	aab345c463	Fix finding of assembly JAR, as well as some pointers to ./run	2013-08-29 21:19:06 -07:00
Matei Zaharia	53cd50c069	Change build and run instructions to use assemblies This commit makes Spark invocation saner by using an assembly JAR to find all of Spark's dependencies instead of adding all the JARs in lib_managed. It also packages the examples into an assembly and uses that as SPARK_EXAMPLES_JAR. Finally, it replaces the old "run" script with two better-named scripts: "run-examples" for examples, and "spark-class" for Spark internal classes (e.g. REPL, master, etc). This is also designed to minimize the confusion people have in trying to use "run" to run their own classes; it's not meant to do that, but now at least if they look at it, they can modify run-examples to do a decent job for them. As part of this, Bagel's examples are also now properly moved to the examples package instead of bagel.	2013-08-29 21:19:04 -07:00
Jey Kottalam	47a7c4338a	Don't assume spark-examples JAR always exists	2013-08-18 16:59:02 -07:00
Jey Kottalam	ad580b94d5	Maven build now also works with YARN	2013-08-16 13:50:12 -07:00
Jey Kottalam	cb4ef19214	yarn support	2013-08-15 16:50:37 -07:00
Joseph E. Gonzalez	499a0d8383	Merged graphx from @rxin into master	2013-08-06 12:28:29 -07:00
Patrick Wendell	b4905c383b	Log the launch command for Spark daemons For debugging and analysis purposes, it's nice to have the exact command used to launch Spark contained within the logs. This adds the necessary hooks to make that possible.	2013-08-02 16:58:19 -07:00
Jey Kottalam	1d10192806	Fix setting of SPARK_EXAMPLES_JAR	2013-07-24 14:04:17 -07:00
Josh Rosen	c83680434b	Add JavaAPICompletenessChecker. This is used to find methods in the Scala API that need to be ported to the Java API. To use it: ./run spark.tools.JavaAPICompletenessChecker Conflicts: project/SparkBuild.scala run run2.cmd	2013-07-22 16:11:49 -07:00
Ubuntu	88a0823c58	Consistently invoke bash with /usr/bin/env bash in scripts to make code more portable (JIRA Ticket SPARK-817)	2013-07-18 00:51:18 +00:00
Matei Zaharia	4ff494de20	Some missing license headers	2013-07-16 17:26:48 -07:00
Matei Zaharia	af3c9d5042	Add Apache license headers and LICENSE and NOTICE files	2013-07-16 17:21:33 -07:00
Prashant Sharma	a3494d405d	Merge branch 'master' of github.com:mesos/spark into scala-2.10 Conflicts: core/src/main/scala/spark/Utils.scala core/src/test/scala/spark/ui/UISuite.scala project/SparkBuild.scala run	2013-07-15 11:15:55 +05:30
Matei Zaharia	cd28d9c147	Merge remote-tracking branch 'origin/pr/662' Conflicts: bin/compute-classpath.sh	2013-07-13 19:10:00 -07:00
Prashant Sharma	e86d5dbaad	Merge branch 'master' into master-merge Conflicts: README.md core/pom.xml core/src/main/scala/spark/deploy/JsonProtocol.scala core/src/main/scala/spark/deploy/LocalSparkCluster.scala core/src/main/scala/spark/deploy/master/Master.scala core/src/main/scala/spark/deploy/master/MasterWebUI.scala core/src/main/scala/spark/deploy/worker/Worker.scala core/src/main/scala/spark/deploy/worker/WorkerWebUI.scala core/src/main/scala/spark/storage/BlockManagerUI.scala core/src/main/scala/spark/util/AkkaUtils.scala pom.xml project/SparkBuild.scala streaming/src/main/scala/spark/streaming/receivers/ActorReceiver.scala	2013-07-12 14:49:16 +05:30
Prashant Sharma	69ae7ea227	Removed some unnecessary code and fixed dependencies	2013-07-11 18:30:18 +05:30
Matei Zaharia	43b24635ee	Renamed ML package to MLlib and added it to classpath	2013-07-05 11:38:53 -07:00
Reynold Xin	ce7e270bb4	Added graph package to the classpath.	2013-06-29 21:28:22 -07:00
Evan Chan	1107b4d55b	Merge branch 'master' into 2013-06/assembly-jar-deploy Conflicts: run Previous changes that I made to run and set-dev-classpath.sh instead have been folded into compute-classpath.sh	2013-06-28 17:18:35 -07:00
Matei Zaharia	03906f7f0a	Fixes to compute-classpath on Windows	2013-06-26 17:40:22 -07:00
Matei Zaharia	6c8d1b2ca6	Fix computation of classpath when we launch java directly The previous version assumed that a CLASSPATH environment variable was set by the "run" script when launching the process that starts the ExecutorRunner, but unfortunately this is not true in tests. Instead, we factor the classpath calculation into an extenral script and call that. NOTE: This includes a Windows version but hasn't yet been tested there.	2013-06-25 18:21:00 -04:00
Evan Chan	4cda8f865a	Add simple usage to start-slave script	2013-06-24 15:14:48 -07:00
Matei Zaharia	dc4073654b	Revert "Fix start-slave not passing instance number to spark-daemon." This reverts commit `a674d67c0a`.	2013-06-11 00:08:02 -04:00
Stephen Haberman	a674d67c0a	Fix start-slave not passing instance number to spark-daemon.	2013-05-28 16:24:19 -05:00
Josh Rosen	cda2b15041	Use ec2-metadata in start-slave.sh. PR #419 applied the same change, but only to start-master.sh, so some workers were still starting their web UI's using internal addresses. This should finally fix SPARK-613.	2013-05-24 13:05:06 -07:00
kalpit	aa9134f72a	spark instance number must be present in log filename to prevent multiple workers from overriding each other's logs	2013-03-26 17:49:30 -07:00
kalpit	f08db010d3	added SPARK_WORKER_INSTANCES : allows spawning multiple worker instances/processes on every slave machine	2013-03-26 17:49:30 -07:00
Shivaram Venkataraman	717b221cca	Detect whether we run on EC2 using ec2-metadata as well	2013-01-26 23:03:11 -08:00
Josh Rosen	1948f46093	Use spark-env.sh to configure standalone master. See SPARK-638. Also fixed a typo in the standalone mode documentation.	2012-12-14 01:20:00 +00:00
Josh Rosen	cdaa0fad51	Use external addresses in standalone WebUI on EC2.	2012-12-01 18:19:13 -08:00
Matei Zaharia	59c0a9ad16	Use hostname instead of IP in deploy scripts to let Akka connect properly	2012-11-27 21:00:04 -08:00
Reynold Xin	f67bcbed07	Use SPARK_MASTER_IP if it is set in start-slaves.sh.	2012-10-19 01:08:23 -07:00
Matei Zaharia	30362a21e7	Update license info on deploy scripts	2012-09-25 14:43:47 -07:00
Denny	8fb955fd40	Add Apache license to non-trivial scripts taken from Hadoop.	2012-08-04 17:04:33 -07:00
Denny	c90c9ec208	Read config variables before to get the master port	2012-08-02 16:12:40 -07:00
Denny	53008c2d8a	Settings variables and bugfix for stop script.	2012-08-02 15:59:39 -07:00
Denny	0ee44c225e	Spark standalone mode cluster scripts. Heavily inspired by Hadoop cluster scripts ;-)	2012-08-01 20:38:52 -07:00

1 2 3

131 commits