ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Ankur Dave	7309a29c75	Removed Kryo dependency and graphx-shell	2014-01-09 00:13:23 -08:00
Patrick Wendell	49cbf48bcc	Small typo fix	2014-01-09 00:12:34 -08:00
Matei Zaharia	a01f3401e3	Use typed getters for configuration settings	2014-01-09 00:07:29 -08:00
Patrick Wendell	4d2e388e6a	Don't delegate to users `sbt`. This changes our `sbt/sbt` script to not delegate to the user's `sbt` even if it is present. If users already have sbt installed and they want to use their own sbt, we'd expect them to just call sbt directly from within Spark. We no longer set any enironment variables or anything from this script, so they should just launch sbt directly on their own. There are a number of hard-to-debug issues which can come from the current appraoch. One is if the user is unaware of an existing sbt installation and now without explanation their build breaks because they haven't configured options correctly (such as permgen size) within their sbt. Another is if the user has a much older version of sbt hanging around, in which case some of the older versions don't acutally work well when newer verisons of sbt are specified in the build file (reported by @marmbrus). A third is if the user has done some other modification to their sbt script, such as setting it to delegate to sbt/sbt in Spark, and this causes that to break (also reported by @marmbrus). So to keep things simple let's just avoid this path and remove it. Any user who already has sbt and wants to build spark with it should be able to understand easily how to do it.	2014-01-08 23:56:53 -08:00
Patrick Wendell	dceedb4660	Merge pull request #364 from pwendell/fix Fixing config option "retained_stages" => "retainedStages". This is a very esoteric option and it's out of sync with the style we use. So it seems fitting to fix it for 0.9.0.	2014-01-08 23:19:28 -08:00
Prashant Sharma	59b03e015d	Fixes corresponding to Reynolds feedback comments	2014-01-09 12:26:30 +05:30
Ankur Dave	22374559a2	Remove GraphX README	2014-01-08 22:48:54 -08:00
Ankur Dave	74fdfac112	Fix AbstractMethodError by inlining zip{Edge,Vertex}Partitions The zip{Edge,Vertex}Partitions methods created doubly-nested closures and passed them to zipPartitions. For some reason this caused an AbstractMethodError when zipPartitions tried to invoke the closure. This commit works around the problem by inlining these methods wherever they are called, eliminating the doubly-nested closure.	2014-01-08 21:19:14 -08:00
Ankur Dave	ab861d8450	Take SparkConf in constructor of Serializer subclasses	2014-01-08 21:19:14 -08:00
Ankur Dave	0ad75cdfb0	Manifest -> Tag in variable names	2014-01-08 21:19:14 -08:00
Ankur Dave	ac536345f8	ClassManifest -> ClassTag	2014-01-08 21:19:14 -08:00
Ankur Dave	78d6b13ac8	Fix mis-merge in `44fd30d3fb`	2014-01-08 21:19:14 -08:00
Ankur Dave	91227566bc	Merge remote-tracking branch 'spark-upstream/master' into HEAD Conflicts: README.md core/src/main/scala/org/apache/spark/util/collection/OpenHashMap.scala core/src/main/scala/org/apache/spark/util/collection/OpenHashSet.scala core/src/main/scala/org/apache/spark/util/collection/PrimitiveKeyOpenHashMap.scala pom.xml project/SparkBuild.scala repl/src/main/scala/org/apache/spark/repl/SparkILoop.scala	2014-01-08 21:19:08 -08:00
Patrick Wendell	112c0a1776	Fixing config option "retained_stages" => "retainedStages". This is a very esoteric option and it's out of sync with the style we use. So it seems fitting to fix it for 0.9.0.	2014-01-08 21:16:16 -08:00
Patrick Wendell	0f9d2ace6b	Adding polling to driver submission client.	2014-01-08 16:56:26 -08:00
Reynold Xin	46f6a3b6aa	Minor style cleanup. Mostly on indenting & line width changes.	2014-01-08 14:55:04 -08:00
Reynold Xin	04d83fc37f	Merge pull request #360 from witgo/master fix make-distribution.sh show version: command not found	2014-01-08 11:55:37 -08:00
Reynold Xin	56ebfeaa52	Merge pull request #357 from hsaputra/set_boolean_paramname Set boolean param name for call to SparkHadoopMapReduceUtil.newTaskAttemptID Set boolean param name for call to SparkHadoopMapReduceUtil.newTaskAttemptID to make it clear which param being set.	2014-01-08 11:50:06 -08:00
Patrick Wendell	bdeaeafbda	Merge pull request #358 from pwendell/add-cdh Add CDH Repository to Maven Build At some point this was removed from the Maven build... so I'm adding it back. It's needed for the Hadoop2 tests we run on Jenkins and it's also included in the SBT build.	2014-01-08 11:48:39 -08:00
Reynold Xin	5cae05f59e	Merge pull request #356 from hsaputra/remove_deprecated_cleanup_method Remove calls to deprecated mapred's OutputCommitter.cleanupJob Since Hadoop 1.0.4 the mapred OutputCommitter.commitJob should do cleanup job via call to OutputCommitter.cleanupJob, Remove SparkHadoopWriter.cleanup since it is used only by PairRDDFunctions. In fact the implementation of mapred OutputCommitter.commitJob looks like this: public void commitJob(JobContext jobContext) throws IOException { cleanupJob(jobContext); }	2014-01-08 11:47:28 -08:00
walker	d942f95d7e	Merge remote branch 'upstream/master'	2014-01-09 01:22:26 +08:00
liguoqiang	cf4aaf92d6	fix make-distribution.sh show version: command not found	2014-01-09 00:34:53 +08:00
Thomas Graves	6eef78d769	Merge pull request #345 from colorant/yarn support distributing extra files to worker for yarn client mode So that user doesn't need to package all dependency into one assemble jar as spark app jar	2014-01-08 08:49:20 -06:00
Tathagata Das	a17cc602ac	More bug fixes.	2014-01-08 04:12:05 -08:00
Tathagata Das	0b7a132d03	Modified checkpoing file clearing policy.	2014-01-08 03:22:06 -08:00
Prashant Sharma	277b4a36c5	we clone hadoop key and values by default and reuse if specified.	2014-01-08 16:32:55 +05:30
Patrick Wendell	3209a86f39	Add CDH Repository to Maven Build	2014-01-08 01:21:17 -08:00
Patrick Wendell	62b08faac5	Adding mockito to maven build	2014-01-08 00:45:41 -08:00
Patrick Wendell	bc81ce040d	Merge remote-tracking branch 'apache-github/master' into standalone-driver Conflicts: core/src/test/scala/org/apache/spark/deploy/JsonProtocolSuite.scala pom.xml	2014-01-08 00:38:31 -08:00
Henry Saputra	aa56585d21	Resolve PR review over 100 chars	2014-01-08 00:38:29 -08:00
Patrick Wendell	3ec21f2eee	Show more helpful information in UI	2014-01-08 00:30:10 -08:00
Patrick Wendell	c78b381e91	Fixes	2014-01-08 00:09:12 -08:00
Patrick Wendell	d0533f7046	Rename to Client	2014-01-07 23:38:51 -08:00
Patrick Wendell	3d939e5fe8	Adding --verbose option to DriverClient	2014-01-07 23:27:18 -08:00
Henry Saputra	f6b6f88367	Set boolean param name for two files call to SparkHadoopMapReduceUtil.newTaskAttemptID to make it clear which param being set.	2014-01-07 23:23:17 -08:00
Henry Saputra	4517326ec6	Remove calls to deprecated mapred's OutputCommitter.cleanupJob because since Hadoop 1.0.4 the mapred OutputCommitter.commitJob should do cleanup job. In fact the implementation of mapred OutputCommitter.commitJob looks like this: public void commitJob(JobContext jobContext) throws IOException { cleanupJob(jobContext); } (The jobContext input argument is type of org.apache.hadoop.mapred.JobContext)	2014-01-07 22:55:56 -08:00
Patrick Wendell	bb6a39a687	Merge pull request #322 from falaki/MLLibDocumentationImprovement SPARK-1009 Updated MLlib docs to show how to use it in Python In addition added detailed examples for regression, clustering and recommendation algorithms in a separate Scala section. Fixed a few minor issues with existing documentation.	2014-01-07 22:32:18 -08:00
Patrick Wendell	cb1b927399	Merge pull request #355 from ScrapCodes/patch-1 Update README.md The link does not work otherwise.	2014-01-07 22:26:28 -08:00
Patrick Wendell	c0f0155eca	Merge pull request #313 from tdas/project-refactor Refactored the streaming project to separate external libraries like Twitter, Kafka, Flume, etc. At a high level, these are the following changes. 1. All the external code was put in `SPARK_HOME/external/` as separate SBT projects and Maven modules. Their artifact names are `spark-streaming-twitter`, `spark-streaming-kafka`, etc. Both SparkBuild.scala and pom.xml files have been updated. References to external libraries and repositories have been removed from the settings of root and streaming projects/modules. 2. To avail the external functionality (say, creating a Twitter stream), the developer has to `import org.apache.spark.streaming.twitter._` . For Scala API, the developer has to call `TwitterUtils.createStream(streamingContext, ...)`. For the Java API, the developer has to call `TwitterUtils.createStream(javaStreamingContext, ...)`. 3. Each external project has its own scala and java unit tests. Note the unit tests of each external library use classes of the streaming unit tests (`TestSuiteBase`, `LocalJavaStreamingContext`, etc.). To enable this code sharing among test classes, `dependsOn(streaming % "compile->compile,test->test")` was used in the SparkBuild.scala . In the streaming/pom.xml, an additional `maven-jar-plugin` was necessary to capture this dependency (see comment inside the pom.xml for more information). 4. Jars of the external projects have been added to examples project but not to the assembly project. 5. In some files, imports have been rearrange to conform to the Spark coding guidelines.	2014-01-07 22:21:52 -08:00
Prashant Sharma	d1f2805712	Update README.md The link does not work otherwise.	2014-01-08 11:36:26 +05:30
Patrick Wendell	f5f12dc282	Merge pull request #336 from liancheng/akka-remote-lookup Get rid of `Either[ActorRef, ActorSelection]' In this pull request, instead of returning an `Either[ActorRef, ActorSelection]`, `registerOrLookup` identifies the remote actor blockingly to obtain an `ActorRef`, or throws an exception if the remote actor doesn't exist or the lookup times out (configured by `spark.akka.lookupTimeout`). This function is only called when an `SparkEnv` is constructed (instantiating driver or executor), so the blocking call is considered acceptable. Executor side `ActorSelection`s/`ActorRef`s to driver side `MapOutputTrackerMasterActor` and `BlockManagerMasterActor` are affected by this pull request. `ActorSelection` is dangerous and should be used with care. It's only absolutely safe to send messages via an `ActorSelection` when the remote actor is stateless, so that actor incarnation is irrelevant. But as pointed by @ScrapCodes in the comments below, executor exits immediately once the connection to the driver lost, `ActorSelection`s are not harmful in this scenario. So this pull request is mostly a code style patch.	2014-01-07 21:56:35 -08:00
Matei Zaharia	11891e68c3	Merge pull request #327 from lucarosellini/master Added ‘-i’ command line option to Spark REPL We had to create a new implementation of both scala.tools.nsc.CompilerCommand and scala.tools.nsc.Settings, because using scala.tools.nsc.GenericRunnerSettings would bring in other options (-howtorun, -save and -execute) which don’t make sense in Spark. Any new Spark specific command line option could now be added to org.apache.spark.repl.SparkRunnerSettings class. Since the behavior of loading a script from the command line should be the same as loading it using the “:load” command inside the shell, the script should be loaded when the SparkContext is available, that’s why we had to move the call to ‘loadfiles(settings)’ _after_ the call to postInitialization(). This still doesn’t work if ‘isAsync = true’.	2014-01-08 00:32:18 -05:00
Matei Zaharia	7d0aac917b	Merge pull request #354 from hsaputra/addasfheadertosbt Add ASF header to the new sbt script. Add ASF header to the new sbt script.	2014-01-08 00:30:45 -05:00
Matei Zaharia	d75dc428da	Merge pull request #350 from mateiz/standalone-limit Add way to limit default # of cores used by apps in standalone mode Also documents the spark.deploy.spreadOut option, and fixes a config option that had a dash in its name.	2014-01-08 00:30:03 -05:00
Hossein Falaki	46cb980a5f	Fixed merge conflict	2014-01-07 21:28:26 -08:00
Henry Saputra	226b58ada2	Add ASF header to the new sbt script.	2014-01-07 21:07:27 -08:00
Patrick Wendell	61674bcadf	Merge pull request #352 from markhamstra/oldArch Don't leave os.arch unset after BlockManagerSuite Recent SparkConf changes meant that BlockManagerSuite was now leaving the os.arch System.property unset. That's a problem for any subsequent tests that rely upon having a valid os.arch. This is true for CompressionCodecSuite in the usual maven build test order, even though it isn't usually true for the sbt build.	2014-01-07 18:32:13 -08:00
Patrick Wendell	82a1d38aea	Simplify and fix pyspark script. This patch removes compatibility for IPython < 1.0 but fixes the launch script and makes it much simpler. I tested this using the three commands in the PySpark documentation page: 1. IPYTHON=1 ./pyspark 2. IPYTHON_OPTS="notebook" ./pyspark 3. IPYTHON_OPTS="notebook --pylab inline" ./pyspark There are two changes: - We rely on PYTHONSTARTUP env var to start PySpark - Removed the quotes around $IPYTHON_OPTS... having quotes gloms them together as a single argument passed to `exec` which seemed to cause ipython to fail (it instead expects them as multiple arguments).	2014-01-07 17:55:25 -08:00
Patrick Wendell	b2e690f839	Merge pull request #328 from falaki/MatrixFactorizationModel-fix SPARK-1012: DAGScheduler Exception Fix Added a predict method to MatrixFactorizationModel to enable bulk prediction. This method takes and RDD[(Int, Int)] of users and products and return an RDD with a Rating element per each element in the input RDD. Also added python bindings to the new bulk prediction methods to address SPARK-1011 issue. This is ready to be merged now.	2014-01-07 16:57:08 -08:00
Mark Hamstra	86ed1ad252	Fix BlockManagerSuite#after	2014-01-07 16:39:37 -08:00

... 3 4 5 6 7 ...

6058 commits