ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Henry Saputra	4517326ec6	Remove calls to deprecated mapred's OutputCommitter.cleanupJob because since Hadoop 1.0.4 the mapred OutputCommitter.commitJob should do cleanup job. In fact the implementation of mapred OutputCommitter.commitJob looks like this: public void commitJob(JobContext jobContext) throws IOException { cleanupJob(jobContext); } (The jobContext input argument is type of org.apache.hadoop.mapred.JobContext)	2014-01-07 22:55:56 -08:00
Patrick Wendell	bb6a39a687	Merge pull request #322 from falaki/MLLibDocumentationImprovement SPARK-1009 Updated MLlib docs to show how to use it in Python In addition added detailed examples for regression, clustering and recommendation algorithms in a separate Scala section. Fixed a few minor issues with existing documentation.	2014-01-07 22:32:18 -08:00
Patrick Wendell	cb1b927399	Merge pull request #355 from ScrapCodes/patch-1 Update README.md The link does not work otherwise.	2014-01-07 22:26:28 -08:00
Patrick Wendell	c0f0155eca	Merge pull request #313 from tdas/project-refactor Refactored the streaming project to separate external libraries like Twitter, Kafka, Flume, etc. At a high level, these are the following changes. 1. All the external code was put in `SPARK_HOME/external/` as separate SBT projects and Maven modules. Their artifact names are `spark-streaming-twitter`, `spark-streaming-kafka`, etc. Both SparkBuild.scala and pom.xml files have been updated. References to external libraries and repositories have been removed from the settings of root and streaming projects/modules. 2. To avail the external functionality (say, creating a Twitter stream), the developer has to `import org.apache.spark.streaming.twitter._` . For Scala API, the developer has to call `TwitterUtils.createStream(streamingContext, ...)`. For the Java API, the developer has to call `TwitterUtils.createStream(javaStreamingContext, ...)`. 3. Each external project has its own scala and java unit tests. Note the unit tests of each external library use classes of the streaming unit tests (`TestSuiteBase`, `LocalJavaStreamingContext`, etc.). To enable this code sharing among test classes, `dependsOn(streaming % "compile->compile,test->test")` was used in the SparkBuild.scala . In the streaming/pom.xml, an additional `maven-jar-plugin` was necessary to capture this dependency (see comment inside the pom.xml for more information). 4. Jars of the external projects have been added to examples project but not to the assembly project. 5. In some files, imports have been rearrange to conform to the Spark coding guidelines.	2014-01-07 22:21:52 -08:00
Prashant Sharma	d1f2805712	Update README.md The link does not work otherwise.	2014-01-08 11:36:26 +05:30
Patrick Wendell	f5f12dc282	Merge pull request #336 from liancheng/akka-remote-lookup Get rid of `Either[ActorRef, ActorSelection]' In this pull request, instead of returning an `Either[ActorRef, ActorSelection]`, `registerOrLookup` identifies the remote actor blockingly to obtain an `ActorRef`, or throws an exception if the remote actor doesn't exist or the lookup times out (configured by `spark.akka.lookupTimeout`). This function is only called when an `SparkEnv` is constructed (instantiating driver or executor), so the blocking call is considered acceptable. Executor side `ActorSelection`s/`ActorRef`s to driver side `MapOutputTrackerMasterActor` and `BlockManagerMasterActor` are affected by this pull request. `ActorSelection` is dangerous and should be used with care. It's only absolutely safe to send messages via an `ActorSelection` when the remote actor is stateless, so that actor incarnation is irrelevant. But as pointed by @ScrapCodes in the comments below, executor exits immediately once the connection to the driver lost, `ActorSelection`s are not harmful in this scenario. So this pull request is mostly a code style patch.	2014-01-07 21:56:35 -08:00
Matei Zaharia	11891e68c3	Merge pull request #327 from lucarosellini/master Added ‘-i’ command line option to Spark REPL We had to create a new implementation of both scala.tools.nsc.CompilerCommand and scala.tools.nsc.Settings, because using scala.tools.nsc.GenericRunnerSettings would bring in other options (-howtorun, -save and -execute) which don’t make sense in Spark. Any new Spark specific command line option could now be added to org.apache.spark.repl.SparkRunnerSettings class. Since the behavior of loading a script from the command line should be the same as loading it using the “:load” command inside the shell, the script should be loaded when the SparkContext is available, that’s why we had to move the call to ‘loadfiles(settings)’ _after_ the call to postInitialization(). This still doesn’t work if ‘isAsync = true’.	2014-01-08 00:32:18 -05:00
Matei Zaharia	7d0aac917b	Merge pull request #354 from hsaputra/addasfheadertosbt Add ASF header to the new sbt script. Add ASF header to the new sbt script.	2014-01-08 00:30:45 -05:00
Matei Zaharia	d75dc428da	Merge pull request #350 from mateiz/standalone-limit Add way to limit default # of cores used by apps in standalone mode Also documents the spark.deploy.spreadOut option, and fixes a config option that had a dash in its name.	2014-01-08 00:30:03 -05:00
Hossein Falaki	46cb980a5f	Fixed merge conflict	2014-01-07 21:28:26 -08:00
Henry Saputra	226b58ada2	Add ASF header to the new sbt script.	2014-01-07 21:07:27 -08:00
Patrick Wendell	61674bcadf	Merge pull request #352 from markhamstra/oldArch Don't leave os.arch unset after BlockManagerSuite Recent SparkConf changes meant that BlockManagerSuite was now leaving the os.arch System.property unset. That's a problem for any subsequent tests that rely upon having a valid os.arch. This is true for CompressionCodecSuite in the usual maven build test order, even though it isn't usually true for the sbt build.	2014-01-07 18:32:13 -08:00
Patrick Wendell	82a1d38aea	Simplify and fix pyspark script. This patch removes compatibility for IPython < 1.0 but fixes the launch script and makes it much simpler. I tested this using the three commands in the PySpark documentation page: 1. IPYTHON=1 ./pyspark 2. IPYTHON_OPTS="notebook" ./pyspark 3. IPYTHON_OPTS="notebook --pylab inline" ./pyspark There are two changes: - We rely on PYTHONSTARTUP env var to start PySpark - Removed the quotes around $IPYTHON_OPTS... having quotes gloms them together as a single argument passed to `exec` which seemed to cause ipython to fail (it instead expects them as multiple arguments).	2014-01-07 17:55:25 -08:00
Patrick Wendell	b2e690f839	Merge pull request #328 from falaki/MatrixFactorizationModel-fix SPARK-1012: DAGScheduler Exception Fix Added a predict method to MatrixFactorizationModel to enable bulk prediction. This method takes and RDD[(Int, Int)] of users and products and return an RDD with a Rating element per each element in the input RDD. Also added python bindings to the new bulk prediction methods to address SPARK-1011 issue. This is ready to be merged now.	2014-01-07 16:57:08 -08:00
Mark Hamstra	86ed1ad252	Fix BlockManagerSuite#after	2014-01-07 16:39:37 -08:00
Matei Zaharia	2c421749ea	Address review comments	2014-01-07 19:30:23 -05:00
Patrick Wendell	6ccf8ce705	Merge pull request #351 from pwendell/maven-fix Add log4j exclusion rule to maven. To make this work I had to rename the defaults file. Otherwise maven's pattern matching rules included it when trying to match other log4j.properties files. I also fixed a bug in the existing maven build where two <transformers> tags were present in assembly/pom.xml such that one overwrote the other.	2014-01-07 15:49:14 -08:00
Patrick Wendell	e21a707a13	Adding unit tests and some refactoring to promote testability.	2014-01-07 15:39:47 -08:00
Hossein Falaki	3a8beb46cb	Merge branch 'master' into MatrixFactorizationModel-fix	2014-01-07 15:22:42 -08:00
Matei Zaharia	044c8ad3a4	Fix unit test compilation	2014-01-07 16:12:20 -05:00
Patrick Wendell	e688e11206	Add log4j exclusion rule to maven. To make this work I had to rename the defaults file. Otherwise maven's pattern matching rules included it when trying to match other log4j.properties files. I also fixed a bug in the existing maven build where two <transformers> tags were present in assembly/pom.xml such that one overwrote the other.	2014-01-07 12:56:24 -08:00
Andrew Or	80ba9f8ba0	Get SparkConf from SparkEnv, rather than creating new ones	2014-01-07 12:44:22 -08:00
Matei Zaharia	d8bcc8e9a0	Add way to limit default # of cores used by applications on standalone mode Also documents the spark.deploy.spreadOut option.	2014-01-07 14:35:52 -05:00
Reynold Xin	7d5fa175ca	Merge pull request #337 from yinxusen/mllib-16-bugfix Mllib 16 bugfix Bug fix: https://spark-project.atlassian.net/browse/MLLIB-16 Hi, I fixed the bug and added a test suite for `GradientDescent`. There are 2 checks in the test case. First, the final loss must be lower than the initial one. Second, the trend of loss sequence should be decreasing, i.e., at least 80% iterations have lower losses than their prior iterations. Thanks!	2014-01-07 11:31:34 -08:00
Reynold Xin	71fc113574	Merge pull request #349 from CodingCat/support-worker_dir add the comments about SPARK_WORKER_DIR this env variable seems to be forgotten in many cases we need to set this variable, e.g. in EC2, we have to move the large application log files from the EBS to the ephemeral storage	2014-01-07 11:30:35 -08:00
Tathagata Das	8f02f1c3d4	Fixed examples/pom.xml and run-example based on Patrick's suggestions.	2014-01-07 11:02:29 -08:00
CodingCat	3633172e30	add the comments about SPARK_WORKER_DIR this env variable seems to be forgotten …	2014-01-07 12:53:04 -05:00
Reynold Xin	15d9534501	Merge pull request #318 from srowen/master Suggested small changes to Java code for slightly more standard style, encapsulation and in some cases performance Sorry if this is too abrupt or not a welcome set of changes, but thought I'd see if I could contribute a little. I'm a Java developer and just getting seriously into Spark. So I thought I'd suggest a number of small changes to the couple Java parts of the code to make it a little tighter, more standard and even a bit faster. Feel free to take all, some or none of this. Happy to explain any of it.	2014-01-07 08:10:02 -08:00
Reynold Xin	468af0fa03	Merge pull request #348 from prabeesh/master spark -> org.apache.spark Changed package name spark to org.apache.spark which was missing in some of the files	2014-01-07 08:09:01 -08:00
Tathagata Das	aa99f226a6	Removed XYZFunctions and added XYZUtils as a common Scala and Java interface for creating XYZ streams.	2014-01-07 01:56:15 -08:00
Sean Owen	4b92a20232	Issue #318 : minor style updates per review from Reynold Xin	2014-01-07 09:38:45 +00:00
Patrick Wendell	c3cf0475e8	Merge pull request #339 from ScrapCodes/conf-improvements Conf improvements There are two new features. 1. Allow users to set arbitrary akka configurations via spark conf. 2. Allow configuration to be printed in logs for diagnosis.	2014-01-07 00:54:25 -08:00
Luca Rosellini	4689ce29fd	Added license header and removed @author tag	2014-01-07 09:44:24 +01:00
Reynold Xin	a862cafacf	Merge pull request #331 from holdenk/master Add a script to download sbt if not present on the system As per the discussion on the dev mailing list this script will use the system sbt if present or otherwise attempt to install the sbt launcher. The fall back error message in the event it fails instructs the user to install sbt. While the URLs it fetches from aren't controlled by the spark project directly, they are stable and the current authoritative sources.	2014-01-07 00:18:20 -08:00
Holden Karau	60a7a6b31a	Use awk to extract the version	2014-01-06 23:45:27 -08:00
Prashant Sharma	c729fa7c8e	formatting related fixes suggested by Patrick.	2014-01-07 13:08:16 +05:30
Prashant Sharma	b84dc780d3	Allow configuration to be printed in logs for diagnosis.	2014-01-07 13:01:43 +05:30
Prashant Sharma	b3018811e1	Allow users to set arbitrary akka configurations via spark conf.	2014-01-07 13:01:43 +05:30
Holden Karau	b590adb2ad	Put quote arround arguments passed down to system sbt	2014-01-06 23:31:39 -08:00
prabeesh	a91f14cfdc	spark -> org.apache.spark	2014-01-07 12:21:20 +05:30
Patrick Wendell	b72cceba27	Some doc fixes	2014-01-06 22:05:53 -08:00
Patrick Wendell	b97ef218f3	Merge pull request #346 from sproblvem/patch-1 Update stop-slaves.sh The most recently version has changed the directory structure, but this script "sbin/stop-all.sh" doesn't change with it accordingly. This mistake makes "sbin/stop-all.sh" can't stop the slave node.	2014-01-06 20:12:57 -08:00
Patrick Wendell	6a3daead2d	Fixes after merge	2014-01-06 20:12:45 -08:00
sproblvem	dea4ba9d80	Update stop-slaves.sh The most recently version has changed the directory structure, but this script "sbin/stop-all.sh" doesn't change with it accordingly. This mistake makes "sbin/stop-all.sh" can't stop the slave node.	2014-01-07 11:11:59 +08:00
Raymond Liu	67af803136	Export --file for YarnClient mode to support sending extra files to worker on yarn cluster	2014-01-07 10:24:11 +08:00
Raymond Liu	da4694a0d8	Minor typo fix for yarn client	2014-01-07 10:24:10 +08:00
Patrick Wendell	c0498f9265	Merge remote-tracking branch 'apache-github/master' into standalone-driver Conflicts: core/src/main/scala/org/apache/spark/deploy/client/AppClient.scala core/src/main/scala/org/apache/spark/deploy/client/TestClient.scala core/src/main/scala/org/apache/spark/deploy/master/Master.scala core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala core/src/main/scala/org/apache/spark/scheduler/cluster/SparkDeploySchedulerBackend.scala	2014-01-06 17:29:21 -08:00
Patrick Wendell	f236ddd1a2	Changes based on review feedback.	2014-01-06 17:15:52 -08:00
Patrick Wendell	e4d6057b66	Merge pull request #343 from pwendell/build-fix Fix test breaking downstream builds This wasn't detected in the pull-request-builder because it manually sets SPARK_HOME. I'm going to change that (it should't do this) to make it like the other builds.	2014-01-06 14:56:54 -08:00
Patrick Wendell	9272a004af	Fix test breaking downstream builds	2014-01-06 13:03:19 -08:00

1 2 3 4 5 ...

5434 commits