ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Mridul Muralidharan	968c0187a1	SPARK-1586 Windows build fixes Unfortunately, this is not exhaustive - particularly hive tests still fail due to path issues. Author: Mridul Muralidharan <mridulm80@apache.org> This patch had conflicts when merged, resolved by Committer: Matei Zaharia <matei@databricks.com> Closes #505 from mridulm/windows_fixes and squashes the following commits: ef12283 [Mridul Muralidharan] Move to org.apache.commons.lang3 for StringEscapeUtils. Earlier version was buggy appparently cdae406 [Mridul Muralidharan] Remove leaked changes from > 2G fix branch 3267f4b [Mridul Muralidharan] Fix build failures 35b277a [Mridul Muralidharan] Fix Scalastyle failures bc69d14 [Mridul Muralidharan] Change from hardcoded path separator 10c4d78 [Mridul Muralidharan] Use explicit encoding while using getBytes 1337abd [Mridul Muralidharan] fix classpath while running in windows	2014-04-24 20:48:33 -07:00
Sandeep	a03ac222d8	Fix Scala Style Any comments are welcome Author: Sandeep <sandeep@techaddict.me> Closes #531 from techaddict/stylefix-1 and squashes the following commits: 7492730 [Sandeep] Pass 4 98b2428 [Sandeep] fix rxin suggestions b5e2e6f [Sandeep] Pass 3 05932d7 [Sandeep] fix if else styling 2 08690e5 [Sandeep] fix if else styling	2014-04-24 15:07:23 -07:00
Michael Armbrust	3a390bfd80	REPL cleanup. Author: Michael Armbrust <michael@databricks.com> Closes #451 from marmbrus/replCleanup and squashes the following commits: 088526a [Michael Armbrust] REPL cleanup.	2014-04-19 17:33:37 -07:00
Patrick Wendell	4bc07eebbf	SPARK-1480: Clean up use of classloaders The Spark codebase is a bit fast-and-loose when accessing classloaders and this has caused a few bugs to surface in master. This patch defines some utility methods for accessing classloaders. This makes the intention when accessing a classloader much more explicit in the code and fixes a few cases where the wrong one was chosen. case (a) -> We want the classloader that loaded Spark case (b) -> We want the context class loader, or if not present, we want (a) This patch provides a better fix for SPARK-1403 (https://issues.apache.org/jira/browse/SPARK-1403) than the current work around, which it reverts. It also fixes a previously unreported bug that the `./spark-submit` script did not work for running with `local` master. It didn't work because the executor classloader did not properly delegate to the context class loader (if it is defined) and in local mode the context class loader is set by the `./spark-submit` script. A unit test is added for that case. Author: Patrick Wendell <pwendell@gmail.com> Closes #398 from pwendell/class-loaders and squashes the following commits: b4a1a58 [Patrick Wendell] Minor clean up 14f1272 [Patrick Wendell] SPARK-1480: Clean up use of classloaders	2014-04-13 08:58:37 -07:00
Sandeep	930b70f052	Remove Unnecessary Whitespace's stack these together in a commit else they show up chunk by chunk in different commits. Author: Sandeep <sandeep@techaddict.me> Closes #380 from techaddict/white_space and squashes the following commits: b58f294 [Sandeep] Remove Unnecessary Whitespace's	2014-04-10 15:04:13 -07:00
Andrew Or	79820fe825	[SPARK-1276] Add a HistoryServer to render persisted UI The new feature of event logging, introduced in #42, allows the user to persist the details of his/her Spark application to storage, and later replay these events to reconstruct an after-the-fact SparkUI. Currently, however, a persisted UI can only be rendered through the standalone Master. This greatly limits the use case of this new feature as many people also run Spark on Yarn / Mesos. This PR introduces a new entity called the HistoryServer, which, given a log directory, keeps track of all completed applications independently of a Spark Master. Unlike Master, the HistoryServer needs not be running while the application is still running. It is relatively light-weight in that it only maintains static information of applications and performs no scheduling. To quickly test it out, generate event logs with ```spark.eventLog.enabled=true``` and run ```sbin/start-history-server.sh <log-dir-path>```. Your HistoryServer awaits on port 18080. Comments and feedback are most welcome. --- A few other changes introduced in this PR include refactoring the WebUI interface, which is beginning to have a lot of duplicate code now that we have added more functionality to it. Two new SparkListenerEvents have been introduced (SparkListenerApplicationStart/End) to keep track of application name and start/finish times. This PR also clarifies the semantics of the ReplayListenerBus introduced in #42. A potential TODO in the future (not part of this PR) is to render live applications in addition to just completed applications. This is useful when applications fail, a condition that our current HistoryServer does not handle unless the user manually signals application completion (by creating the APPLICATION_COMPLETION file). Handling live applications becomes significantly more challenging, however, because it is now necessary to render the same SparkUI multiple times. To avoid reading the entire log every time, which is inefficient, we must handle reading the log from where we previously left off, but this becomes fairly complicated because we must deal with the arbitrary behavior of each input stream. Author: Andrew Or <andrewor14@gmail.com> Closes #204 from andrewor14/master and squashes the following commits: 7b7234c [Andrew Or] Finished -> Completed b158d98 [Andrew Or] Address Patrick's comments 69d1b41 [Andrew Or] Do not block on posting SparkListenerApplicationEnd 19d5dd0 [Andrew Or] Merge github.com:apache/spark f7f5bf0 [Andrew Or] Make history server's web UI port a Spark configuration 2dfb494 [Andrew Or] Decouple checking for application completion from replaying d02dbaa [Andrew Or] Expose Spark version and include it in event logs 2282300 [Andrew Or] Add documentation for the HistoryServer 567474a [Andrew Or] Merge github.com:apache/spark 6edf052 [Andrew Or] Merge github.com:apache/spark 19e1fb4 [Andrew Or] Address Thomas' comments 248cb3d [Andrew Or] Limit number of live applications + add configurability a3598de [Andrew Or] Do not close file system with ReplayBus + fix bind address bc46fc8 [Andrew Or] Merge github.com:apache/spark e2f4ff9 [Andrew Or] Merge github.com:apache/spark 050419e [Andrew Or] Merge github.com:apache/spark 81b568b [Andrew Or] Fix strange error messages... 0670743 [Andrew Or] Decouple page rendering from loading files from disk 1b2f391 [Andrew Or] Minor changes a9eae7e [Andrew Or] Merge branch 'master' of github.com:apache/spark d5154da [Andrew Or] Styling and comments 5dbfbb4 [Andrew Or] Merge branch 'master' of github.com:apache/spark 60bc6d5 [Andrew Or] First complete implementation of HistoryServer (only for finished apps) 7584418 [Andrew Or] Report application start/end times to HistoryServer 8aac163 [Andrew Or] Add basic application table c086bd5 [Andrew Or] Add HistoryServer and scripts ++ Refactor WebUI interface	2014-04-10 10:39:34 -07:00
Holden Karau	fa0524fd02	Spark-939: allow user jars to take precedence over spark jars I still need to do a small bit of re-factoring [mostly the one Java file I'll switch it back to a Scala file and use it in both the close loaders], but comments on other things I should do would be great. Author: Holden Karau <holden@pigscanfly.ca> Closes #217 from holdenk/spark-939-allow-user-jars-to-take-precedence-over-spark-jars and squashes the following commits: cf0cac9 [Holden Karau] Fix the executorclassloader 1955232 [Holden Karau] Fix long line in TestUtils 8f89965 [Holden Karau] Fix tests for new class name 7546549 [Holden Karau] CR feedback, merge some of the testutils methods down, rename the classloader 644719f [Holden Karau] User the class generator for the repl class loader tests too f0b7114 [Holden Karau] Fix the core/src/test/scala/org/apache/spark/executor/ExecutorURLClassLoaderSuite.scala tests 204b199 [Holden Karau] Fix the generated classes 9f68f10 [Holden Karau] Start rewriting the ExecutorURLClassLoaderSuite to not use the hard coded classes 858aba2 [Holden Karau] Remove a bunch of test junk 261aaee [Holden Karau] simplify executorurlclassloader a bit 7a7bf5f [Holden Karau] CR feedback d4ae848 [Holden Karau] rewrite component into scala aa95083 [Holden Karau] CR feedback 7752594 [Holden Karau] re-add https comment a0ef85a [Holden Karau] Fix style issues 125ea7f [Holden Karau] Easier to just remove those files, we don't need them bb8d179 [Holden Karau] Fix issues with the repl class loader 241b03d [Holden Karau] fix my rat excludes a343350 [Holden Karau] Update rat-excludes and remove a useless file d90d217 [Holden Karau] Fix fall back with custom class loader and add a test for it 4919bf9 [Holden Karau] Fix parent calling class loader issue 8a67302 [Holden Karau] Test are good 9e2d236 [Holden Karau] It works comrade 691ee00 [Holden Karau] It works ish dc4fe44 [Holden Karau] Does not depend on being in my home directory 47046ff [Holden Karau] Remove bad import' 22d83cb [Holden Karau] Add a test suite for the executor url class loader suite 7ef4628 [Holden Karau] Clean up 792d961 [Holden Karau] Almost works 16aecd1 [Holden Karau] Doesn't quite work 8d2241e [Holden Karau] Adda FakeClass for testing ClassLoader precedence options 648b559 [Holden Karau] Both class loaders compile. Now for testing e1d9f71 [Holden Karau] One loader workers.	2014-04-08 22:30:03 -07:00
Aaron Davidson	0307db0f55	SPARK-1099: Introduce local[] mode to infer number of cores This is the default mode for running spark-shell and pyspark, intended to allow users running spark for the first time to see the performance benefits of using multiple cores, while not breaking backwards compatibility for users who use "local" mode and expect exactly 1 core. Author: Aaron Davidson <aaron@databricks.com> Closes #182 from aarondav/110 and squashes the following commits: a88294c [Aaron Davidson] Rebased changes for new spark-shell a9f393e [Aaron Davidson] SPARK-1099: Introduce local[] mode to infer number of cores	2014-04-07 13:06:30 -07:00
Aaron Davidson	7ce52c4a7a	SPARK-1349: spark-shell gets its own command history Currently, spark-shell shares its command history with scala repl. This fix is simply a modification of the default FileBackedHistory file setting: https://github.com/scala/scala/blob/master/src/repl/scala/tools/nsc/interpreter/session/FileBackedHistory.scala#L77 Author: Aaron Davidson <aaron@databricks.com> Closes #267 from aarondav/repl and squashes the following commits: f9c62d2 [Aaron Davidson] SPARK-1349: spark-shell gets its own command history separate from scala repl	2014-04-06 17:43:44 -07:00
Prashant Sharma	60abc25254	SPARK-1096, a space after comment start style checker. Author: Prashant Sharma <prashant.s@imaginea.com> Closes #124 from ScrapCodes/SPARK-1096/scalastyle-comment-check and squashes the following commits: 214135a [Prashant Sharma] Review feedback. 5eba88c [Prashant Sharma] Fixed style checks for ///+ comments. e54b2f8 [Prashant Sharma] improved message, work around. 83e7144 [Prashant Sharma] removed dependency on scalastyle in plugin, since scalastyle sbt plugin already depends on the right version. Incase we update the plugin we will have to adjust our spark-style project to depend on right scalastyle version. 810a1d6 [Prashant Sharma] SPARK-1096, a space after comment style checker. ba33193 [Prashant Sharma] scala style as a project	2014-03-28 00:21:49 -07:00
Takuya UESHIN	3d89043b7e	[SPARK-1210] Prevent ContextClassLoader of Actor from becoming ClassLoader of Executo... ...r. Constructor of `org.apache.spark.executor.Executor` should not set context class loader of current thread, which is backend Actor's thread. Run the following code in local-mode REPL. ``` scala> case class Foo(i: Int) scala> val ret = sc.parallelize((1 to 100).map(Foo), 10).collect ``` This causes errors as follows: ``` ERROR actor.OneForOneStrategy: [L$line5.$read$$iwC$$iwC$$iwC$$iwC$Foo; java.lang.ArrayStoreException: [L$line5.$read$$iwC$$iwC$$iwC$$iwC$Foo; at scala.runtime.ScalaRunTime$.array_update(ScalaRunTime.scala:88) at org.apache.spark.SparkContext$$anonfun$runJob$3.apply(SparkContext.scala:870) at org.apache.spark.SparkContext$$anonfun$runJob$3.apply(SparkContext.scala:870) at org.apache.spark.scheduler.JobWaiter.taskSucceeded(JobWaiter.scala:56) at org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:859) at org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:616) at org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:207) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) at akka.actor.ActorCell.invoke(ActorCell.scala:456) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) at akka.dispatch.Mailbox.run(Mailbox.scala:219) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) ``` This is because the class loaders to deserialize result `Foo` instances might be different from backend Actor's, and the Actor's class loader should be the same as Driver's. Author: Takuya UESHIN <ueshin@happy-camper.st> Closes #15 from ueshin/wip/wrongcontextclassloader and squashes the following commits: d79e8c0 [Takuya UESHIN] Change a parent class loader of ExecutorURLClassLoader. c6c09b6 [Takuya UESHIN] Add a test to collect objects of class defined in repl. 43e0feb [Takuya UESHIN] Prevent ContextClassLoader of Actor from becoming ClassLoader of Executor.	2014-03-27 22:17:15 -07:00
Sean Owen	1fa48d9422	SPARK-1325. The maven build error for Spark Tools This is just a slight variation on https://github.com/apache/spark/pull/234 and alternative suggestion for SPARK-1325. `scala-actors` is not necessary. `SparkBuild.scala` should be updated to reflect the direct dependency on `scala-reflect` and `scala-compiler`. And the `repl` build, which has the same dependencies, should also be consistent between Maven / SBT. Author: Sean Owen <sowen@cloudera.com> Author: witgo <witgo@qq.com> Closes #240 from srowen/SPARK-1325 and squashes the following commits: 25bd7db [Sean Owen] Add necessary dependencies scala-reflect and scala-compiler to tools. Update repl dependencies, which are similar, to be consistent between Maven / SBT in this regard too.	2014-03-26 18:32:14 -07:00
Patrick Wendell	b9be160951	SPARK-782 Clean up for ASM dependency. This makes two changes. 1) Spark uses the shaded version of asm that is (conveniently) published with Kryo. 2) Existing exclude rules around asm are updated to reflect the new groupId of `org.ow2.asm`. This made all of the old rules not work with newer Hadoop versions that pull in new asm versions. Author: Patrick Wendell <pwendell@gmail.com> Closes #100 from pwendell/asm and squashes the following commits: 9235f3f [Patrick Wendell] SPARK-782 Clean up for ASM dependency.	2014-03-09 13:17:07 -07:00
Sandy Ryza	a99fb3747a	SPARK-1193. Fix indentation in pom.xmls Author: Sandy Ryza <sandy@cloudera.com> Closes #91 from sryza/sandy-spark-1193 and squashes the following commits: a878124 [Sandy Ryza] SPARK-1193. Fix indentation in pom.xmls	2014-03-07 23:10:35 -08:00
Thomas Graves	7edbea41b4	SPARK-1189: Add Security to Spark - Akka, Http, ConnectionManager, UI use servlets resubmit pull request. was https://github.com/apache/incubator-spark/pull/332. Author: Thomas Graves <tgraves@apache.org> Closes #33 from tgravescs/security-branch-0.9-with-client-rebase and squashes the following commits: dfe3918 [Thomas Graves] Fix merge conflict since startUserClass now using runAsUser 05eebed [Thomas Graves] Fix dependency lost in upmerge d1040ec [Thomas Graves] Fix up various imports 05ff5e0 [Thomas Graves] Fix up imports after upmerging to master ac046b3 [Thomas Graves] Merge remote-tracking branch 'upstream/master' into security-branch-0.9-with-client-rebase 13733e1 [Thomas Graves] Pass securityManager and SparkConf around where we can. Switch to use sparkConf for reading config whereever possible. Added ConnectionManagerSuite unit tests. 4a57acc [Thomas Graves] Change UI createHandler routines to createServlet since they now return servlets 2f77147 [Thomas Graves] Rework from comments 50dd9f2 [Thomas Graves] fix header in SecurityManager ecbfb65 [Thomas Graves] Fix spacing and formatting b514bec [Thomas Graves] Fix reference to config ed3d1c1 [Thomas Graves] Add security.md 6f7ddf3 [Thomas Graves] Convert SaslClient and SaslServer to scala, change spark.authenticate.ui to spark.ui.acls.enable, and fix up various other things from review comments 2d9e23e [Thomas Graves] Merge remote-tracking branch 'upstream/master' into security-branch-0.9-with-client-rebase_rework 5721c5a [Thomas Graves] update AkkaUtilsSuite test for the actorSelection changes, fix typos based on comments, and remove extra lines I missed in rebase from AkkaUtils f351763 [Thomas Graves] Add Security to Spark - Akka, Http, ConnectionManager, UI to use servlets	2014-03-06 18:27:50 -06:00
Patrick Wendell	c3f5e07533	SPARK-1121: Include avro for yarn-alpha builds This lets us explicitly include Avro based on a profile for 0.23.X builds. It makes me sad how convoluted it is to express this logic in Maven. @tgraves and @sryza curious if this works for you. I'm also considering just reverting to how it was before. The only real problem was that Spark advertised a dependency on Avro even though it only really depends transitively on Avro through other deps. Author: Patrick Wendell <pwendell@gmail.com> Closes #49 from pwendell/avro-build-fix and squashes the following commits: 8d6ee92 [Patrick Wendell] SPARK-1121: Add avro to yarn-alpha profile	2014-03-02 15:18:19 -08:00
Patrick Wendell	1fd2bfd3dd	Remove remaining references to incubation This removes some loose ends not caught by the other (incubating -> tlp) patches. @markhamstra this updates the version as you mentioned earlier. Author: Patrick Wendell <pwendell@gmail.com> Closes #51 from pwendell/tlp and squashes the following commits: d553b1b [Patrick Wendell] Remove remaining references to incubation	2014-03-02 01:00:16 -08:00
Sean Owen	12bbca2065	SPARK 1084.1 (resubmitted) (Ported from https://github.com/apache/incubator-spark/pull/637 ) Author: Sean Owen <sowen@cloudera.com> Closes #31 from srowen/SPARK-1084.1 and squashes the following commits: 6c4a32c [Sean Owen] Suppress warnings about legitimate unchecked array creations, or change code to avoid it f35b833 [Sean Owen] Fix two misc javadoc problems 254e8ef [Sean Owen] Fix one new style error introduced in scaladoc warning commit 5b2fce2 [Sean Owen] Fix scaladoc invocation warning, and enable javac warnings properly, with plugin config updates 007762b [Sean Owen] Remove dead scaladoc links b8ff8cb [Sean Owen] Replace deprecated Ant <tasks> with <target>	2014-02-27 11:12:21 -08:00
CodingCat	345df5f4a9	[SPARK-1089] fix the regression problem on ADD_JARS in 0.9 https://spark-project.atlassian.net/browse/SPARK-1089 copied from JIRA, reported by @ash211 "Using the ADD_JARS environment variable with spark-shell used to add the jar to both the shell and the various workers. Now it only adds to the workers and importing a custom class in the shell is broken. The workaround is to add custom jars to both ADD_JARS and SPARK_CLASSPATH. We should fix ADD_JARS so it works properly again. See various threads on the user list: https://mail-archives.apache.org/mod_mbox/incubator-spark-user/201402.mbox/%3CCAJbo4neMLiTrnm1XbyqomWmp0m+EUcg4yE-txuRGSVKOb5KLeA@mail.gmail.com%3E (another one that doesn't appear in the archives yet titled "ADD_JARS not working on 0.9")" The reason of this bug is two-folds in the current implementation of SparkILoop.scala, the settings.classpath is not set properly when the process() method is invoked the weird behaviour of Scala 2.10, (I personally thought it is a bug) if we simply set value of a PathSettings object (like settings.classpath), the isDefault is not set to true (this is a flag showing if the variable is modified), so it makes the PathResolver loads the default CLASSPATH environment variable value to calculated the path (see https://github.com/scala/scala/blob/2.10.x/src/compiler/scala/tools/util/PathResolver.scala#L215) what we have to do is to manually make this flag set, (`e3991d97dd/repl/src/main/scala/org/apache/spark/repl/SparkILoop.scala (L884)`) Author: CodingCat <zhunansjtu@gmail.com> Closes #13 from CodingCat/SPARK-1089 and squashes the following commits: 8af81e7 [CodingCat] impose non-null settings 9aa2125 [CodingCat] code cleaning ce36676 [CodingCat] code cleaning e045582 [CodingCat] fix the regression problem on ADD_JARS in 0.9	2014-02-26 23:42:15 -08:00
Sean Owen	c0ef3afa82	SPARK-1071: Tidy logging strategy and use of log4j Prompted by a recent thread on the mailing list, I tried and failed to see if Spark can be made independent of log4j. There are a few cases where control of the underlying logging is pretty useful, and to do that, you have to bind to a specific logger. Instead I propose some tidying that leaves Spark's use of log4j, but gets rid of warnings and should still enable downstream users to switch. The idea is to pipe everything (except log4j) through SLF4J, and have Spark use SLF4J directly when logging, and where Spark needs to output info (REPL and tests), bind from SLF4J to log4j. This leaves the same behavior in Spark. It means that downstream users who want to use something except log4j should: - Exclude dependencies on log4j, slf4j-log4j12 from Spark - Include dependency on log4j-over-slf4j - Include dependency on another logger X, and another slf4j-X - Recreate any log config that Spark does, that is needed, in the other logger's config That sounds about right. Here are the key changes: - Include the jcl-over-slf4j shim everywhere by depending on it in core. - Exclude dependencies on commons-logging from third-party libraries. - Include the jul-to-slf4j shim everywhere by depending on it in core. - Exclude slf4j-* dependencies from third-party libraries to prevent collision or warnings - Added missing slf4j-log4j12 binding to GraphX, Bagel module tests And minor/incidental changes: - Update to SLF4J 1.7.5, which happily matches Hadoop 2’s version and is a recommended update over 1.7.2 - (Remove a duplicate HBase dependency declaration in SparkBuild.scala) - (Remove a duplicate mockito dependency declaration that was causing warnings and bugging me) Author: Sean Owen <sowen@cloudera.com> Closes #570 from srowen/SPARK-1071 and squashes the following commits: 52eac9f [Sean Owen] Add slf4j-over-log4j12 dependency to core (non-test) and remove it from things that depend on core. 77a7fa9 [Sean Owen] SPARK-1071: Tidy logging strategy and use of log4j	2014-02-23 11:40:55 -08:00
CodingCat	e0d49ad229	[SPARK-1090] improvement on spark_shell (help information, configure memory) https://spark-project.atlassian.net/browse/SPARK-1090 spark-shell should print help information about parameters and should allow user to configure exe memory there is no document about hot to set --cores/-c in spark-shell and also users should be able to set executor memory through command line options In this PR I also check the format of the options passed by the user Author: CodingCat <zhunansjtu@gmail.com> Closes #599 from CodingCat/spark_shell_improve and squashes the following commits: de5aa38 [CodingCat] add parameter to set driver memory 915cbf8 [CodingCat] improvement on spark_shell (help information, configure memory)	2014-02-17 15:12:52 -08:00
Patrick Wendell	b69f8b2a01	Merge pull request #557 from ScrapCodes/style. Closes #557 . SPARK-1058, Fix Style Errors and Add Scala Style to Spark Build. Author: Patrick Wendell <pwendell@gmail.com> Author: Prashant Sharma <scrapcodes@gmail.com> == Merge branch commits == commit 1a8bd1c059b842cb95cc246aaea74a79fec684f4 Author: Prashant Sharma <scrapcodes@gmail.com> Date: Sun Feb 9 17:39:07 2014 +0530 scala style fixes commit f91709887a8e0b608c5c2b282db19b8a44d53a43 Author: Patrick Wendell <pwendell@gmail.com> Date: Fri Jan 24 11:22:53 2014 -0800 Adding scalastyle snapshot	2014-02-09 10:09:19 -08:00
Mark Hamstra	c2341c92bb	Merge pull request #542 from markhamstra/versionBump. Closes #542 . Version number to 1.0.0-SNAPSHOT Since 0.9.0-incubating is done and out the door, we shouldn't be building 0.9.0-incubating-SNAPSHOT anymore. @pwendell Author: Mark Hamstra <markhamstra@gmail.com> == Merge branch commits == commit 1b00a8a7c1a7f251b4bb3774b84b9e64758eaa71 Author: Mark Hamstra <markhamstra@gmail.com> Date: Wed Feb 5 09:30:32 2014 -0800 Version number to 1.0.0-SNAPSHOT	2014-02-08 16:00:43 -08:00
Patrick Wendell	23034798d7	Add missing header files	2014-01-14 01:17:13 -08:00
Patrick Wendell	0bb33076e2	Removing mentions in tests	2014-01-12 16:53:58 -08:00
Matei Zaharia	11891e68c3	Merge pull request #327 from lucarosellini/master Added ‘-i’ command line option to Spark REPL We had to create a new implementation of both scala.tools.nsc.CompilerCommand and scala.tools.nsc.Settings, because using scala.tools.nsc.GenericRunnerSettings would bring in other options (-howtorun, -save and -execute) which don’t make sense in Spark. Any new Spark specific command line option could now be added to org.apache.spark.repl.SparkRunnerSettings class. Since the behavior of loading a script from the command line should be the same as loading it using the “:load” command inside the shell, the script should be loaded when the SparkContext is available, that’s why we had to move the call to ‘loadfiles(settings)’ _after_ the call to postInitialization(). This still doesn’t work if ‘isAsync = true’.	2014-01-08 00:32:18 -05:00
Luca Rosellini	4689ce29fd	Added license header and removed @author tag	2014-01-07 09:44:24 +01:00
Patrick Wendell	604fad9c39	Merge remote-tracking branch 'apache-github/master' into remove-binaries Conflicts: core/src/test/scala/org/apache/spark/DriverSuite.scala docs/python-programming-guide.md	2014-01-03 21:29:33 -08:00
Luca Rosellini	0b6db8c186	Added ‘-i’ command line option to spark REPL. We had to create a new implementation of both scala.tools.nsc.CompilerCommand and scala.tools.nsc.Settings, because using scala.tools.nsc.GenericRunnerSettings would bring in other options (-howtorun, -save and -execute) which don’t make sense in Spark. Any new Spark specific command line option could now be added to org.apache.spark.repl.SparkRunnerSettings class. Since the behavior of loading a script from the command line should be the same as loading it using the “:load” command inside the shell, the script should be loaded when the SparkContext is available, that’s why we had to move the call to ‘loadfiles(settings)’ _after_ the call to postInitialization(). This still doesn’t work if ‘isAsync = true’.	2014-01-03 12:57:06 +01:00
Prashant Sharma	94f2fffa23	fixed review comments	2014-01-03 14:43:37 +05:30
Prashant Sharma	980afd280a	Merge branch 'scripts-reorg' of github.com:shane-huang/incubator-spark into spark-915-segregate-scripts Conflicts: bin/spark-shell core/pom.xml core/src/main/scala/org/apache/spark/SparkContext.scala core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/CoarseMesosSchedulerBackend.scala core/src/main/scala/org/apache/spark/ui/UIWorkloadGenerator.scala core/src/test/scala/org/apache/spark/DriverSuite.scala python/run-tests sbin/compute-classpath.sh sbin/spark-class sbin/stop-slaves.sh	2014-01-02 17:55:21 +05:30
Matei Zaharia	e2c68642c6	Miscellaneous fixes from code review. Also replaced SparkConf.getOrElse with just a "get" that takes a default value, and added getInt, getLong, etc to make code that uses this simpler later on.	2014-01-01 22:03:39 -05:00
Matei Zaharia	642029e7f4	Various fixes to configuration code - Got rid of global SparkContext.globalConf - Pass SparkConf to serializers and compression codecs - Made SparkConf public instead of private[spark] - Improved API of SparkContext and SparkConf - Switched executor environment vars to be passed through SparkConf - Fixed some places that were still using system properties - Fixed some tests, though others are still failing This still fails several tests in core, repl and streaming, likely due to properties not being set or cleared correctly (some of the tests run fine in isolation).	2013-12-28 17:13:15 -05:00
Prashant Sharma	2573add94c	spark-544, introducing SparkConf and related configuration overhaul.	2013-12-25 00:09:36 +05:30
Mark Hamstra	09ed7ddfa0	Use scala.binary.version in POMs	2013-12-15 12:39:58 -08:00
Prashant Sharma	a854cc536d	Review comments on the PR for scala 2.10 migration.	2013-12-13 15:19:51 +05:30
Prashant Sharma	17db6a9041	Style fixes and addressed review comments at #221	2013-12-10 11:47:16 +05:30
Prashant Sharma	7ad6921ae0	Incorporated Patrick's feedback comment on #211 and made maven build/dep-resolution atleast a bit faster.	2013-12-07 12:45:57 +05:30
Prashant Sharma	d092a8cc6a	Fixed compile time warnings and formatting post merge.	2013-11-26 15:21:50 +05:30
Aaron Davidson	f629ba95b6	Various merge corrections I've diff'd this patch against my own -- since they were both created independently, this means that two sets of eyes have gone over all the merge conflicts that were created, so I'm feeling significantly more confident in the resulting PR. @rxin has looked at the changes to the repl and is resoundingly confident that they are correct.	2013-11-14 22:13:09 -08:00
Raymond Liu	a60620b76a	Merge branch 'master' into scala-2.10	2013-11-14 12:44:19 +08:00
Raymond Liu	0f2e3c6e31	Merge branch 'master' into scala-2.10	2013-11-13 16:55:11 +08:00
Reynold Xin	319299941d	Propagate the SparkContext local property from the thread that calls the spark-repl to the actual execution thread.	2013-11-09 00:32:14 -08:00
Ali Ghodsi	05a0df2b9e	Makes Spark SIMR ready.	2013-10-24 11:59:51 -07:00
Aaron Davidson	74737264c4	Spark shell exits if it cannot create SparkContext Mainly, this occurs if you provide a messed up MASTER url (one that doesn't match one of our regexes). Previously, we would default to Mesos, fail, and then start the shell anyway, except that any Spark command would fail.	2013-10-17 18:51:19 -07:00
Andrew xia	52ccf4f859	deprecate "spark" script and SPAKR_CLASSPATH environment variable	2013-10-12 14:34:14 +08:00
Prashant Sharma	7be75682b9	Merge branch 'master' into wip-merge-master Conflicts: bagel/pom.xml core/pom.xml core/src/test/scala/org/apache/spark/ui/UISuite.scala examples/pom.xml mllib/pom.xml pom.xml project/SparkBuild.scala repl/pom.xml streaming/pom.xml tools/pom.xml In scala 2.10, a shorter representation is used for naming artifacts so changed to shorter scala version for artifacts and made it a property in pom.	2013-10-08 11:29:40 +05:30
Patrick Wendell	aa9fb84994	Merging build changes in from 0.8	2013-10-05 22:07:00 -07:00
Prashant Sharma	5829692885	Merge branch 'master' into scala-2.10 Conflicts: core/src/main/scala/org/apache/spark/ui/jobs/JobProgressUI.scala docs/_config.yml project/SparkBuild.scala repl/src/main/scala/org/apache/spark/repl/SparkILoop.scala	2013-10-01 11:57:24 +05:30
Patrick Wendell	e2ff59af72	Bug fix in master build	2013-09-26 13:06:51 -07:00
Prashant Sharma	7ff4c2d399	fixed maven build for scala 2.10	2013-09-26 10:48:24 +05:30
Patrick Wendell	6079721fa1	Update build version in master	2013-09-24 11:41:51 -07:00
Prashant Sharma	69fd42aee3	ported repl improvements from master	2013-09-15 15:51:02 +05:30
Prashant Sharma	383e151fd7	Merge branch 'master' of git://github.com/mesos/spark into scala-2.10 Conflicts: core/src/main/scala/org/apache/spark/SparkContext.scala project/SparkBuild.scala	2013-09-15 10:55:12 +05:30
Prashant Sharma	20c65bc334	Fixed repl suite	2013-09-15 10:43:06 +05:30
Prashant Sharma	6fcfefcb27	Few more fixes to tests broken during merge	2013-09-10 10:57:47 +05:30
Jey Kottalam	30a32c8335	Minor YARN build cleanups	2013-09-06 11:31:16 -07:00
Prashant Sharma	4106ae9fbf	Merged with master	2013-09-06 17:53:01 +05:30
Matei Zaharia	f586c8ef38	Updated LICENSE with third-party licenses	2013-09-02 16:43:06 -07:00
Matei Zaharia	0a8cc30921	Move some classes to more appropriate packages: * RDD, RDDFunctions -> org.apache.spark.rdd Utils, ClosureCleaner, SizeEstimator -> org.apache.spark.util * JavaSerializer, KryoSerializer -> org.apache.spark.serializer	2013-09-01 14:13:16 -07:00
Matei Zaharia	5701eb92c7	Fix some URLs	2013-09-01 14:13:16 -07:00
Matei Zaharia	46eecd110a	Initial work to rename package to org.apache.spark	2013-09-01 14:13:13 -07:00
Mark Hamstra	ff6f1b0500	Synced sbt and maven builds	2013-08-21 13:50:24 -07:00
Jey Kottalam	23f4622aff	Remove redundant dependencies from POMs	2013-08-18 18:53:57 -07:00
Jey Kottalam	c1e547bb7f	Updates to repl and example POMs to match SBT build	2013-08-16 13:50:12 -07:00
Jey Kottalam	ad580b94d5	Maven build now also works with YARN	2013-08-16 13:50:12 -07:00
Jey Kottalam	11b42a84db	Maven build now works with CDH hadoop-2.0.0-mr1	2013-08-16 13:50:12 -07:00
Jey Kottalam	353fab2440	Initial changes to make Maven build agnostic of hadoop version	2013-08-16 13:50:12 -07:00
Shivaram Venkataraman	a1227708e9	Set SPARK_CLASSPATH for maven repl tests	2013-08-13 20:06:47 -07:00
Benjamin Hindman	7bdafa918a	Format cleanup.	2013-07-30 17:01:00 -07:00
Benjamin Hindman	f6f46455eb	Added property 'spark.executor.uri' for launching on Mesos without requiring Spark to be installed. Using 'make_distribution.sh' a user can put a Spark distribution at a URI supported by Mesos (e.g., 'hdfs://...') and then set that when launching their job. Also added SPARK_EXECUTOR_URI for the REPL.	2013-07-29 23:32:52 -07:00
Matei Zaharia	af3c9d5042	Add Apache license headers and LICENSE and NOTICE files	2013-07-16 17:21:33 -07:00
Prashant Sharma	6e6d94ffdf	Added add jars functionality to new repl, which was dropped while merging with old.	2013-07-12 11:55:16 +05:30
Prashant Sharma	ca249eea50	Removed an unnecessary test case	2013-07-11 18:31:36 +05:30
Prashant Sharma	a5f1f6a907	Merge branch 'master' into master-merge Conflicts: core/pom.xml core/src/main/scala/spark/MapOutputTracker.scala core/src/main/scala/spark/RDD.scala core/src/main/scala/spark/RDDCheckpointData.scala core/src/main/scala/spark/SparkContext.scala core/src/main/scala/spark/Utils.scala core/src/main/scala/spark/api/python/PythonRDD.scala core/src/main/scala/spark/deploy/client/Client.scala core/src/main/scala/spark/deploy/master/MasterWebUI.scala core/src/main/scala/spark/deploy/worker/Worker.scala core/src/main/scala/spark/deploy/worker/WorkerWebUI.scala core/src/main/scala/spark/rdd/BlockRDD.scala core/src/main/scala/spark/rdd/ZippedRDD.scala core/src/main/scala/spark/scheduler/cluster/StandaloneSchedulerBackend.scala core/src/main/scala/spark/storage/BlockManager.scala core/src/main/scala/spark/storage/BlockManagerMaster.scala core/src/main/scala/spark/storage/BlockManagerMasterActor.scala core/src/main/scala/spark/storage/BlockManagerUI.scala core/src/main/scala/spark/util/AkkaUtils.scala core/src/test/scala/spark/SizeEstimatorSuite.scala pom.xml project/SparkBuild.scala repl/src/main/scala/spark/repl/SparkILoop.scala repl/src/test/scala/spark/repl/ReplSuite.scala streaming/src/main/scala/spark/streaming/StreamingContext.scala streaming/src/main/scala/spark/streaming/api/java/JavaStreamingContext.scala streaming/src/main/scala/spark/streaming/dstream/KafkaInputDStream.scala streaming/src/main/scala/spark/streaming/util/MasterFailureTest.scala	2013-07-03 11:43:26 +05:30
Matei Zaharia	2bd04c3513	Formatting	2013-06-25 18:37:14 -04:00
Matei Zaharia	f2263350ed	Added a local-cluster mode test to ReplSuite	2013-06-25 18:35:35 -04:00
Matei Zaharia	0e0f9d3069	Fix search path for REPL class loader to really find added JARs	2013-06-22 17:44:04 -07:00
Matei Zaharia	b5df1cd668	ADD_JARS environment variable for spark-shell	2013-06-22 17:14:44 -07:00
Matei Zaharia	7902baddc7	Update ASM to version 4.0	2013-06-19 13:34:30 +02:00
Matei Zaharia	1044a95c9f	Merge pull request #652 from ScrapCodes/scala-2.10 Fixed maven build without netty fix	2013-06-14 20:04:24 -07:00
Prashant Sharma	6f28067f8d	Fixed maven build without netty fix	2013-06-14 21:03:21 +05:30
Matei Zaharia	96c895f653	Fix StandaloneClusterReplSuite to allow running multiple tests	2013-06-08 14:21:41 -07:00
Jey Kottalam	207afe4088	Remove spark-repl's extraneous dependency on spark-streaming	2013-05-01 16:57:31 -07:00
Prashant Sharma	24bbf318b3	Fixied other warnings	2013-04-29 19:56:28 +05:30
Mridul Muralidharan	afee902443	Attempt to fix streaming test failures after yarn branch merge	2013-04-28 22:26:45 +05:30
Prashant Sharma	ad88f083a6	scala 2.10 and master merge	2013-04-24 18:08:26 +05:30
Mridul Muralidharan	dd515ca3ee	Attempt at fixing merge conflict	2013-04-24 09:24:17 +05:30
Prashant Sharma	185bb9525a	Manually merged scala-2.10 and master	2013-04-22 14:14:03 +05:30
Prashant Sharma	bf5fc07379	Added more tests	2013-04-19 13:51:16 +05:30
Prashant Sharma	36ccb35371	Changed spark context as lazy val to become just val.	2013-04-16 11:13:51 +05:30
Prashant Sharma	19b0256ae4	Added standalone cluster repl suite	2013-04-15 19:49:40 +05:30
Prashant Sharma	f31e41c270	Added class wrappers instead of object and incorporated most of matei comments	2013-04-10 15:08:01 +05:30
Prashant Sharma	b67c638b85	Made shell loading synchronous as async loading confuses with out of order status messages	2013-04-10 14:48:49 +05:30
Matei Zaharia	65caa8f711	Merge remote-tracking branch 'jey/bump-development-version-to-0.8.0' Conflicts: docs/_config.yml project/SparkBuild.scala	2013-04-08 12:43:17 -04:00
Mridul Muralidharan	6798a09df8	Add support for building against hadoop2-yarn : adding new maven profile for it	2013-04-07 17:47:38 +05:30
Jey Kottalam	bc8ba222ff	Bump development version to 0.8.0	2013-03-28 15:42:01 -07:00
Prashant Sharma	5a080acdb8	Fixed broken tests by last commit for repl.	2013-03-25 10:43:45 +05:30
Prashant Sharma	d9f34e505d	Ctrl-D hang bug fixed!	2013-03-20 00:18:04 +05:30
prashant	432a227320	fixed autocompletion apparent hang due to logging	2013-03-19 12:29:08 +05:30
Prashant Sharma	15530c2b23	porting of repl to scala-2.10	2013-03-17 10:47:17 +05:30
Mikhail Bautin	7fd2708eda	Add a log4j compile dependency to fix build in IntelliJ Also rename parent project to spark-parent (otherwise it shows up as "parent" in IntelliJ, which is very confusing).	2013-03-15 11:41:51 -07:00
Mark Hamstra	8b06b359da	bump version to 0.7.1-SNAPSHOT in the subproject poms to keep the maven build building.	2013-02-28 23:34:34 -08:00
Matei Zaharia	db9b90fdbd	Change version to 0.7.1-SNAPSHOT for development branch	2013-02-27 09:15:26 -08:00
Mikhail Bautin	fe3eceab57	Remove activation of profiles by default See the discussion at https://github.com/mesos/spark/pull/355 for why default profile activation is a problem.	2013-01-31 13:30:41 -08:00
Stephen Haberman	7dfb82a992	Replace old 'master' term with 'driver'.	2013-01-25 11:03:00 -06:00
Mikhail Bautin	325297e5c3	Add an Avro dependency to REPL to make it compile with Hadoop 2	2013-01-22 18:11:51 -08:00
Matei Zaharia	6e3754bf47	Add Maven build file for streaming, and fix some issues in SBT file As part of this, changed our Scala 2.9.2 Kafka library to be available as a local Maven repository, following the example in (http://blog.dub.podval.org/2010/01/maven-in-project-repository.html)	2013-01-20 19:22:24 -08:00
Tathagata Das	cd1521cfdb	Merge branch 'master' into streaming Conflicts: core/src/main/scala/spark/rdd/CoGroupedRDD.scala core/src/main/scala/spark/rdd/FilteredRDD.scala docs/_layouts/global.html docs/index.md run	2013-01-15 12:08:51 -08:00
Shivaram Venkataraman	bbc56d85ed	Rename environment variable for hadoop profiles to hadoopVersion	2013-01-12 15:24:13 -08:00
Shivaram Venkataraman	9262522306	Activate hadoop2 profile in pom.xml with -Dhadoop=2	2013-01-10 22:07:34 -08:00
Shivaram Venkataraman	f7adb382ac	Activate hadoop1 if property hadoop is missing. hadoop2 can be activated now by using -Dhadoop -Phadoop2.	2013-01-08 03:19:43 -08:00
Shivaram Venkataraman	4bbe07e5ec	Activate hadoop1 profile by default for maven builds	2013-01-07 17:46:22 -08:00
Tathagata Das	4719e6d8fe	Changed locations for unit test logs.	2013-01-07 16:06:07 -08:00
Thomas Dudziak	4af6cad37a	Fixed repl maven build to produce artifacts with the appropriate hadoop classifier and extracted repl fat-jar and debian packaging into a separate project to make Maven happy	2012-12-18 12:08:19 -08:00
Thomas Dudziak	c1d15ae3d5	Shaded repl jar for hadoop1 profile needs to include hadoop classes	2012-12-10 15:06:28 -08:00
Matei Zaharia	a9ea14d6e7	Merge pull request #318 from tomdz/master Minor tweaks to the debian build	2012-12-10 10:59:41 -08:00
Matei Zaharia	ccff0a089a	Use the same output directories that SBT had in subprojects This will make it easier to make the "run" script work with a Maven build	2012-12-10 10:58:56 -08:00
Thomas Dudziak	0e5b1f7981	Minor tweaks to the debian build	2012-12-10 10:30:30 -08:00
Thomas Dudziak	3b643e86bc	Updated versions in the pom.xml files to match current master	2012-11-27 17:50:42 -08:00
Thomas Dudziak	69297c64be	Addressed code review comments	2012-11-27 15:45:16 -08:00
Thomas Dudziak	24e1e425cd	Include the configuration templates in the debian package	2012-11-20 16:19:56 -08:00
Thomas Dudziak	811a32257b	Added maven and debian build files	2012-11-20 16:19:51 -08:00
Matei Zaharia	0967e71a00	Bump up version to 0.7.0-SNAPSHOT for master branch	2012-10-22 11:49:42 -07:00
Matei Zaharia	902a608187	Update version to 0.6.1-SNAPSHOT to show this is in development	2012-10-22 11:43:57 -07:00
Matei Zaharia	ce6b5a3ee5	Uncomment Maven publishing stuff and set version to 0.6.0	2012-10-13 15:55:39 -07:00
Matei Zaharia	eca570f66a	Removed the need to sleep in tests due to waiting for Akka to shut down	2012-10-07 00:17:59 -07:00
Matei Zaharia	74a9244255	Write all unit test output to a file	2012-10-01 15:07:42 -07:00
Matei Zaharia	83143f9a5f	Fixed several bugs that caused weird behavior with files in spark-shell: - SizeEstimator was following through a ClassLoader field of Hadoop JobConfs, which referenced the whole interpreter, Scala compiler, etc. Chaos ensued, giving an estimated size in the tens of gigabytes. - Broadcast variables in local mode were only stored as MEMORY_ONLY and never made accessible over a server, so they fell out of the cache when they were deemed too large and couldn't be reloaded.	2012-09-30 21:19:39 -07:00
Matei Zaharia	2c16ae36d7	Set log level in tests to WARN	2012-08-23 20:38:14 -07:00
Matei Zaharia	e72afdb817	Some refactoring to make cluster scheduler pluggable.	2012-07-06 15:23:26 -07:00
Matei Zaharia	08579ffa11	Update version number for dev branch	2012-06-15 23:55:43 -07:00
Matei Zaharia	4449eb9783	Changed version in master branch to 0.5.1-SNAPSHOT for further development.	2012-06-13 22:26:14 -04:00
Matei Zaharia	4971e0f547	Updated version number to 0.5.0	2012-06-12 13:41:57 -04:00
Matei Zaharia	dbc3c86ae3	Merge branch 'master' into mesos-0.9 Conflicts: core/src/main/scala/spark/Executor.scala	2012-06-03 17:44:04 -07:00
Reynold Xin	d176422586	Make spark.repl.Main.interp_ publicly accessible (so Shark can get rid of a weird file dediated to accessing this variable).	2012-05-30 18:40:10 -07:00
Matei Zaharia	08cda89e8a	Further fixes to how Mesos is found and used	2012-03-17 13:39:14 -07:00
Matei Zaharia	63da22c025	Update REPL code to use our own version of JLineReader, which fixes #89 . I'm not entirely sure why this broke in the jump from Scala 2.9.0.1 to 2.9.1 -- maybe something about name resolution changed?	2011-11-07 20:16:25 -08:00
Ismael Juma	483f724d62	Upgrade to Scala 2.9.1. Interestingly, the version in Maven is 2.9.1, but SBT outputs file to the 2.9.1.final directory inside target. A couple of small changes in SparkIMain were also required. All tests pass and ./spark-shell launches successfully.	2011-08-31 10:43:05 +01:00
Ismael Juma	065043a14f	Use process instead of main as the latter is deprecated.	2011-08-02 10:26:03 +01:00
Matei Zaharia	cf8f5de61b	Merge branch 'master' into scala-2.9 Conflicts: project/build.properties repl/src/main/scala/spark/repl/SparkInterpreterLoop.scala	2011-07-14 17:48:56 -04:00
Matei Zaharia	02678724a4	Update version number to 0.4-SNAPSHOT	2011-07-14 17:47:39 -04:00
Matei Zaharia	b187675b68	Print version number 0.3 in REPL	2011-06-26 18:27:01 -07:00
Matei Zaharia	b49d1be65b	Ensure logging is initialized before any Spark threads run in the REPL	2011-05-31 23:54:48 -07:00
Matei Zaharia	90f924202b	Another fix ported forward for the REPL	2011-05-31 23:11:49 -07:00
Matei Zaharia	73975d7491	Further fixes to interpreter (adding in some code generation changes I missed before and setting SparkEnv properly on the threads that execute each line in the 2.9 interpreter).	2011-05-31 22:05:24 -07:00
Matei Zaharia	d52660c969	Ported code generation changes from 2.8 interpreter (to use a class for each line's object rather than a singleton object so that we can ship these classes to worker nodes). This is pretty hairy stuff, which would be nice to avoid in the future by integrating with the interpreter some other way.	2011-05-31 19:23:15 -07:00
Matei Zaharia	bcce6e8d01	Various work to use the 2.9 interpreter	2011-05-31 17:31:51 -07:00
Ismael Juma	1396678baa	Move REPL classes to separate module.	2011-05-27 11:22:50 +01:00

1 2 3 4 5

249 commits