ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Sandy Ryza	2409af9dcf	SPARK-1064 This reopens PR 649 from incubator-spark against the new repo Author: Sandy Ryza <sandy@cloudera.com> Closes #102 from sryza/sandy-spark-1064 and squashes the following commits: 270e490 [Sandy Ryza] Handle different application classpath variables in different versions 88b04e0 [Sandy Ryza] SPARK-1064. Make it possible to run on YARN without bundling Hadoop jars in Spark assembly	2014-03-11 22:39:17 -07:00
Patrick Wendell	16788a6542	SPARK-1167: Remove metrics-ganglia from default build due to LGPL issues... This patch removes Ganglia integration from the default build. It allows users willing to link against LGPL code to use Ganglia by adding build flags or linking against a new Spark artifact called spark-ganglia-lgpl. This brings Spark in line with the Apache policy on LGPL code enumerated here: https://www.apache.org/legal/3party.html#options-optional Author: Patrick Wendell <pwendell@gmail.com> Closes #108 from pwendell/ganglia and squashes the following commits: 326712a [Patrick Wendell] Responding to review feedback 5f28ee4 [Patrick Wendell] SPARK-1167: Remove metrics-ganglia from default build due to LGPL issues.	2014-03-11 11:16:59 -07:00
Patrick Wendell	b9be160951	SPARK-782 Clean up for ASM dependency. This makes two changes. 1) Spark uses the shaded version of asm that is (conveniently) published with Kryo. 2) Existing exclude rules around asm are updated to reflect the new groupId of `org.ow2.asm`. This made all of the old rules not work with newer Hadoop versions that pull in new asm versions. Author: Patrick Wendell <pwendell@gmail.com> Closes #100 from pwendell/asm and squashes the following commits: 9235f3f [Patrick Wendell] SPARK-782 Clean up for ASM dependency.	2014-03-09 13:17:07 -07:00
Thomas Graves	7edbea41b4	SPARK-1189: Add Security to Spark - Akka, Http, ConnectionManager, UI use servlets resubmit pull request. was https://github.com/apache/incubator-spark/pull/332. Author: Thomas Graves <tgraves@apache.org> Closes #33 from tgravescs/security-branch-0.9-with-client-rebase and squashes the following commits: dfe3918 [Thomas Graves] Fix merge conflict since startUserClass now using runAsUser 05eebed [Thomas Graves] Fix dependency lost in upmerge d1040ec [Thomas Graves] Fix up various imports 05ff5e0 [Thomas Graves] Fix up imports after upmerging to master ac046b3 [Thomas Graves] Merge remote-tracking branch 'upstream/master' into security-branch-0.9-with-client-rebase 13733e1 [Thomas Graves] Pass securityManager and SparkConf around where we can. Switch to use sparkConf for reading config whereever possible. Added ConnectionManagerSuite unit tests. 4a57acc [Thomas Graves] Change UI createHandler routines to createServlet since they now return servlets 2f77147 [Thomas Graves] Rework from comments 50dd9f2 [Thomas Graves] fix header in SecurityManager ecbfb65 [Thomas Graves] Fix spacing and formatting b514bec [Thomas Graves] Fix reference to config ed3d1c1 [Thomas Graves] Add security.md 6f7ddf3 [Thomas Graves] Convert SaslClient and SaslServer to scala, change spark.authenticate.ui to spark.ui.acls.enable, and fix up various other things from review comments 2d9e23e [Thomas Graves] Merge remote-tracking branch 'upstream/master' into security-branch-0.9-with-client-rebase_rework 5721c5a [Thomas Graves] update AkkaUtilsSuite test for the actorSelection changes, fix typos based on comments, and remove extra lines I missed in rebase from AkkaUtils f351763 [Thomas Graves] Add Security to Spark - Akka, Http, ConnectionManager, UI to use servlets	2014-03-06 18:27:50 -06:00
Prashant Sharma	181ec50307	[java8API] SPARK-964 Investigate the potential for using JDK 8 lambda expressions for the Java/Scala APIs Author: Prashant Sharma <prashant.s@imaginea.com> Author: Patrick Wendell <pwendell@gmail.com> Closes #17 from ScrapCodes/java8-lambdas and squashes the following commits: 95850e6 [Patrick Wendell] Some doc improvements and build changes to the Java 8 patch. 85a954e [Prashant Sharma] Nit. import orderings. 673f7ac [Prashant Sharma] Added support for -java-home as well 80a13e8 [Prashant Sharma] Used fake class tag syntax 26eb3f6 [Prashant Sharma] Patrick's comments on PR. 35d8d79 [Prashant Sharma] Specified java 8 building in the docs 31d4cd6 [Prashant Sharma] Maven build to support -Pjava8-tests flag. 4ab87d3 [Prashant Sharma] Review feedback on the pr c33dc2c [Prashant Sharma] SPARK-964, Java 8 API Support.	2014-03-03 22:31:30 -08:00
Patrick Wendell	c3f5e07533	SPARK-1121: Include avro for yarn-alpha builds This lets us explicitly include Avro based on a profile for 0.23.X builds. It makes me sad how convoluted it is to express this logic in Maven. @tgraves and @sryza curious if this works for you. I'm also considering just reverting to how it was before. The only real problem was that Spark advertised a dependency on Avro even though it only really depends transitively on Avro through other deps. Author: Patrick Wendell <pwendell@gmail.com> Closes #49 from pwendell/avro-build-fix and squashes the following commits: 8d6ee92 [Patrick Wendell] SPARK-1121: Add avro to yarn-alpha profile	2014-03-02 15:18:19 -08:00
Sean Owen	fd31adbf27	SPARK-1084.2 (resubmitted) (Ported from https://github.com/apache/incubator-spark/pull/650 ) This adds one more change though, to fix the scala version warning introduced by json4s recently. Author: Sean Owen <sowen@cloudera.com> Closes #32 from srowen/SPARK-1084.2 and squashes the following commits: 9240abd [Sean Owen] Avoid scala version conflict in scalap induced by json4s dependency 1561cec [Sean Owen] Remove "exclude *" dependencies that are causing Maven warnings, and that are apparently unneeded anyway	2014-03-02 14:27:53 -08:00
Patrick Wendell	1fd2bfd3dd	Remove remaining references to incubation This removes some loose ends not caught by the other (incubating -> tlp) patches. @markhamstra this updates the version as you mentioned earlier. Author: Patrick Wendell <pwendell@gmail.com> Closes #51 from pwendell/tlp and squashes the following commits: d553b1b [Patrick Wendell] Remove remaining references to incubation	2014-03-02 01:00:16 -08:00
Binh Nguyen	b70823c91d	Update io.netty from 4.0.13 Final to 4.0.17.Final This update contains a lot of bug fixes and some new perf improvements. It is also binary compatible with the current 4.0.13.Final For more information: http://netty.io/news/2014/02/25/4-0-17-Final.html Author: Binh Nguyen <ngbinh@gmail.com> Author: Binh Nguyen <ngbinh@gmail.com> Closes #41 from ngbinh/master and squashes the following commits: a9498f4 [Binh Nguyen] update io.netty to 4.0.17.Final	2014-03-02 00:48:50 -08:00
Sean Owen	12bbca2065	SPARK 1084.1 (resubmitted) (Ported from https://github.com/apache/incubator-spark/pull/637 ) Author: Sean Owen <sowen@cloudera.com> Closes #31 from srowen/SPARK-1084.1 and squashes the following commits: 6c4a32c [Sean Owen] Suppress warnings about legitimate unchecked array creations, or change code to avoid it f35b833 [Sean Owen] Fix two misc javadoc problems 254e8ef [Sean Owen] Fix one new style error introduced in scaladoc warning commit 5b2fce2 [Sean Owen] Fix scaladoc invocation warning, and enable javac warnings properly, with plugin config updates 007762b [Sean Owen] Remove dead scaladoc links b8ff8cb [Sean Owen] Replace deprecated Ant <tasks> with <target>	2014-02-27 11:12:21 -08:00
Prashant Sharma	6ccd6c55bd	SPARK-1121 Only add avro if the build is for Hadoop 0.23.X and SPARK_YARN is set Author: Prashant Sharma <prashant.s@imaginea.com> Closes #6 from ScrapCodes/SPARK-1121/avro-dep-fix and squashes the following commits: 9b29e34 [Prashant Sharma] Review feedback on PR 46ed2ad [Prashant Sharma] SPARK-1121-Only add avro if the build is for Hadoop 0.23.X and SPARK_YARN is set	2014-02-26 23:40:49 -08:00
Raymond Liu	c852201ce9	For SPARK-1082, Use Curator for ZK interaction in standalone cluster Author: Raymond Liu <raymond.liu@intel.com> Closes #611 from colorant/curator and squashes the following commits: 7556aa1 [Raymond Liu] Address review comments af92e1f [Raymond Liu] Fix coding style 964f3c2 [Raymond Liu] Ignore NodeExists exception 6df2966 [Raymond Liu] Rewrite zookeeper client code with curator	2014-02-24 23:20:38 -08:00
Sean Owen	c0ef3afa82	SPARK-1071: Tidy logging strategy and use of log4j Prompted by a recent thread on the mailing list, I tried and failed to see if Spark can be made independent of log4j. There are a few cases where control of the underlying logging is pretty useful, and to do that, you have to bind to a specific logger. Instead I propose some tidying that leaves Spark's use of log4j, but gets rid of warnings and should still enable downstream users to switch. The idea is to pipe everything (except log4j) through SLF4J, and have Spark use SLF4J directly when logging, and where Spark needs to output info (REPL and tests), bind from SLF4J to log4j. This leaves the same behavior in Spark. It means that downstream users who want to use something except log4j should: - Exclude dependencies on log4j, slf4j-log4j12 from Spark - Include dependency on log4j-over-slf4j - Include dependency on another logger X, and another slf4j-X - Recreate any log config that Spark does, that is needed, in the other logger's config That sounds about right. Here are the key changes: - Include the jcl-over-slf4j shim everywhere by depending on it in core. - Exclude dependencies on commons-logging from third-party libraries. - Include the jul-to-slf4j shim everywhere by depending on it in core. - Exclude slf4j-* dependencies from third-party libraries to prevent collision or warnings - Added missing slf4j-log4j12 binding to GraphX, Bagel module tests And minor/incidental changes: - Update to SLF4J 1.7.5, which happily matches Hadoop 2’s version and is a recommended update over 1.7.2 - (Remove a duplicate HBase dependency declaration in SparkBuild.scala) - (Remove a duplicate mockito dependency declaration that was causing warnings and bugging me) Author: Sean Owen <sowen@cloudera.com> Closes #570 from srowen/SPARK-1071 and squashes the following commits: 52eac9f [Sean Owen] Add slf4j-over-log4j12 dependency to core (non-test) and remove it from things that depend on core. 77a7fa9 [Sean Owen] SPARK-1071: Tidy logging strategy and use of log4j	2014-02-23 11:40:55 -08:00
Mark Hamstra	c2341c92bb	Merge pull request #542 from markhamstra/versionBump. Closes #542 . Version number to 1.0.0-SNAPSHOT Since 0.9.0-incubating is done and out the door, we shouldn't be building 0.9.0-incubating-SNAPSHOT anymore. @pwendell Author: Mark Hamstra <markhamstra@gmail.com> == Merge branch commits == commit 1b00a8a7c1a7f251b4bb3774b84b9e64758eaa71 Author: Mark Hamstra <markhamstra@gmail.com> Date: Wed Feb 5 09:30:32 2014 -0800 Version number to 1.0.0-SNAPSHOT	2014-02-08 16:00:43 -08:00
Josh Rosen	531d9d7576	Increase JUnit test verbosity under SBT. Upgrade junit-interface plugin from 0.9 to 0.10. I noticed that the JavaAPISuite tests didn't appear to display any output locally or under Jenkins, making it difficult to know whether they were running. This change increases the verbosity to more closely match the ScalaTest tests.	2014-01-25 16:32:44 -08:00
Mark Hamstra	147a943df0	Removed repl-bin and updated maven build doc.	2014-01-14 22:17:24 -08:00
Reynold Xin	e2d25d2dfe	Merge branch 'master' into graphx	2014-01-13 16:21:26 -08:00
Ankur Dave	b437ed62a8	graph -> graphx in pom.xml	2014-01-10 15:22:31 -08:00
Patrick Wendell	d86a85e9ca	Merge pull request #293 from pwendell/standalone-driver SPARK-998: Support Launching Driver Inside of Standalone Mode [NOTE: I need to bring the tests up to date with new changes, so for now they will fail] This patch provides support for launching driver programs inside of a standalone cluster manager. It also supports monitoring and re-launching of driver programs which is useful for long running, recoverable applications such as Spark Streaming jobs. For those jobs, this patch allows a deployment mode which is resilient to the failure of any worker node, failure of a master node (provided a multi-master setup), and even failures of the applicaiton itself, provided they are recoverable on a restart. Driver information, such as the status and logs from a driver, is displayed in the UI There are a few small TODO's here, but the code is generally feature-complete. They are: - Bring tests up to date and add test coverage - Restarting on failure should be optional and maybe off by default. - See if we can re-use akka connections to facilitate clients behind a firewall A sensible place to start for review would be to look at the `DriverClient` class which presents users the ability to launch their driver program. I've also added an example program (`DriverSubmissionTest`) that allows you to test this locally and play around with killing workers, etc. Most of the code is devoted to persisting driver state in the cluster manger, exposing it in the UI, and dealing correctly with various types of failures. Instructions to test locally: - `sbt/sbt assembly/assembly examples/assembly` - start a local version of the standalone cluster manager ``` ./spark-class org.apache.spark.deploy.client.DriverClient \ -j -Dspark.test.property=something \ -e SPARK_TEST_KEY=SOMEVALUE \ launch spark://10.99.1.14:7077 \ ../path-to-examples-assembly-jar \ org.apache.spark.examples.DriverSubmissionTest 1000 some extra options --some-option-here -X 13 ``` - Go in the UI and make sure it started correctly, look at the output etc - Kill workers, the driver program, masters, etc.	2014-01-09 18:37:52 -08:00
Ankur Dave	91227566bc	Merge remote-tracking branch 'spark-upstream/master' into HEAD Conflicts: README.md core/src/main/scala/org/apache/spark/util/collection/OpenHashMap.scala core/src/main/scala/org/apache/spark/util/collection/OpenHashSet.scala core/src/main/scala/org/apache/spark/util/collection/PrimitiveKeyOpenHashMap.scala pom.xml project/SparkBuild.scala repl/src/main/scala/org/apache/spark/repl/SparkILoop.scala	2014-01-08 21:19:08 -08:00
Patrick Wendell	3209a86f39	Add CDH Repository to Maven Build	2014-01-08 01:21:17 -08:00
Patrick Wendell	62b08faac5	Adding mockito to maven build	2014-01-08 00:45:41 -08:00
Patrick Wendell	bc81ce040d	Merge remote-tracking branch 'apache-github/master' into standalone-driver Conflicts: core/src/test/scala/org/apache/spark/deploy/JsonProtocolSuite.scala pom.xml	2014-01-08 00:38:31 -08:00
Patrick Wendell	c0f0155eca	Merge pull request #313 from tdas/project-refactor Refactored the streaming project to separate external libraries like Twitter, Kafka, Flume, etc. At a high level, these are the following changes. 1. All the external code was put in `SPARK_HOME/external/` as separate SBT projects and Maven modules. Their artifact names are `spark-streaming-twitter`, `spark-streaming-kafka`, etc. Both SparkBuild.scala and pom.xml files have been updated. References to external libraries and repositories have been removed from the settings of root and streaming projects/modules. 2. To avail the external functionality (say, creating a Twitter stream), the developer has to `import org.apache.spark.streaming.twitter._` . For Scala API, the developer has to call `TwitterUtils.createStream(streamingContext, ...)`. For the Java API, the developer has to call `TwitterUtils.createStream(javaStreamingContext, ...)`. 3. Each external project has its own scala and java unit tests. Note the unit tests of each external library use classes of the streaming unit tests (`TestSuiteBase`, `LocalJavaStreamingContext`, etc.). To enable this code sharing among test classes, `dependsOn(streaming % "compile->compile,test->test")` was used in the SparkBuild.scala . In the streaming/pom.xml, an additional `maven-jar-plugin` was necessary to capture this dependency (see comment inside the pom.xml for more information). 4. Jars of the external projects have been added to examples project but not to the assembly project. 5. In some files, imports have been rearrange to conform to the Spark coding guidelines.	2014-01-07 22:21:52 -08:00
Patrick Wendell	e21a707a13	Adding unit tests and some refactoring to promote testability.	2014-01-07 15:39:47 -08:00
Patrick Wendell	60edeb3d65	Merge pull request #338 from ScrapCodes/ning-upgrade SPARK-1005 Ning upgrade	2014-01-06 11:40:32 -08:00
Thomas Graves	1f7c090e4b	Change protobuf version for yarn alpha back to 2.4.1	2014-01-06 12:04:22 -06:00
Tathagata Das	3b4c4c7f4d	Merge remote-tracking branch 'apache/master' into project-refactor Conflicts: examples/src/main/java/org/apache/spark/streaming/examples/JavaFlumeEventCount.java streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala streaming/src/main/scala/org/apache/spark/streaming/api/java/JavaStreamingContext.scala streaming/src/test/java/org/apache/spark/streaming/JavaAPISuite.java streaming/src/test/scala/org/apache/spark/streaming/InputStreamsSuite.scala streaming/src/test/scala/org/apache/spark/streaming/TestSuiteBase.scala	2014-01-06 03:05:52 -08:00
Prashant Sharma	355a033893	SPARK-1005 Ning upgrade	2014-01-06 14:38:27 +05:30
Raymond Liu	ebdfa6bb97	Using name yarn-alpha/yarn instead of yarn-2.0/yarn-2.2	2014-01-03 12:14:38 +08:00
Raymond Liu	aec96dd108	Change profile name new-yarn to hadoop2.2-yarn	2014-01-03 12:12:37 +08:00
Raymond Liu	d1528c7c8c	Fix pom for yarn code reorgnaize commit	2014-01-03 12:12:37 +08:00
liguoqiang	b5d0b3b0f7	restore core/pom.xml file modification	2014-01-01 11:30:08 +08:00
Reynold Xin	8b8e70ebde	Merge pull request #73 from falaki/ApproximateDistinctCount Approximate distinct count Added countApproxDistinct() to RDD and countApproxDistinctByKey() to PairRDDFunctions to approximately count distinct number of elements and distinct number of values per key, respectively. Both functions use HyperLogLog from stream-lib for counting. Both functions take a parameter that controls the trade-off between accuracy and memory consumption. Also added Scala docs and test suites for both methods.	2013-12-31 17:48:24 -08:00
Tathagata Das	97630849ff	Added pom.xml for external projects and removed unnecessary dependencies and repositoris from other poms and sbt.	2013-12-31 00:28:57 -08:00
Hossein Falaki	d50ccc5ca9	Using origin version	2013-12-30 15:08:34 -08:00
Binh Nguyen	040dd3ecd5	upgrade Netty from 4.0.0.Beta2 to 4.0.13.Final	2013-12-24 14:58:18 -08:00
Patrick Wendell	c1c0f8099f	Clean-up	2013-12-16 22:01:27 -08:00
Patrick Wendell	c1fec89895	Cleanup	2013-12-16 21:56:21 -08:00
Patrick Wendell	ceb013f8b9	Remove trailing slashes from repository specifications. The correct format is to not have a trailing slash. For me this caused non-deterministic failures due to issues fetching certain artifacts. The issue was that some of the maven caches would fail to fetch the artifact (due to the way that the artifact path was concatenated with the repository) and this short-circuited the download process in a silent way. Here is what the log output looked like: Downloading: http://repo.maven.apache.org/maven2/org/spark-project/akka/akka-remote_2.10/2.2.3-shaded-protobuf/akka-remote_2.10-2.2.3-shaded-protobuf.pom [WARNING] The POM for org.spark-project.akka:akka-remote_2.10🫙2.2.3-shaded-protobuf is missing, no dependency information available This was pretty brutal to debug since there was no error message anywhere and the path looks correct as reported by the Maven log.	2013-12-16 21:53:51 -08:00
Patrick Wendell	c6f95e603e	Attempt with extra repositories	2013-12-16 21:53:51 -08:00
Mark Hamstra	09ed7ddfa0	Use scala.binary.version in POMs	2013-12-15 12:39:58 -08:00
Patrick Wendell	6e8a96c7e7	Fix maven build issues in 2.10 branch	2013-12-13 23:14:08 -08:00
Prashant Sharma	589b83a18f	Disabled yarn 2.2 and added a message in the sbt build	2013-12-12 16:25:30 +05:30
Prashant Sharma	603af51bb5	Merge branch 'master' into akka-bug-fix Conflicts: core/pom.xml core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala pom.xml project/SparkBuild.scala streaming/pom.xml yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandler.scala	2013-12-11 10:21:53 +05:30
Prashant Sharma	17db6a9041	Style fixes and addressed review comments at #221	2013-12-10 11:47:16 +05:30
Prashant Sharma	7ad6921ae0	Incorporated Patrick's feedback comment on #211 and made maven build/dep-resolution atleast a bit faster.	2013-12-07 12:45:57 +05:30
Raymond Liu	4738818dd6	Fix pom.xml for maven build	2013-12-03 16:36:05 +08:00
Prashant Sharma	44fd30d3fb	Merge branch 'master' into scala-2.10-wip Conflicts: core/src/main/scala/org/apache/spark/rdd/RDD.scala project/SparkBuild.scala	2013-11-25 18:10:54 +05:30
Reynold Xin	6bcac986b2	Merge branch 'master' of github.com:apache/incubator-spark	2013-11-25 15:47:47 +08:00
LiGuoqiang	989203604e	Fix Maven build for metrics-graphite	2013-11-25 11:23:11 +08:00
Raymond Liu	a60620b76a	Merge branch 'master' into scala-2.10	2013-11-14 12:44:19 +08:00
Raymond Liu	0f2e3c6e31	Merge branch 'master' into scala-2.10	2013-11-13 16:55:11 +08:00
tgravescs	a35472e1dd	Allow spark on yarn to be run from HDFS. Allows the spark.jar, app.jar, and log4j.properties to be put into hdfs.	2013-11-04 16:16:28 -06:00
Ankur Dave	5064f9b2d2	Merge remote-tracking branch 'spark-upstream/master' Conflicts: project/SparkBuild.scala	2013-10-30 15:59:09 -07:00
Matei Zaharia	dadfc63b03	Fix Maven build to use MQTT repository	2013-10-23 15:29:22 -07:00
Reynold Xin	4e44d65b5e	Exclusion rules for Maven build files.	2013-10-19 12:35:55 -07:00
Hossein Falaki	13227aaa28	Added stream-lib dependency to Maven build	2013-10-18 14:10:24 -07:00
Joseph E. Gonzalez	1b22eef744	Merge branch 'master' of https://github.com/apache/incubator-spark into indexedrdd_graphx	2013-10-15 16:15:19 -07:00
Henry Saputra	3fed3e2283	Update pom.xml to use version 13 of the ASF parent pom and add mailingLists element.	2013-10-14 23:10:54 -07:00
Joseph E. Gonzalez	ef7c369092	merged with upstream changes	2013-10-14 22:56:42 -07:00
Matei Zaharia	c71499b779	Merge pull request #19 from aarondav/master-zk Standalone Scheduler fault tolerance using ZooKeeper This patch implements full distributed fault tolerance for standalone scheduler Masters. There is only one master Leader at a time, which is actively serving scheduling requests. If this Leader crashes, another master will eventually be elected, reconstruct the state from the first Master, and continue serving scheduling requests. Leader election is performed using the ZooKeeper leader election pattern. We try to minimize the use of ZooKeeper and the assumptions about ZooKeeper's behavior, so there is a layer of retries and session monitoring on top of the ZooKeeper client. Master failover follows directly from the single-node Master recovery via the file system (patch `d5a96fe`), save that the Master state is stored in ZooKeeper instead. Configuration: By default, no recovery mechanism is enabled (spark.deploy.recoveryMode = NONE). By setting spark.deploy.recoveryMode to ZOOKEEPER and setting spark.deploy.zookeeper.url to an appropriate ZooKeeper URL, ZooKeeper recovery mode is enabled. By setting spark.deploy.recoveryMode to FILESYSTEM and setting spark.deploy.recoveryDirectory to an appropriate directory accessible by the Master, we will keep the behavior of from `d5a96fe`. Additionally, places where a Master could be specificied by a spark:// url can now take comma-delimited lists to specify backup masters. Note that this is only used for registration of NEW Workers and application Clients. Once a Worker or Client has registered with the Master Leader, it is "in the system" and will never need to register again.	2013-10-10 17:16:42 -07:00
Prashant Sharma	26860639c5	Merge branch 'scala-2.10' of github.com:ScrapCodes/spark into scala-2.10 Conflicts: core/src/main/scala/org/apache/spark/scheduler/cluster/ClusterTaskSetManager.scala project/SparkBuild.scala	2013-10-10 09:42:23 +05:30
Prashant Sharma	7be75682b9	Merge branch 'master' into wip-merge-master Conflicts: bagel/pom.xml core/pom.xml core/src/test/scala/org/apache/spark/ui/UISuite.scala examples/pom.xml mllib/pom.xml pom.xml project/SparkBuild.scala repl/pom.xml streaming/pom.xml tools/pom.xml In scala 2.10, a shorter representation is used for naming artifacts so changed to shorter scala version for artifacts and made it a property in pom.	2013-10-08 11:29:40 +05:30
Patrick Wendell	aa9fb84994	Merging build changes in from 0.8	2013-10-05 22:07:00 -07:00
Martin Weindel	9b0c9c893d	scala 2.10 requires Java 1.6, using Scala 2.10.3, resolved maven-scala-plugin warning	2013-10-05 21:41:09 +02:00
Prashant Sharma	5829692885	Merge branch 'master' into scala-2.10 Conflicts: core/src/main/scala/org/apache/spark/ui/jobs/JobProgressUI.scala docs/_config.yml project/SparkBuild.scala repl/src/main/scala/org/apache/spark/repl/SparkILoop.scala	2013-10-01 11:57:24 +05:30
Aaron Davidson	f549ea33d3	Standalone Scheduler fault tolerance using ZooKeeper This patch implements full distributed fault tolerance for standalone scheduler Masters. There is only one master Leader at a time, which is actively serving scheduling requests. If this Leader crashes, another master will eventually be elected, reconstruct the state from the first Master, and continue serving scheduling requests. Leader election is performed using the ZooKeeper leader election pattern. We try to minimize the use of ZooKeeper and the assumptions about ZooKeeper's behavior, so there is a layer of retries and session monitoring on top of the ZooKeeper client. Master failover follows directly from the single-node Master recovery via the file system (patch 194ba4b8), save that the Master state is stored in ZooKeeper instead. Configuration: By default, no recovery mechanism is enabled (spark.deploy.recoveryMode = NONE). By setting spark.deploy.recoveryMode to ZOOKEEPER and setting spark.deploy.zookeeper.url to an appropriate ZooKeeper URL, ZooKeeper recovery mode is enabled. By setting spark.deploy.recoveryMode to FILESYSTEM and setting spark.deploy.recoveryDirectory to an appropriate directory accessible by the Master, we will keep the behavior of from 194ba4b8. Additionally, places where a Master could be specificied by a spark:// url can now take comma-delimited lists to specify backup masters. Note that this is only used for registration of NEW Workers and application Clients. Once a Worker or Client has registered with the Master Leader, it is "in the system" and will never need to register again. Forthcoming: Documentation, tests (! - only ad hoc testing has been performed so far) I do not intend for this commit to be merged until tests are added, but this patch should still be mostly reviewable until then.	2013-09-26 15:04:23 -07:00
Reynold Xin	3f283278b0	Removed scala -optimize flag.	2013-09-26 13:58:10 -07:00
Prashant Sharma	604dc40996	Sync with master and some build fixes	2013-09-26 11:40:02 +05:30
Prashant Sharma	7ff4c2d399	fixed maven build for scala 2.10	2013-09-26 10:48:24 +05:30
Patrick Wendell	6079721fa1	Update build version in master	2013-09-24 11:41:51 -07:00
Joseph E. Gonzalez	8b59fb72c4	Merging latest changes from spark main branch	2013-09-17 20:56:12 -07:00
Patrick Wendell	c856860c5b	Bumping Mesos version to 0.13.0	2013-09-15 12:46:26 -07:00
Prashant Sharma	a90e0eff59	version changed 2.9.3 -> 2.10 in shell script.	2013-09-15 12:47:20 +05:30
Prashant Sharma	383e151fd7	Merge branch 'master' of git://github.com/mesos/spark into scala-2.10 Conflicts: core/src/main/scala/org/apache/spark/SparkContext.scala project/SparkBuild.scala	2013-09-15 10:55:12 +05:30
Patrick Wendell	e9eba8c3ce	Use different Hadoop version for YARN artifacts. This uses a seperate Hadoop version for YARN artifact. This means when people link against spark-yarn, things will resolve correctly.	2013-09-13 15:34:57 -07:00
Patrick Wendell	45ec1cc90e	Add git scm url for publishing	2013-09-12 13:47:31 -07:00
Matei Zaharia	f117dc6d0d	Add explicit jets3t dependency, which is excluded in hadoop-client	2013-09-10 06:39:25 +00:00
Matei Zaharia	0456384939	Merge pull request #911 from pwendell/ganglia-sink Adding Manen dependency for Ganglia	2013-09-09 09:57:54 -07:00
Patrick Wendell	528fdbae97	Adding Manen dependency	2013-09-09 09:32:18 -07:00
Jey Kottalam	70661246fd	Fix YARN assembly generation under Maven	2013-09-06 11:31:16 -07:00
Prashant Sharma	4106ae9fbf	Merged with master	2013-09-06 17:53:01 +05:30
Matei Zaharia	59218bdd49	Add Apache parent POM	2013-09-02 18:34:03 -07:00
Matei Zaharia	5701eb92c7	Fix some URLs	2013-09-01 14:13:16 -07:00
Matei Zaharia	46eecd110a	Initial work to rename package to org.apache.spark	2013-09-01 14:13:13 -07:00
Matei Zaharia	666d93c294	Update Maven build to create assemblies expected by new scripts This includes the following changes: - The "assembly" package now builds in Maven by default, and creates an assembly containing both hadoop-client and Spark, unlike the old BigTop distribution assembly that skipped hadoop-client - There is now a bigtop-dist package to build the old BigTop assembly - The repl-bin package is no longer built by default since the scripts don't reply on it; instead it can be enabled with -Prepl-bin - Py4J is now included in the assembly/lib folder as a local Maven repo, so that the Maven package can link to it - run-example now adds the original Spark classpath as well because the Maven examples assembly lists spark-core and such as provided - The various Maven projects add a spark-yarn dependency correctly	2013-08-29 21:19:06 -07:00
Matei Zaharia	8d81358a05	Provide more memory for tests	2013-08-29 21:19:06 -07:00
Reynold Xin	9db1e50344	Revert "Merge pull request #841 from rxin/json" This reverts commit `1fb1b09928`, reversing changes made to `c69c48947d`.	2013-08-26 11:05:14 -07:00
Matei Zaharia	ae8ba83ef2	Merge pull request #855 from jey/update-build-docs Update build docs	2013-08-22 10:14:54 -07:00
Mark Hamstra	ff6f1b0500	Synced sbt and maven builds	2013-08-21 13:50:24 -07:00
Jey Kottalam	31644a011c	Use "hadoop.version" property when specifying Hadoop YARN version too	2013-08-21 13:24:28 -07:00
Reynold Xin	af602ba9d3	Downgraded default build hadoop version to 1.0.4.	2013-08-21 11:38:24 -07:00
Matei Zaharia	aa2b89d98d	Merge remote-tracking branch 'jey/hadoop-agnostic' Conflicts: core/src/main/scala/spark/PairRDDFunctions.scala	2013-08-20 10:14:15 -07:00
Jey Kottalam	bdd861c6c3	Fix Maven build with Hadoop 0.23.9	2013-08-18 18:28:57 -07:00
Jey Kottalam	ad580b94d5	Maven build now also works with YARN	2013-08-16 13:50:12 -07:00
Jey Kottalam	11b42a84db	Maven build now works with CDH hadoop-2.0.0-mr1	2013-08-16 13:50:12 -07:00
Jey Kottalam	353fab2440	Initial changes to make Maven build agnostic of hadoop version	2013-08-16 13:50:12 -07:00
Reynold Xin	c961c19b7b	Use the JSON formatter from Scala library and removed dependency on lift-json. It made the JSON creation slightly more complicated, but reduces one external dependency. The scala library also properly escape "/" (which lift-json doesn't).	2013-08-15 18:23:01 -07:00
Jey Kottalam	a0f0848463	Update default version of Hadoop to 1.2.1	2013-08-15 16:50:37 -07:00
Matei Zaharia	d9588183fa	Update to Mesos 0.12.1	2013-08-13 18:51:35 -07:00
Patrick Wendell	a0133bfbad	Merge pull request #784 from jerryshao/dev-metrics-servlet Add MetricsServlet for Spark metrics system	2013-08-13 09:28:18 -07:00
jerryshao	320e87e7ab	Add MetricsServlet for Spark metrics system	2013-08-12 13:23:23 +08:00
Alexander Pivovarov	ca28f2e639	Changed yarn.version to 2.0.5 in pom.xml	2013-08-10 22:50:04 -07:00
Matei Zaharia	5a4003c1ac	Update to Chill 0.3.1	2013-08-08 13:30:27 -07:00
Joseph E. Gonzalez	499a0d8383	Merged graphx from @rxin into master	2013-08-06 12:28:29 -07:00
Matei Zaharia	b2b86c2575	Merge pull request #753 from shivaram/glm-refactor Build changes for ML lib	2013-07-31 15:51:39 -07:00
Reynold Xin	311aae76a2	Added Snappy dependency to Maven build files.	2013-07-30 17:25:42 -07:00
Shivaram Venkataraman	48851d4dd9	Add bagel, mllib to SBT assembly. Also add jblas dependency to mllib pom.xml	2013-07-30 14:03:15 -07:00
Matei Zaharia	8eb8b52997	Fix Chill version in Maven	2013-07-25 08:58:02 -07:00
ryanlecompte	30a369a808	update pom.xml	2013-07-24 20:55:48 -07:00
Matei Zaharia	c258718606	Fix Maven build errors after previous commits	2013-07-24 16:12:32 -07:00
Matei Zaharia	5584ebcbd3	Merge pull request #675 from c0s/assembly Building spark assembly for further consumption of the Spark project with a deployed cluster	2013-07-24 11:46:46 -07:00
jerryshao	a79f6077f0	Add Maven metrics library dependency and code changes	2013-07-24 14:57:47 +08:00
Josh Rosen	c83680434b	Add JavaAPICompletenessChecker. This is used to find methods in the Scala API that need to be ported to the Java API. To use it: ./run spark.tools.JavaAPICompletenessChecker Conflicts: project/SparkBuild.scala run run2.cmd	2013-07-22 16:11:49 -07:00
Konstantin Boudnik	f4d514810e	Building spark assembly for further consumption of the Spark project with a deployed cluster	2013-07-21 11:47:29 -07:00
Matei Zaharia	af3c9d5042	Add Apache license headers and LICENSE and NOTICE files	2013-07-16 17:21:33 -07:00
Matei Zaharia	00a14deb6d	Update to latest Scala Maven plugin and allow Zinc external compiler	2013-07-16 11:52:20 -07:00
Prashant Sharma	e86d5dbaad	Merge branch 'master' into master-merge Conflicts: README.md core/pom.xml core/src/main/scala/spark/deploy/JsonProtocol.scala core/src/main/scala/spark/deploy/LocalSparkCluster.scala core/src/main/scala/spark/deploy/master/Master.scala core/src/main/scala/spark/deploy/master/MasterWebUI.scala core/src/main/scala/spark/deploy/worker/Worker.scala core/src/main/scala/spark/deploy/worker/WorkerWebUI.scala core/src/main/scala/spark/storage/BlockManagerUI.scala core/src/main/scala/spark/util/AkkaUtils.scala pom.xml project/SparkBuild.scala streaming/src/main/scala/spark/streaming/receivers/ActorReceiver.scala	2013-07-12 14:49:16 +05:30
Prashant Sharma	69ae7ea227	Removed some unnecessary code and fixed dependencies	2013-07-11 18:30:18 +05:30
Mark Hamstra	0b39d66f3f	pom cleanup	2013-07-08 16:07:09 -07:00
Matei Zaharia	fd6665122b	Fix some other references to Cloudera Avro and updated Avro version	2013-07-06 16:45:15 -07:00
Matei Zaharia	22161887ee	Merge pull request #676 from c0s/asf-avro Use standard ASF published avro module instead of a proprietory built one	2013-07-06 16:18:15 -07:00
Matei Zaharia	1ffadb2d9e	Merge remote-tracking branch 'pwendell/ui-updates' Conflicts: core/src/main/scala/spark/scheduler/DAGScheduler.scala core/src/main/scala/spark/util/AkkaUtils.scala pom.xml	2013-07-06 15:51:41 -07:00
Konstantin Boudnik	7687ed5292	Use standard ASF published avro module instead of a proprietory built one	2013-07-04 13:48:33 -07:00
Prashant Sharma	a5f1f6a907	Merge branch 'master' into master-merge Conflicts: core/pom.xml core/src/main/scala/spark/MapOutputTracker.scala core/src/main/scala/spark/RDD.scala core/src/main/scala/spark/RDDCheckpointData.scala core/src/main/scala/spark/SparkContext.scala core/src/main/scala/spark/Utils.scala core/src/main/scala/spark/api/python/PythonRDD.scala core/src/main/scala/spark/deploy/client/Client.scala core/src/main/scala/spark/deploy/master/MasterWebUI.scala core/src/main/scala/spark/deploy/worker/Worker.scala core/src/main/scala/spark/deploy/worker/WorkerWebUI.scala core/src/main/scala/spark/rdd/BlockRDD.scala core/src/main/scala/spark/rdd/ZippedRDD.scala core/src/main/scala/spark/scheduler/cluster/StandaloneSchedulerBackend.scala core/src/main/scala/spark/storage/BlockManager.scala core/src/main/scala/spark/storage/BlockManagerMaster.scala core/src/main/scala/spark/storage/BlockManagerMasterActor.scala core/src/main/scala/spark/storage/BlockManagerUI.scala core/src/main/scala/spark/util/AkkaUtils.scala core/src/test/scala/spark/SizeEstimatorSuite.scala pom.xml project/SparkBuild.scala repl/src/main/scala/spark/repl/SparkILoop.scala repl/src/test/scala/spark/repl/ReplSuite.scala streaming/src/main/scala/spark/streaming/StreamingContext.scala streaming/src/main/scala/spark/streaming/api/java/JavaStreamingContext.scala streaming/src/main/scala/spark/streaming/dstream/KafkaInputDStream.scala streaming/src/main/scala/spark/streaming/util/MasterFailureTest.scala	2013-07-03 11:43:26 +05:30
Matei Zaharia	39ae073b5c	Increase SLF4j version in Maven too	2013-06-30 17:11:14 -07:00
Matei Zaharia	5cfcd3c336	Remove Twitter4J specific repo since it's in Maven central	2013-06-29 15:37:27 -07:00
Reynold Xin	564d902d79	Merge branch 'master' of github.com:mesos/spark into graph Conflicts: run run2.cmd	2013-06-29 15:30:21 -07:00
Patrick Wendell	17f145f3bc	Updating Maven build	2013-06-22 10:31:36 -07:00
Matei Zaharia	7902baddc7	Update ASM to version 4.0	2013-06-19 13:34:30 +02:00
Prashant Sharma	6f28067f8d	Fixed maven build without netty fix	2013-06-14 21:03:21 +05:30
Mark Hamstra	3f96c6f27b	Fixed jvmArgs in maven build.	2013-06-12 17:24:22 -07:00
Reynold Xin	8abf4a8447	Merge branch 'master' of github.com:mesos/spark into graph	2013-06-10 12:28:43 -07:00
Konstantin Boudnik	d1286231e0	Sometime Maven build runs out of PermGen space.	2013-06-03 15:55:44 -07:00
Reynold Xin	b0403d3f2b	Merge branch 'master' of github.com:mesos/spark into graph Conflicts: run	2013-06-01 00:48:27 -07:00
Jey Kottalam	e7982c798e	Exclude old versions of Netty from Maven-based build	2013-05-18 21:24:58 -07:00
Reynold Xin	61cf176238	Added dependency on netty-all in Maven.	2013-05-16 14:31:26 -07:00
Reynold Xin	404f9ff617	Added derby dependency to Maven pom files for the JDBC Java test.	2013-05-14 23:28:34 -07:00
Jey Kottalam	4d8919d330	Update Maven build to Scala 2.9.3	2013-05-06 17:05:50 -07:00
Reynold Xin	f54bc544c5	Merge branch 'master' of github.com:mesos/spark into graph	2013-05-02 17:25:09 -07:00
Prashant Sharma	4041a2689e	Updated to latest stable scala 2.10.1 and akka 2.1.2	2013-05-01 11:35:35 +05:30
Mridul Muralidharan	dd515ca3ee	Attempt at fixing merge conflict	2013-04-24 09:24:17 +05:30
Mridul Muralidharan	5b85c715c8	Revert back to 2.0.2-alpha : 0.23.7 has protocol changes which break against cloudera	2013-04-24 02:57:51 +05:30
Mridul Muralidharan	8faf5c51c3	Patch from Thomas Graves to improve the YARN Client, and move to more production ready hadoop yarn branch	2013-04-24 02:31:57 +05:30
Prashant Sharma	185bb9525a	Manually merged scala-2.10 and master	2013-04-22 14:14:03 +05:30
Mridul Muralidharan	46779b4745	Move back to 2.0.2-alpha, since 2.0.3-alpha is not available in cloudera yet	2013-04-17 05:53:28 +05:30
Mridul Muralidharan	323ab8ff3b	Scala does not prevent variable shadowing ! Sick error due to it ...	2013-04-16 17:05:10 +05:30
Mridul Muralidharan	6798a09df8	Add support for building against hadoop2-yarn : adding new maven profile for it	2013-04-07 17:47:38 +05:30
Jey Kottalam	bc8ba222ff	Bump development version to 0.8.0	2013-03-28 15:42:01 -07:00
Reynold Xin	ba9d00c44a	Merge branch 'master' into graph Conflicts: run2.cmd	2013-03-18 18:30:14 +08:00
Mikhail Bautin	7fd2708eda	Add a log4j compile dependency to fix build in IntelliJ Also rename parent project to spark-parent (otherwise it shows up as "parent" in IntelliJ, which is very confusing).	2013-03-15 11:41:51 -07:00
Matei Zaharia	d4e29ea878	Update kryo-serializers version in pom.xml to match previous commit	2013-03-10 15:49:11 -07:00
Matei Zaharia	db9b90fdbd	Change version to 0.7.1-SNAPSHOT for development branch	2013-02-27 09:15:26 -08:00
Matei Zaharia	7e67c626ee	Change version number to 0.7.0	2013-02-25 20:30:47 -08:00
Matei Zaharia	6494cab19d	Update Hadoop dependency to 1.0.4	2013-02-25 15:38:21 -08:00
Tathagata Das	5ab37be983	Fixed class paths and dependencies based on Matei's comments.	2013-02-24 16:24:52 -08:00
Reynold Xin	19d3b059e3	Merge branch 'master' into graph	2013-02-19 12:44:05 -08:00
Reynold Xin	81c4d19c61	Maven and sbt build changes for SparkGraph.	2013-02-19 12:43:13 -08:00
Charles Reiss	6107957962	Merge remote-tracking branch 'base/master' into dag-sched-tests Conflicts: core/src/main/scala/spark/scheduler/DAGScheduler.scala	2013-02-02 00:33:30 -08:00
Mikhail Bautin	fe3eceab57	Remove activation of profiles by default See the discussion at https://github.com/mesos/spark/pull/355 for why default profile activation is a problem.	2013-01-31 13:30:41 -08:00
Charles Reiss	a34096a76d	Add easymock to POMs	2013-01-29 10:04:33 -08:00
Mikhail Bautin	325297e5c3	Add an Avro dependency to REPL to make it compile with Hadoop 2	2013-01-22 18:11:51 -08:00
Matei Zaharia	6e3754bf47	Add Maven build file for streaming, and fix some issues in SBT file As part of this, changed our Scala 2.9.2 Kafka library to be available as a local Maven repository, following the example in (http://blog.dub.podval.org/2010/01/maven-in-project-repository.html)	2013-01-20 19:22:24 -08:00
folone	a5403acd4e	Updated maven build for scala 2.10.	2013-01-20 14:42:16 +01:00
Mikhail Bautin	88d8f11365	Add missing dependency spray-json to Maven build	2013-01-13 00:46:25 -08:00
Shivaram Venkataraman	bbc56d85ed	Rename environment variable for hadoop profiles to hadoopVersion	2013-01-12 15:24:13 -08:00
Shivaram Venkataraman	9262522306	Activate hadoop2 profile in pom.xml with -Dhadoop=2	2013-01-10 22:07:34 -08:00
Shivaram Venkataraman	f7adb382ac	Activate hadoop1 if property hadoop is missing. hadoop2 can be activated now by using -Dhadoop -Phadoop2.	2013-01-08 03:19:43 -08:00
Shivaram Venkataraman	4bbe07e5ec	Activate hadoop1 profile by default for maven builds	2013-01-07 17:46:22 -08:00
Shivaram Venkataraman	fb3d4d5e85	Make default hadoop version 1.0.3 in pom.xml	2013-01-07 16:46:06 -08:00
Reynold Xin	a6bb41c6d3	Updated Kryo version for Maven pom file.	2012-12-21 16:25:50 -08:00
Thomas Dudziak	4af6cad37a	Fixed repl maven build to produce artifacts with the appropriate hadoop classifier and extracted repl fat-jar and debian packaging into a separate project to make Maven happy	2012-12-18 12:08:19 -08:00
Matei Zaharia	c10b229992	Merge pull request #319 from mbautin/cdh4.1.2 Bump CDH version for the Hadoop 2 profile to 4.1.2	2012-12-10 15:21:01 -08:00
Thomas Dudziak	c1d15ae3d5	Shaded repl jar for hadoop1 profile needs to include hadoop classes	2012-12-10 15:06:28 -08:00
Mikhail Bautin	450659079a	Bump CDH version for the Hadoop 2 profile to 4.1.2	2012-12-10 11:27:20 -08:00
Matei Zaharia	ccff0a089a	Use the same output directories that SBT had in subprojects This will make it easier to make the "run" script work with a Maven build	2012-12-10 10:58:56 -08:00
Thomas Dudziak	3b643e86bc	Updated versions in the pom.xml files to match current master	2012-11-27 17:50:42 -08:00
Thomas Dudziak	69297c64be	Addressed code review comments	2012-11-27 15:45:16 -08:00
Thomas Dudziak	811a32257b	Added maven and debian build files	2012-11-20 16:19:51 -08:00

... 3 4 5 6 7 ...

380 commits