ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
tgravescs	17bb9a27b2	Add mockito to the sbt build	2013-11-11 10:01:23 -06:00
Josh Rosen	a37ff0f1db	Add spark-tools assembly to spark-class classpath. This allows the JavaAPICompletenessChecker to be run with Spark 0.8+.	2013-11-09 13:42:45 -08:00
Russell Cardullo	ef85a51f85	Add graphite sink for metrics This adds a metrics sink for graphite. The sink must be configured with the host and port of a graphite node and optionally may be configured with a prefix that will be prepended to all metrics that are sent to graphite.	2013-11-08 16:36:03 -08:00
Patrick Wendell	af4a529f6e	Exclude jopt from kafka dependency. Kafka uses an older version of jopt that causes bad conflicts with the version used by spark-perf. It's not easy to remove this downstream because of the way that spark-perf uses Spark (by including a spark assembly as an unmanaged jar). This fixes the problem at its source by just never including it.	2013-10-25 09:20:30 -07:00
Prashant Sharma	c77ca1fed9	Updating to latest akka 2.2.3, which fixes our only failing Driver Suite	2013-10-24 16:11:40 +05:30
Matei Zaharia	dadfc63b03	Fix Maven build to use MQTT repository	2013-10-23 15:29:22 -07:00
Matei Zaharia	dd659642e7	Merge pull request #64 from prabeesh/master MQTT Adapter for Spark Streaming MQTT is a machine-to-machine (M2M)/Internet of Things connectivity protocol. It was designed as an extremely lightweight publish/subscribe messaging transport. You may read more about it here http://mqtt.org/ Message Queue Telemetry Transport (MQTT) is an open message protocol for M2M communications. It enables the transfer of telemetry-style data in the form of messages from devices like sensors and actuators, to mobile phones, embedded systems on vehicles, or laptops and full scale computers. The protocol was invented by Andy Stanford-Clark of IBM, and Arlen Nipper of Cirrus Link Solutions This protocol enables a publish/subscribe messaging model in an extremely lightweight way. It is useful for connections with remote locations where line of code and network bandwidth is a constraint. MQTT is one of the widely used protocol for 'Internet of Things'. This protocol is getting much attraction as anything and everything is getting connected to internet and they all produce data. Researchers and companies predict some 25 billion devices will be connected to the internet by 2015. Plugin/Support for MQTT is available in popular MQs like RabbitMQ, ActiveMQ etc. Support for MQTT in Spark will help people with Internet of Things (IoT) projects to use Spark Streaming for their real time data processing needs (from sensors and other embedded devices etc).	2013-10-23 15:07:59 -07:00
Matei Zaharia	731c94e91d	Merge pull request #56 from jerryshao/kafka-0.8-dev Upgrade Kafka 0.7.2 to Kafka 0.8.0-beta1 for Spark Streaming Conflicts: streaming/pom.xml	2013-10-21 23:31:38 -07:00
Matei Zaharia	8de9706b86	Merge pull request #66 from shivaram/sbt-assembly-deps Add SBT target to assemble dependencies This pull request is an attempt to address the long assembly build times during development. Instead of rebuilding the assembly jar for every Spark change, this pull request adds a new SBT target `spark` that packages all the Spark modules and builds an assembly of the dependencies. So the work flow that should work now would be something like ``` ./sbt/sbt spark # Doing this once should suffice ## Make changes ./sbt/sbt compile ./sbt/sbt test or ./spark-shell ```	2013-10-18 20:32:39 -07:00
prabeesh	29245605bf	remove unused dependency	2013-10-17 09:57:30 +05:30
Shivaram Venkataraman	0a4b76fcc2	Rename SBT target to assemble-deps.	2013-10-16 17:05:46 -07:00
prabeesh	06de3d516d	added mqtt adapter library dependencies	2013-10-16 13:38:37 +05:30
Patrick Wendell	35befe07bb	Fixing spark streaming example and a bug in examples build. - Examples assembly included a log4j.properties which clobbered Spark's - Example had an error where some classes weren't serializable - Did some other clean-up in this example	2013-10-15 22:55:43 -07:00
Shivaram Venkataraman	051cd960d9	Merge branch 'master' of https://github.com/apache/incubator-spark into sbt-assembly-deps	2013-10-15 13:26:40 -07:00
jerryshao	c23cd72b4b	Upgrade Kafka 0.7.2 to Kafka 0.8.0-beta1 for Spark Streaming	2013-10-12 20:00:42 +08:00
Shivaram Venkataraman	c441904bce	Add a comment and exclude tools	2013-10-11 18:23:15 -07:00
Matei Zaharia	c71499b779	Merge pull request #19 from aarondav/master-zk Standalone Scheduler fault tolerance using ZooKeeper This patch implements full distributed fault tolerance for standalone scheduler Masters. There is only one master Leader at a time, which is actively serving scheduling requests. If this Leader crashes, another master will eventually be elected, reconstruct the state from the first Master, and continue serving scheduling requests. Leader election is performed using the ZooKeeper leader election pattern. We try to minimize the use of ZooKeeper and the assumptions about ZooKeeper's behavior, so there is a layer of retries and session monitoring on top of the ZooKeeper client. Master failover follows directly from the single-node Master recovery via the file system (patch `d5a96fe`), save that the Master state is stored in ZooKeeper instead. Configuration: By default, no recovery mechanism is enabled (spark.deploy.recoveryMode = NONE). By setting spark.deploy.recoveryMode to ZOOKEEPER and setting spark.deploy.zookeeper.url to an appropriate ZooKeeper URL, ZooKeeper recovery mode is enabled. By setting spark.deploy.recoveryMode to FILESYSTEM and setting spark.deploy.recoveryDirectory to an appropriate directory accessible by the Master, we will keep the behavior of from `d5a96fe`. Additionally, places where a Master could be specificied by a spark:// url can now take comma-delimited lists to specify backup masters. Note that this is only used for registration of NEW Workers and application Clients. Once a Worker or Client has registered with the Master Leader, it is "in the system" and will never need to register again.	2013-10-10 17:16:42 -07:00
Prashant Sharma	26860639c5	Merge branch 'scala-2.10' of github.com:ScrapCodes/spark into scala-2.10 Conflicts: core/src/main/scala/org/apache/spark/scheduler/cluster/ClusterTaskSetManager.scala project/SparkBuild.scala	2013-10-10 09:42:23 +05:30
Shivaram Venkataraman	484166d520	Add new SBT target for dependency assembly	2013-10-09 04:24:34 -07:00
Prashant Sharma	7be75682b9	Merge branch 'master' into wip-merge-master Conflicts: bagel/pom.xml core/pom.xml core/src/test/scala/org/apache/spark/ui/UISuite.scala examples/pom.xml mllib/pom.xml pom.xml project/SparkBuild.scala repl/pom.xml streaming/pom.xml tools/pom.xml In scala 2.10, a shorter representation is used for naming artifacts so changed to shorter scala version for artifacts and made it a property in pom.	2013-10-08 11:29:40 +05:30
Reynold Xin	213b70a2db	Merge pull request #31 from sundeepn/branch-0.8 Resolving package conflicts with hadoop 0.23.9 Hadoop 0.23.9 is having a package conflict with easymock's dependencies. (cherry picked from commit `023e3fdf00`) Signed-off-by: Reynold Xin <rxin@apache.org>	2013-10-07 10:54:22 -07:00
Martin Weindel	9b0c9c893d	scala 2.10 requires Java 1.6, using Scala 2.10.3, resolved maven-scala-plugin warning	2013-10-05 21:41:09 +02:00
Prashant Sharma	c810ee0690	Merge branch 'master' into scala-2.10 Conflicts: core/src/test/scala/org/apache/spark/DistributedSuite.scala project/SparkBuild.scala	2013-10-05 15:52:57 +05:30
Du Li	9fd6bba60d	ask ivy/sbt to check local maven repo under ~/.m2	2013-10-01 15:46:51 -07:00
Prashant Sharma	5829692885	Merge branch 'master' into scala-2.10 Conflicts: core/src/main/scala/org/apache/spark/ui/jobs/JobProgressUI.scala docs/_config.yml project/SparkBuild.scala repl/src/main/scala/org/apache/spark/repl/SparkILoop.scala	2013-10-01 11:57:24 +05:30
Aaron Davidson	f549ea33d3	Standalone Scheduler fault tolerance using ZooKeeper This patch implements full distributed fault tolerance for standalone scheduler Masters. There is only one master Leader at a time, which is actively serving scheduling requests. If this Leader crashes, another master will eventually be elected, reconstruct the state from the first Master, and continue serving scheduling requests. Leader election is performed using the ZooKeeper leader election pattern. We try to minimize the use of ZooKeeper and the assumptions about ZooKeeper's behavior, so there is a layer of retries and session monitoring on top of the ZooKeeper client. Master failover follows directly from the single-node Master recovery via the file system (patch 194ba4b8), save that the Master state is stored in ZooKeeper instead. Configuration: By default, no recovery mechanism is enabled (spark.deploy.recoveryMode = NONE). By setting spark.deploy.recoveryMode to ZOOKEEPER and setting spark.deploy.zookeeper.url to an appropriate ZooKeeper URL, ZooKeeper recovery mode is enabled. By setting spark.deploy.recoveryMode to FILESYSTEM and setting spark.deploy.recoveryDirectory to an appropriate directory accessible by the Master, we will keep the behavior of from 194ba4b8. Additionally, places where a Master could be specificied by a spark:// url can now take comma-delimited lists to specify backup masters. Note that this is only used for registration of NEW Workers and application Clients. Once a Worker or Client has registered with the Master Leader, it is "in the system" and will never need to register again. Forthcoming: Documentation, tests (! - only ad hoc testing has been performed so far) I do not intend for this commit to be merged until tests are added, but this patch should still be mostly reviewable until then.	2013-09-26 15:04:23 -07:00
Reynold Xin	3f283278b0	Removed scala -optimize flag.	2013-09-26 13:58:10 -07:00
Reynold Xin	c514cd1587	Merge pull request #930 from holdenk/master Add mapPartitionsWithIndex	2013-09-26 13:48:20 -07:00
Prashant Sharma	604dc40996	Sync with master and some build fixes	2013-09-26 11:40:02 +05:30
Prashant Sharma	7ff4c2d399	fixed maven build for scala 2.10	2013-09-26 10:48:24 +05:30
Patrick Wendell	6079721fa1	Update build version in master	2013-09-24 11:41:51 -07:00
Prashant Sharma	276c37a51c	Akka 2.2 migration	2013-09-22 08:20:12 +05:30
Patrick Wendell	c856860c5b	Bumping Mesos version to 0.13.0	2013-09-15 12:46:26 -07:00
Prashant Sharma	383e151fd7	Merge branch 'master' of git://github.com/mesos/spark into scala-2.10 Conflicts: core/src/main/scala/org/apache/spark/SparkContext.scala project/SparkBuild.scala	2013-09-15 10:55:12 +05:30
Prashant Sharma	20c65bc334	Fixed repl suite	2013-09-15 10:43:06 +05:30
Holden Karau	68068977b8	Fix build on ubuntu	2013-09-14 20:51:11 -07:00
Patrick Wendell	91a59e6b10	Merge pull request #919 from mateiz/jets3t Add explicit jets3t dependency, which is excluded in hadoop-client	2013-09-11 10:21:48 -07:00
Patrick Wendell	0c1985b153	Fix HDFS access bug with assembly build. Due to this change in HDFS: https://issues.apache.org/jira/browse/HADOOP-7549 there is a bug when using the new assembly builds. The symptom is that any HDFS access results in an exception saying "No filesystem for scheme 'hdfs'". This adds a merge strategy in the assembly build which fixes the problem.	2013-09-10 22:05:13 -07:00
Matei Zaharia	f117dc6d0d	Add explicit jets3t dependency, which is excluded in hadoop-client	2013-09-10 06:39:25 +00:00
Patrick Wendell	f68848d95d	Merge pull request #906 from pwendell/ganglia-sink Clean-up of Metrics Code/Docs and Add Ganglia Sink	2013-09-08 18:32:16 -07:00
Matei Zaharia	0b957997ad	Merge pull request #908 from pwendell/master Fix target JVM version in scala build	2013-09-08 15:30:16 -07:00
Patrick Wendell	27bd74c8ad	Fix target JVM version in scala build	2013-09-08 14:37:45 -07:00
Patrick Wendell	8de8ee5d3c	Ganglia sink	2013-09-08 10:08:18 -07:00
Patrick Wendell	a8e376ec0f	Merge pull request #904 from pwendell/master Adding Apache license to two files	2013-09-07 21:16:01 -07:00
Patrick Wendell	6d2198643c	Adding Apache license to two files	2013-09-07 20:46:58 -07:00
Jey Kottalam	30a32c8335	Minor YARN build cleanups	2013-09-06 11:31:16 -07:00
Prashant Sharma	4106ae9fbf	Merged with master	2013-09-06 17:53:01 +05:30
Matei Zaharia	59218bdd49	Add Apache parent POM	2013-09-02 18:34:03 -07:00
Matei Zaharia	5701eb92c7	Fix some URLs	2013-09-01 14:13:16 -07:00
Matei Zaharia	46eecd110a	Initial work to rename package to org.apache.spark	2013-09-01 14:13:13 -07:00
Matei Zaharia	666d93c294	Update Maven build to create assemblies expected by new scripts This includes the following changes: - The "assembly" package now builds in Maven by default, and creates an assembly containing both hadoop-client and Spark, unlike the old BigTop distribution assembly that skipped hadoop-client - There is now a bigtop-dist package to build the old BigTop assembly - The repl-bin package is no longer built by default since the scripts don't reply on it; instead it can be enabled with -Prepl-bin - Py4J is now included in the assembly/lib folder as a local Maven repo, so that the Maven package can link to it - run-example now adds the original Spark classpath as well because the Maven examples assembly lists spark-core and such as provided - The various Maven projects add a spark-yarn dependency correctly	2013-08-29 21:19:06 -07:00
Matei Zaharia	8d81358a05	Provide more memory for tests	2013-08-29 21:19:06 -07:00
Matei Zaharia	53cd50c069	Change build and run instructions to use assemblies This commit makes Spark invocation saner by using an assembly JAR to find all of Spark's dependencies instead of adding all the JARs in lib_managed. It also packages the examples into an assembly and uses that as SPARK_EXAMPLES_JAR. Finally, it replaces the old "run" script with two better-named scripts: "run-examples" for examples, and "spark-class" for Spark internal classes (e.g. REPL, master, etc). This is also designed to minimize the confusion people have in trying to use "run" to run their own classes; it's not meant to do that, but now at least if they look at it, they can modify run-examples to do a decent job for them. As part of this, Bagel's examples are also now properly moved to the examples package instead of bagel.	2013-08-29 21:19:04 -07:00
Reynold Xin	9db1e50344	Revert "Merge pull request #841 from rxin/json" This reverts commit `1fb1b09928`, reversing changes made to `c69c48947d`.	2013-08-26 11:05:14 -07:00
Jey Kottalam	a9db1b7b6e	Upgrade SBT IDE project generators	2013-08-23 10:27:18 -07:00
Jey Kottalam	b7f9e6374a	Fix SBT generation of IDE project files	2013-08-23 10:26:37 -07:00
Jey Kottalam	281b6c5f28	Re-add removed dependency on 'commons-daemon' Fixes SBT build under Hadoop 0.23.9 and 2.0.4	2013-08-22 15:45:45 -07:00
Matei Zaharia	ae8ba83ef2	Merge pull request #855 from jey/update-build-docs Update build docs	2013-08-22 10:14:54 -07:00
Matei Zaharia	8a36fd09dd	Merge pull request #854 from markhamstra/pomUpdate Synced sbt and maven builds to use the same dependencies, etc.	2013-08-22 10:13:35 -07:00
Jey Kottalam	f9cc1fbf27	Remove references to unsupported Hadoop versions	2013-08-21 17:14:36 -07:00
Mark Hamstra	ff6f1b0500	Synced sbt and maven builds	2013-08-21 13:50:24 -07:00
Reynold Xin	af602ba9d3	Downgraded default build hadoop version to 1.0.4.	2013-08-21 11:38:24 -07:00
Matei Zaharia	aa2b89d98d	Merge remote-tracking branch 'jey/hadoop-agnostic' Conflicts: core/src/main/scala/spark/PairRDDFunctions.scala	2013-08-20 10:14:15 -07:00
Jey Kottalam	6f6944c807	Update SBT build to use simpler fix for Hadoop 0.23.9	2013-08-19 12:33:13 -07:00
Jey Kottalam	67b593607c	Rename YARN build flag to SPARK_WITH_YARN	2013-08-16 14:00:05 -07:00
Jey Kottalam	b1d99744a8	Fix SBT build under Hadoop 0.23.x	2013-08-16 13:50:12 -07:00
Jey Kottalam	8add2d7a59	Fix repl/assembly when YARN enabled	2013-08-16 13:50:12 -07:00
Jey Kottalam	3f98eff63a	Allow make-distribution.sh to specify Hadoop version used	2013-08-16 13:50:09 -07:00
Reynold Xin	c961c19b7b	Use the JSON formatter from Scala library and removed dependency on lift-json. It made the JSON creation slightly more complicated, but reduces one external dependency. The scala library also properly escape "/" (which lift-json doesn't).	2013-08-15 18:23:01 -07:00
Jey Kottalam	a0f0848463	Update default version of Hadoop to 1.2.1	2013-08-15 16:50:37 -07:00
Jey Kottalam	cb4ef19214	yarn support	2013-08-15 16:50:37 -07:00
Jey Kottalam	273b499b9a	yarn sbt	2013-08-15 16:50:37 -07:00
Jey Kottalam	69c3bbf688	dynamically detect hadoop version	2013-08-15 16:50:37 -07:00
Matei Zaharia	d9588183fa	Update to Mesos 0.12.1	2013-08-13 18:51:35 -07:00
jerryshao	320e87e7ab	Add MetricsServlet for Spark metrics system	2013-08-12 13:23:23 +08:00
Matei Zaharia	dce5e47435	Merge pull request #800 from dlyubimov/HBASE_VERSION Pull HBASE_VERSION in the head of sbt build	2013-08-09 21:53:45 -07:00
Matei Zaharia	cd247ba5bb	Merge pull request #786 from shivaram/mllib-java Java fixes, tests and examples for ALS, KMeans	2013-08-09 20:41:13 -07:00
Dmitriy Lyubimov	27f674f82b	fewer words	2013-08-09 13:54:41 -07:00
Dmitriy Lyubimov	ae95b57469	Pull HBASE_VERSION in the head of sbt build	2013-08-09 12:45:18 -07:00
Matei Zaharia	5a4003c1ac	Update to Chill 0.3.1	2013-08-08 13:30:27 -07:00
Shivaram Venkataraman	471fbadd0c	Java examples, tests for KMeans and ALS - Changes ALS to accept RDD[Rating] instead of (Int, Int, Double) making it easier to call from Java - Renames class methods from `train` to `run` to enable static methods to be called from Java. - Add unit tests which check if both static / class methods can be called. - Also add examples which port the main() function in ALS, KMeans to the examples project. Couple of minor changes to existing code: - Add a toJavaRDD method in RDD to convert scala RDD to java RDD easily - Workaround a bug where using double[] from Java leads to class cast exception in KMeans init	2013-08-06 15:43:46 -07:00
Matei Zaharia	e466a55a6b	Revert Mesos version to 0.9 since the 0.12 artifact has target Java 7	2013-08-01 15:45:21 -07:00
Matei Zaharia	b2b86c2575	Merge pull request #753 from shivaram/glm-refactor Build changes for ML lib	2013-07-31 15:51:39 -07:00
Matei Zaharia	14bf2fe039	Merge pull request #749 from benh/spark-executor-uri Added property 'spark.executor.uri' for launching on Mesos.	2013-07-31 14:18:16 -07:00
Shivaram Venkataraman	15fd0d619d	Add mllib, bagel to repl dependencies Also don't build an assembly jar for them	2013-07-30 18:31:11 -07:00
Reynold Xin	3b1ced83fb	Exclude older version of Snappy in streaming and examples.	2013-07-30 17:25:36 -07:00
Reynold Xin	368c58eac5	Merge branch 'lazy_file_open' of github.com:lyogavin/spark into compression Conflicts: project/SparkBuild.scala	2013-07-30 16:04:18 -07:00
Shivaram Venkataraman	48851d4dd9	Add bagel, mllib to SBT assembly. Also add jblas dependency to mllib pom.xml	2013-07-30 14:03:15 -07:00
Benjamin Hindman	f6f46455eb	Added property 'spark.executor.uri' for launching on Mesos without requiring Spark to be installed. Using 'make_distribution.sh' a user can put a Spark distribution at a URI supported by Mesos (e.g., 'hdfs://...') and then set that when launching their job. Also added SPARK_EXECUTOR_URI for the REPL.	2013-07-29 23:32:52 -07:00
ryanlecompte	8e0939f5a9	refactor Kryo serializer support to use chill/chill-java	2013-07-24 20:43:57 -07:00
jerryshao	5730193e0c	Fix some typos	2013-07-24 14:57:47 +08:00
jerryshao	576528f0f9	Add dependency of Codahale's metrics library	2013-07-24 14:57:46 +08:00
Josh Rosen	c83680434b	Add JavaAPICompletenessChecker. This is used to find methods in the Scala API that need to be ported to the Java API. To use it: ./run spark.tools.JavaAPICompletenessChecker Conflicts: project/SparkBuild.scala run run2.cmd	2013-07-22 16:11:49 -07:00
Liang-Chi Hsieh	d1738d72ba	also exclude asm for hadoop2. hadoop1 looks like no need to do that too.	2013-07-20 00:37:24 +08:00
Liang-Chi Hsieh	3aad452653	fix a bug in build process that pulls in two versionf of ASM.	2013-07-19 02:29:46 +08:00
Matei Zaharia	cad48edb70	Merge pull request #708 from ScrapCodes/dependencies-upgrade Dependency upgrade Akka 2.0.3 -> 2.0.5	2013-07-16 21:41:28 -07:00
Matei Zaharia	af3c9d5042	Add Apache license headers and LICENSE and NOTICE files	2013-07-16 17:21:33 -07:00
Prashant Sharma	2748e73eb9	Dependency upgrade Akka 2.0.3 -> 2.0.5	2013-07-16 16:08:46 +05:30
Prashant Sharma	9d7781c4e1	Adding commons io as dependency	2013-07-15 12:03:48 +05:30
Prashant Sharma	a3494d405d	Merge branch 'master' of github.com:mesos/spark into scala-2.10 Conflicts: core/src/main/scala/spark/Utils.scala core/src/test/scala/spark/ui/UISuite.scala project/SparkBuild.scala run	2013-07-15 11:15:55 +05:30

1 2 3 4 5 ...

372 commits