ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Reynold Xin	662ee9f321	Merge pull request #114 from soulmachine/master A little revise for the document	2013-10-26 11:35:59 -07:00
soulmachine	2eed6bbd10	A little revise for the document	2013-10-26 15:13:57 +08:00
Patrick Wendell	4ba32678e0	Adding improved error message when multiple assembly jars are present. This can happen easily if building different hadoop versions.	2013-10-25 19:01:15 -07:00
Matei Zaharia	bab496c120	Merge pull request #108 from alig/master Changes to enable executing by using HDFS as a synchronization point between driver and executors, as well as ensuring executors exit properly.	2013-10-25 18:28:43 -07:00
Matei Zaharia	d307db6e55	Merge pull request #102 from tdas/transform Added new Spark Streaming operations New operations - transformWith which allows arbitrary 2-to-1 DStream transform, added to Scala and Java API - StreamingContext.transform to allow arbitrary n-to-1 DStream - leftOuterJoin and rightOuterJoin between 2 DStreams, added to Scala and Java API - missing variations of join and cogroup added to Scala Java API - missing JavaStreamingContext.union Updated a number of Java and Scala API docs	2013-10-25 17:26:06 -07:00
Ali Ghodsi	eef261c892	fixing comments on PR	2013-10-25 16:48:33 -07:00
Kyle Ellrott	8236d5dcc4	More changes to the graph/pom.xml to make it match the other subprojects	2013-10-25 15:52:44 -07:00
Matei Zaharia	85e2cab6f6	Merge pull request #111 from kayousterhout/ui_name Properly display the name of a stage in the UI. This fixes a bug introduced by the fix for SPARK-940, which changed the UI to display the RDD name rather than the stage name. As a result, no name for the stage was shown when using the Spark shell, which meant that there was no way to click on the stage to see more details (e.g., the running tasks). This commit changes the UI back to using the stage name. @pwendell -- let me know if this change was intentional	2013-10-25 14:46:06 -07:00
Tathagata Das	dc9570782a	Merge branch 'apache-master' into transform	2013-10-25 14:22:23 -07:00
Kyle Ellrott	d39ac2eb40	Merge https://github.com/amplab/graphx	2013-10-25 13:16:05 -07:00
Kay Ousterhout	a9c8d83aaf	Properly display the name of a stage in the UI. This fixes a bug introduced by the fix for SPARK-940, which changed the UI to display the RDD name rather than the stage name. As a result, no name for the stage was shown when using the Spark shell, which meant that there was no way to click on the stage to see more details (e.g., the running tasks). This commit changes the UI back to using the stage name.	2013-10-25 12:00:09 -07:00
Reynold Xin	ab35ec4f0f	Merge pull request #110 from pwendell/master Exclude jopt from kafka dependency. Kafka uses an older version of jopt that causes bad conflicts with the version used by spark-perf. It's not easy to remove this downstream because of the way that spark-perf uses Spark (by including a spark assembly as an unmanaged jar). This fixes the problem at its source by just never including it.	2013-10-25 10:16:18 -07:00
Patrick Wendell	af4a529f6e	Exclude jopt from kafka dependency. Kafka uses an older version of jopt that causes bad conflicts with the version used by spark-perf. It's not easy to remove this downstream because of the way that spark-perf uses Spark (by including a spark assembly as an unmanaged jar). This fixes the problem at its source by just never including it.	2013-10-25 09:20:30 -07:00
Reynold Xin	4f2c9438b4	Merge pull request #109 from pwendell/master Adding Java/Java Streaming versions of `repartition` with associated tests	2013-10-24 22:32:02 -07:00
Patrick Wendell	ad5f579cbf	Style fixes	2013-10-24 22:18:53 -07:00
Patrick Wendell	e5f6d5697b	Spacing fix	2013-10-24 22:08:06 -07:00
Patrick Wendell	a351fd4aed	Small spacing fix	2013-10-24 21:16:30 -07:00
Patrick Wendell	31e92b72e3	Adding Java versions and associated tests	2013-10-24 21:14:56 -07:00
Reynold Xin	99ad4a613a	Merge pull request #106 from pwendell/master Add a `repartition` operator. This patch adds an operator called repartition with more straightforward semantics than the current `coalesce` operator. There are a few use cases where this operator is useful: 1. If a user wants to increase the number of partitions in the RDD. This is more common now with streaming. E.g. a user is ingesting data on one node but they want to add more partitions to ensure parallelism of subsequent operations across threads or the cluster. Right now they have to call rdd.coalesce(numSplits, shuffle=true) - that's super confusing. 2. If a user has input data where the number of partitions is not known. E.g. > sc.textFile("some file").coalesce(50).... This is both vague semantically (am I growing or shrinking this RDD) but also, may not work correctly if the base RDD has fewer than 50 partitions. The new operator forces shuffles every time, so it will always produce exactly the number of new partitions. It also throws an exception rather than silently not-working if a bad input is passed. I am currently adding streaming tests (requires refactoring some of the test suite to allow testing at partition granularity), so this is not ready for merge yet. But feedback is welcome.	2013-10-24 17:08:39 -07:00
Patrick Wendell	39f6f75588	Some clean-up of tests	2013-10-24 16:43:33 -07:00
Tathagata Das	e962a6e6ee	Fixed accidental bug.	2013-10-24 15:17:26 -07:00
Patrick Wendell	9423532fab	Removing Java for now	2013-10-24 14:31:34 -07:00
Patrick Wendell	05ac9940ee	Adding tests	2013-10-24 14:31:34 -07:00
Patrick Wendell	2fda84fe3f	Always use a shuffle	2013-10-24 14:31:34 -07:00
Patrick Wendell	08c1a42d7d	Add a `repartition` operator. This patch adds an operator called repartition with more straightforward semantics than the current `coalesce` operator. There are a few use cases where this operator is useful: 1. If a user wants to increase the number of partitions in the RDD. This is more common now with streaming. E.g. a user is ingesting data on one node but they want to add more partitions to ensure parallelism of subsequent operations across threads or the cluster. Right now they have to call rdd.coalesce(numSplits, shuffle=true) - that's super confusing. 2. If a user has input data where the number of partitions is not known. E.g. > sc.textFile("some file").coalesce(50).... This is both vague semantically (am I growing or shrinking this RDD) but also, may not work correctly if the base RDD has fewer than 50 partitions. The new operator forces shuffles every time, so it will always produce exactly the number of new partitions. It also throws an exception rather than silently not-working if a bad input is passed. I am currently adding streaming tests (requires refactoring some of the test suite to allow testing at partition granularity), so this is not ready for merge yet. But feedback is welcome.	2013-10-24 14:31:33 -07:00
Ali Ghodsi	05a0df2b9e	Makes Spark SIMR ready.	2013-10-24 11:59:51 -07:00
Reynold Xin	6f82c42690	Merge pull request #34 from jegonzal/AnalyticsCleanup Analytics Cleanup	2013-10-24 11:09:46 -07:00
Tathagata Das	0400aba1c0	Merge branch 'apache-master' into transform	2013-10-24 11:05:00 -07:00
Tathagata Das	bacfe5ebca	Added JavaStreamingContext.transform	2013-10-24 10:56:24 -07:00
Kyle Ellrott	59ec6b85d0	Merge branch 'master' of https://github.com/amplab/graphx	2013-10-24 10:29:24 -07:00
Matei Zaharia	1dc776b863	Merge pull request #93 from kayousterhout/ui_new_state Show "GETTING_RESULTS" state in UI. This commit adds a set of calls using the SparkListener interface that indicate when a task is remotely fetching results, so that we can display this (potentially time-consuming) phase of execution to users through the UI.	2013-10-23 22:05:52 -07:00
Reynold Xin	c4b187d1db	Merge pull request #105 from pwendell/doc-fix Fixing broken links in programming guide Unfortunately these are broken in 0.8.0.	2013-10-23 21:56:18 -07:00
Patrick Wendell	4e093b88f8	Fixing broken links in programming guide	2013-10-23 21:28:23 -07:00
Kay Ousterhout	b45352e373	Clear akka frame size property in tests	2013-10-23 18:23:28 -07:00
Reynold Xin	a098438c48	Merge pull request #103 from JoshRosen/unpersist-fix Add unpersist() to JavaDoubleRDD and JavaPairRDD. This fixes a minor inconsistency where [unpersist() was only available on JavaRDD](https://mail-archives.apache.org/mod_mbox/incubator-spark-user/201310.mbox/%3CCE8D8748.68C0%25YannLuppo%40livenation.com%3E) and not JavaPairRDD / JavaDoubleRDD. I also added support for the new optional `blocking` argument added in 0.8. Please merge this into branch-0.8, too.	2013-10-23 18:03:08 -07:00
Kay Ousterhout	c42f5d1787	Fixed broken tests	2013-10-23 17:35:01 -07:00
Josh Rosen	210858ac02	Add unpersist() to JavaDoubleRDD and JavaPairRDD. Also add support for new optional `blocking` argument.	2013-10-23 17:27:01 -07:00
Kay Ousterhout	a5f8f54ecd	Merge remote-tracking branch 'upstream/master' into ui_new_state Conflicts: core/src/test/scala/org/apache/spark/scheduler/SparkListenerSuite.scala	2013-10-23 16:06:28 -07:00
Matei Zaharia	dadfc63b03	Fix Maven build to use MQTT repository	2013-10-23 15:29:22 -07:00
Matei Zaharia	dd659642e7	Merge pull request #64 from prabeesh/master MQTT Adapter for Spark Streaming MQTT is a machine-to-machine (M2M)/Internet of Things connectivity protocol. It was designed as an extremely lightweight publish/subscribe messaging transport. You may read more about it here http://mqtt.org/ Message Queue Telemetry Transport (MQTT) is an open message protocol for M2M communications. It enables the transfer of telemetry-style data in the form of messages from devices like sensors and actuators, to mobile phones, embedded systems on vehicles, or laptops and full scale computers. The protocol was invented by Andy Stanford-Clark of IBM, and Arlen Nipper of Cirrus Link Solutions This protocol enables a publish/subscribe messaging model in an extremely lightweight way. It is useful for connections with remote locations where line of code and network bandwidth is a constraint. MQTT is one of the widely used protocol for 'Internet of Things'. This protocol is getting much attraction as anything and everything is getting connected to internet and they all produce data. Researchers and companies predict some 25 billion devices will be connected to the internet by 2015. Plugin/Support for MQTT is available in popular MQs like RabbitMQ, ActiveMQ etc. Support for MQTT in Spark will help people with Internet of Things (IoT) projects to use Spark Streaming for their real time data processing needs (from sensors and other embedded devices etc).	2013-10-23 15:07:59 -07:00
Tathagata Das	9fccb17a5f	Removed Function3.call() based on Josh's comment.	2013-10-23 12:07:07 -07:00
Joseph E. Gonzalez	9d1e5946fe	Merge branch 'master' of https://github.com/amplab/graphx into AnalyticsCleanup	2013-10-23 00:26:06 -07:00
Joseph E. Gonzalez	c30624dcbb	Adding dynamic pregel, fixing bugs in PageRank, and adding basic analytics unit tests.	2013-10-23 00:25:45 -07:00
Tathagata Das	fe8626efd1	Merge branch 'apache-master' into transform	2013-10-22 23:40:40 -07:00
Tathagata Das	72d2e1dd77	Fixed bug in Java transformWith, added more Java testcases for transform and transformWith, added missing variations of Java join and cogroup, updated various Scala and Java API docs.	2013-10-22 23:35:51 -07:00
Matei Zaharia	452aa36d67	Merge pull request #97 from ewencp/pyspark-system-properties Add classmethod to SparkContext to set system properties. Add a new classmethod to SparkContext to set system properties like is possible in Scala/Java. Unlike the Java/Scala implementations, there's no access to System until the JVM bridge is created. Since SparkContext handles that, move the initialization of the JVM connection to a separate classmethod that can safely be called repeatedly as long as the same instance (or no instance) is provided.	2013-10-22 23:15:33 -07:00
Joseph E. Gonzalez	0bd92ed8d0	Fixing a bug in pregel where the initial vertex-program results are lost.	2013-10-22 19:10:51 -07:00
Reynold Xin	9dfcf53a08	Merge pull request #100 from JoshRosen/spark-902 Remove redundant Java Function call() definitions This should fix [SPARK-902](https://spark-project.atlassian.net/browse/SPARK-902), an issue where some Java API Function classes could cause AbstractMethodErrors when user code is compiled using the Eclipse compiler. Thanks to @MartinWeindel for diagnosing this problem. (This PR subsumes #30).	2013-10-22 16:01:42 -07:00
Dan Crankshaw	49d5cdac33	Merge pull request #30 from jegonzal/VertexSetRDD_Tests Testing and Documenting VertexSetRDD	2013-10-22 15:38:02 -07:00
Joseph E. Gonzalez	be8269af07	Merge branch 'VertexSetRDD_Tests' into AnalyticsCleanup	2013-10-22 15:03:49 -07:00

... 5 6 7 8 9 ...

4918 commits