ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Ankur Dave	098768e0b9	Merge pull request #37 from jegonzal/AnalyticsCleanup Updated Connected Components and Pregel Docs	2013-10-29 15:08:36 -07:00
Joseph E. Gonzalez	08c7b040d6	Documented the VertexSetRDD	2013-10-29 15:03:13 -07:00
Joseph E. Gonzalez	ede329336d	Fixing a scaladoc bug in graph generators.	2013-10-29 14:50:12 -07:00
Joseph E. Gonzalez	15958ca65a	Reindenting documentation.	2013-10-29 14:01:24 -07:00
Joseph E. Gonzalez	d316cad9b1	Documented Graph.appy functions.	2013-10-29 13:58:04 -07:00
Joseph E. Gonzalez	19da8820fc	Minor modifications to documentation.	2013-10-29 11:06:06 -07:00
Joseph E. Gonzalez	77626d1507	Adding collect neighbors and documenting GraphOps.	2013-10-29 11:05:42 -07:00
Joseph E. Gonzalez	942de98433	Making suggested changes.	2013-10-29 10:19:49 -07:00
Reynold Xin	f0e23a023c	Merge pull request #119 from soulmachine/master A little revise for the document	2013-10-29 01:41:44 -04:00
Joseph E. Gonzalez	d6a902f309	Finished updating connected components to used Pregel like abstraction and created a series of tests in the AnalyticsSuite.	2013-10-28 11:52:26 -07:00
soulmachine	a197137fde	A little revise for the document	2013-10-29 00:28:56 +08:00
Josh Rosen	2d7cf6a271	Restructure BlockInfo fields to reduce memory use.	2013-10-27 23:01:03 -07:00
Matei Zaharia	aec9bf9060	Merge pull request #112 from kayousterhout/ui_task_attempt_id Display both task ID and task attempt ID in UI, and rename taskId to taskAttemptId Previously only the task attempt ID was shown in the UI; this was confusing because the job can be shown as complete while there are tasks still running. Showing the task ID in addition to the attempt ID makes it clear which tasks are redundant. This commit also renames taskId to taskAttemptId in TaskInfo and in the local/cluster schedulers. This identifier was used to uniquely identify attempts, not tasks, so the current naming was confusing. The new naming is also more consistent with map reduce.	2013-10-27 19:32:00 -07:00
Reynold Xin	d4df4749a8	Merge pull request #115 from aarondav/shuffle-fix Eliminate extra memory usage when shuffle file consolidation is disabled Otherwise, we see SPARK-946 even when shuffle file consolidation is disabled. Fixing SPARK-946 is still forthcoming.	2013-10-27 22:11:21 -04:00
Stephen Haberman	a6ae2b4832	Handle ConcurrentModificationExceptions in SparkContext init. System.getProperties.toMap will fail-fast when concurrently modified, and it seems like some other thread started by SparkContext does a System.setProperty during it's initialization. Handle this by just looping on ConcurrentModificationException, which seems the safest, since the non-fail-fast methods (Hastable.entrySet) have undefined behavior under concurrent modification.	2013-10-27 14:08:32 -05:00
Joseph E. Gonzalez	a2287ae138	Implementing connected components on top of pregel like abstraction.	2013-10-27 10:42:11 -07:00
Aaron Davidson	4261e834cb	Use flag instead of name check.	2013-10-26 23:53:38 -07:00
Aaron Davidson	596f18479e	Eliminate extra memory usage when shuffle file consolidation is disabled Otherwise, we see SPARK-946 even when shuffle file consolidation is disabled. Fixing SPARK-946 is still forthcoming.	2013-10-26 22:35:01 -07:00
Kay Ousterhout	ae22b4dd99	Display both task ID and task index in UI	2013-10-26 22:18:39 -07:00
Joseph E. Gonzalez	6a0fbc0374	Updating the GraphLab API to match the changes made to the Pregel API.	2013-10-26 15:44:19 -07:00
Joseph E. Gonzalez	08024c938c	Adding more documentation to the Pregel API as well as additional functionality including the ability to specify the edge direction along which messages are computed.	2013-10-26 15:42:51 -07:00
Joseph E. Gonzalez	00e73833cc	Fixing a bug in reverse edge direction.	2013-10-26 15:10:30 -07:00
Patrick Wendell	e018f2d0ae	Merge pull request #113 from pwendell/master Improve error message when multiple assembly jars are present. This can happen easily if building different hadoop versions. Right now it gives a class not found exception.	2013-10-26 11:39:15 -07:00
Reynold Xin	662ee9f321	Merge pull request #114 from soulmachine/master A little revise for the document	2013-10-26 11:35:59 -07:00
soulmachine	2eed6bbd10	A little revise for the document	2013-10-26 15:13:57 +08:00
Patrick Wendell	4ba32678e0	Adding improved error message when multiple assembly jars are present. This can happen easily if building different hadoop versions.	2013-10-25 19:01:15 -07:00
Matei Zaharia	bab496c120	Merge pull request #108 from alig/master Changes to enable executing by using HDFS as a synchronization point between driver and executors, as well as ensuring executors exit properly.	2013-10-25 18:28:43 -07:00
Matei Zaharia	d307db6e55	Merge pull request #102 from tdas/transform Added new Spark Streaming operations New operations - transformWith which allows arbitrary 2-to-1 DStream transform, added to Scala and Java API - StreamingContext.transform to allow arbitrary n-to-1 DStream - leftOuterJoin and rightOuterJoin between 2 DStreams, added to Scala and Java API - missing variations of join and cogroup added to Scala Java API - missing JavaStreamingContext.union Updated a number of Java and Scala API docs	2013-10-25 17:26:06 -07:00
Ali Ghodsi	eef261c892	fixing comments on PR	2013-10-25 16:48:33 -07:00
Kyle Ellrott	8236d5dcc4	More changes to the graph/pom.xml to make it match the other subprojects	2013-10-25 15:52:44 -07:00
Matei Zaharia	85e2cab6f6	Merge pull request #111 from kayousterhout/ui_name Properly display the name of a stage in the UI. This fixes a bug introduced by the fix for SPARK-940, which changed the UI to display the RDD name rather than the stage name. As a result, no name for the stage was shown when using the Spark shell, which meant that there was no way to click on the stage to see more details (e.g., the running tasks). This commit changes the UI back to using the stage name. @pwendell -- let me know if this change was intentional	2013-10-25 14:46:06 -07:00
Tathagata Das	dc9570782a	Merge branch 'apache-master' into transform	2013-10-25 14:22:23 -07:00
Kyle Ellrott	d39ac2eb40	Merge https://github.com/amplab/graphx	2013-10-25 13:16:05 -07:00
Kay Ousterhout	a9c8d83aaf	Properly display the name of a stage in the UI. This fixes a bug introduced by the fix for SPARK-940, which changed the UI to display the RDD name rather than the stage name. As a result, no name for the stage was shown when using the Spark shell, which meant that there was no way to click on the stage to see more details (e.g., the running tasks). This commit changes the UI back to using the stage name.	2013-10-25 12:00:09 -07:00
Reynold Xin	ab35ec4f0f	Merge pull request #110 from pwendell/master Exclude jopt from kafka dependency. Kafka uses an older version of jopt that causes bad conflicts with the version used by spark-perf. It's not easy to remove this downstream because of the way that spark-perf uses Spark (by including a spark assembly as an unmanaged jar). This fixes the problem at its source by just never including it.	2013-10-25 10:16:18 -07:00
Patrick Wendell	af4a529f6e	Exclude jopt from kafka dependency. Kafka uses an older version of jopt that causes bad conflicts with the version used by spark-perf. It's not easy to remove this downstream because of the way that spark-perf uses Spark (by including a spark assembly as an unmanaged jar). This fixes the problem at its source by just never including it.	2013-10-25 09:20:30 -07:00
Reynold Xin	4f2c9438b4	Merge pull request #109 from pwendell/master Adding Java/Java Streaming versions of `repartition` with associated tests	2013-10-24 22:32:02 -07:00
Patrick Wendell	ad5f579cbf	Style fixes	2013-10-24 22:18:53 -07:00
Patrick Wendell	e5f6d5697b	Spacing fix	2013-10-24 22:08:06 -07:00
Patrick Wendell	a351fd4aed	Small spacing fix	2013-10-24 21:16:30 -07:00
Patrick Wendell	31e92b72e3	Adding Java versions and associated tests	2013-10-24 21:14:56 -07:00
Reynold Xin	99ad4a613a	Merge pull request #106 from pwendell/master Add a `repartition` operator. This patch adds an operator called repartition with more straightforward semantics than the current `coalesce` operator. There are a few use cases where this operator is useful: 1. If a user wants to increase the number of partitions in the RDD. This is more common now with streaming. E.g. a user is ingesting data on one node but they want to add more partitions to ensure parallelism of subsequent operations across threads or the cluster. Right now they have to call rdd.coalesce(numSplits, shuffle=true) - that's super confusing. 2. If a user has input data where the number of partitions is not known. E.g. > sc.textFile("some file").coalesce(50).... This is both vague semantically (am I growing or shrinking this RDD) but also, may not work correctly if the base RDD has fewer than 50 partitions. The new operator forces shuffles every time, so it will always produce exactly the number of new partitions. It also throws an exception rather than silently not-working if a bad input is passed. I am currently adding streaming tests (requires refactoring some of the test suite to allow testing at partition granularity), so this is not ready for merge yet. But feedback is welcome.	2013-10-24 17:08:39 -07:00
Patrick Wendell	39f6f75588	Some clean-up of tests	2013-10-24 16:43:33 -07:00
Tathagata Das	e962a6e6ee	Fixed accidental bug.	2013-10-24 15:17:26 -07:00
Patrick Wendell	9423532fab	Removing Java for now	2013-10-24 14:31:34 -07:00
Patrick Wendell	05ac9940ee	Adding tests	2013-10-24 14:31:34 -07:00
Patrick Wendell	2fda84fe3f	Always use a shuffle	2013-10-24 14:31:34 -07:00
Patrick Wendell	08c1a42d7d	Add a `repartition` operator. This patch adds an operator called repartition with more straightforward semantics than the current `coalesce` operator. There are a few use cases where this operator is useful: 1. If a user wants to increase the number of partitions in the RDD. This is more common now with streaming. E.g. a user is ingesting data on one node but they want to add more partitions to ensure parallelism of subsequent operations across threads or the cluster. Right now they have to call rdd.coalesce(numSplits, shuffle=true) - that's super confusing. 2. If a user has input data where the number of partitions is not known. E.g. > sc.textFile("some file").coalesce(50).... This is both vague semantically (am I growing or shrinking this RDD) but also, may not work correctly if the base RDD has fewer than 50 partitions. The new operator forces shuffles every time, so it will always produce exactly the number of new partitions. It also throws an exception rather than silently not-working if a bad input is passed. I am currently adding streaming tests (requires refactoring some of the test suite to allow testing at partition granularity), so this is not ready for merge yet. But feedback is welcome.	2013-10-24 14:31:33 -07:00
Ali Ghodsi	05a0df2b9e	Makes Spark SIMR ready.	2013-10-24 11:59:51 -07:00
Reynold Xin	6f82c42690	Merge pull request #34 from jegonzal/AnalyticsCleanup Analytics Cleanup	2013-10-24 11:09:46 -07:00

... 7 8 9 10 11 ...

5041 commits