Commit graph

2267 commits

Author SHA1 Message Date
Joseph E. Gonzalez bb58aa5330 Added some stub code to address the case where a vertex could occur multiple times in the vertex table or where a vertex in the edge list may not appear in the vertex table.
Moving IndexedRDD into the graphx source tree and removing dependencies in /core.
2013-10-18 18:15:32 -07:00
Ankur Dave 971f824014 Revert unnecessary changes to core
While benchmarking, we accidentally committed some unnecessary changes
to core such as adding logging. These changes make it more difficult to
merge from Spark upstream, so this commit reverts them.
2013-10-18 16:07:38 -07:00
Joseph E. Gonzalez 1856b37e9d Merge branch 'master' of https://github.com/apache/incubator-spark into indexedrdd_graphx 2013-10-18 12:21:19 -07:00
Joseph E. Gonzalez 3f3d28c73f Switching from Seq to IndexedSeq 2013-10-17 19:55:36 -07:00
Joseph E. Gonzalez 9a03c5fe28 This commit accomplishes three goals:
1) Further simplification of the IndexedRDD operations (eliminating some)
 2) Aggressive reuse of HashMaps
 3) Pipelining join operations within indexedrdd
2013-10-17 19:01:48 -07:00
Kay Ousterhout 809f547633 Fixed unit tests 2013-10-16 23:16:12 -07:00
Kay Ousterhout ec512583ab Removed TaskSchedulerListener interface.
The interface was used only by the DAG scheduler (so it wasn't necessary
to define the additional interface), and the naming makes it very
confusing when reading the code (because "listener" was used
to describe the DAG scheduler, rather than SparkListeners, which
implement a nearly-identical interface but serve a different
function).
2013-10-16 16:57:42 -07:00
Joseph E. Gonzalez 57ac9073ae Introducing unique indexedrdd and adding numerous specialized joins 2013-10-16 04:08:22 -07:00
Joseph E. Gonzalez 59700c0c2a switched to more efficienct implementation of reduce by key 2013-10-16 00:18:37 -07:00
Joseph E. Gonzalez 80e4ec3278 IndexedRDD now only supports unique keys 2013-10-16 00:16:44 -07:00
Matei Zaharia 4e46fde818 Merge pull request #62 from harveyfeng/master
Make TaskContext's stageId publicly accessible.
2013-10-15 23:14:27 -07:00
Harvey Feng 65b46236e7 Proper formatting for SparkHadoopWriter class extensions. 2013-10-15 21:51:52 -07:00
Matei Zaharia 6dbd2208ff Merge pull request #34 from kayousterhout/rename
Renamed StandaloneX to CoarseGrainedX.

(as suggested by @rxin here https://github.com/apache/incubator-spark/pull/14)

The previous names were confusing because the components weren't just
used in Standalone mode.  The scheduler used for Standalone
mode is called SparkDeploySchedulerBackend, so referring to the base class
as StandaloneSchedulerBackend was misleading.
2013-10-15 19:02:57 -07:00
Joseph E. Gonzalez 3cb6dffce0 adding indexed reduce by key 2013-10-15 18:55:06 -07:00
Harvey Feng c4c76e37a7 Fix line length > 100 chars in SparkHadoopWriter 2013-10-15 18:35:59 -07:00
Harvey Feng 5b8083fee5 Make TaskContext's stageId publicly accessible. 2013-10-15 18:06:37 -07:00
Joseph E. Gonzalez 1b22eef744 Merge branch 'master' of https://github.com/apache/incubator-spark into indexedrdd_graphx 2013-10-15 16:15:19 -07:00
Kay Ousterhout f95a2be045 Fixed build error after merging in master 2013-10-15 14:51:37 -07:00
Kay Ousterhout acc7638f7c Merge remote branch 'upstream/master' into rename 2013-10-15 14:43:56 -07:00
Kay Ousterhout 707ad8cc4f Unified daemon thread pools 2013-10-15 14:23:43 -07:00
Reynold Xin f41feb7b33 Bump up logging level to warning for failed tasks. 2013-10-14 23:35:32 -07:00
Joseph E. Gonzalez 6a13d02319 Merging chagnes for IndexedRDD branch 2013-10-14 23:30:36 -07:00
Joseph E. Gonzalez 6700ccd7d5 Introducing indexedrdd
The rest of indexed rdd
2013-10-14 23:30:35 -07:00
Joseph E. Gonzalez 4755f42d78 moving indexedrdd to the correct location 2013-10-14 23:13:27 -07:00
Joseph E. Gonzalez ef7c369092 merged with upstream changes 2013-10-14 22:56:42 -07:00
Reynold Xin 9cd8786e4a Merge branch 'master' of github.com:apache/incubator-spark into kill
Conflicts:
	core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala
2013-10-14 21:51:30 -07:00
Joseph E. Gonzalez bf059691f0 Adding a few extra comments. 2013-10-14 19:59:11 -07:00
Joseph E. Gonzalez 11a44d0ec9 Introducing indexedrdd
The rest of indexed rdd
2013-10-14 19:46:42 -07:00
Reynold Xin 3b11f43e36 Merge pull request #57 from aarondav/bid
Refactor BlockId into an actual type

Converts all of our BlockId strings into actual BlockId types. Here are some advantages of doing this now:

+ Type safety
+  Code clarity - it's now obvious what the key of a shuffle or rdd block is, for instance. Additionally, appearing in tuple/map type signatures is a big readability bonus. A Seq[(String, BlockStatus)] is not very clear. Further, we can now use more Scala features, like matching on BlockId types.
+ Explicit usage - we can now formally tell where various BlockIds are being used (without doing string searches); this makes updating current BlockIds a much clearer process, and compiler-supported.
  (I'm looking at you, shuffle file consolidation.)
+ It will only get harder to make this change as time goes on.

Downside is, of course, that this is a very invasive change touching a lot of different files, which will inevitably lead to merge conflicts for many.
2013-10-14 14:20:01 -07:00
Aaron Davidson 4a45019fb0 Address Matei's comments 2013-10-14 00:24:17 -07:00
Joseph E. Gonzalez 637b67da56 merging changes from upstream benchmarking branch 2013-10-13 19:54:09 -07:00
Joseph E. Gonzalez 494472a6cc Integrated IndexedRDD into graph design. 2013-10-13 19:42:32 -07:00
Aaron Davidson da896115ec Change BlockId filename to name + rest of Patrick's comments 2013-10-13 11:15:02 -07:00
Aaron Davidson d60352283c Add unit test and address rest of Reynold's comments 2013-10-12 22:45:15 -07:00
Aaron Davidson a395911138 Refactor BlockId into an actual type
This is an unfortunately invasive change which converts all of our BlockId
strings into actual BlockId types. Here are some advantages of doing this now:

+ Type safety

+ Code clarity - it's now obvious what the key of a shuffle or rdd block is,
  for instance. Additionally, appearing in tuple/map type signatures is a big
  readability bonus. A Seq[(String, BlockStatus)] is not very clear.
  Further, we can now use more Scala features, like matching on BlockId types.

+ Explicit usage - we can now formally tell where various BlockIds are being used
  (without doing string searches); this makes updating current BlockIds a much
  clearer process, and compiler-supported.
  (I'm looking at you, shuffle file consolidation.)

+ It will only get harder to make this change as time goes on.

Since this touches a lot of files, it'd be best to either get this patch
in quickly or throw it on the ground to avoid too many secondary merge conflicts.
2013-10-12 22:44:57 -07:00
Reynold Xin 99796904ae Merge pull request #52 from harveyfeng/hadoop-closure
Add an optional closure parameter to HadoopRDD instantiation to use when creating local JobConfs.

Having HadoopRDD accept this optional closure eliminates the need for the HadoopFileRDD added earlier. It makes the HadoopRDD more general, in that the caller can specify any JobConf initialization flow.
2013-10-12 21:23:26 -07:00
Harvey Feng 6c32aab87d Remove the new HadoopRDD constructor from SparkContext API, plus some minor style changes. 2013-10-12 21:02:08 -07:00
Reynold Xin 88866ea9c9 Fixed PairRDDFunctionsSuite after removing InterruptibleRDD. 2013-10-12 20:05:23 -07:00
Reynold Xin 6b288b75d4 Job cancellation: address Matei's code review feedback. 2013-10-12 15:53:31 -07:00
Reynold Xin ab0940f0c2 Job cancellation: addressed code review feedback round 2 from Kay. 2013-10-11 18:15:04 -07:00
Reynold Xin 97ffebbe87 Fixed dagscheduler suite because of a logging message change. 2013-10-11 16:18:22 -07:00
Reynold Xin a61cf40ab9 Job cancellation: addressed code review feedback from Kay. 2013-10-11 15:58:14 -07:00
Dan Crankshaw c4a23f95c3 Updated code so benchmarks actually run. 2013-10-11 22:57:43 +00:00
Matei Zaharia fb25f32300 Merge pull request #53 from witgo/master
Add a zookeeper compile dependency to fix build in maven

 Add a zookeeper compile dependency to fix build in maven
2013-10-11 15:44:43 -07:00
Matei Zaharia d6ead47809 Merge pull request #32 from mridulm/master
Address review comments, move to incubator spark

Also includes a small fix to speculative execution.

<edit> Continued from https://github.com/mesos/spark/pull/914 </edit>
2013-10-11 15:43:01 -07:00
Reynold Xin e2047d3927 Making takeAsync and collectAsync deterministic. 2013-10-11 13:04:45 -07:00
Reynold Xin 09f7609254 Properly handle interrupted exception in FutureAction. 2013-10-11 11:20:15 -07:00
LiGuoqiang fc60c412ab Add a zookeeper compile dependency to fix build in maven 2013-10-11 16:31:47 +08:00
Reynold Xin 42fb1df694 Merge branch 'master' of github.com:apache/incubator-spark into kill
Conflicts:
	core/src/main/scala/org/apache/spark/rdd/CoGroupedRDD.scala
2013-10-10 23:48:05 -07:00
Reynold Xin d9e724e756 Fixed the broken local scheduler test. 2013-10-10 23:08:13 -07:00