Commit graph

4772 commits

Author SHA1 Message Date
Harvey Feng 5b8083fee5 Make TaskContext's stageId publicly accessible. 2013-10-15 18:06:37 -07:00
Joseph E. Gonzalez 9058f261fe Addressing issue where statistics are not computed correctly 2013-10-15 17:39:09 -07:00
Joseph E. Gonzalez 1b22eef744 Merge branch 'master' of https://github.com/apache/incubator-spark into indexedrdd_graphx 2013-10-15 16:15:19 -07:00
Joseph E. Gonzalez 194bb03d16 Resolved closure capture issues by addressing capture through implicit variables. 2013-10-15 15:10:41 -07:00
Kay Ousterhout f95a2be045 Fixed build error after merging in master 2013-10-15 14:51:37 -07:00
Kay Ousterhout acc7638f7c Merge remote branch 'upstream/master' into rename 2013-10-15 14:43:56 -07:00
Kay Ousterhout 707ad8cc4f Unified daemon thread pools 2013-10-15 14:23:43 -07:00
Joseph E. Gonzalez 7241cf1632 Updating unit tests. 2013-10-15 14:18:03 -07:00
Matei Zaharia 3249e0e90d Merge pull request #59 from rxin/warning
Bump up logging level to warning for failed tasks.
2013-10-15 14:12:33 -07:00
Joseph E. Gonzalez 345e1e94cc Still trying to resolve issues with capture. 2013-10-15 14:01:38 -07:00
Shivaram Venkataraman 051cd960d9 Merge branch 'master' of https://github.com/apache/incubator-spark into sbt-assembly-deps 2013-10-15 13:26:40 -07:00
Joseph E. Gonzalez b64337ec40 Trying to resolve issues with closure capture. 2013-10-15 13:02:17 -07:00
Reynold Xin 678dec6680 Merge pull request #58 from hsaputra/update-pom-asf
Update pom.xml to use version 13 of the ASF parent pom

Update pom.xml to use version 13 of the ASF parent pom.
Add mailingList element to pom.xml.
2013-10-15 10:51:46 -07:00
Joseph E. Gonzalez e7d0320000 More refactoring and documentating including renaming data to attr for vertex and edge data and eliminating the vertex type. 2013-10-15 02:20:06 -07:00
KarthikTunga 6c6b146fc2 Merge branch 'master' of https://github.com/apache/incubator-spark
Updating local branch
2013-10-15 00:46:35 -07:00
KarthikTunga d2c86e7188 SPARK-627 - reading --config argument 2013-10-15 00:35:44 -07:00
Reynold Xin f41feb7b33 Bump up logging level to warning for failed tasks. 2013-10-14 23:35:32 -07:00
Joseph E. Gonzalez 6a13d02319 Merging chagnes for IndexedRDD branch 2013-10-14 23:30:36 -07:00
Joseph E. Gonzalez 6700ccd7d5 Introducing indexedrdd
The rest of indexed rdd
2013-10-14 23:30:35 -07:00
Joseph E. Gonzalez 4755f42d78 moving indexedrdd to the correct location 2013-10-14 23:13:27 -07:00
Henry Saputra 3fed3e2283 Update pom.xml to use version 13 of the ASF parent pom and add mailingLists element. 2013-10-14 23:10:54 -07:00
Joseph E. Gonzalez ef7c369092 merged with upstream changes 2013-10-14 22:56:42 -07:00
Patrick Wendell e33b1839e2 Merge pull request #29 from rxin/kill
Job killing

Moving https://github.com/mesos/spark/pull/935 here

The high level idea is to have an "interrupted" field in TaskContext, and a task should check that flag to determine if its execution should continue. For convenience, I provide an InterruptibleIterator which wraps around a normal iterator but checks for the interrupted flag. I also provide an InterruptibleRDD that wraps around an existing RDD.

As part of this pull request, I added an AsyncRDDActions class that provides a number of RDD actions that return a FutureJob (extending scala.concurrent.Future). The FutureJob can be used to kill the job execution, or waits until the job finishes.

This is NOT ready for merging yet. Remaining TODOs:

1. Add unit tests
2. Add job killing functionality for local scheduler (current job killing functionality only works in cluster scheduler)

-------------

Update on Oct 10, 2013:

This is ready!

Related future work:
- Figure out how to handle the job triggered by RangePartitioner (this one is tough; might become future work)
- Java API
- Python API
2013-10-14 22:25:47 -07:00
Reynold Xin 9cd8786e4a Merge branch 'master' of github.com:apache/incubator-spark into kill
Conflicts:
	core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala
2013-10-14 21:51:30 -07:00
Joseph E. Gonzalez bf059691f0 Adding a few extra comments. 2013-10-14 19:59:11 -07:00
Joseph E. Gonzalez 11a44d0ec9 Introducing indexedrdd
The rest of indexed rdd
2013-10-14 19:46:42 -07:00
Joseph E. Gonzalez 67bb39c54b Removing extraneous code 2013-10-14 18:49:05 -07:00
Reynold Xin 3b11f43e36 Merge pull request #57 from aarondav/bid
Refactor BlockId into an actual type

Converts all of our BlockId strings into actual BlockId types. Here are some advantages of doing this now:

+ Type safety
+  Code clarity - it's now obvious what the key of a shuffle or rdd block is, for instance. Additionally, appearing in tuple/map type signatures is a big readability bonus. A Seq[(String, BlockStatus)] is not very clear. Further, we can now use more Scala features, like matching on BlockId types.
+ Explicit usage - we can now formally tell where various BlockIds are being used (without doing string searches); this makes updating current BlockIds a much clearer process, and compiler-supported.
  (I'm looking at you, shuffle file consolidation.)
+ It will only get harder to make this change as time goes on.

Downside is, of course, that this is a very invasive change touching a lot of different files, which will inevitably lead to merge conflicts for many.
2013-10-14 14:20:01 -07:00
Aaron Davidson 4a45019fb0 Address Matei's comments 2013-10-14 00:24:17 -07:00
Joseph E. Gonzalez bff223454a trying to address issues with GraphImpl being caught in closures. 2013-10-13 22:27:10 -07:00
Joseph E. Gonzalez f89e6e5cbf removing benchmark code 2013-10-13 20:45:01 -07:00
Joseph E. Gonzalez 141c22e28c merging in master changes 2013-10-13 20:43:23 -07:00
Joseph E. Gonzalez 637b67da56 merging changes from upstream benchmarking branch 2013-10-13 19:54:09 -07:00
Joseph E. Gonzalez 494472a6cc Integrated IndexedRDD into graph design. 2013-10-13 19:42:32 -07:00
Aaron Davidson da896115ec Change BlockId filename to name + rest of Patrick's comments 2013-10-13 11:15:02 -07:00
Aaron Davidson d60352283c Add unit test and address rest of Reynold's comments 2013-10-12 22:45:15 -07:00
Aaron Davidson a395911138 Refactor BlockId into an actual type
This is an unfortunately invasive change which converts all of our BlockId
strings into actual BlockId types. Here are some advantages of doing this now:

+ Type safety

+ Code clarity - it's now obvious what the key of a shuffle or rdd block is,
  for instance. Additionally, appearing in tuple/map type signatures is a big
  readability bonus. A Seq[(String, BlockStatus)] is not very clear.
  Further, we can now use more Scala features, like matching on BlockId types.

+ Explicit usage - we can now formally tell where various BlockIds are being used
  (without doing string searches); this makes updating current BlockIds a much
  clearer process, and compiler-supported.
  (I'm looking at you, shuffle file consolidation.)

+ It will only get harder to make this change as time goes on.

Since this touches a lot of files, it'd be best to either get this patch
in quickly or throw it on the ground to avoid too many secondary merge conflicts.
2013-10-12 22:44:57 -07:00
Reynold Xin 99796904ae Merge pull request #52 from harveyfeng/hadoop-closure
Add an optional closure parameter to HadoopRDD instantiation to use when creating local JobConfs.

Having HadoopRDD accept this optional closure eliminates the need for the HadoopFileRDD added earlier. It makes the HadoopRDD more general, in that the caller can specify any JobConf initialization flow.
2013-10-12 21:23:26 -07:00
Harvey Feng 6c32aab87d Remove the new HadoopRDD constructor from SparkContext API, plus some minor style changes. 2013-10-12 21:02:08 -07:00
Reynold Xin 88866ea9c9 Fixed PairRDDFunctionsSuite after removing InterruptibleRDD. 2013-10-12 20:05:23 -07:00
Reynold Xin 6b288b75d4 Job cancellation: address Matei's code review feedback. 2013-10-12 15:53:31 -07:00
jerryshao c23cd72b4b Upgrade Kafka 0.7.2 to Kafka 0.8.0-beta1 for Spark Streaming 2013-10-12 20:00:42 +08:00
Dan Crankshaw 1a961dd1f2 Fixed connected components CL params 2013-10-12 01:47:38 +00:00
Shivaram Venkataraman c441904bce Add a comment and exclude tools 2013-10-11 18:23:15 -07:00
Reynold Xin ab0940f0c2 Job cancellation: addressed code review feedback round 2 from Kay. 2013-10-11 18:15:04 -07:00
Dan Crankshaw 1e5535cfcf Added connected components back 2013-10-11 16:38:52 -07:00
Reynold Xin 97ffebbe87 Fixed dagscheduler suite because of a logging message change. 2013-10-11 16:18:22 -07:00
Reynold Xin dca80094d3 Merge pull request #54 from aoiwelle/remove_unused_imports
Remove unnecessary mutable imports

It appears that the imports aren't necessary here.
2013-10-11 16:08:15 -07:00
Dan Crankshaw 543a54dffa Tried to fix some indenting 2013-10-11 16:07:49 -07:00
Reynold Xin a61cf40ab9 Job cancellation: addressed code review feedback from Kay. 2013-10-11 15:58:14 -07:00