Stephen Haberman
63fe225587
Simplify SubtractedRDD in preparation from subtractByKey.
2013-03-13 17:17:34 -05:00
Imran Rashid
8fef5b9c5f
refactoring of TaskMetrics
2013-03-03 16:34:04 -08:00
Imran Rashid
d36abdb053
Merge branch 'master' into stageInfo
2013-03-03 15:20:46 -08:00
Tathagata Das
dff53d1b94
Merge branch 'mesos-master' into streaming
2013-02-24 12:17:22 -08:00
Stephen Haberman
f442e7d83c
Update for split->partition rename.
2013-02-24 00:27:14 -06:00
Stephen Haberman
cec87a0653
Merge branch 'master' into subtract
2013-02-23 23:27:55 -06:00
Matei Zaharia
d942d39072
Handle exceptions in RecordReader.close() better (suggested by Jim
...
Donahue)
2013-02-23 11:19:07 -08:00
Imran Rashid
0f37b43b40
make the ShuffleFetcher responsible for collecting shuffle metrics, which gives us metrics for CoGroupedRDD and ShuffledRDD
2013-02-21 16:56:28 -08:00
Imran Rashid
ff127cfcd3
Merge branch 'master' into stageInfo
...
Conflicts:
core/src/main/scala/spark/SparkContext.scala
core/src/main/scala/spark/storage/BlockManager.scala
2013-02-21 15:16:21 -08:00
Imran Rashid
baab23abdf
TaskContext does not hold a reference to Task; instead, it has a shared instance of TaskMetrics with Task
2013-02-21 14:13:01 -08:00
Tathagata Das
334ab92441
Fixed bug in CheckpointSuite
2013-02-20 10:26:36 -08:00
Tathagata Das
1cb725e417
Merge branch 'mesos-master' into streaming
2013-02-20 09:55:35 -08:00
Tathagata Das
fb9956256d
Merge branch 'mesos-master' into streaming
...
Conflicts:
core/src/main/scala/spark/rdd/CheckpointRDD.scala
streaming/src/main/scala/spark/streaming/dstream/ReducedWindowedDStream.scala
2013-02-20 09:01:29 -08:00
Reynold Xin
130f704baf
Added a method to create PartitionPruningRDD.
2013-02-19 16:03:52 -08:00
Matei Zaharia
06e5e6627f
Renamed "splits" to "partitions"
2013-02-17 22:13:26 -08:00
Matei Zaharia
340cc54e47
Merge pull request #471 from stephenh/parallelrdd
...
Move ParallelCollection into spark.rdd package.
2013-02-16 16:39:15 -08:00
Stephen Haberman
924f47dd11
Add RDD.subtract.
...
Instead of reusing the cogroup primitive, this adds a SubtractedRDD
that knows it only needs to keep rdd1's values (per split) in memory.
2013-02-16 13:38:42 -06:00
Stephen Haberman
e7713adb99
Move ParallelCollection into spark.rdd package.
2013-02-16 13:20:48 -06:00
Stephen Haberman
ae2234687d
Make CoGroupedRDDs explicitly have the same key type.
2013-02-16 13:10:31 -06:00
Imran Rashid
bffee929ab
Merge branch 'master' into stageInfo
...
Conflicts:
core/src/main/scala/spark/rdd/CoGroupedRDD.scala
core/src/main/scala/spark/storage/BlockManager.scala
2013-02-15 10:35:04 -08:00
Imran Rashid
e9f53ec0ea
undo chnage to onCompleteCallbacks
2013-02-11 09:36:49 -08:00
Tathagata Das
16baea62bc
Fixed bug in CheckpointRDD to prevent exception when the original RDD had zero splits.
2013-02-10 19:14:49 -08:00
Imran Rashid
b7d9e24394
use TaskMetrics to gather all stats; lots of plumbing to get it all the way back to driver
2013-02-10 14:18:52 -08:00
Stephen Haberman
2a18cd826c
Add back return types.
2013-02-09 10:12:04 -06:00
Stephen Haberman
a9c8d53cfa
Clean up RDDs, mainly to use getSplits.
...
Also made sure clearDependencies() was calling super, to ensure
the getSplits/getDependencies vars in the RDD base class get
cleaned up.
2013-02-05 22:16:59 -06:00
Stephen Haberman
f4d43cb43e
Remove unneeded zipWithIndex.
...
Also rename r->rdd and remove unneeded extra type info.
2013-02-05 21:26:45 -06:00
Imran Rashid
379564c7e0
setup plumbing to get task metrics; lots of unfinished parts, but basic flow in place
2013-02-05 18:30:21 -08:00
Stephen Haberman
8bd0e888f3
Inline mergePair to look more like the narrow dep branch.
...
No functionality changes, I think this is just more consistent
given mergePair isn't called multiple times/recursive.
Also added a comment to explain the usual case of having two parent RDDs.
2013-02-05 17:50:25 -06:00
Imran Rashid
e319ac74c1
cogrouped RDD stores the amount of time taken to read shuffle data in each task
2013-02-05 10:18:16 -08:00
Imran Rashid
295b534398
task context keeps a handle on Task -- giant hack, temporary for tracking shuffle times & amount
2013-02-05 10:18:16 -08:00
Reynold Xin
f9af9cee6f
Moved PruneDependency into PartitionPruningRDD.scala.
2013-02-01 00:02:46 -08:00
Reynold Xin
6289d9654e
Removed the TODO comment from PartitionPruningRDD.
2013-01-31 17:49:36 -08:00
Reynold Xin
5b0fc265c2
Changed PartitionPruningRDD's split to make sure it returns the correct
...
split index.
2013-01-31 17:48:39 -08:00
Matei Zaharia
64ba6a8c2c
Simplify checkpointing code and RDD class a little:
...
- RDD's getDependencies and getSplits methods are now guaranteed to be
called only once, so subclasses can safely do computation in there
without worrying about caching the results.
- The management of a "splits_" variable that is cleared out when we
checkpoint an RDD is now done in the RDD class.
- A few of the RDD subclasses are simpler.
- CheckpointRDD's compute() method no longer assumes that it is given a
CheckpointRDDSplit -- it can work just as well on a split from the
original RDD, because it only looks at its index. This is important
because things like UnionRDD and ZippedRDD remember the parent's
splits as part of their own and wouldn't work on checkpointed parents.
- RDD.iterator can now reuse cached data if an RDD is computed before it
is checkpointed. It seems like it wouldn't do this before (it always
called iterator() on the CheckpointRDD, which read from HDFS).
2013-01-28 22:30:12 -08:00
Reynold Xin
67a43bc7e6
Added a clearDependencies method in PartitionPruningRDD.
2013-01-23 23:06:52 -08:00
Reynold Xin
c109f29c97
Updated PruneDependency to change "split" to "partition".
2013-01-23 22:22:03 -08:00
Reynold Xin
eedc542a02
Removed pruneSplits method in RDD and renamed SplitsPruningRDD to
...
PartitionPruningRDD.
2013-01-23 22:14:23 -08:00
Reynold Xin
81004b967e
Marked prev RDD as transient in SplitsPruningRDD.
2013-01-23 21:54:27 -08:00
Reynold Xin
636e912f32
Created a PruneDependency to properly assign dependency for
...
SplitsPruningRDD.
2013-01-23 21:21:55 -08:00
Reynold Xin
eb222b7206
Added pruntSplits method to RDD.
2013-01-23 15:29:02 -08:00
Stephen Haberman
ffd1623595
Minor cleanup.
2013-01-21 15:55:46 -06:00
Tathagata Das
cd1521cfdb
Merge branch 'master' into streaming
...
Conflicts:
core/src/main/scala/spark/rdd/CoGroupedRDD.scala
core/src/main/scala/spark/rdd/FilteredRDD.scala
docs/_layouts/global.html
docs/index.md
run
2013-01-15 12:08:51 -08:00
Tathagata Das
131be5d62e
Fixed bug in RDD checkpointing.
2013-01-14 03:28:25 -08:00
Matei Zaharia
cb867e9ffb
Merge branch 'master' of github.com:mesos/spark
2013-01-13 19:34:32 -08:00
Matei Zaharia
72408e8dfa
Make filter preserve partitioner info, since it can
2013-01-13 19:34:07 -08:00
Reynold Xin
be7166146b
Removed the use of getOrElse to avoid Scala wrapper for every call.
2013-01-13 15:27:28 -08:00
Reynold Xin
bd336f5f40
Changed CoGroupRDD's hash map from Scala to Java.
2013-01-10 17:13:04 -08:00
Tathagata Das
3b0a3b89ac
Added better docs for RDDCheckpointData
2013-01-07 14:55:49 -08:00
Tathagata Das
7e0271b438
Refactored a whole lot to push all DStreams into the spark.streaming.dstream package.
2012-12-30 15:19:55 -08:00
Tathagata Das
836042bb9f
Merge branch 'dev-checkpoint' of github.com:radlab/spark into dev-merge
...
Conflicts:
core/src/main/scala/spark/ParallelCollection.scala
core/src/main/scala/spark/RDD.scala
core/src/main/scala/spark/rdd/BlockRDD.scala
core/src/main/scala/spark/rdd/CartesianRDD.scala
core/src/main/scala/spark/rdd/CoGroupedRDD.scala
core/src/main/scala/spark/rdd/CoalescedRDD.scala
core/src/main/scala/spark/rdd/FilteredRDD.scala
core/src/main/scala/spark/rdd/FlatMappedRDD.scala
core/src/main/scala/spark/rdd/GlommedRDD.scala
core/src/main/scala/spark/rdd/HadoopRDD.scala
core/src/main/scala/spark/rdd/MapPartitionsRDD.scala
core/src/main/scala/spark/rdd/MapPartitionsWithSplitRDD.scala
core/src/main/scala/spark/rdd/MappedRDD.scala
core/src/main/scala/spark/rdd/PipedRDD.scala
core/src/main/scala/spark/rdd/SampledRDD.scala
core/src/main/scala/spark/rdd/ShuffledRDD.scala
core/src/main/scala/spark/rdd/UnionRDD.scala
core/src/main/scala/spark/scheduler/ResultTask.scala
core/src/test/scala/spark/CheckpointSuite.scala
2012-12-26 19:09:01 -08:00