Matei Zaharia
64ba6a8c2c
Simplify checkpointing code and RDD class a little:
...
- RDD's getDependencies and getSplits methods are now guaranteed to be
called only once, so subclasses can safely do computation in there
without worrying about caching the results.
- The management of a "splits_" variable that is cleared out when we
checkpoint an RDD is now done in the RDD class.
- A few of the RDD subclasses are simpler.
- CheckpointRDD's compute() method no longer assumes that it is given a
CheckpointRDDSplit -- it can work just as well on a split from the
original RDD, because it only looks at its index. This is important
because things like UnionRDD and ZippedRDD remember the parent's
splits as part of their own and wouldn't work on checkpointed parents.
- RDD.iterator can now reuse cached data if an RDD is computed before it
is checkpointed. It seems like it wouldn't do this before (it always
called iterator() on the CheckpointRDD, which read from HDFS).
2013-01-28 22:30:12 -08:00
Matei Zaharia
a1ecec8d79
Merge branch 'master' of github.com:mesos/spark
2013-01-28 22:08:44 -08:00
Matei Zaharia
f6eb1f0825
Merge pull request #413 from pwendell/stage-logging
...
SPARK-658: Adding logging of stage duration
2013-01-28 22:01:52 -08:00
Patrick Wendell
7ee824e42e
Units from ms -> s
2013-01-28 21:48:32 -08:00
Matei Zaharia
dda2ce017c
Merge pull request #424 from pwendell/logging-cleanup
...
Some DEBUG-level log cleanup.
2013-01-28 21:18:54 -08:00
Patrick Wendell
1f9b486a8b
Some DEBUG-level log cleanup.
...
A few changes to make the DEBUG-level logs less
noisy and more readable.
- Moved a few very frequent messages to Trace
- Changed some BlockManger log messages to make them
more understandable
SPARK-666 #resolve
2013-01-28 20:29:35 -08:00
Imran Rashid
efff7bfb33
add long and float accumulatorparams
2013-01-28 20:23:11 -08:00
Patrick Wendell
501433f1d5
Making submission time a field
2013-01-28 10:45:57 -08:00
Patrick Wendell
c423be7d8e
Renaming stage finished function
2013-01-28 10:45:57 -08:00
Patrick Wendell
07f568e1bf
SPARK-658: Adding logging of stage duration
2013-01-28 10:45:57 -08:00
Matei Zaharia
286f8f876f
Change time unit in MetadataCleaner to seconds
2013-01-28 01:29:27 -08:00
Matei Zaharia
f03d9760fd
Clean up BlockManagerUI a little (make it not be an object, merge with
...
Directives, and bind to a random port)
2013-01-27 23:56:14 -08:00
Matei Zaharia
909850729e
Rename more things from slave to executor
2013-01-27 23:17:20 -08:00
Matei Zaharia
44b4a0f88f
Track workers by executor ID instead of hostname to allow multiple
...
executors per machine and remove the need for multiple IP addresses in
unit tests.
2013-01-27 19:23:49 -08:00
Matei Zaharia
6ad8540b40
Merge pull request #401 from squito/blockmanager_ui
...
Blockmanager ui
2013-01-27 15:51:08 -08:00
Matei Zaharia
49f6472c0f
Merge pull request #418 from woggling/reregister-deadlock
...
Fix BlockManager reregistration deadlock; do BlockManager reregistration more asynchronously
2013-01-26 18:59:02 -08:00
Charles Reiss
58fc6b2bed
Handle duplicate registrations better.
2013-01-26 18:30:44 -08:00
Charles Reiss
ad4232b4da
Fix deadlock in BlockManager reregistration triggered by failed updates.
2013-01-26 18:30:38 -08:00
Josh Rosen
d49cf0e587
Fix JavaRDDLike.flatMap(PairFlatMapFunction) (SPARK-668).
...
This workaround is easier than rewriting JavaRDDLike in Java.
2013-01-26 16:13:18 -08:00
Imran Rashid
49c05608f5
add metadatacleaner for persisentRdd map
2013-01-25 17:04:16 -08:00
Stephen Haberman
8efbda0b17
Call executeOnCompleteCallbacks in more finally blocks.
2013-01-25 14:55:33 -06:00
Imran Rashid
a1d9d1767d
fixup 1cadaa1
, changed api of map
2013-01-25 10:05:26 -08:00
Imran Rashid
1cadaa164e
switch to TimeStampedHashMap for storing persistent Rdds
2013-01-25 09:30:21 -08:00
Imran Rashid
539491bbc3
code reformatting
2013-01-25 09:29:59 -08:00
Patrick Wendell
b6fc6e6752
SPARK-541: Adding a warning for invalid Master URL
...
Right now Spark silently parses master URL's which do not match any
known regex as a Mesos URL. The Mesos error message when an invalid URL gets
passed is really confusing, so this warns the user when the implicit
conversion is happening.
2013-01-24 14:31:23 -08:00
Matei Zaharia
0fe173a3a5
Merge pull request #410 from rxin/splitpruningrdd
...
Added a clearDependencies method in PartitionPruningRDD.
2013-01-23 23:10:15 -08:00
Reynold Xin
67a43bc7e6
Added a clearDependencies method in PartitionPruningRDD.
2013-01-23 23:06:52 -08:00
Matei Zaharia
fe5e4812fc
Merge pull request #409 from rxin/splitpruningrdd
...
Added pruntSplits method to RDD.
2013-01-23 22:23:22 -08:00
Reynold Xin
c109f29c97
Updated PruneDependency to change "split" to "partition".
2013-01-23 22:22:03 -08:00
Reynold Xin
eedc542a02
Removed pruneSplits method in RDD and renamed SplitsPruningRDD to
...
PartitionPruningRDD.
2013-01-23 22:14:23 -08:00
Reynold Xin
81004b967e
Marked prev RDD as transient in SplitsPruningRDD.
2013-01-23 21:54:27 -08:00
Reynold Xin
636e912f32
Created a PruneDependency to properly assign dependency for
...
SplitsPruningRDD.
2013-01-23 21:21:55 -08:00
Matei Zaharia
548856a224
Merge remote-tracking branch 'woggling/remove-machines'
...
Conflicts:
core/src/main/scala/spark/scheduler/DAGScheduler.scala
2013-01-23 15:44:17 -08:00
Reynold Xin
eb222b7206
Added pruntSplits method to RDD.
2013-01-23 15:29:02 -08:00
Matei Zaharia
1dd82743e0
Fix compile error due to cherry-pick
2013-01-23 13:07:27 -08:00
Imran Rashid
e1985bfa04
be sure to set class loader of kryo instances
2013-01-23 12:51:09 -08:00
Charles Reiss
be4a115a7e
Clarify TODO.
2013-01-23 12:48:45 -08:00
Matei Zaharia
1a3aeeca23
Merge pull request #407 from woggling/no-cache-tracker
...
Eliminate CacheTracker
2013-01-23 12:28:48 -08:00
Charles Reiss
e1027ca639
Actually add CacheManager.
2013-01-23 12:22:11 -08:00
Matei Zaharia
4147e1d47b
Merge pull request #406 from tdas/master
...
Changed StorageLevel and BlockManagerId API to prevent duplication in memory
2013-01-23 12:18:31 -08:00
Matei Zaharia
4d77d554e1
Merge pull request #394 from JoshRosen/add_file_fix
...
Add SparkFiles.get() API to access files added through addFile().
2013-01-23 12:16:30 -08:00
Josh Rosen
ae2ed2947d
Allow PySpark's SparkFiles to be used from driver
...
Fix minor documentation formatting issues.
2013-01-23 10:58:50 -08:00
Tathagata Das
79d55700ce
One more fix. Made even default constructor of BlockManagerId private to prevent such problems in the future.
2013-01-23 01:57:09 -08:00
Charles Reiss
d209b6b764
Extra debugging from hostLost()
2013-01-23 01:35:14 -08:00
Charles Reiss
9a27062260
Force generation increment after shuffle map stage
2013-01-23 01:34:44 -08:00
Tathagata Das
155f31398d
Made StorageLevel constructor private, and added StorageLevels.create() to the Java API. Updates scala and java programming guides.
2013-01-23 01:10:26 -08:00
Tathagata Das
5e11f1e51f
Modified StorageLevel API to ensure zero duplicate objects.
2013-01-22 23:42:53 -08:00
Tathagata Das
bacade6caf
Modified BlockManagerId API to ensure zero duplicate objects. Fixed BlockManagerId testcase in BlockManagerTestSuite.
2013-01-22 22:55:26 -08:00
Charles Reiss
2849931000
Eliminate CacheTracker.
...
Replaces DAGScheduler's queries of CacheTracker with BlockManagerMaster
queries.
Adds CacheManager to locally coordinate computation of cached RDDs.
2013-01-22 22:19:30 -08:00
Matei Zaharia
ebaa8f6519
Merge remote-tracking branch 'stephenh/cleanup'
...
Conflicts:
core/src/main/scala/spark/scheduler/local/LocalScheduler.scala
2013-01-22 21:05:45 -08:00