Commit graph

890 commits

Author SHA1 Message Date
Matei Zaharia 64ba6a8c2c Simplify checkpointing code and RDD class a little:
- RDD's getDependencies and getSplits methods are now guaranteed to be
  called only once, so subclasses can safely do computation in there
  without worrying about caching the results.

- The management of a "splits_" variable that is cleared out when we
  checkpoint an RDD is now done in the RDD class.

- A few of the RDD subclasses are simpler.

- CheckpointRDD's compute() method no longer assumes that it is given a
  CheckpointRDDSplit -- it can work just as well on a split from the
  original RDD, because it only looks at its index. This is important
  because things like UnionRDD and ZippedRDD remember the parent's
  splits as part of their own and wouldn't work on checkpointed parents.

- RDD.iterator can now reuse cached data if an RDD is computed before it
  is checkpointed. It seems like it wouldn't do this before (it always
  called iterator() on the CheckpointRDD, which read from HDFS).
2013-01-28 22:30:12 -08:00
Matei Zaharia a1ecec8d79 Merge branch 'master' of github.com:mesos/spark 2013-01-28 22:08:44 -08:00
Matei Zaharia f6eb1f0825 Merge pull request #413 from pwendell/stage-logging
SPARK-658: Adding logging of stage duration
2013-01-28 22:01:52 -08:00
Patrick Wendell 7ee824e42e Units from ms -> s 2013-01-28 21:48:32 -08:00
Matei Zaharia dda2ce017c Merge pull request #424 from pwendell/logging-cleanup
Some DEBUG-level log cleanup.
2013-01-28 21:18:54 -08:00
Patrick Wendell 1f9b486a8b Some DEBUG-level log cleanup.
A few changes to make the DEBUG-level logs less
noisy and more readable.

- Moved a few very frequent messages to Trace
- Changed some BlockManger log messages to make them
  more understandable

SPARK-666 #resolve
2013-01-28 20:29:35 -08:00
Imran Rashid efff7bfb33 add long and float accumulatorparams 2013-01-28 20:23:11 -08:00
Patrick Wendell 501433f1d5 Making submission time a field 2013-01-28 10:45:57 -08:00
Patrick Wendell c423be7d8e Renaming stage finished function 2013-01-28 10:45:57 -08:00
Patrick Wendell 07f568e1bf SPARK-658: Adding logging of stage duration 2013-01-28 10:45:57 -08:00
Matei Zaharia 286f8f876f Change time unit in MetadataCleaner to seconds 2013-01-28 01:29:27 -08:00
Matei Zaharia f03d9760fd Clean up BlockManagerUI a little (make it not be an object, merge with
Directives, and bind to a random port)
2013-01-27 23:56:14 -08:00
Matei Zaharia 909850729e Rename more things from slave to executor 2013-01-27 23:17:20 -08:00
Matei Zaharia 44b4a0f88f Track workers by executor ID instead of hostname to allow multiple
executors per machine and remove the need for multiple IP addresses in
unit tests.
2013-01-27 19:23:49 -08:00
Matei Zaharia 6ad8540b40 Merge pull request #401 from squito/blockmanager_ui
Blockmanager ui
2013-01-27 15:51:08 -08:00
Matei Zaharia 49f6472c0f Merge pull request #418 from woggling/reregister-deadlock
Fix BlockManager reregistration deadlock; do BlockManager reregistration more asynchronously
2013-01-26 18:59:02 -08:00
Charles Reiss 58fc6b2bed Handle duplicate registrations better. 2013-01-26 18:30:44 -08:00
Charles Reiss ad4232b4da Fix deadlock in BlockManager reregistration triggered by failed updates. 2013-01-26 18:30:38 -08:00
Josh Rosen d49cf0e587 Fix JavaRDDLike.flatMap(PairFlatMapFunction) (SPARK-668).
This workaround is easier than rewriting JavaRDDLike in Java.
2013-01-26 16:13:18 -08:00
Imran Rashid 49c05608f5 add metadatacleaner for persisentRdd map 2013-01-25 17:04:16 -08:00
Stephen Haberman 8efbda0b17 Call executeOnCompleteCallbacks in more finally blocks. 2013-01-25 14:55:33 -06:00
Imran Rashid a1d9d1767d fixup 1cadaa1, changed api of map 2013-01-25 10:05:26 -08:00
Imran Rashid 1cadaa164e switch to TimeStampedHashMap for storing persistent Rdds 2013-01-25 09:30:21 -08:00
Imran Rashid 539491bbc3 code reformatting 2013-01-25 09:29:59 -08:00
Patrick Wendell b6fc6e6752 SPARK-541: Adding a warning for invalid Master URL
Right now Spark silently parses master URL's which do not match any
known regex as a Mesos URL. The Mesos error message when an invalid URL gets
passed is really confusing, so this warns the user when the implicit
conversion is happening.
2013-01-24 14:31:23 -08:00
Matei Zaharia 0fe173a3a5 Merge pull request #410 from rxin/splitpruningrdd
Added a clearDependencies method in PartitionPruningRDD.
2013-01-23 23:10:15 -08:00
Reynold Xin 67a43bc7e6 Added a clearDependencies method in PartitionPruningRDD. 2013-01-23 23:06:52 -08:00
Matei Zaharia fe5e4812fc Merge pull request #409 from rxin/splitpruningrdd
Added pruntSplits method to RDD.
2013-01-23 22:23:22 -08:00
Reynold Xin c109f29c97 Updated PruneDependency to change "split" to "partition". 2013-01-23 22:22:03 -08:00
Reynold Xin eedc542a02 Removed pruneSplits method in RDD and renamed SplitsPruningRDD to
PartitionPruningRDD.
2013-01-23 22:14:23 -08:00
Reynold Xin 81004b967e Marked prev RDD as transient in SplitsPruningRDD. 2013-01-23 21:54:27 -08:00
Reynold Xin 636e912f32 Created a PruneDependency to properly assign dependency for
SplitsPruningRDD.
2013-01-23 21:21:55 -08:00
Matei Zaharia 548856a224 Merge remote-tracking branch 'woggling/remove-machines'
Conflicts:
	core/src/main/scala/spark/scheduler/DAGScheduler.scala
2013-01-23 15:44:17 -08:00
Reynold Xin eb222b7206 Added pruntSplits method to RDD. 2013-01-23 15:29:02 -08:00
Matei Zaharia 1dd82743e0 Fix compile error due to cherry-pick 2013-01-23 13:07:27 -08:00
Imran Rashid e1985bfa04 be sure to set class loader of kryo instances 2013-01-23 12:51:09 -08:00
Charles Reiss be4a115a7e Clarify TODO. 2013-01-23 12:48:45 -08:00
Matei Zaharia 1a3aeeca23 Merge pull request #407 from woggling/no-cache-tracker
Eliminate CacheTracker
2013-01-23 12:28:48 -08:00
Charles Reiss e1027ca639 Actually add CacheManager. 2013-01-23 12:22:11 -08:00
Matei Zaharia 4147e1d47b Merge pull request #406 from tdas/master
Changed StorageLevel and BlockManagerId API to prevent duplication in memory
2013-01-23 12:18:31 -08:00
Matei Zaharia 4d77d554e1 Merge pull request #394 from JoshRosen/add_file_fix
Add SparkFiles.get() API to access files added through addFile().
2013-01-23 12:16:30 -08:00
Josh Rosen ae2ed2947d Allow PySpark's SparkFiles to be used from driver
Fix minor documentation formatting issues.
2013-01-23 10:58:50 -08:00
Tathagata Das 79d55700ce One more fix. Made even default constructor of BlockManagerId private to prevent such problems in the future. 2013-01-23 01:57:09 -08:00
Charles Reiss d209b6b764 Extra debugging from hostLost() 2013-01-23 01:35:14 -08:00
Charles Reiss 9a27062260 Force generation increment after shuffle map stage 2013-01-23 01:34:44 -08:00
Tathagata Das 155f31398d Made StorageLevel constructor private, and added StorageLevels.create() to the Java API. Updates scala and java programming guides. 2013-01-23 01:10:26 -08:00
Tathagata Das 5e11f1e51f Modified StorageLevel API to ensure zero duplicate objects. 2013-01-22 23:42:53 -08:00
Tathagata Das bacade6caf Modified BlockManagerId API to ensure zero duplicate objects. Fixed BlockManagerId testcase in BlockManagerTestSuite. 2013-01-22 22:55:26 -08:00
Charles Reiss 2849931000 Eliminate CacheTracker.
Replaces DAGScheduler's queries of CacheTracker with BlockManagerMaster
queries.

Adds CacheManager to locally coordinate computation of cached RDDs.
2013-01-22 22:19:30 -08:00
Matei Zaharia ebaa8f6519 Merge remote-tracking branch 'stephenh/cleanup'
Conflicts:
	core/src/main/scala/spark/scheduler/local/LocalScheduler.scala
2013-01-22 21:05:45 -08:00