Commit graph

2041 commits

Author SHA1 Message Date
Imran Rashid c1df24d085 rename Slaves --> Executor 2013-01-30 18:51:14 -08:00
Imran Rashid b92259ba57 Merge branch 'master' into blockmanager_info 2013-01-29 09:45:10 -08:00
Matei Zaharia 64ba6a8c2c Simplify checkpointing code and RDD class a little:
- RDD's getDependencies and getSplits methods are now guaranteed to be
  called only once, so subclasses can safely do computation in there
  without worrying about caching the results.

- The management of a "splits_" variable that is cleared out when we
  checkpoint an RDD is now done in the RDD class.

- A few of the RDD subclasses are simpler.

- CheckpointRDD's compute() method no longer assumes that it is given a
  CheckpointRDDSplit -- it can work just as well on a split from the
  original RDD, because it only looks at its index. This is important
  because things like UnionRDD and ZippedRDD remember the parent's
  splits as part of their own and wouldn't work on checkpointed parents.

- RDD.iterator can now reuse cached data if an RDD is computed before it
  is checkpointed. It seems like it wouldn't do this before (it always
  called iterator() on the CheckpointRDD, which read from HDFS).
2013-01-28 22:30:12 -08:00
Matei Zaharia b29599e5cf Fix code that depended on metadata cleaner interval being in minutes 2013-01-28 22:24:47 -08:00
Matei Zaharia a1ecec8d79 Merge branch 'master' of github.com:mesos/spark 2013-01-28 22:08:44 -08:00
Matei Zaharia f6eb1f0825 Merge pull request #413 from pwendell/stage-logging
SPARK-658: Adding logging of stage duration
2013-01-28 22:01:52 -08:00
Patrick Wendell 7ee824e42e Units from ms -> s 2013-01-28 21:48:32 -08:00
Matei Zaharia dda2ce017c Merge pull request #424 from pwendell/logging-cleanup
Some DEBUG-level log cleanup.
2013-01-28 21:18:54 -08:00
Matei Zaharia 8160f03ac4 Merge pull request #423 from squito/long_float_accums
add long and float accumulatorparams
2013-01-28 21:18:01 -08:00
Patrick Wendell 1f9b486a8b Some DEBUG-level log cleanup.
A few changes to make the DEBUG-level logs less
noisy and more readable.

- Moved a few very frequent messages to Trace
- Changed some BlockManger log messages to make them
  more understandable

SPARK-666 #resolve
2013-01-28 20:29:35 -08:00
Imran Rashid efff7bfb33 add long and float accumulatorparams 2013-01-28 20:23:11 -08:00
Imran Rashid 0f22c4207f better formatting for RDDInfo 2013-01-28 20:07:53 -08:00
Imran Rashid a423ee546c expose RDD & storage info directly via SparkContext 2013-01-28 20:07:53 -08:00
Patrick Wendell 501433f1d5 Making submission time a field 2013-01-28 10:45:57 -08:00
Patrick Wendell c423be7d8e Renaming stage finished function 2013-01-28 10:45:57 -08:00
Patrick Wendell 07f568e1bf SPARK-658: Adding logging of stage duration 2013-01-28 10:45:57 -08:00
Matei Zaharia 286f8f876f Change time unit in MetadataCleaner to seconds 2013-01-28 01:29:27 -08:00
Matei Zaharia f03d9760fd Clean up BlockManagerUI a little (make it not be an object, merge with
Directives, and bind to a random port)
2013-01-27 23:56:14 -08:00
Matei Zaharia 909850729e Rename more things from slave to executor 2013-01-27 23:17:20 -08:00
Matei Zaharia 44b4a0f88f Track workers by executor ID instead of hostname to allow multiple
executors per machine and remove the need for multiple IP addresses in
unit tests.
2013-01-27 19:23:49 -08:00
Matei Zaharia b9e2d9efec Merge pull request #419 from shivaram/ec2-ip-change
Detect whether we run on EC2 using ec2-metadata as well
2013-01-27 18:41:11 -08:00
Matei Zaharia 6ad8540b40 Merge pull request #401 from squito/blockmanager_ui
Blockmanager ui
2013-01-27 15:51:08 -08:00
Shivaram Venkataraman 717b221cca Detect whether we run on EC2 using ec2-metadata as well 2013-01-26 23:03:11 -08:00
Matei Zaharia 49f6472c0f Merge pull request #418 from woggling/reregister-deadlock
Fix BlockManager reregistration deadlock; do BlockManager reregistration more asynchronously
2013-01-26 18:59:02 -08:00
Charles Reiss 58fc6b2bed Handle duplicate registrations better. 2013-01-26 18:30:44 -08:00
Charles Reiss ad4232b4da Fix deadlock in BlockManager reregistration triggered by failed updates. 2013-01-26 18:30:38 -08:00
Matei Zaharia ec2dadb521 Merge pull request #417 from JoshRosen/spark-668
Fix JavaRDDLike.flatMap(PairFlatMapFunction) (SPARK-668).
2013-01-26 16:20:57 -08:00
Josh Rosen d49cf0e587 Fix JavaRDDLike.flatMap(PairFlatMapFunction) (SPARK-668).
This workaround is easier than rewriting JavaRDDLike in Java.
2013-01-26 16:13:18 -08:00
Imran Rashid 49c05608f5 add metadatacleaner for persisentRdd map 2013-01-25 17:04:16 -08:00
Matei Zaharia 2435b7b5b7 Merge pull request #416 from stephenh/morefinally
Call executeOnCompleteCallbacks in more finally blocks.
2013-01-25 15:33:26 -08:00
Stephen Haberman 8efbda0b17 Call executeOnCompleteCallbacks in more finally blocks. 2013-01-25 14:55:33 -06:00
Imran Rashid a1d9d1767d fixup 1cadaa1, changed api of map 2013-01-25 10:05:26 -08:00
Imran Rashid 1cadaa164e switch to TimeStampedHashMap for storing persistent Rdds 2013-01-25 09:30:21 -08:00
Imran Rashid 539491bbc3 code reformatting 2013-01-25 09:29:59 -08:00
Matei Zaharia 04bfee2d08 Merge pull request #411 from stephenh/localsparkcontext
Add LocalSparkContext to manage common sc variable.
2013-01-24 23:00:50 -08:00
Stephen Haberman ec43a51b38 Merge branch 'master' into localsparkcontext
Conflicts:
	core/src/test/scala/spark/FileServerSuite.scala
	core/src/test/scala/spark/RDDSuite.scala
2013-01-24 21:17:30 -06:00
Matei Zaharia 45e6dd65b2 Merge pull request #412 from pwendell/master-url-warning
SPARK-541: Adding a warning for invalid Master URL
2013-01-24 15:58:43 -08:00
Patrick Wendell b6fc6e6752 SPARK-541: Adding a warning for invalid Master URL
Right now Spark silently parses master URL's which do not match any
known regex as a Mesos URL. The Mesos error message when an invalid URL gets
passed is really confusing, so this warns the user when the implicit
conversion is happening.
2013-01-24 14:31:23 -08:00
Matei Zaharia a2f4891d1d Merge pull request #396 from JoshRosen/spark-653
Make PySpark AccumulatorParam an abstract base class
2013-01-24 13:05:03 -08:00
Stephen Haberman 230bda2047 Add LocalSparkContext to manage common sc variable. 2013-01-24 11:01:01 -06:00
Matei Zaharia 0fe173a3a5 Merge pull request #410 from rxin/splitpruningrdd
Added a clearDependencies method in PartitionPruningRDD.
2013-01-23 23:10:15 -08:00
Reynold Xin 67a43bc7e6 Added a clearDependencies method in PartitionPruningRDD. 2013-01-23 23:06:52 -08:00
Matei Zaharia fe5e4812fc Merge pull request #409 from rxin/splitpruningrdd
Added pruntSplits method to RDD.
2013-01-23 22:23:22 -08:00
Reynold Xin c109f29c97 Updated PruneDependency to change "split" to "partition". 2013-01-23 22:22:03 -08:00
Reynold Xin eedc542a02 Removed pruneSplits method in RDD and renamed SplitsPruningRDD to
PartitionPruningRDD.
2013-01-23 22:14:23 -08:00
Reynold Xin 81004b967e Marked prev RDD as transient in SplitsPruningRDD. 2013-01-23 21:54:27 -08:00
Reynold Xin 636e912f32 Created a PruneDependency to properly assign dependency for
SplitsPruningRDD.
2013-01-23 21:21:55 -08:00
Reynold Xin 45cd50d5fe Updated assert == to ===. 2013-01-23 16:06:58 -08:00
Matei Zaharia 548856a224 Merge remote-tracking branch 'woggling/remove-machines'
Conflicts:
	core/src/main/scala/spark/scheduler/DAGScheduler.scala
2013-01-23 15:44:17 -08:00
Reynold Xin c24b3819dd Added an extra assert for split size check. 2013-01-23 15:34:59 -08:00