Commit graph

2061 commits

Author SHA1 Message Date
Stephen Haberman 418e36caa8 Add more private declarations. 2013-01-31 17:18:33 -06:00
Matei Zaharia 55327a283e Merge pull request #430 from pwendell/pyspark-guide
Minor improvements to PySpark docs
2013-01-30 15:35:29 -08:00
Patrick Wendell 3f945e3b83 Make module help available in python shell.
Also, adds a line in doc explaining how to use.
2013-01-30 15:04:06 -08:00
Patrick Wendell 58a7d320d7 Inclue packaging and launching pyspark in guide.
It's nicer if all the commands you need are made explicit.
2013-01-30 15:04:02 -08:00
Matei Zaharia d12330bd2c Merge pull request #426 from woggling/conn-manager-ips
Remember ConnectionManagerId used to initiate SendingConnections
2013-01-30 15:02:53 -08:00
Matei Zaharia 612a9fee71 Merge pull request #428 from woggling/mesos-exec-id
Make ExecutorIDs include SlaveIDs when running Mesos
2013-01-30 15:01:46 -08:00
Matei Zaharia dfb721b970 Merge pull request #429 from stephenh/includemessage
Include message and exitStatus if availalbe.
2013-01-30 15:01:24 -08:00
Stephen Haberman 871476d506 Include message and exitStatus if availalbe. 2013-01-30 16:56:46 -06:00
Charles Reiss 252845d304 Remove remants of attempt to use slaveId-executorId in MesosExecutorBackend 2013-01-30 10:38:06 -08:00
Charles Reiss f7de6978c1 Use Mesos ExecutorIDs to hold SlaveIDs. Then we can safely use
the Mesos ExecutorID as a Spark ExecutorID.
2013-01-30 09:38:57 -08:00
Charles Reiss 16a0789e10 Remember ConnectionManagerId used to initiate SendingConnections.
This prevents ConnectionManager from getting confused if a machine
has multiple host names and the one getHostName() finds happens
not to be the one that was passed from, e.g., the BlockManagerMaster.
2013-01-29 18:13:59 -08:00
Matei Zaharia d54b10b6ad Merge remote-tracking branch 'stephenh/removefailedjob'
Conflicts:
	core/src/main/scala/spark/deploy/master/Master.scala
2013-01-29 18:12:29 -08:00
Matei Zaharia ccb67ff2ca Merge pull request #425 from stephenh/toDebugString
Add RDD.toDebugString.
2013-01-29 10:44:18 -08:00
Matei Zaharia 9ae11603b4 Merge pull request #415 from stephenh/driver
Replace old 'master' term with 'driver'.
2013-01-29 10:41:42 -08:00
Matei Zaharia 64ba6a8c2c Simplify checkpointing code and RDD class a little:
- RDD's getDependencies and getSplits methods are now guaranteed to be
  called only once, so subclasses can safely do computation in there
  without worrying about caching the results.

- The management of a "splits_" variable that is cleared out when we
  checkpoint an RDD is now done in the RDD class.

- A few of the RDD subclasses are simpler.

- CheckpointRDD's compute() method no longer assumes that it is given a
  CheckpointRDDSplit -- it can work just as well on a split from the
  original RDD, because it only looks at its index. This is important
  because things like UnionRDD and ZippedRDD remember the parent's
  splits as part of their own and wouldn't work on checkpointed parents.

- RDD.iterator can now reuse cached data if an RDD is computed before it
  is checkpointed. It seems like it wouldn't do this before (it always
  called iterator() on the CheckpointRDD, which read from HDFS).
2013-01-28 22:30:12 -08:00
Matei Zaharia b29599e5cf Fix code that depended on metadata cleaner interval being in minutes 2013-01-28 22:24:47 -08:00
Stephen Haberman cbf72bffa5 Include name, if set, in RDD.toString(). 2013-01-29 00:20:36 -06:00
Stephen Haberman 3cda14af3f Add number of splits. 2013-01-29 00:12:31 -06:00
Matei Zaharia a1ecec8d79 Merge branch 'master' of github.com:mesos/spark 2013-01-28 22:08:44 -08:00
Stephen Haberman 951cfd9ba2 Add JavaRDDLike.toDebugString(). 2013-01-29 00:02:17 -06:00
Matei Zaharia f6eb1f0825 Merge pull request #413 from pwendell/stage-logging
SPARK-658: Adding logging of stage duration
2013-01-28 22:01:52 -08:00
Stephen Haberman b45857c965 Add RDD.toDebugString.
Original idea by Nathan Kronenfeld.
2013-01-28 23:56:56 -06:00
Patrick Wendell 7ee824e42e Units from ms -> s 2013-01-28 21:48:32 -08:00
Stephen Haberman 13368818af Merge branch 'master' into driver
Conflicts:
	core/src/main/scala/spark/SparkContext.scala
	core/src/main/scala/spark/SparkEnv.scala
	core/src/main/scala/spark/deploy/LocalSparkCluster.scala
	core/src/main/scala/spark/executor/StandaloneExecutorBackend.scala
	core/src/main/scala/spark/scheduler/cluster/SparkDeploySchedulerBackend.scala
	core/src/main/scala/spark/scheduler/cluster/StandaloneClusterMessage.scala
	core/src/main/scala/spark/scheduler/cluster/StandaloneSchedulerBackend.scala
	core/src/main/scala/spark/storage/BlockManagerMaster.scala
	core/src/main/scala/spark/storage/ThreadingTest.scala
	core/src/test/scala/spark/MapOutputTrackerSuite.scala
2013-01-28 23:30:24 -06:00
Matei Zaharia dda2ce017c Merge pull request #424 from pwendell/logging-cleanup
Some DEBUG-level log cleanup.
2013-01-28 21:18:54 -08:00
Matei Zaharia 8160f03ac4 Merge pull request #423 from squito/long_float_accums
add long and float accumulatorparams
2013-01-28 21:18:01 -08:00
Patrick Wendell 1f9b486a8b Some DEBUG-level log cleanup.
A few changes to make the DEBUG-level logs less
noisy and more readable.

- Moved a few very frequent messages to Trace
- Changed some BlockManger log messages to make them
  more understandable

SPARK-666 #resolve
2013-01-28 20:29:35 -08:00
Imran Rashid efff7bfb33 add long and float accumulatorparams 2013-01-28 20:23:11 -08:00
Patrick Wendell 501433f1d5 Making submission time a field 2013-01-28 10:45:57 -08:00
Patrick Wendell c423be7d8e Renaming stage finished function 2013-01-28 10:45:57 -08:00
Patrick Wendell 07f568e1bf SPARK-658: Adding logging of stage duration 2013-01-28 10:45:57 -08:00
Matei Zaharia 286f8f876f Change time unit in MetadataCleaner to seconds 2013-01-28 01:29:27 -08:00
Matei Zaharia f03d9760fd Clean up BlockManagerUI a little (make it not be an object, merge with
Directives, and bind to a random port)
2013-01-27 23:56:14 -08:00
Matei Zaharia 909850729e Rename more things from slave to executor 2013-01-27 23:17:20 -08:00
Matei Zaharia 44b4a0f88f Track workers by executor ID instead of hostname to allow multiple
executors per machine and remove the need for multiple IP addresses in
unit tests.
2013-01-27 19:23:49 -08:00
Matei Zaharia b9e2d9efec Merge pull request #419 from shivaram/ec2-ip-change
Detect whether we run on EC2 using ec2-metadata as well
2013-01-27 18:41:11 -08:00
Matei Zaharia 6ad8540b40 Merge pull request #401 from squito/blockmanager_ui
Blockmanager ui
2013-01-27 15:51:08 -08:00
Shivaram Venkataraman 717b221cca Detect whether we run on EC2 using ec2-metadata as well 2013-01-26 23:03:11 -08:00
Matei Zaharia 49f6472c0f Merge pull request #418 from woggling/reregister-deadlock
Fix BlockManager reregistration deadlock; do BlockManager reregistration more asynchronously
2013-01-26 18:59:02 -08:00
Charles Reiss 58fc6b2bed Handle duplicate registrations better. 2013-01-26 18:30:44 -08:00
Charles Reiss ad4232b4da Fix deadlock in BlockManager reregistration triggered by failed updates. 2013-01-26 18:30:38 -08:00
Matei Zaharia ec2dadb521 Merge pull request #417 from JoshRosen/spark-668
Fix JavaRDDLike.flatMap(PairFlatMapFunction) (SPARK-668).
2013-01-26 16:20:57 -08:00
Josh Rosen d49cf0e587 Fix JavaRDDLike.flatMap(PairFlatMapFunction) (SPARK-668).
This workaround is easier than rewriting JavaRDDLike in Java.
2013-01-26 16:13:18 -08:00
Imran Rashid 49c05608f5 add metadatacleaner for persisentRdd map 2013-01-25 17:04:16 -08:00
Matei Zaharia 2435b7b5b7 Merge pull request #416 from stephenh/morefinally
Call executeOnCompleteCallbacks in more finally blocks.
2013-01-25 15:33:26 -08:00
Stephen Haberman 8efbda0b17 Call executeOnCompleteCallbacks in more finally blocks. 2013-01-25 14:55:33 -06:00
Imran Rashid a1d9d1767d fixup 1cadaa1, changed api of map 2013-01-25 10:05:26 -08:00
Imran Rashid 1cadaa164e switch to TimeStampedHashMap for storing persistent Rdds 2013-01-25 09:30:21 -08:00
Imran Rashid 539491bbc3 code reformatting 2013-01-25 09:29:59 -08:00
Stephen Haberman 7dfb82a992 Replace old 'master' term with 'driver'. 2013-01-25 11:03:00 -06:00