Commit graph

2132 commits

Author SHA1 Message Date
Mikhail Bautin fe3eceab57 Remove activation of profiles by default
See the discussion at https://github.com/mesos/spark/pull/355 for why
default profile activation is a problem.
2013-01-31 13:30:41 -08:00
Imran Rashid 02a6761589 Merge branch 'master' into blockmanager_info
Conflicts:
	core/src/main/scala/spark/storage/BlockManagerMaster.scala
2013-01-30 18:52:35 -08:00
Imran Rashid c1df24d085 rename Slaves --> Executor 2013-01-30 18:51:14 -08:00
Matei Zaharia 55327a283e Merge pull request #430 from pwendell/pyspark-guide
Minor improvements to PySpark docs
2013-01-30 15:35:29 -08:00
Patrick Wendell 3f945e3b83 Make module help available in python shell.
Also, adds a line in doc explaining how to use.
2013-01-30 15:04:06 -08:00
Patrick Wendell 58a7d320d7 Inclue packaging and launching pyspark in guide.
It's nicer if all the commands you need are made explicit.
2013-01-30 15:04:02 -08:00
Matei Zaharia d12330bd2c Merge pull request #426 from woggling/conn-manager-ips
Remember ConnectionManagerId used to initiate SendingConnections
2013-01-30 15:02:53 -08:00
Matei Zaharia 612a9fee71 Merge pull request #428 from woggling/mesos-exec-id
Make ExecutorIDs include SlaveIDs when running Mesos
2013-01-30 15:01:46 -08:00
Matei Zaharia dfb721b970 Merge pull request #429 from stephenh/includemessage
Include message and exitStatus if availalbe.
2013-01-30 15:01:24 -08:00
Stephen Haberman 871476d506 Include message and exitStatus if availalbe. 2013-01-30 16:56:46 -06:00
Charles Reiss 252845d304 Remove remants of attempt to use slaveId-executorId in MesosExecutorBackend 2013-01-30 10:38:06 -08:00
Charles Reiss f7de6978c1 Use Mesos ExecutorIDs to hold SlaveIDs. Then we can safely use
the Mesos ExecutorID as a Spark ExecutorID.
2013-01-30 09:38:57 -08:00
Charles Reiss 7f51458774 Comment at top of DAGSchedulerSuite 2013-01-30 09:34:53 -08:00
Charles Reiss 9c0bae75ad Change DAGSchedulerSuite to run DAGScheduler in the same Thread. 2013-01-30 09:22:07 -08:00
Charles Reiss 178b89204c Refactor DAGScheduler more to allow testing without a separate thread. 2013-01-30 09:19:55 -08:00
Charles Reiss 4bf3d7ea12 Clear spark.master.port to cleanup for other tests 2013-01-29 19:05:58 -08:00
Charles Reiss 9eac7d01f0 Add DAGScheduler tests. 2013-01-29 18:55:43 -08:00
Charles Reiss a3d14c0404 Refactoring to DAGScheduler to aid testing 2013-01-29 18:55:42 -08:00
Charles Reiss 0f81025eca Add easymock to SBT configuration. 2013-01-29 18:55:42 -08:00
Charles Reiss 16a0789e10 Remember ConnectionManagerId used to initiate SendingConnections.
This prevents ConnectionManager from getting confused if a machine
has multiple host names and the one getHostName() finds happens
not to be the one that was passed from, e.g., the BlockManagerMaster.
2013-01-29 18:13:59 -08:00
Matei Zaharia d54b10b6ad Merge remote-tracking branch 'stephenh/removefailedjob'
Conflicts:
	core/src/main/scala/spark/deploy/master/Master.scala
2013-01-29 18:12:29 -08:00
Matei Zaharia ccb67ff2ca Merge pull request #425 from stephenh/toDebugString
Add RDD.toDebugString.
2013-01-29 10:44:18 -08:00
Matei Zaharia 9ae11603b4 Merge pull request #415 from stephenh/driver
Replace old 'master' term with 'driver'.
2013-01-29 10:41:42 -08:00
Charles Reiss a34096a76d Add easymock to POMs 2013-01-29 10:04:33 -08:00
Imran Rashid b92259ba57 Merge branch 'master' into blockmanager_info 2013-01-29 09:45:10 -08:00
Matei Zaharia 64ba6a8c2c Simplify checkpointing code and RDD class a little:
- RDD's getDependencies and getSplits methods are now guaranteed to be
  called only once, so subclasses can safely do computation in there
  without worrying about caching the results.

- The management of a "splits_" variable that is cleared out when we
  checkpoint an RDD is now done in the RDD class.

- A few of the RDD subclasses are simpler.

- CheckpointRDD's compute() method no longer assumes that it is given a
  CheckpointRDDSplit -- it can work just as well on a split from the
  original RDD, because it only looks at its index. This is important
  because things like UnionRDD and ZippedRDD remember the parent's
  splits as part of their own and wouldn't work on checkpointed parents.

- RDD.iterator can now reuse cached data if an RDD is computed before it
  is checkpointed. It seems like it wouldn't do this before (it always
  called iterator() on the CheckpointRDD, which read from HDFS).
2013-01-28 22:30:12 -08:00
Matei Zaharia b29599e5cf Fix code that depended on metadata cleaner interval being in minutes 2013-01-28 22:24:47 -08:00
Stephen Haberman cbf72bffa5 Include name, if set, in RDD.toString(). 2013-01-29 00:20:36 -06:00
Stephen Haberman 3cda14af3f Add number of splits. 2013-01-29 00:12:31 -06:00
Matei Zaharia a1ecec8d79 Merge branch 'master' of github.com:mesos/spark 2013-01-28 22:08:44 -08:00
Stephen Haberman 951cfd9ba2 Add JavaRDDLike.toDebugString(). 2013-01-29 00:02:17 -06:00
Matei Zaharia f6eb1f0825 Merge pull request #413 from pwendell/stage-logging
SPARK-658: Adding logging of stage duration
2013-01-28 22:01:52 -08:00
Stephen Haberman b45857c965 Add RDD.toDebugString.
Original idea by Nathan Kronenfeld.
2013-01-28 23:56:56 -06:00
Patrick Wendell 7ee824e42e Units from ms -> s 2013-01-28 21:48:32 -08:00
Stephen Haberman 13368818af Merge branch 'master' into driver
Conflicts:
	core/src/main/scala/spark/SparkContext.scala
	core/src/main/scala/spark/SparkEnv.scala
	core/src/main/scala/spark/deploy/LocalSparkCluster.scala
	core/src/main/scala/spark/executor/StandaloneExecutorBackend.scala
	core/src/main/scala/spark/scheduler/cluster/SparkDeploySchedulerBackend.scala
	core/src/main/scala/spark/scheduler/cluster/StandaloneClusterMessage.scala
	core/src/main/scala/spark/scheduler/cluster/StandaloneSchedulerBackend.scala
	core/src/main/scala/spark/storage/BlockManagerMaster.scala
	core/src/main/scala/spark/storage/ThreadingTest.scala
	core/src/test/scala/spark/MapOutputTrackerSuite.scala
2013-01-28 23:30:24 -06:00
Matei Zaharia dda2ce017c Merge pull request #424 from pwendell/logging-cleanup
Some DEBUG-level log cleanup.
2013-01-28 21:18:54 -08:00
Matei Zaharia 8160f03ac4 Merge pull request #423 from squito/long_float_accums
add long and float accumulatorparams
2013-01-28 21:18:01 -08:00
Patrick Wendell 1f9b486a8b Some DEBUG-level log cleanup.
A few changes to make the DEBUG-level logs less
noisy and more readable.

- Moved a few very frequent messages to Trace
- Changed some BlockManger log messages to make them
  more understandable

SPARK-666 #resolve
2013-01-28 20:29:35 -08:00
Imran Rashid efff7bfb33 add long and float accumulatorparams 2013-01-28 20:23:11 -08:00
Imran Rashid cec9c768c2 convenient name available in StageInfo 2013-01-28 20:09:41 -08:00
Imran Rashid 01d77f329f expose stageInfo in SparkContext 2013-01-28 20:09:40 -08:00
Imran Rashid 38b83bc66b can get task runtime summary from task info 2013-01-28 20:09:40 -08:00
Imran Rashid b88daee916 simple util to summarize distributions 2013-01-28 20:09:40 -08:00
Imran Rashid b14841455c track task completion in DAGScheduler, and send a stageCompleted event with taskInfo to SparkListeners 2013-01-28 20:09:40 -08:00
Imran Rashid 0f22c4207f better formatting for RDDInfo 2013-01-28 20:07:53 -08:00
Imran Rashid a423ee546c expose RDD & storage info directly via SparkContext 2013-01-28 20:07:53 -08:00
Patrick Wendell 501433f1d5 Making submission time a field 2013-01-28 10:45:57 -08:00
Patrick Wendell c423be7d8e Renaming stage finished function 2013-01-28 10:45:57 -08:00
Patrick Wendell 07f568e1bf SPARK-658: Adding logging of stage duration 2013-01-28 10:45:57 -08:00
Matei Zaharia 286f8f876f Change time unit in MetadataCleaner to seconds 2013-01-28 01:29:27 -08:00