Commit graph

1112 commits

Author SHA1 Message Date
Matei Zaharia a4611d66f0 Merge pull request #449 from stephenh/longerdriversuite
Increase DriverSuite timeout.
2013-02-05 17:58:22 -08:00
Stephen Haberman 0e19093fd8 Handle Terminated to avoid endless DeathPactExceptions.
Credit to Roland Kuhn, Akka's tech lead, for pointing out this
various obvious fix, but StandaloneExecutorBackend.preStart's
catch block would never (ever) get hit, because all of the
operation's in preStart are async.

So, the System.exit in the catch block was skipped, and instead
Akka was sending Terminated messages which, since we didn't
handle, it turned into DeathPactException, which started
a postRestart/preStart infinite loop.
2013-02-05 18:58:00 -06:00
Stephen Haberman 1ba3393ceb Increase DriverSuite timeout. 2013-02-05 17:56:50 -06:00
Stephen Haberman 8bd0e888f3 Inline mergePair to look more like the narrow dep branch.
No functionality changes, I think this is just more consistent
given mergePair isn't called multiple times/recursive.

Also added a comment to explain the usual case of having two parent RDDs.
2013-02-05 17:50:25 -06:00
Imran Rashid 1704b124d8 add as many fetch requests as we can, subject to maxBytesInFlight 2013-02-05 14:33:52 -08:00
Imran Rashid cfab1a3528 add as many fetch requests as we can, subject to maxBytesInFlight 2013-02-05 14:31:46 -08:00
Imran Rashid 696e4b2167 track remoteFetchTime 2013-02-05 14:29:16 -08:00
Imran Rashid b29f9cc978 BlockManager.getMultiple returns a custom iterator, to enable tracking of shuffle performance 2013-02-05 14:00:44 -08:00
Imran Rashid e319ac74c1 cogrouped RDD stores the amount of time taken to read shuffle data in each task 2013-02-05 10:18:16 -08:00
Imran Rashid 295b534398 task context keeps a handle on Task -- giant hack, temporary for tracking shuffle times & amount 2013-02-05 10:18:16 -08:00
Imran Rashid 9df7e2ae55 Shuffle Fetchers use a timed iterator 2013-02-05 10:18:16 -08:00
Imran Rashid 1ad77c4766 add TimedIterator 2013-02-05 10:18:15 -08:00
Imran Rashid 843084d69d track total bytes written by ShuffleMapTasks 2013-02-05 10:18:15 -08:00
Imran Rashid b430d2359d Merge branch 'master' into stageInfo
Conflicts:
	core/src/main/scala/spark/scheduler/DAGScheduler.scala
	core/src/main/scala/spark/scheduler/local/LocalScheduler.scala
2013-02-04 21:40:44 -08:00
Matei Zaharia f6ec547ea7 Small fix to test for distinct 2013-02-04 13:14:54 -08:00
Matei Zaharia aa4ee1e9e5 Fix failing test 2013-02-04 11:06:31 -08:00
Matei Zaharia f7b4e428be Merge pull request #445 from JoshRosen/pyspark_fixes
Fix exit status in PySpark unit tests; fix/optimize PySpark's RDD.take()
2013-02-03 21:36:36 -08:00
Matei Zaharia 3bfaf3ab1d Merge pull request #379 from stephenh/sparkmem
Add spark.executor.memory to differentiate executor memory from spark-shell
2013-02-02 23:58:23 -08:00
Matei Zaharia 88ee6163a1 Merge pull request #422 from squito/blockmanager_info
RDDInfo available from SparkContext
2013-02-02 23:44:13 -08:00
Matei Zaharia cd4ca93679 Merge pull request #436 from stephenh/removeextraloop
Once we find a split with no block, we don't have to look for more.
2013-02-02 23:39:28 -08:00
Matei Zaharia d5daaab381 Merge pull request #442 from stephenh/fixsystemnames
Fix createActorSystem not actually using the systemName parameter.
2013-02-02 23:38:46 -08:00
Matei Zaharia 9163c3705d Formatting 2013-02-02 23:34:47 -08:00
Josh Rosen 8fbd5380b7 Fetch fewer objects in PySpark's take() method. 2013-02-03 06:44:49 +00:00
Matei Zaharia 34a7bcdb3a Formatting 2013-02-02 19:40:30 -08:00
Stephen Haberman 7aba123f0c Further simplify checking for Nil. 2013-02-02 13:53:28 -06:00
Charles Reiss 6107957962 Merge remote-tracking branch 'base/master' into dag-sched-tests
Conflicts:
	core/src/main/scala/spark/scheduler/DAGScheduler.scala
2013-02-02 00:33:30 -08:00
Stephen Haberman cae8a6795c Fix dangling old variable names. 2013-02-02 02:15:39 -06:00
Stephen Haberman 696eec32c9 Move executorMemory up into SchedulerBackend. 2013-02-02 02:03:26 -06:00
Stephen Haberman 103c375ba0 Merge branch 'master' into sparkmem 2013-02-02 01:57:18 -06:00
Stephen Haberman 28e0cb9f31 Fix createActorSystem not actually using the systemName parameter.
This meant all system names were "spark", which worked, but didn't
lead to the most intuitive log output.

This fixes createActorSystem to use the passed system name, and
refactors Master/Worker to encapsulate their system/actor names
instead of having the clients guess at them.

Note that the driver system name, "spark", is left as is, and is
still repeated a few times, but that seems like a separate issue.
2013-02-02 01:11:37 -06:00
Charles Reiss 1fd5ee323d Code review changes: add sc.stop; style of multiline comments; parens on procedure calls. 2013-02-01 22:33:38 -08:00
Matei Zaharia ae26911ec0 Add back test for distinct without parens 2013-02-01 21:07:24 -08:00
Stephen Haberman 12c1eb4756 Reduce the amount of duplicate logging Akka does to stdout.
Given we have Akka logging go through SLF4j to log4j, we don't need
all the extra noise of Akka's stdout logger that is supposedly only
used during Akka init time but seems to continue logging lots of
noisy network events that we either don't care about or are in the
log4j logs anyway.

See:

http://doc.akka.io/docs/akka/2.0/general/configuration.html

    # Log level for the very basic logger activated during AkkaApplication startup
    # Options: ERROR, WARNING, INFO, DEBUG
    # stdout-loglevel = "WARNING"
2013-02-01 21:21:44 -06:00
Matei Zaharia 8b3041c723 Reduced the memory usage of reduce and similar operations
These operations used to wait for all the results to be available in an
array on the driver program before merging them. They now merge values
incrementally as they arrive.
2013-02-01 15:38:42 -08:00
Matei Zaharia 4529876db0 Merge branch 'master' of github.com:mesos/spark 2013-02-01 14:07:38 -08:00
Matei Zaharia 9970926ede formatting 2013-02-01 14:07:34 -08:00
Matei Zaharia 79c24abe4c Merge pull request #432 from stephenh/moreprivacy
Add more private declarations.
2013-02-01 14:06:55 -08:00
Matei Zaharia de340ddf0b Merge pull request #437 from stephenh/cancelmetacleaner
Stop BlockManagers metadataCleaner.
2013-02-01 12:59:25 -08:00
Imran Rashid c6190067ae remove unneeded (and unused) filter on block info 2013-02-01 09:55:25 -08:00
Stephen Haberman 59c57e48df Stop BlockManagers metadataCleaner. 2013-02-01 10:34:02 -06:00
Matei Zaharia 571af31304 Merge pull request #433 from rxin/master
Changed PartitionPruningRDD's split to make sure it returns the correct split index.
2013-02-01 00:32:41 -08:00
Imran Rashid 8a0a5ed533 track total partitions, in addition to cached partitions; use scala string formatting 2013-02-01 00:23:38 -08:00
Imran Rashid f127f2ae76 fixup merge (master -> driver renaming) 2013-02-01 00:20:49 -08:00
Reynold Xin f9af9cee6f Moved PruneDependency into PartitionPruningRDD.scala. 2013-02-01 00:02:46 -08:00
Matei Zaharia 7e2e046e37 Merge pull request #434 from pwendell/python-exceptions
SPARK-673: Capture and re-throw Python exceptions
2013-01-31 21:58:26 -08:00
Patrick Wendell 39ab83e957 Small fix from last commit 2013-01-31 21:52:52 -08:00
Patrick Wendell c33f0ef41a Some style cleanup 2013-01-31 21:50:02 -08:00
Patrick Wendell 3446d5c8d6 SPARK-673: Capture and re-throw Python exceptions
This patch alters the Python <-> executor protocol to pass on
exception data when they occur in user Python code.
2013-01-31 18:06:11 -08:00
Reynold Xin 6289d9654e Removed the TODO comment from PartitionPruningRDD. 2013-01-31 17:49:36 -08:00
Reynold Xin 5b0fc265c2 Changed PartitionPruningRDD's split to make sure it returns the correct
split index.
2013-01-31 17:48:39 -08:00
Stephen Haberman 782187c210 Once we find a split with no block, we don't have to look for more. 2013-01-31 18:27:25 -06:00
Stephen Haberman 418e36caa8 Add more private declarations. 2013-01-31 17:18:33 -06:00
Mikhail Bautin fe3eceab57 Remove activation of profiles by default
See the discussion at https://github.com/mesos/spark/pull/355 for why
default profile activation is a problem.
2013-01-31 13:30:41 -08:00
Imran Rashid 02a6761589 Merge branch 'master' into blockmanager_info
Conflicts:
	core/src/main/scala/spark/storage/BlockManagerMaster.scala
2013-01-30 18:52:35 -08:00
Imran Rashid c1df24d085 rename Slaves --> Executor 2013-01-30 18:51:14 -08:00
Matei Zaharia d12330bd2c Merge pull request #426 from woggling/conn-manager-ips
Remember ConnectionManagerId used to initiate SendingConnections
2013-01-30 15:02:53 -08:00
Matei Zaharia 612a9fee71 Merge pull request #428 from woggling/mesos-exec-id
Make ExecutorIDs include SlaveIDs when running Mesos
2013-01-30 15:01:46 -08:00
Stephen Haberman 871476d506 Include message and exitStatus if availalbe. 2013-01-30 16:56:46 -06:00
Charles Reiss 252845d304 Remove remants of attempt to use slaveId-executorId in MesosExecutorBackend 2013-01-30 10:38:06 -08:00
Charles Reiss f7de6978c1 Use Mesos ExecutorIDs to hold SlaveIDs. Then we can safely use
the Mesos ExecutorID as a Spark ExecutorID.
2013-01-30 09:38:57 -08:00
Charles Reiss 7f51458774 Comment at top of DAGSchedulerSuite 2013-01-30 09:34:53 -08:00
Charles Reiss 9c0bae75ad Change DAGSchedulerSuite to run DAGScheduler in the same Thread. 2013-01-30 09:22:07 -08:00
Charles Reiss 178b89204c Refactor DAGScheduler more to allow testing without a separate thread. 2013-01-30 09:19:55 -08:00
Charles Reiss 4bf3d7ea12 Clear spark.master.port to cleanup for other tests 2013-01-29 19:05:58 -08:00
Charles Reiss 9eac7d01f0 Add DAGScheduler tests. 2013-01-29 18:55:43 -08:00
Charles Reiss a3d14c0404 Refactoring to DAGScheduler to aid testing 2013-01-29 18:55:42 -08:00
Charles Reiss 16a0789e10 Remember ConnectionManagerId used to initiate SendingConnections.
This prevents ConnectionManager from getting confused if a machine
has multiple host names and the one getHostName() finds happens
not to be the one that was passed from, e.g., the BlockManagerMaster.
2013-01-29 18:13:59 -08:00
Matei Zaharia d54b10b6ad Merge remote-tracking branch 'stephenh/removefailedjob'
Conflicts:
	core/src/main/scala/spark/deploy/master/Master.scala
2013-01-29 18:12:29 -08:00
Matei Zaharia ccb67ff2ca Merge pull request #425 from stephenh/toDebugString
Add RDD.toDebugString.
2013-01-29 10:44:18 -08:00
Matei Zaharia 9ae11603b4 Merge pull request #415 from stephenh/driver
Replace old 'master' term with 'driver'.
2013-01-29 10:41:42 -08:00
Charles Reiss a34096a76d Add easymock to POMs 2013-01-29 10:04:33 -08:00
Imran Rashid b92259ba57 Merge branch 'master' into blockmanager_info 2013-01-29 09:45:10 -08:00
Matei Zaharia 64ba6a8c2c Simplify checkpointing code and RDD class a little:
- RDD's getDependencies and getSplits methods are now guaranteed to be
  called only once, so subclasses can safely do computation in there
  without worrying about caching the results.

- The management of a "splits_" variable that is cleared out when we
  checkpoint an RDD is now done in the RDD class.

- A few of the RDD subclasses are simpler.

- CheckpointRDD's compute() method no longer assumes that it is given a
  CheckpointRDDSplit -- it can work just as well on a split from the
  original RDD, because it only looks at its index. This is important
  because things like UnionRDD and ZippedRDD remember the parent's
  splits as part of their own and wouldn't work on checkpointed parents.

- RDD.iterator can now reuse cached data if an RDD is computed before it
  is checkpointed. It seems like it wouldn't do this before (it always
  called iterator() on the CheckpointRDD, which read from HDFS).
2013-01-28 22:30:12 -08:00
Stephen Haberman cbf72bffa5 Include name, if set, in RDD.toString(). 2013-01-29 00:20:36 -06:00
Stephen Haberman 3cda14af3f Add number of splits. 2013-01-29 00:12:31 -06:00
Matei Zaharia a1ecec8d79 Merge branch 'master' of github.com:mesos/spark 2013-01-28 22:08:44 -08:00
Stephen Haberman 951cfd9ba2 Add JavaRDDLike.toDebugString(). 2013-01-29 00:02:17 -06:00
Matei Zaharia f6eb1f0825 Merge pull request #413 from pwendell/stage-logging
SPARK-658: Adding logging of stage duration
2013-01-28 22:01:52 -08:00
Stephen Haberman b45857c965 Add RDD.toDebugString.
Original idea by Nathan Kronenfeld.
2013-01-28 23:56:56 -06:00
Patrick Wendell 7ee824e42e Units from ms -> s 2013-01-28 21:48:32 -08:00
Stephen Haberman 13368818af Merge branch 'master' into driver
Conflicts:
	core/src/main/scala/spark/SparkContext.scala
	core/src/main/scala/spark/SparkEnv.scala
	core/src/main/scala/spark/deploy/LocalSparkCluster.scala
	core/src/main/scala/spark/executor/StandaloneExecutorBackend.scala
	core/src/main/scala/spark/scheduler/cluster/SparkDeploySchedulerBackend.scala
	core/src/main/scala/spark/scheduler/cluster/StandaloneClusterMessage.scala
	core/src/main/scala/spark/scheduler/cluster/StandaloneSchedulerBackend.scala
	core/src/main/scala/spark/storage/BlockManagerMaster.scala
	core/src/main/scala/spark/storage/ThreadingTest.scala
	core/src/test/scala/spark/MapOutputTrackerSuite.scala
2013-01-28 23:30:24 -06:00
Matei Zaharia dda2ce017c Merge pull request #424 from pwendell/logging-cleanup
Some DEBUG-level log cleanup.
2013-01-28 21:18:54 -08:00
Patrick Wendell 1f9b486a8b Some DEBUG-level log cleanup.
A few changes to make the DEBUG-level logs less
noisy and more readable.

- Moved a few very frequent messages to Trace
- Changed some BlockManger log messages to make them
  more understandable

SPARK-666 #resolve
2013-01-28 20:29:35 -08:00
Imran Rashid efff7bfb33 add long and float accumulatorparams 2013-01-28 20:23:11 -08:00
Imran Rashid cec9c768c2 convenient name available in StageInfo 2013-01-28 20:09:41 -08:00
Imran Rashid 01d77f329f expose stageInfo in SparkContext 2013-01-28 20:09:40 -08:00
Imran Rashid 38b83bc66b can get task runtime summary from task info 2013-01-28 20:09:40 -08:00
Imran Rashid b88daee916 simple util to summarize distributions 2013-01-28 20:09:40 -08:00
Imran Rashid b14841455c track task completion in DAGScheduler, and send a stageCompleted event with taskInfo to SparkListeners 2013-01-28 20:09:40 -08:00
Imran Rashid 0f22c4207f better formatting for RDDInfo 2013-01-28 20:07:53 -08:00
Imran Rashid a423ee546c expose RDD & storage info directly via SparkContext 2013-01-28 20:07:53 -08:00
Patrick Wendell 501433f1d5 Making submission time a field 2013-01-28 10:45:57 -08:00
Patrick Wendell c423be7d8e Renaming stage finished function 2013-01-28 10:45:57 -08:00
Patrick Wendell 07f568e1bf SPARK-658: Adding logging of stage duration 2013-01-28 10:45:57 -08:00
Matei Zaharia 286f8f876f Change time unit in MetadataCleaner to seconds 2013-01-28 01:29:27 -08:00
Matei Zaharia f03d9760fd Clean up BlockManagerUI a little (make it not be an object, merge with
Directives, and bind to a random port)
2013-01-27 23:56:14 -08:00
Matei Zaharia 909850729e Rename more things from slave to executor 2013-01-27 23:17:20 -08:00
Matei Zaharia 44b4a0f88f Track workers by executor ID instead of hostname to allow multiple
executors per machine and remove the need for multiple IP addresses in
unit tests.
2013-01-27 19:23:49 -08:00
Matei Zaharia 6ad8540b40 Merge pull request #401 from squito/blockmanager_ui
Blockmanager ui
2013-01-27 15:51:08 -08:00
Matei Zaharia 49f6472c0f Merge pull request #418 from woggling/reregister-deadlock
Fix BlockManager reregistration deadlock; do BlockManager reregistration more asynchronously
2013-01-26 18:59:02 -08:00