Commit graph

971 commits

Author SHA1 Message Date
Imran Rashid d9461b15d3 cleanup a bunch of imports 2013-02-10 21:41:40 -08:00
Imran Rashid 383af599bb SparkContext.addSparkListener; "std" listener in StatsReportListener 2013-02-10 14:19:37 -08:00
Imran Rashid b7d9e24394 use TaskMetrics to gather all stats; lots of plumbing to get it all the way back to driver 2013-02-10 14:18:52 -08:00
Imran Rashid 04e828f7c1 general fixes to Distribution, plus some tests 2013-02-08 19:07:36 -08:00
Imran Rashid 379564c7e0 setup plumbing to get task metrics; lots of unfinished parts, but basic flow in place 2013-02-05 18:30:21 -08:00
Imran Rashid 1704b124d8 add as many fetch requests as we can, subject to maxBytesInFlight 2013-02-05 14:33:52 -08:00
Imran Rashid 696e4b2167 track remoteFetchTime 2013-02-05 14:29:16 -08:00
Imran Rashid b29f9cc978 BlockManager.getMultiple returns a custom iterator, to enable tracking of shuffle performance 2013-02-05 14:00:44 -08:00
Imran Rashid e319ac74c1 cogrouped RDD stores the amount of time taken to read shuffle data in each task 2013-02-05 10:18:16 -08:00
Imran Rashid 295b534398 task context keeps a handle on Task -- giant hack, temporary for tracking shuffle times & amount 2013-02-05 10:18:16 -08:00
Imran Rashid 9df7e2ae55 Shuffle Fetchers use a timed iterator 2013-02-05 10:18:16 -08:00
Imran Rashid 1ad77c4766 add TimedIterator 2013-02-05 10:18:15 -08:00
Imran Rashid 843084d69d track total bytes written by ShuffleMapTasks 2013-02-05 10:18:15 -08:00
Imran Rashid b430d2359d Merge branch 'master' into stageInfo
Conflicts:
	core/src/main/scala/spark/scheduler/DAGScheduler.scala
	core/src/main/scala/spark/scheduler/local/LocalScheduler.scala
2013-02-04 21:40:44 -08:00
Matei Zaharia f7b4e428be Merge pull request #445 from JoshRosen/pyspark_fixes
Fix exit status in PySpark unit tests; fix/optimize PySpark's RDD.take()
2013-02-03 21:36:36 -08:00
Matei Zaharia 3bfaf3ab1d Merge pull request #379 from stephenh/sparkmem
Add spark.executor.memory to differentiate executor memory from spark-shell
2013-02-02 23:58:23 -08:00
Matei Zaharia 88ee6163a1 Merge pull request #422 from squito/blockmanager_info
RDDInfo available from SparkContext
2013-02-02 23:44:13 -08:00
Matei Zaharia cd4ca93679 Merge pull request #436 from stephenh/removeextraloop
Once we find a split with no block, we don't have to look for more.
2013-02-02 23:39:28 -08:00
Matei Zaharia d5daaab381 Merge pull request #442 from stephenh/fixsystemnames
Fix createActorSystem not actually using the systemName parameter.
2013-02-02 23:38:46 -08:00
Matei Zaharia 9163c3705d Formatting 2013-02-02 23:34:47 -08:00
Josh Rosen 8fbd5380b7 Fetch fewer objects in PySpark's take() method. 2013-02-03 06:44:49 +00:00
Matei Zaharia 34a7bcdb3a Formatting 2013-02-02 19:40:30 -08:00
Stephen Haberman 7aba123f0c Further simplify checking for Nil. 2013-02-02 13:53:28 -06:00
Charles Reiss 6107957962 Merge remote-tracking branch 'base/master' into dag-sched-tests
Conflicts:
	core/src/main/scala/spark/scheduler/DAGScheduler.scala
2013-02-02 00:33:30 -08:00
Stephen Haberman cae8a6795c Fix dangling old variable names. 2013-02-02 02:15:39 -06:00
Stephen Haberman 696eec32c9 Move executorMemory up into SchedulerBackend. 2013-02-02 02:03:26 -06:00
Stephen Haberman 103c375ba0 Merge branch 'master' into sparkmem 2013-02-02 01:57:18 -06:00
Stephen Haberman 28e0cb9f31 Fix createActorSystem not actually using the systemName parameter.
This meant all system names were "spark", which worked, but didn't
lead to the most intuitive log output.

This fixes createActorSystem to use the passed system name, and
refactors Master/Worker to encapsulate their system/actor names
instead of having the clients guess at them.

Note that the driver system name, "spark", is left as is, and is
still repeated a few times, but that seems like a separate issue.
2013-02-02 01:11:37 -06:00
Stephen Haberman 12c1eb4756 Reduce the amount of duplicate logging Akka does to stdout.
Given we have Akka logging go through SLF4j to log4j, we don't need
all the extra noise of Akka's stdout logger that is supposedly only
used during Akka init time but seems to continue logging lots of
noisy network events that we either don't care about or are in the
log4j logs anyway.

See:

http://doc.akka.io/docs/akka/2.0/general/configuration.html

    # Log level for the very basic logger activated during AkkaApplication startup
    # Options: ERROR, WARNING, INFO, DEBUG
    # stdout-loglevel = "WARNING"
2013-02-01 21:21:44 -06:00
Matei Zaharia 8b3041c723 Reduced the memory usage of reduce and similar operations
These operations used to wait for all the results to be available in an
array on the driver program before merging them. They now merge values
incrementally as they arrive.
2013-02-01 15:38:42 -08:00
Matei Zaharia 4529876db0 Merge branch 'master' of github.com:mesos/spark 2013-02-01 14:07:38 -08:00
Matei Zaharia 9970926ede formatting 2013-02-01 14:07:34 -08:00
Matei Zaharia 79c24abe4c Merge pull request #432 from stephenh/moreprivacy
Add more private declarations.
2013-02-01 14:06:55 -08:00
Matei Zaharia de340ddf0b Merge pull request #437 from stephenh/cancelmetacleaner
Stop BlockManagers metadataCleaner.
2013-02-01 12:59:25 -08:00
Imran Rashid c6190067ae remove unneeded (and unused) filter on block info 2013-02-01 09:55:25 -08:00
Stephen Haberman 59c57e48df Stop BlockManagers metadataCleaner. 2013-02-01 10:34:02 -06:00
Matei Zaharia 571af31304 Merge pull request #433 from rxin/master
Changed PartitionPruningRDD's split to make sure it returns the correct split index.
2013-02-01 00:32:41 -08:00
Imran Rashid 8a0a5ed533 track total partitions, in addition to cached partitions; use scala string formatting 2013-02-01 00:23:38 -08:00
Imran Rashid f127f2ae76 fixup merge (master -> driver renaming) 2013-02-01 00:20:49 -08:00
Reynold Xin f9af9cee6f Moved PruneDependency into PartitionPruningRDD.scala. 2013-02-01 00:02:46 -08:00
Patrick Wendell 39ab83e957 Small fix from last commit 2013-01-31 21:52:52 -08:00
Patrick Wendell c33f0ef41a Some style cleanup 2013-01-31 21:50:02 -08:00
Patrick Wendell 3446d5c8d6 SPARK-673: Capture and re-throw Python exceptions
This patch alters the Python <-> executor protocol to pass on
exception data when they occur in user Python code.
2013-01-31 18:06:11 -08:00
Reynold Xin 6289d9654e Removed the TODO comment from PartitionPruningRDD. 2013-01-31 17:49:36 -08:00
Reynold Xin 5b0fc265c2 Changed PartitionPruningRDD's split to make sure it returns the correct
split index.
2013-01-31 17:48:39 -08:00
Stephen Haberman 782187c210 Once we find a split with no block, we don't have to look for more. 2013-01-31 18:27:25 -06:00
Stephen Haberman 418e36caa8 Add more private declarations. 2013-01-31 17:18:33 -06:00
Imran Rashid 02a6761589 Merge branch 'master' into blockmanager_info
Conflicts:
	core/src/main/scala/spark/storage/BlockManagerMaster.scala
2013-01-30 18:52:35 -08:00
Imran Rashid c1df24d085 rename Slaves --> Executor 2013-01-30 18:51:14 -08:00
Matei Zaharia d12330bd2c Merge pull request #426 from woggling/conn-manager-ips
Remember ConnectionManagerId used to initiate SendingConnections
2013-01-30 15:02:53 -08:00