Stephen Haberman
8bd0e888f3
Inline mergePair to look more like the narrow dep branch.
...
No functionality changes, I think this is just more consistent
given mergePair isn't called multiple times/recursive.
Also added a comment to explain the usual case of having two parent RDDs.
2013-02-05 17:50:25 -06:00
Imran Rashid
1704b124d8
add as many fetch requests as we can, subject to maxBytesInFlight
2013-02-05 14:33:52 -08:00
Imran Rashid
cfab1a3528
add as many fetch requests as we can, subject to maxBytesInFlight
2013-02-05 14:31:46 -08:00
Imran Rashid
696e4b2167
track remoteFetchTime
2013-02-05 14:29:16 -08:00
Imran Rashid
b29f9cc978
BlockManager.getMultiple returns a custom iterator, to enable tracking of shuffle performance
2013-02-05 14:00:44 -08:00
Matei Zaharia
2d9eca9fbb
Merge pull request #447 from pwendell/streaming-constructor
...
Streaming constructor which takes JavaSparkContext
2013-02-05 11:45:44 -08:00
Patrick Wendell
7eea64aa4c
Streaming constructor which takes JavaSparkContext
...
It's sometimes helpful to directly pass a JavaSparkContext,
and take advantage of the various constructors available for that.
2013-02-05 11:43:16 -08:00
Imran Rashid
e319ac74c1
cogrouped RDD stores the amount of time taken to read shuffle data in each task
2013-02-05 10:18:16 -08:00
Imran Rashid
295b534398
task context keeps a handle on Task -- giant hack, temporary for tracking shuffle times & amount
2013-02-05 10:18:16 -08:00
Imran Rashid
9df7e2ae55
Shuffle Fetchers use a timed iterator
2013-02-05 10:18:16 -08:00
Imran Rashid
1ad77c4766
add TimedIterator
2013-02-05 10:18:15 -08:00
Imran Rashid
843084d69d
track total bytes written by ShuffleMapTasks
2013-02-05 10:18:15 -08:00
haitao.yao
f609182e5b
Merge branch 'mesos'
2013-02-05 14:09:45 +08:00
Imran Rashid
b430d2359d
Merge branch 'master' into stageInfo
...
Conflicts:
core/src/main/scala/spark/scheduler/DAGScheduler.scala
core/src/main/scala/spark/scheduler/local/LocalScheduler.scala
2013-02-04 21:40:44 -08:00
Patrick Wendell
cc37601ecb
Adding an example with an OLAP roll-up
2013-02-04 14:18:11 -08:00
Matei Zaharia
f6ec547ea7
Small fix to test for distinct
2013-02-04 13:14:54 -08:00
Matei Zaharia
aa4ee1e9e5
Fix failing test
2013-02-04 11:06:31 -08:00
Matei Zaharia
f7b4e428be
Merge pull request #445 from JoshRosen/pyspark_fixes
...
Fix exit status in PySpark unit tests; fix/optimize PySpark's RDD.take()
2013-02-03 21:36:36 -08:00
Josh Rosen
e61729113d
Remove unnecessary doctest __main__ methods.
2013-02-03 21:29:40 -08:00
haitao.yao
faa4d9e31f
Merge branch 'mesos'
2013-02-04 11:40:15 +08:00
Patrick Wendell
b14322956c
Starvation check in Standlone scheduler
2013-02-03 12:45:10 -08:00
Patrick Wendell
667860448a
Starvation check in ClusterScheduler
2013-02-03 12:45:04 -08:00
Matei Zaharia
3bfaf3ab1d
Merge pull request #379 from stephenh/sparkmem
...
Add spark.executor.memory to differentiate executor memory from spark-shell
2013-02-02 23:58:23 -08:00
Matei Zaharia
88ee6163a1
Merge pull request #422 from squito/blockmanager_info
...
RDDInfo available from SparkContext
2013-02-02 23:44:13 -08:00
Matei Zaharia
cd4ca93679
Merge pull request #436 from stephenh/removeextraloop
...
Once we find a split with no block, we don't have to look for more.
2013-02-02 23:39:28 -08:00
Matei Zaharia
d5daaab381
Merge pull request #442 from stephenh/fixsystemnames
...
Fix createActorSystem not actually using the systemName parameter.
2013-02-02 23:38:46 -08:00
Matei Zaharia
9163c3705d
Formatting
2013-02-02 23:34:47 -08:00
Josh Rosen
8fbd5380b7
Fetch fewer objects in PySpark's take() method.
2013-02-03 06:44:49 +00:00
Josh Rosen
2415c18f48
Fix reporting of PySpark doctest failures.
2013-02-03 06:44:11 +00:00
Matei Zaharia
34a7bcdb3a
Formatting
2013-02-02 19:40:30 -08:00
Matei Zaharia
85019d76a4
Merge pull request #427 from woggling/dag-sched-tests
...
Tests for DAGScheduler
2013-02-02 19:09:59 -08:00
Stephen Haberman
7aba123f0c
Further simplify checking for Nil.
2013-02-02 13:53:28 -06:00
Charles Reiss
6107957962
Merge remote-tracking branch 'base/master' into dag-sched-tests
...
Conflicts:
core/src/main/scala/spark/scheduler/DAGScheduler.scala
2013-02-02 00:33:30 -08:00
Stephen Haberman
cae8a6795c
Fix dangling old variable names.
2013-02-02 02:15:39 -06:00
Stephen Haberman
696eec32c9
Move executorMemory up into SchedulerBackend.
2013-02-02 02:03:26 -06:00
Stephen Haberman
103c375ba0
Merge branch 'master' into sparkmem
2013-02-02 01:57:18 -06:00
Stephen Haberman
28e0cb9f31
Fix createActorSystem not actually using the systemName parameter.
...
This meant all system names were "spark", which worked, but didn't
lead to the most intuitive log output.
This fixes createActorSystem to use the passed system name, and
refactors Master/Worker to encapsulate their system/actor names
instead of having the clients guess at them.
Note that the driver system name, "spark", is left as is, and is
still repeated a few times, but that seems like a separate issue.
2013-02-02 01:11:37 -06:00
Charles Reiss
1fd5ee323d
Code review changes: add sc.stop; style of multiline comments; parens on procedure calls.
2013-02-01 22:33:38 -08:00
Matei Zaharia
ae26911ec0
Add back test for distinct without parens
2013-02-01 21:07:24 -08:00
Matei Zaharia
7ae4b6a23d
Merge pull request #441 from stephenh/lessnoisyakka
...
Reduce the amount of duplicate logging Akka does to stdout.
2013-02-01 21:03:37 -08:00
Stephen Haberman
12c1eb4756
Reduce the amount of duplicate logging Akka does to stdout.
...
Given we have Akka logging go through SLF4j to log4j, we don't need
all the extra noise of Akka's stdout logger that is supposedly only
used during Akka init time but seems to continue logging lots of
noisy network events that we either don't care about or are in the
log4j logs anyway.
See:
http://doc.akka.io/docs/akka/2.0/general/configuration.html
# Log level for the very basic logger activated during AkkaApplication startup
# Options: ERROR, WARNING, INFO, DEBUG
# stdout-loglevel = "WARNING"
2013-02-01 21:21:44 -06:00
Matei Zaharia
8b3041c723
Reduced the memory usage of reduce and similar operations
...
These operations used to wait for all the results to be available in an
array on the driver program before merging them. They now merge values
incrementally as they arrive.
2013-02-01 15:38:42 -08:00
Matei Zaharia
4529876db0
Merge branch 'master' of github.com:mesos/spark
2013-02-01 14:07:38 -08:00
Matei Zaharia
9970926ede
formatting
2013-02-01 14:07:34 -08:00
Matei Zaharia
79c24abe4c
Merge pull request #432 from stephenh/moreprivacy
...
Add more private declarations.
2013-02-01 14:06:55 -08:00
Matei Zaharia
de340ddf0b
Merge pull request #437 from stephenh/cancelmetacleaner
...
Stop BlockManagers metadataCleaner.
2013-02-01 12:59:25 -08:00
Matei Zaharia
0455650713
Merge pull request #439 from JoshRosen/spark-580
...
Use spark.local.dir for PySpark temp files (SPARK-580).
2013-02-01 12:07:42 -08:00
Josh Rosen
e211f405bc
Use spark.local.dir for PySpark temp files (SPARK-580).
2013-02-01 11:50:27 -08:00
Matei Zaharia
b6a6092177
Merge pull request #438 from JoshRosen/spark-674
...
Do not launch JavaGateways on workers (SPARK-674).
2013-02-01 11:29:47 -08:00
Josh Rosen
9cc6ff9c4e
Do not launch JavaGateways on workers (SPARK-674).
...
The problem was that the gateway was being initialized whenever the
pyspark.context module was loaded. The fix uses lazy initialization
that occurs only when SparkContext instances are actually constructed.
I also made the gateway and jvm variables private.
This change results in ~3-4x performance improvement when running the
PySpark unit tests.
2013-02-01 11:13:10 -08:00