Josh Rosen
e9fb25426e
Remove hack workaround for SPARK-668.
...
Renaming the type paramters solves this problem (see SPARK-694).
I tried this fix earlier, but it didn't work because I didn't run
`sbt/sbt clean` first.
2013-02-11 11:19:20 -08:00
Matei Zaharia
da8afbc77e
Some bug and formatting fixes to FT
...
Conflicts:
core/src/main/scala/spark/scheduler/cluster/SparkDeploySchedulerBackend.scala
core/src/main/scala/spark/scheduler/cluster/StandaloneSchedulerBackend.scala
2013-02-10 22:43:38 -08:00
root
1b47fa2752
Detect hard crashes of workers using a heartbeat mechanism.
...
Also fixes some issues in the rest of the code with detecting workers this way.
Conflicts:
core/src/main/scala/spark/deploy/master/Master.scala
core/src/main/scala/spark/deploy/worker/Worker.scala
core/src/main/scala/spark/scheduler/cluster/SparkDeploySchedulerBackend.scala
core/src/main/scala/spark/scheduler/cluster/StandaloneClusterMessage.scala
core/src/main/scala/spark/scheduler/cluster/StandaloneSchedulerBackend.scala
2013-02-10 22:28:28 -08:00
Matei Zaharia
8c66c49962
Tweak web UI so that people don't get confused about master URL format
...
Conflicts:
core/src/main/twirl/spark/deploy/master/index.scala.html
core/src/main/twirl/spark/deploy/worker/index.scala.html
2013-02-10 21:58:34 -08:00
Tathagata Das
16baea62bc
Fixed bug in CheckpointRDD to prevent exception when the original RDD had zero splits.
2013-02-10 19:14:49 -08:00
Stephen Haberman
680f42e6cd
Change defaultPartitioner to use upstream split size.
...
Previously it used the SparkContext.defaultParallelism, which occassionally
ended up being a very bad guess. Looking at upstream RDDs seems to make
better use of the context.
Also sorted the upstream RDDs by partition size first, as if we have
a hugely-partitioned RDD and tiny-partitioned RDD, it is unlikely
we want the resulting RDD to be tiny-partitioned.
2013-02-10 02:27:03 -06:00
Patrick Wendell
2ed791fd7f
Minor fixes
2013-02-09 22:00:38 -08:00
Patrick Wendell
1859c9f93c
Changing to use Timer based on code review
2013-02-09 21:55:17 -08:00
Matei Zaharia
ccb1ca4a23
Merge pull request #448 from squito/fetch_maxBytesInFlight
...
add as many fetch requests as we can, subject to maxBytesInFlight
2013-02-09 18:15:18 -08:00
Matei Zaharia
f750daa510
Merge pull request #452 from stephenh/misc
...
Add RDD.coalesce, clean up some RDDs, other misc.
2013-02-09 18:12:56 -08:00
Stephen Haberman
4619ee0787
Move JavaRDDLike.coalesce into the right places.
2013-02-09 20:05:42 -06:00
Stephen Haberman
fb7599870f
Fix JavaRDDLike.coalesce return type.
2013-02-09 16:10:52 -06:00
Stephen Haberman
2a18cd826c
Add back return types.
2013-02-09 10:12:04 -06:00
Stephen Haberman
da52b16b38
Remove RDD.coalesce default arguments.
2013-02-09 10:11:54 -06:00
Mark Hamstra
b8863a79d3
Merge branch 'master' of https://github.com/mesos/spark into commutative
...
Conflicts:
core/src/main/scala/spark/RDD.scala
2013-02-08 18:26:00 -08:00
Mark Hamstra
934a53c8b6
Change docs on 'reduce' since the merging of local reduces no longer preserves
...
ordering, so the reduce function must also be commutative.
2013-02-05 22:19:58 -08:00
Stephen Haberman
a9c8d53cfa
Clean up RDDs, mainly to use getSplits.
...
Also made sure clearDependencies() was calling super, to ensure
the getSplits/getDependencies vars in the RDD base class get
cleaned up.
2013-02-05 22:16:59 -06:00
Stephen Haberman
f4d43cb43e
Remove unneeded zipWithIndex.
...
Also rename r->rdd and remove unneeded extra type info.
2013-02-05 21:26:45 -06:00
Stephen Haberman
f2bc748013
Add RDD.coalesce.
2013-02-05 21:23:36 -06:00
Stephen Haberman
67df7f2fa2
Add private, minor formatting.
2013-02-05 21:08:21 -06:00
Matei Zaharia
9cfa068379
Merge pull request #450 from stephenh/inlinemergepair
...
Inline mergePair to look more like the narrow dep branch.
2013-02-05 18:28:44 -08:00
Stephen Haberman
870b2aaf5d
Merge branch 'master' into fixdeathpactexception
...
Conflicts:
core/src/main/scala/spark/deploy/worker/Worker.scala
2013-02-05 20:27:09 -06:00
Stephen Haberman
0e19093fd8
Handle Terminated to avoid endless DeathPactExceptions.
...
Credit to Roland Kuhn, Akka's tech lead, for pointing out this
various obvious fix, but StandaloneExecutorBackend.preStart's
catch block would never (ever) get hit, because all of the
operation's in preStart are async.
So, the System.exit in the catch block was skipped, and instead
Akka was sending Terminated messages which, since we didn't
handle, it turned into DeathPactException, which started
a postRestart/preStart infinite loop.
2013-02-05 18:58:00 -06:00
Stephen Haberman
8bd0e888f3
Inline mergePair to look more like the narrow dep branch.
...
No functionality changes, I think this is just more consistent
given mergePair isn't called multiple times/recursive.
Also added a comment to explain the usual case of having two parent RDDs.
2013-02-05 17:50:25 -06:00
Imran Rashid
cfab1a3528
add as many fetch requests as we can, subject to maxBytesInFlight
2013-02-05 14:31:46 -08:00
haitao.yao
f609182e5b
Merge branch 'mesos'
2013-02-05 14:09:45 +08:00
Matei Zaharia
f7b4e428be
Merge pull request #445 from JoshRosen/pyspark_fixes
...
Fix exit status in PySpark unit tests; fix/optimize PySpark's RDD.take()
2013-02-03 21:36:36 -08:00
haitao.yao
faa4d9e31f
Merge branch 'mesos'
2013-02-04 11:40:15 +08:00
Patrick Wendell
b14322956c
Starvation check in Standlone scheduler
2013-02-03 12:45:10 -08:00
Patrick Wendell
667860448a
Starvation check in ClusterScheduler
2013-02-03 12:45:04 -08:00
Matei Zaharia
3bfaf3ab1d
Merge pull request #379 from stephenh/sparkmem
...
Add spark.executor.memory to differentiate executor memory from spark-shell
2013-02-02 23:58:23 -08:00
Matei Zaharia
88ee6163a1
Merge pull request #422 from squito/blockmanager_info
...
RDDInfo available from SparkContext
2013-02-02 23:44:13 -08:00
Matei Zaharia
cd4ca93679
Merge pull request #436 from stephenh/removeextraloop
...
Once we find a split with no block, we don't have to look for more.
2013-02-02 23:39:28 -08:00
Matei Zaharia
d5daaab381
Merge pull request #442 from stephenh/fixsystemnames
...
Fix createActorSystem not actually using the systemName parameter.
2013-02-02 23:38:46 -08:00
Matei Zaharia
9163c3705d
Formatting
2013-02-02 23:34:47 -08:00
Josh Rosen
8fbd5380b7
Fetch fewer objects in PySpark's take() method.
2013-02-03 06:44:49 +00:00
Matei Zaharia
34a7bcdb3a
Formatting
2013-02-02 19:40:30 -08:00
Stephen Haberman
7aba123f0c
Further simplify checking for Nil.
2013-02-02 13:53:28 -06:00
Charles Reiss
6107957962
Merge remote-tracking branch 'base/master' into dag-sched-tests
...
Conflicts:
core/src/main/scala/spark/scheduler/DAGScheduler.scala
2013-02-02 00:33:30 -08:00
Stephen Haberman
cae8a6795c
Fix dangling old variable names.
2013-02-02 02:15:39 -06:00
Stephen Haberman
696eec32c9
Move executorMemory up into SchedulerBackend.
2013-02-02 02:03:26 -06:00
Stephen Haberman
103c375ba0
Merge branch 'master' into sparkmem
2013-02-02 01:57:18 -06:00
Stephen Haberman
28e0cb9f31
Fix createActorSystem not actually using the systemName parameter.
...
This meant all system names were "spark", which worked, but didn't
lead to the most intuitive log output.
This fixes createActorSystem to use the passed system name, and
refactors Master/Worker to encapsulate their system/actor names
instead of having the clients guess at them.
Note that the driver system name, "spark", is left as is, and is
still repeated a few times, but that seems like a separate issue.
2013-02-02 01:11:37 -06:00
Stephen Haberman
12c1eb4756
Reduce the amount of duplicate logging Akka does to stdout.
...
Given we have Akka logging go through SLF4j to log4j, we don't need
all the extra noise of Akka's stdout logger that is supposedly only
used during Akka init time but seems to continue logging lots of
noisy network events that we either don't care about or are in the
log4j logs anyway.
See:
http://doc.akka.io/docs/akka/2.0/general/configuration.html
# Log level for the very basic logger activated during AkkaApplication startup
# Options: ERROR, WARNING, INFO, DEBUG
# stdout-loglevel = "WARNING"
2013-02-01 21:21:44 -06:00
Matei Zaharia
8b3041c723
Reduced the memory usage of reduce and similar operations
...
These operations used to wait for all the results to be available in an
array on the driver program before merging them. They now merge values
incrementally as they arrive.
2013-02-01 15:38:42 -08:00
Matei Zaharia
4529876db0
Merge branch 'master' of github.com:mesos/spark
2013-02-01 14:07:38 -08:00
Matei Zaharia
9970926ede
formatting
2013-02-01 14:07:34 -08:00
Matei Zaharia
79c24abe4c
Merge pull request #432 from stephenh/moreprivacy
...
Add more private declarations.
2013-02-01 14:06:55 -08:00
Matei Zaharia
de340ddf0b
Merge pull request #437 from stephenh/cancelmetacleaner
...
Stop BlockManagers metadataCleaner.
2013-02-01 12:59:25 -08:00
Imran Rashid
c6190067ae
remove unneeded (and unused) filter on block info
2013-02-01 09:55:25 -08:00
Stephen Haberman
59c57e48df
Stop BlockManagers metadataCleaner.
2013-02-01 10:34:02 -06:00
Matei Zaharia
571af31304
Merge pull request #433 from rxin/master
...
Changed PartitionPruningRDD's split to make sure it returns the correct split index.
2013-02-01 00:32:41 -08:00
Imran Rashid
8a0a5ed533
track total partitions, in addition to cached partitions; use scala string formatting
2013-02-01 00:23:38 -08:00
Imran Rashid
f127f2ae76
fixup merge (master -> driver renaming)
2013-02-01 00:20:49 -08:00
Reynold Xin
f9af9cee6f
Moved PruneDependency into PartitionPruningRDD.scala.
2013-02-01 00:02:46 -08:00
haitao.yao
b57570fd12
Merge branch 'mesos'
2013-02-01 14:06:45 +08:00
Patrick Wendell
39ab83e957
Small fix from last commit
2013-01-31 21:52:52 -08:00
Patrick Wendell
c33f0ef41a
Some style cleanup
2013-01-31 21:50:02 -08:00
Patrick Wendell
3446d5c8d6
SPARK-673: Capture and re-throw Python exceptions
...
This patch alters the Python <-> executor protocol to pass on
exception data when they occur in user Python code.
2013-01-31 18:06:11 -08:00
Reynold Xin
6289d9654e
Removed the TODO comment from PartitionPruningRDD.
2013-01-31 17:49:36 -08:00
Reynold Xin
5b0fc265c2
Changed PartitionPruningRDD's split to make sure it returns the correct
...
split index.
2013-01-31 17:48:39 -08:00
Stephen Haberman
782187c210
Once we find a split with no block, we don't have to look for more.
2013-01-31 18:27:25 -06:00
Stephen Haberman
418e36caa8
Add more private declarations.
2013-01-31 17:18:33 -06:00
haitao.yao
3190483b98
bug fix for javadoc
2013-01-31 14:23:51 +08:00
Imran Rashid
02a6761589
Merge branch 'master' into blockmanager_info
...
Conflicts:
core/src/main/scala/spark/storage/BlockManagerMaster.scala
2013-01-30 18:52:35 -08:00
Imran Rashid
c1df24d085
rename Slaves --> Executor
2013-01-30 18:51:14 -08:00
Matei Zaharia
d12330bd2c
Merge pull request #426 from woggling/conn-manager-ips
...
Remember ConnectionManagerId used to initiate SendingConnections
2013-01-30 15:02:53 -08:00
Matei Zaharia
612a9fee71
Merge pull request #428 from woggling/mesos-exec-id
...
Make ExecutorIDs include SlaveIDs when running Mesos
2013-01-30 15:01:46 -08:00
Stephen Haberman
871476d506
Include message and exitStatus if availalbe.
2013-01-30 16:56:46 -06:00
Charles Reiss
252845d304
Remove remants of attempt to use slaveId-executorId in MesosExecutorBackend
2013-01-30 10:38:06 -08:00
Charles Reiss
f7de6978c1
Use Mesos ExecutorIDs to hold SlaveIDs. Then we can safely use
...
the Mesos ExecutorID as a Spark ExecutorID.
2013-01-30 09:38:57 -08:00
Charles Reiss
178b89204c
Refactor DAGScheduler more to allow testing without a separate thread.
2013-01-30 09:19:55 -08:00
Charles Reiss
a3d14c0404
Refactoring to DAGScheduler to aid testing
2013-01-29 18:55:42 -08:00
Charles Reiss
16a0789e10
Remember ConnectionManagerId used to initiate SendingConnections.
...
This prevents ConnectionManager from getting confused if a machine
has multiple host names and the one getHostName() finds happens
not to be the one that was passed from, e.g., the BlockManagerMaster.
2013-01-29 18:13:59 -08:00
Matei Zaharia
d54b10b6ad
Merge remote-tracking branch 'stephenh/removefailedjob'
...
Conflicts:
core/src/main/scala/spark/deploy/master/Master.scala
2013-01-29 18:12:29 -08:00
Matei Zaharia
ccb67ff2ca
Merge pull request #425 from stephenh/toDebugString
...
Add RDD.toDebugString.
2013-01-29 10:44:18 -08:00
Matei Zaharia
9ae11603b4
Merge pull request #415 from stephenh/driver
...
Replace old 'master' term with 'driver'.
2013-01-29 10:41:42 -08:00
Imran Rashid
b92259ba57
Merge branch 'master' into blockmanager_info
2013-01-29 09:45:10 -08:00
Matei Zaharia
64ba6a8c2c
Simplify checkpointing code and RDD class a little:
...
- RDD's getDependencies and getSplits methods are now guaranteed to be
called only once, so subclasses can safely do computation in there
without worrying about caching the results.
- The management of a "splits_" variable that is cleared out when we
checkpoint an RDD is now done in the RDD class.
- A few of the RDD subclasses are simpler.
- CheckpointRDD's compute() method no longer assumes that it is given a
CheckpointRDDSplit -- it can work just as well on a split from the
original RDD, because it only looks at its index. This is important
because things like UnionRDD and ZippedRDD remember the parent's
splits as part of their own and wouldn't work on checkpointed parents.
- RDD.iterator can now reuse cached data if an RDD is computed before it
is checkpointed. It seems like it wouldn't do this before (it always
called iterator() on the CheckpointRDD, which read from HDFS).
2013-01-28 22:30:12 -08:00
Stephen Haberman
cbf72bffa5
Include name, if set, in RDD.toString().
2013-01-29 00:20:36 -06:00
Stephen Haberman
3cda14af3f
Add number of splits.
2013-01-29 00:12:31 -06:00
Matei Zaharia
a1ecec8d79
Merge branch 'master' of github.com:mesos/spark
2013-01-28 22:08:44 -08:00
Stephen Haberman
951cfd9ba2
Add JavaRDDLike.toDebugString().
2013-01-29 00:02:17 -06:00
Matei Zaharia
f6eb1f0825
Merge pull request #413 from pwendell/stage-logging
...
SPARK-658: Adding logging of stage duration
2013-01-28 22:01:52 -08:00
Stephen Haberman
b45857c965
Add RDD.toDebugString.
...
Original idea by Nathan Kronenfeld.
2013-01-28 23:56:56 -06:00
Patrick Wendell
7ee824e42e
Units from ms -> s
2013-01-28 21:48:32 -08:00
Stephen Haberman
13368818af
Merge branch 'master' into driver
...
Conflicts:
core/src/main/scala/spark/SparkContext.scala
core/src/main/scala/spark/SparkEnv.scala
core/src/main/scala/spark/deploy/LocalSparkCluster.scala
core/src/main/scala/spark/executor/StandaloneExecutorBackend.scala
core/src/main/scala/spark/scheduler/cluster/SparkDeploySchedulerBackend.scala
core/src/main/scala/spark/scheduler/cluster/StandaloneClusterMessage.scala
core/src/main/scala/spark/scheduler/cluster/StandaloneSchedulerBackend.scala
core/src/main/scala/spark/storage/BlockManagerMaster.scala
core/src/main/scala/spark/storage/ThreadingTest.scala
core/src/test/scala/spark/MapOutputTrackerSuite.scala
2013-01-28 23:30:24 -06:00
Matei Zaharia
dda2ce017c
Merge pull request #424 from pwendell/logging-cleanup
...
Some DEBUG-level log cleanup.
2013-01-28 21:18:54 -08:00
Patrick Wendell
1f9b486a8b
Some DEBUG-level log cleanup.
...
A few changes to make the DEBUG-level logs less
noisy and more readable.
- Moved a few very frequent messages to Trace
- Changed some BlockManger log messages to make them
more understandable
SPARK-666 #resolve
2013-01-28 20:29:35 -08:00
Imran Rashid
efff7bfb33
add long and float accumulatorparams
2013-01-28 20:23:11 -08:00
Imran Rashid
0f22c4207f
better formatting for RDDInfo
2013-01-28 20:07:53 -08:00
Imran Rashid
a423ee546c
expose RDD & storage info directly via SparkContext
2013-01-28 20:07:53 -08:00
Patrick Wendell
501433f1d5
Making submission time a field
2013-01-28 10:45:57 -08:00
Patrick Wendell
c423be7d8e
Renaming stage finished function
2013-01-28 10:45:57 -08:00
Patrick Wendell
07f568e1bf
SPARK-658: Adding logging of stage duration
2013-01-28 10:45:57 -08:00
Matei Zaharia
286f8f876f
Change time unit in MetadataCleaner to seconds
2013-01-28 01:29:27 -08:00
Matei Zaharia
f03d9760fd
Clean up BlockManagerUI a little (make it not be an object, merge with
...
Directives, and bind to a random port)
2013-01-27 23:56:14 -08:00
Matei Zaharia
909850729e
Rename more things from slave to executor
2013-01-27 23:17:20 -08:00
Matei Zaharia
44b4a0f88f
Track workers by executor ID instead of hostname to allow multiple
...
executors per machine and remove the need for multiple IP addresses in
unit tests.
2013-01-27 19:23:49 -08:00
Matei Zaharia
6ad8540b40
Merge pull request #401 from squito/blockmanager_ui
...
Blockmanager ui
2013-01-27 15:51:08 -08:00