Commit graph

2461 commits

Author SHA1 Message Date
Prashant Sharma 489862a657 Remote death watch has a funny bug.
https://gist.github.com/ScrapCodes/4805fd84906e40b7b03d
2013-11-25 18:00:02 +05:30
Prashant Sharma 77929cfeed Fine tuning defaults for akka and restored tracking of dissassociated events, for they are delivered when a remote TCP socket is closed. Also made transport failure heartbeats larger interval for it is mostly not needed. As we are using remote death watch instead. 2013-11-25 14:13:21 +05:30
Prashant Sharma 95d8dbce91 Merge branch 'master' of github.com:apache/incubator-spark into scala-2.10-temp
Conflicts:
	core/src/main/scala/org/apache/spark/util/collection/PrimitiveVector.scala
	streaming/src/main/scala/org/apache/spark/streaming/api/java/JavaStreamingContext.scala
2013-11-21 12:34:46 +05:30
Prashant Sharma 199e9cf02d Merge branch 'scala210-master' of github.com:colorant/incubator-spark into scala-2.10
Conflicts:
	core/src/main/scala/org/apache/spark/deploy/client/Client.scala
	core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala
	core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala
	core/src/test/scala/org/apache/spark/MapOutputTrackerSuite.scala
2013-11-21 11:55:48 +05:30
Reynold Xin 2fead510f7 Merge branch 'master' of github.com:tbfenet/incubator-spark
PartitionPruningRDD is using index from parent

I was getting a ArrayIndexOutOfBoundsException exception after doing union on pruned RDD. The index it was using on the partition was the index in the original RDD not the new pruned RDD.
2013-11-21 07:15:55 +08:00
Henry Saputra 43dfac5132 Merge branch 'master' into removesemicolonscala 2013-11-19 16:57:57 -08:00
Henry Saputra 10be58f251 Another set of changes to remove unnecessary semicolon (;) from Scala code.
Passed the sbt/sbt compile and test
2013-11-19 16:56:23 -08:00
Matei Zaharia f568912f85 Merge pull request #181 from BlackNiuza/fix_tasks_number
correct number of tasks in ExecutorsUI

Index `a` is not `execId` here
2013-11-19 16:11:31 -08:00
tgravescs 4093e9393a Impove Spark on Yarn Error handling 2013-11-19 12:44:00 -06:00
Henry Saputra 9c934b640f Remove the semicolons at the end of Scala code to make it more pure Scala code.
Also remove unused imports as I found them along the way.
Remove return statements when returning value in the Scala code.

Passing compile and tests.
2013-11-19 10:19:03 -08:00
Matthew Taylor f639b65eab PartitionPruningRDD is using index from parent(review changes) 2013-11-19 10:48:48 +00:00
Matthew Taylor 13b9bf494b PartitionPruningRDD is using index from parent 2013-11-19 06:27:33 +00:00
shiyun.wxm eda05fa439 use HashSet.empty[Long] instead of Seq[Long] 2013-11-18 13:31:14 +08:00
Aaron Davidson 85763f4942 Add PrimitiveVectorSuite and fix bug in resize() 2013-11-17 18:16:51 -08:00
Reynold Xin 16a2286d6d Return the vector itself for trim and resize method in PrimitiveVector. 2013-11-17 17:52:02 -08:00
BlackNiuza ecfbaf2442 rename "a" to "statusId" 2013-11-18 09:51:40 +08:00
Reynold Xin c30979c7d6 Slightly enhanced PrimitiveVector:
1. Added trim() method
2. Added size method.
3. Renamed getUnderlyingArray to array.
4. Minor documentation update.
2013-11-17 17:09:40 -08:00
BlackNiuza b60839e56a correct number of tasks in ExecutorsUI 2013-11-17 21:38:57 +08:00
Matei Zaharia 1b5b358309 Merge pull request #178 from hsaputra/simplecleanupcode
Simple cleanup on Spark's Scala code

Simple cleanup on Spark's Scala code while testing some modules:
-) Remove some of unused imports as I found them
-) Remove ";" in the imports statements
-) Remove () at the end of method calls like size that does not have size effect.
2013-11-16 11:44:10 -08:00
Henry Saputra c33f802044 Simple cleanup on Spark's Scala code while testing core and yarn modules:
-) Remove some of unused imports as I found them
-) Remove ";" in the imports statements
-) Remove () at the end of method call like size that does not have size effect.
2013-11-15 10:32:20 -08:00
Matei Zaharia 96e0fb4630 Merge pull request #173 from kayousterhout/scheduler_hang
Fix bug where scheduler could hang after task failure.

When a task fails, we need to call reviveOffers() so that the
task can be rescheduled on a different machine. In the current code,
the state in ClusterTaskSetManager indicating which tasks are
pending may be updated after revive offers is called (there's a
race condition here), so when revive offers is called, the task set
manager does not yet realize that there are failed tasks that need
to be relaunched.

This isn't currently unit tested but will be once my pull request for
merging the cluster and local schedulers goes in -- at which point
many more of the unit tests will exercise the code paths through
the cluster scheduler (currently the failure test suite uses the local
scheduler, which is why we didn't see this bug before).
2013-11-14 22:29:28 -08:00
Aaron Davidson f629ba95b6 Various merge corrections
I've diff'd this patch against my own -- since they were both created
independently, this means that two sets of eyes have gone over all the
merge conflicts that were created, so I'm feeling significantly more
confident in the resulting PR.

@rxin has looked at the changes to the repl and is resoundingly
confident that they are correct.
2013-11-14 22:13:09 -08:00
Matei Zaharia dfd40e9f6f Merge pull request #175 from kayousterhout/no_retry_not_serializable
Don't retry tasks when they fail due to a NotSerializableException

As with my previous pull request, this will be unit tested once the Cluster and Local schedulers get merged.
2013-11-14 19:44:50 -08:00
Matei Zaharia ed25105fd9 Merge pull request #174 from ahirreddy/master
Write Spark UI url to driver file on HDFS

This makes the SIMR code path simpler
2013-11-14 19:43:55 -08:00
Raymond Liu d4cd32330e Some fixes for previous master merge commits 2013-11-15 10:22:31 +08:00
Kay Ousterhout 29c88e408e Don't retry tasks when they fail due to a NotSerializableException 2013-11-14 15:15:19 -08:00
Kay Ousterhout b4546ba9e6 Fix bug where scheduler could hang after task failure.
When a task fails, we need to call reviveOffers() so that the
task can be rescheduled on a different machine. In the current code,
the state in ClusterTaskSetManager indicating which tasks are
pending may be updated after revive offers is called (there's a
race condition here), so when revive offers is called, the task set
manager does not yet realize that there are failed tasks that need
to be relaunched.
2013-11-14 13:55:03 -08:00
Reynold Xin 1a4cfbea33 Merge pull request #169 from kayousterhout/mesos_fix
Don't ignore spark.cores.max when using Mesos Coarse mode

totalCoresAcquired is decremented but never incremented, causing Spark to effectively ignore spark.cores.max in coarse grained Mesos mode.
2013-11-14 10:32:11 -08:00
Lian, Cheng cc8995c8f4 Fixed a scaladoc typo in HadoopRDD.scala 2013-11-14 18:17:05 +08:00
Kay Ousterhout 5125cd3466 Don't ignore spark.cores.max when using Mesos Coarse mode 2013-11-13 23:06:17 -08:00
Raymond Liu a60620b76a Merge branch 'master' into scala-2.10 2013-11-14 12:44:19 +08:00
Matei Zaharia 2054c61a18 Merge pull request #159 from liancheng/dagscheduler-actor-refine
Migrate the daemon thread started by DAGScheduler to Akka actor

`DAGScheduler` adopts an event queue and a daemon thread polling the it to process events sent to a `DAGScheduler`.  This is a classical actor use case.  By migrating this thread to Akka actor, we may benefit from both cleaner code and better performance (context switching cost of Akka actor is much less than that of a native thread).

But things become a little complicated when taking existing test code into consideration.

Code in `DAGSchedulerSuite` is somewhat tightly coupled with `DAGScheduler`, and directly calls `DAGScheduler.processEvent` instead of posting event messages to `DAGScheduler`.  To minimize code change, I chose to let the actor to delegate messages to `processEvent`.  Maybe this doesn't follow conventional actor usage, but I tried to make it apparently correct.

Another tricky part is that, since `DAGScheduler` depends on the `ActorSystem` provided by its field `env`, `env` cannot be null.  But the `dagScheduler` field created in `DAGSchedulerSuite.before` was given a null `env`.  What's more, `BlockManager.blockIdsToBlockManagers` checks whether `env` is null to determine whether to run the production code or the test code (bad smell here, huh?).  I went through all callers of `BlockManager.blockIdsToBlockManagers`, and made sure that if `env != null` holds, then `blockManagerMaster == null` must also hold.  That's the logic behind `BlockManager.scala` [line 896](https://github.com/liancheng/incubator-spark/compare/dagscheduler-actor-refine?expand=1#diff-2b643ea78c1add0381754b1f47eec132L896).

At last, since `DAGScheduler` instances are always `start()`ed after creation, I removed the `start()` method, and starts the `eventProcessActor` within the constructor.
2013-11-13 16:49:55 -08:00
Ahir Reddy 0ea1f8b225 Write Spark UI url to driver file on HDFS 2013-11-13 15:23:36 -08:00
Matei Zaharia 39af914b27 Merge pull request #166 from ahirreddy/simr-spark-ui
SIMR Backend Scheduler will now write Spark UI URL to HDFS, which is to ...

...be retrieved by SIMR clients
2013-11-13 08:39:05 -08:00
Raymond Liu 0f2e3c6e31 Merge branch 'master' into scala-2.10 2013-11-13 16:55:11 +08:00
Matei Zaharia b8bf04a085 Merge pull request #160 from xiajunluan/JIRA-923
Fix bug JIRA-923

Fix column sort issue in UI for JIRA-923.
https://spark-project.atlassian.net/browse/SPARK-923

Conflicts:
	core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala
	core/src/main/scala/org/apache/spark/ui/jobs/StageTable.scala
2013-11-12 16:19:50 -08:00
Ahir Reddy ccb099e804 SIMR Backend Scheduler will now write Spark UI URL to HDFS, which is to be retrieved by SIMR clients 2013-11-12 15:58:41 -08:00
Prashant Sharma 6860b79f6e Remove deprecated actorFor and use actorSelection everywhere. 2013-11-12 12:43:53 +05:30
Prashant Sharma a8bfdd4377 Enabled remote death watch and a way to configure the timeouts for akka heartbeats. 2013-11-12 12:04:00 +05:30
Andrew xia e13da05424 fix format error 2013-11-11 19:15:45 +08:00
Andrew xia 37d2f3749e cut lines to less than 100 2013-11-11 15:49:32 +08:00
Andrew xia b3208063af Fix bug JIRA-923 2013-11-11 15:39:10 +08:00
Lian, Cheng e2a43b3dcc Made some changes according to suggestions from @aarondav 2013-11-11 12:21:54 +08:00
Lian, Cheng ba55285177 Put the periodical resubmitFailedStages() call into a scheduled task 2013-11-11 01:25:35 +08:00
Reynold Xin c845611fc3 Moved the Spark internal class registration for Kryo into an object, and added more classes (e.g. MapStatus, BlockManagerId) to the registration. 2013-11-09 23:00:08 -08:00
Reynold Xin 7c5f70d873 Call Kryo setReferences before calling user specified Kryo registrator. 2013-11-09 22:43:36 -08:00
Matei Zaharia 87954d4c85 Merge pull request #154 from soulmachine/ClusterScheduler
Replace the thread inside ClusterScheduler.start() with an Akka scheduler

Threads are precious resources so that we shouldn't abuse them
2013-11-09 17:53:25 -08:00
Reynold Xin 83bf1920c8 Merge pull request #155 from rxin/jobgroup
Don't reset job group when a new job description is set.
2013-11-09 15:40:29 -08:00
Reynold Xin 28f27097cf Don't reset job group when a new job description is set. 2013-11-09 13:59:31 -08:00
Matei Zaharia 8af99f2356 Merge pull request #149 from tgravescs/fixSecureHdfsAccess
Fix secure hdfs access for spark on yarn

https://github.com/apache/incubator-spark/pull/23 broke secure hdfs access. Not sure if it works with secure hdfs on standalone. Fixing it at least for spark on yarn.

The broadcasting of jobconf change also broke secure hdfs access as it didn't take into account things calling the getPartitions before sparkContext is initialized. The DAGScheduler does this as it tries to getShuffleMapStage.
2013-11-09 13:48:00 -08:00