Commit graph

4306 commits

Author SHA1 Message Date
Kay Ousterhout 7b5ae23a37 Renamed StandaloneX to CoarseGrainedX.
The previous names were confusing because the components weren't just
used in Standalone mode -- in fact, the scheduler used for Standalone
mode is called SparkDeploySchedulerBackend. So, the previous names
were misleading.
2013-10-04 13:56:43 -07:00
Andre Schumacher c84946fe21 Fixing SPARK-602: PythonPartitioner
Currently PythonPartitioner determines partition ID by hashing a
byte-array representation of PySpark's key. This PR lets
PythonPartitioner use the actual partition ID, which is required e.g.
for sorting via PySpark.
2013-10-04 11:56:47 -07:00
Nick Pentreath 93b96b44d7 Adding implicit feedback ALS to MLlib user guide 2013-10-04 14:39:44 +02:00
Nick Pentreath c6ceaeae50 Style fix using 'if' rather than 'match' on boolean 2013-10-04 13:52:53 +02:00
Nick Pentreath 6a7836cddc Fixing closing brace indentation 2013-10-04 13:33:01 +02:00
Nick Pentreath 0bd9b373d1 Reverting to using comma-delimited split 2013-10-04 13:30:33 +02:00
Nick Pentreath 1cbdcb9cb6 Merge remote-tracking branch 'upstream/master' into implicit-als 2013-10-04 13:25:34 +02:00
Reynold Xin d29e8035a0 Added countAsync and various unit tests for async actions. 2013-10-03 15:13:44 -07:00
Matei Zaharia 232765f7b2 Merge pull request #26 from Du-Li/master
fixed a wildcard bug in make-distribution.sh; ask sbt to check local
maven repo in project/SparkBuild.scala

(1) fixed a wildcard bug in make-distribution.sh:
with the wildcard * in quotes, this cp command failed. it worked after
moving the wildcard out quotes.

(2) ask sbt to check local maven repo in SparkBuild.scala:
To build Spark (0.9.0-SNAPSHOT) with the HEAD of mesos (0.15.0), I must
do "make maven-install" under mesos/build, which publishes the java .jar
file under ~/.m2. However, when building Spark (after pointing mesos to
version 0.15.0), sbt uses ivy which by default only checks ~/.ivy2. This
change is to tell sbt to also check ~/.m2.
2013-10-03 12:00:48 -07:00
Matei Zaharia 405e69bb20 Merge pull request #25 from CruncherBigData/master
Update README: updated the link
2013-10-03 10:52:41 -07:00
Matei Zaharia 49dbfccf6b Merge pull request #28 from tgravescs/sparYarnAppName
Allow users to set the application name for Spark on Yarn
2013-10-03 10:52:06 -07:00
tgravescs 0fff4ee852 Adding in the --addJars option to make SparkContext.addJar work on yarn and cleanup
the classpaths
2013-10-03 11:52:16 -05:00
tgravescs c021b8c202 Add default value to usage statement 2013-10-03 08:07:19 -05:00
Reynold Xin 802bfb870d - Created AsyncRDDActions.
- Make FutureJob a Scala Future instead of Java Future.
2013-10-03 01:22:28 -07:00
Reynold Xin e8e917f209 Merge branch 'master' into kill
Conflicts:
	core/src/main/scala/org/apache/spark/TaskEndReason.scala
	core/src/main/scala/org/apache/spark/executor/Executor.scala
	core/src/main/scala/org/apache/spark/scheduler/cluster/ClusterTaskSetManager.scala
	core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala
2013-10-02 23:01:34 -07:00
Matei Zaharia e597ea34a6 Merge pull request #10 from kayousterhout/results_through-bm
Send Task results through the block manager when larger than Akka frame size (fixes SPARK-669).

This change requires adding an extra failure mode: tasks can complete
successfully, but the result gets lost or flushed from the block manager
before it's been fetched.

This change also moves the deserialization of tasks into a separate thread, so it's no longer part of the DAG scheduler's tight loop. This should improve scheduler throughput, particularly when tasks are sending back large results.

Thanks Josh for writing the original version of this patch!

This is duplicated from the mesos/spark repo: https://github.com/mesos/spark/pull/835
2013-10-02 21:14:24 -07:00
Reynold Xin 1c48ba0d9f Merge remote-tracking branch 'origin' into kill
Conflicts:
	core/src/main/scala/org/apache/spark/scheduler/TaskScheduler.scala
	core/src/main/scala/org/apache/spark/scheduler/cluster/ClusterTaskSetManager.scala
	core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala
2013-10-02 16:40:44 -07:00
tgravescs bc3b20abdc Allow users to set the application name for Spark on Yarn 2013-10-02 12:54:17 -05:00
David McCauley 1577b373a9 SPARK-921 - Add Application UI URL to ApplicationInfo Json output 2013-10-02 15:03:41 +01:00
David McCauley 351da54676 SPARK-920 - JSON endpoint URI scheme part (spark://) duplicated 2013-10-02 13:23:38 +01:00
Du Li 9fd6bba60d ask ivy/sbt to check local maven repo under ~/.m2 2013-10-01 15:46:51 -07:00
Du Li 0d19f00e9e fixed a bug of using wildcard in quotes 2013-10-01 15:42:06 -07:00
CruncherBigData c85f720588 Update README 2013-10-01 09:05:03 -07:00
Kay Ousterhout 0dcad2edcb Added additional unit test for repeated task failures 2013-09-30 23:26:15 -07:00
Kay Ousterhout dea4677c88 Fixed compilation errors and broken test. 2013-09-30 22:07:01 -07:00
Kay Ousterhout 8deda427bc Merge remote-tracking branch 'upstream/master' into results_through-bm
Conflicts:
	core/src/main/scala/org/apache/spark/scheduler/cluster/ClusterScheduler.scala
	core/src/main/scala/org/apache/spark/scheduler/cluster/ClusterTaskSetManager.scala
	core/src/main/scala/org/apache/spark/scheduler/local/LocalTaskSetManager.scala
2013-09-30 10:16:58 -07:00
Kay Ousterhout 58b764b7c6 Addressed Matei's code review comments 2013-09-30 10:11:59 -07:00
Grace Huang 892fb8ffa8 remedy the line-wrap while exceeding 100 chars 2013-09-30 20:12:55 +08:00
Harvey Feng 7d06bdde1d Merge HadoopDatasetRDD into HadoopRDD. 2013-09-29 20:08:03 -07:00
Grace Huang 4b68be5f3c SPARK-900 Use coarser grained naming for metrics 2013-09-27 14:47:38 +08:00
Harvey Feng 417085716a Merge remote-tracking branch 'oldsparkme/hadoopRDD-broadcast-change' into hadoop-config-cache 2013-09-26 15:49:42 -07:00
Aaron Davidson 42d72308fb Add license notices 2013-09-26 15:45:20 -07:00
Aaron Davidson f549ea33d3 Standalone Scheduler fault tolerance using ZooKeeper
This patch implements full distributed fault tolerance for standalone scheduler Masters.
There is only one master Leader at a time, which is actively serving scheduling
requests. If this Leader crashes, another master will eventually be elected, reconstruct
the state from the first Master, and continue serving scheduling requests.

Leader election is performed using the ZooKeeper leader election pattern. We try to minimize
the use of ZooKeeper and the assumptions about ZooKeeper's behavior, so there is a layer of
retries and session monitoring on top of the ZooKeeper client.

Master failover follows directly from the single-node Master recovery via the file
system (patch 194ba4b8), save that the Master state is stored in ZooKeeper instead.

Configuration:
By default, no recovery mechanism is enabled (spark.deploy.recoveryMode = NONE).
By setting spark.deploy.recoveryMode to ZOOKEEPER and setting spark.deploy.zookeeper.url
to an appropriate ZooKeeper URL, ZooKeeper recovery mode is enabled.
By setting spark.deploy.recoveryMode to FILESYSTEM and setting spark.deploy.recoveryDirectory
to an appropriate directory accessible by the Master, we will keep the behavior of from 194ba4b8.

Additionally, places where a Master could be specificied by a spark:// url can now take
comma-delimited lists to specify backup masters. Note that this is only used for registration
of NEW Workers and application Clients. Once a Worker or Client has registered with the
Master Leader, it is "in the system" and will never need to register again.

Forthcoming:
Documentation, tests (! - only ad hoc testing has been performed so far)
I do not intend for this commit to be merged until tests are added, but this patch should
still be mostly reviewable until then.
2013-09-26 15:04:23 -07:00
Aaron Davidson d5a96feccb Standalone Scheduler fault recovery
Implements a basic form of Standalone Scheduler fault recovery. In particular,
this allows faults to be manually recovered from by means of restarting the
Master process on the same machine. This is the majority of the code necessary
for general fault tolerance, which will first elect a leader and then recover
the Master state.

In order to enable fault recovery, the Master will persist a small amount of state related
to the registration of Workers and Applications to disk. If the Master is started and
sees that this state is still around, it will enter Recovery mode, during which time it
will not schedule any new Executors on Workers (but it does accept the registration of
new Clients and Workers).

At this point, the Master attempts to reconnect to all Workers and Client applications
that were registered at the time of failure. After confirming either the existence
or nonexistence of all such nodes (within a certain timeout), the Master will exit
Recovery mode and resume normal scheduling.
2013-09-26 14:59:35 -07:00
Reynold Xin 714fdabd99 Merge pull request #17 from rxin/optimize
Remove -optimize flag
2013-09-26 14:28:55 -07:00
Reynold Xin 13eced723f Merge pull request #16 from pwendell/master
Bug fix in master build
2013-09-26 14:18:19 -07:00
Reynold Xin 70a0b993d4 Merge pull request #14 from kayousterhout/untangle_scheduler
Improved organization of scheduling packages.

This commit does not change any code -- only file organization.
Please let me know if there was some masterminded strategy behind
the existing organization that I failed to understand!

There are two components of this change:
(1) Moving files out of the cluster package, and down
a level to the scheduling package. These files are all used by
the local scheduler in addition to the cluster scheduler(s), so
should not be in the cluster package. As a result of this change,
none of the files in the local package reference files in the
cluster package.

(2) Moving the mesos package to within the cluster package.
The mesos scheduling code is for a cluster, and represents a
specific case of cluster scheduling (the Mesos-related classes
often subclass cluster scheduling classes). Thus, the most logical
place for it seems to be within the cluster package.

The one thing about the scheduling code that seems a little funny to me
is the naming of the SchedulerBackends.  The StandaloneSchedulerBackend
is not just for Standalone mode, but instead is used by Mesos coarse grained
mode and Yarn, and the backend that *is* just for Standalone mode is instead called SparkDeploySchedulerBackend. I didn't change this because I wasn't sure if there
was a reason for this naming that I'm just not aware of.
2013-09-26 14:11:54 -07:00
Reynold Xin 76677b8fa1 Merge pull request #670 from jey/ec2-ssh-improvements
EC2 SSH improvements
2013-09-26 14:03:46 -07:00
Reynold Xin 3f283278b0 Removed scala -optimize flag. 2013-09-26 13:58:10 -07:00
Reynold Xin c514cd1587 Merge pull request #930 from holdenk/master
Add mapPartitionsWithIndex
2013-09-26 13:48:20 -07:00
Patrick Wendell e2ff59af72 Bug fix in master build 2013-09-26 13:06:51 -07:00
Reynold Xin 560ee5c9bb Merge pull request #7 from wannabeast/memorystore-fixes
some minor fixes to MemoryStore

This is a repeat of #5, moved to its own branch in my repo.

This makes all updates to   on ; it skips on synchronizing the reads where it can get away with it.
2013-09-26 11:27:34 -07:00
Patrick Wendell 6566a19b38 Merge pull request #9 from rxin/limit
Smarter take/limit implementation.
2013-09-26 08:01:04 -07:00
Kay Ousterhout d85fe41b2b Improved organization of scheduling packages.
This commit does not change any code -- only file organization.

There are two components of this change:
(1) Moving files out of the cluster package, and down
a level to the scheduling package. These files are all used by
the local scheduler in addition to the cluster scheduler(s), so
should not be in the cluster package. As a result of this change,
none of the files in the local package reference files in the
cluster package.

(2) Moving the mesos package to within the cluster package.
The mesos scheduling code is for a cluster, and represents a
specific case of cluster scheduling (the Mesos-related classes
often subclass cluster scheduling classes). Thus, the most logical
place for it is within the cluster package.
2013-09-25 12:45:46 -07:00
Patrick Wendell 9d34838bde Merge remote-tracking branch 'apache-github/pr/13' into HEAD 2013-09-24 15:31:12 -07:00
Patrick Wendell 6079721fa1 Update build version in master 2013-09-24 11:41:51 -07:00
Holden Karau 0cef683553 Fix formatting :) 2013-09-23 19:39:42 -07:00
Reynold Xin 7220e8f90b Merge remote-tracking branch 'pr/12'
Fix spacing so java.io.tmpdir doesn't run on with SPARK_JAVA_OPTS
2013-09-23 14:02:21 -07:00
Y.CORP.YAHOO.COM\tgraves a314b30733 Fix spacing so that the java.io.tmpdir doesn't run on with SPARK_JAVA_OPTS 2013-09-23 14:48:17 -05:00
Reynold Xin 0d2e5c3e24 Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/incubator-spark 2013-09-23 11:55:55 -07:00