https://issues.apache.org/jira/browse/SPARK-1901
Author: Zhen Peng <zhenpeng01@baidu.com>
Closes#854 from zhpengg/bugfix-worker-kills-executor and squashes the following commits:
21d380b [Zhen Peng] add some error messages
506cea6 [Zhen Peng] add some docs for killProcess()
a0b9860 [Zhen Peng] [SPARK-1901] worker should make sure executor has exited before updating executor's info
bugfix worker DriverStateChanged state should match DriverState.FAILED
Author: lianhuiwang <lianhuiwang09@gmail.com>
Closes#864 from lianhuiwang/master and squashes the following commits:
480ce94 [lianhuiwang] address aarondav comments
f2b5970 [lianhuiwang] bugfix worker DriverStateChanged state should match DriverState.FAILED
`var cachedPeers: Seq[BlockManagerId] = null` is used in `def replicate(blockId: BlockId, data: ByteBuffer, level: StorageLevel)` without proper protection.
There are two place will call `replicate(blockId, bytesAfterPut, level)`
* 17f3075bc4/core/src/main/scala/org/apache/spark/storage/BlockManager.scala (L644) runs in `connectionManager.futureExecContext`
* 17f3075bc4/core/src/main/scala/org/apache/spark/storage/BlockManager.scala (L752) `doPut` runs in `connectionManager.handleMessageExecutor`. `org.apache.spark.storage.BlockManagerWorker` calls `blockManager.putBytes` in `connectionManager.handleMessageExecutor`.
As they run in different `Executor`s, this is a race condition which may cause the memory pointed by `cachedPeers` is not correct even if `cachedPeers != null`.
The race condition of `onReceiveCallback` is that it's set in `BlockManagerWorker` but read in a different thread in `ConnectionManager.handleMessageExecutor`.
Author: zsxwing <zsxwing@gmail.com>
Closes#887 from zsxwing/SPARK-1932 and squashes the following commits:
524f69c [zsxwing] SPARK-1932: Fix race conditions in onReceiveCallback and cachedPeers
https://issues.apache.org/jira/browse/SPARK-1933
Author: Reynold Xin <rxin@apache.org>
Closes#888 from rxin/addfile and squashes the following commits:
8c402a3 [Reynold Xin] Updated comment.
ff6c162 [Reynold Xin] SPARK-1933: Throw a more meaningful exception when a directory is passed to addJar/addFile.
DAGScheduler does not handle local task OOM properly, and will wait for the job result forever.
Author: Zhen Peng <zhenpeng01@baidu.com>
Closes#883 from zhpengg/bugfix-dag-scheduler-oom and squashes the following commits:
76f7eda [Zhen Peng] remove redundant memory allocations
aa63161 [Zhen Peng] SPARK-1929 DAGScheduler suspended by local task OOM
Self explanatory.
Author: Patrick Wendell <pwendell@gmail.com>
Closes#878 from pwendell/java-constructor and squashes the following commits:
2cc1605 [Patrick Wendell] HOTFIX: Add no-arg SparkContext constructor in Java
Author: Zhen Peng <zhenpeng01@baidu.com>
Closes#827 from zhpengg/bugfix-executor-id-not-found and squashes the following commits:
cd8bb65 [Zhen Peng] bugfix: check executor id existence when executor exit
If I run the following on a YARN cluster
```
bin/spark-submit sheep.py --master yarn-client
```
it fails because of a mismatch in paths: `spark-submit` thinks that `sheep.py` resides on HDFS, and balks when it can't find the file there. A natural workaround is to add the `file:` prefix to the file:
```
bin/spark-submit file:/path/to/sheep.py --master yarn-client
```
However, this also fails. This time it is because python does not understand URI schemes.
This PR fixes this by automatically resolving all paths passed as command line argument to `spark-submit` properly. This has the added benefit of keeping file and jar paths consistent across different cluster modes. For python, we strip the URI scheme before we actually try to run it.
Much of the code is originally written by @mengxr. Tested on YARN cluster. More tests pending.
Author: Andrew Or <andrewor14@gmail.com>
Closes#853 from andrewor14/submit-paths and squashes the following commits:
0bb097a [Andrew Or] Format path correctly before adding it to PYTHONPATH
323b45c [Andrew Or] Include --py-files on PYTHONPATH for pyspark shell
3c36587 [Andrew Or] Improve error messages (minor)
854aa6a [Andrew Or] Guard against NPE if user gives pathological paths
6638a6b [Andrew Or] Fix spark-shell jar paths after #849 went in
3bb0359 [Andrew Or] Update more comments (minor)
2a1f8a0 [Andrew Or] Update comments (minor)
6af2c77 [Andrew Or] Merge branch 'master' of github.com:apache/spark into submit-paths
a68c4d1 [Andrew Or] Handle Windows python file path correctly
427a250 [Andrew Or] Resolve paths properly for Windows
a591a4a [Andrew Or] Update tests for resolving URIs
6c8621c [Andrew Or] Move resolveURIs to Utils
db8255e [Andrew Or] Merge branch 'master' of github.com:apache/spark into submit-paths
f542dce [Andrew Or] Fix outdated tests
691c4ce [Andrew Or] Ignore special primary resource names
5342ac7 [Andrew Or] Add missing space in error message
02f77f3 [Andrew Or] Resolve command line arguments to spark-submit properly
Due to perhaps zombie processes on Jenkins, it seems that at least 10
Spark ports are in use. It also doesn't matter that the port increases
when used, it could in fact go down -- the only part that matters is
that it selects a different port rather than failing to bind.
Changed test to match this.
Thanks to @andrewor14 for helping diagnose this.
Author: Aaron Davidson <aaron@databricks.com>
Closes#857 from aarondav/tiny and squashes the following commits:
c199ec8 [Aaron Davidson] Fix UISuite unit test that fails under Jenkins contention
Sent secondary jars to distributed cache of all containers and add the cached jars to classpath before executors start. Tested on a YARN cluster (CDH-5.0).
`spark-submit --jars` also works in standalone server and `yarn-client`. Thanks for @andrewor14 for testing!
I removed "Doesn't work for drivers in standalone mode with "cluster" deploy mode." from `spark-submit`'s help message, though we haven't tested mesos yet.
CC: @dbtsai @sryza
Author: Xiangrui Meng <meng@databricks.com>
Closes#848 from mengxr/yarn-classpath and squashes the following commits:
23e7df4 [Xiangrui Meng] rename spark.jar to __spark__.jar and app.jar to __app__.jar to avoid confliction apped $CWD/ and $CWD/* to the classpath remove unused methods
a40f6ed [Xiangrui Meng] standalone -> cluster
65e04ad [Xiangrui Meng] update spark-submit help message and add a comment for yarn-client
11e5354 [Xiangrui Meng] minor changes
3e7e1c4 [Xiangrui Meng] use sparkConf instead of hadoop conf
dc3c825 [Xiangrui Meng] add secondary jars to classpath in yarn
It was in the wrong package
Author: Andrew Or <andrewor14@gmail.com>
Closes#839 from andrewor14/jdbc-suite and squashes the following commits:
f948c5a [Andrew Or] cache -> cache()
b215279 [Andrew Or] Move JdbcRDDSuite to the correct package
Updated version of #821
Author: Tathagata Das <tathagata.das1565@gmail.com>
Author: Ghidireac <bogdang@u448a5b0a73d45358d94a.ant.amazon.com>
Closes#835 from tdas/SPARK-1877 and squashes the following commits:
f346f71 [Tathagata Das] Addressed Patrick's comments.
fee0c5d [Ghidireac] SPARK-1877: ClassNotFoundException when loading RDD with serialized objects
scheduler.error() will mask the error if there are active tasks. Being removed is a cataclysmic event for Spark applications, and should probably be treated as such.
Author: Aaron Davidson <aaron@databricks.com>
Closes#832 from aarondav/i-love-u and squashes the following commits:
9f1200f [Aaron Davidson] SPARK-1689: Spark application should die when removed by Master
See https://issues.apache.org/jira/browse/SPARK-1879 -- builds with Hadoop2 and Hive ran out of PermGen space in spark-shell, when those things added up with the Scala compiler.
Note that users can still override it by setting their own Java options with this change. Their options will come later in the command string than the -XX:MaxPermSize=128m.
Author: Matei Zaharia <matei@databricks.com>
Closes#823 from mateiz/spark-1879 and squashes the following commits:
6bc0ee8 [Matei Zaharia] Increase MaxPermSize to 128m since some of our builds have lots of classes
- Look for JARs in the right place
- Launch examples the same way as on Unix
- Load datanucleus JARs if they exist
- Don't attempt to parse local paths as URIs in SparkSubmit, since paths with C:\ are not valid URIs
- Also fixed POM exclusion rules for datanucleus (it wasn't properly excluding it, whereas SBT was)
Author: Matei Zaharia <matei@databricks.com>
Closes#819 from mateiz/win-fixes and squashes the following commits:
d558f96 [Matei Zaharia] Fix comment
228577b [Matei Zaharia] Review comments
d3b71c7 [Matei Zaharia] Properly exclude datanucleus files in Maven assembly
144af84 [Matei Zaharia] Update Windows scripts to match latest binary package layout
Just a small change. I think it's good not to scare people who are using the old options.
Author: Patrick Wendell <pwendell@gmail.com>
Closes#810 from pwendell/warnings and squashes the following commits:
cb8a311 [Patrick Wendell] Make deprecation warning less severe
**Problem.** For `bin/pyspark`, there is currently no other way to specify Spark configuration properties other than through `SPARK_JAVA_OPTS` in `conf/spark-env.sh`. However, this mechanism is supposedly deprecated. Instead, it needs to pick up configurations explicitly specified in `conf/spark-defaults.conf`.
**Solution.** Have `bin/pyspark` invoke `bin/spark-submit`, like all of its counterparts in Scala land (i.e. `bin/spark-shell`, `bin/run-example`). This has the additional benefit of making the invocation of all the user facing Spark scripts consistent.
**Details.** `bin/pyspark` inherently handles two cases: (1) running python applications and (2) running the python shell. For (1), Spark submit already handles running python applications. For cases in which `bin/pyspark` is given a python file, we can simply call pass the file directly to Spark submit and let it handle the rest.
For case (2), `bin/pyspark` starts a python process as before, which launches the JVM as a sub-process. The existing code already provides a code path to do this. All we needed to change is to use `bin/spark-submit` instead of `spark-class` to launch the JVM. This requires modifications to Spark submit to handle the pyspark shell as a special case.
This has been tested locally (OSX and Windows 7), on a standalone cluster, and on a YARN cluster. Running IPython also works as before, except now it takes in Spark submit arguments too.
Author: Andrew Or <andrewor14@gmail.com>
Closes#799 from andrewor14/pyspark-submit and squashes the following commits:
bf37e36 [Andrew Or] Minor changes
01066fa [Andrew Or] bin/pyspark for Windows
c8cb3bf [Andrew Or] Handle perverse app names (with escaped quotes)
1866f85 [Andrew Or] Windows is not cooperating
456d844 [Andrew Or] Guard against shlex hanging if PYSPARK_SUBMIT_ARGS is not set
7eebda8 [Andrew Or] Merge branch 'master' of github.com:apache/spark into pyspark-submit
b7ba0d8 [Andrew Or] Address a few comments (minor)
06eb138 [Andrew Or] Use shlex instead of writing our own parser
05879fa [Andrew Or] Merge branch 'master' of github.com:apache/spark into pyspark-submit
a823661 [Andrew Or] Fix --die-on-broken-pipe not propagated properly
6fba412 [Andrew Or] Deal with quotes + address various comments
fe4c8a7 [Andrew Or] Update --help for bin/pyspark
afe47bf [Andrew Or] Fix spark shell
f04aaa4 [Andrew Or] Merge branch 'master' of github.com:apache/spark into pyspark-submit
a371d26 [Andrew Or] Route bin/pyspark through Spark submit
Author: Michael Armbrust <michael@databricks.com>
Closes#808 from marmbrus/confClasspath and squashes the following commits:
4c31d57 [Michael Armbrust] Look in spark conf instead of system properties when propagating configuration to executors.
This causes an unrecoverable error for applications that are running for longer
than 7 days that have jars added to the SparkContext, as the jars are cleaned up
even though the application is still running.
Author: Aaron Davidson <aaron@databricks.com>
Closes#800 from aarondav/shitty-defaults and squashes the following commits:
a573fbb [Aaron Davidson] SPARK-1860: Do not cleanup application work/ directories by default
Author: Huajian Mao <huajianmao@gmail.com>
Closes#798 from huajianmao/patch-1 and squashes the following commits:
208a454 [Huajian Mao] A typo in Task
1b515af [Huajian Mao] A typo in the message
This is a few changes based on the original patch by @scrapcodes.
Author: Prashant Sharma <prashant.s@imaginea.com>
Author: Patrick Wendell <pwendell@gmail.com>
Closes#785 from pwendell/package-docs and squashes the following commits:
c32b731 [Patrick Wendell] Changes based on Prashant's patch
c0463d3 [Prashant Sharma] added eof new line
ce8bf73 [Prashant Sharma] Added eof new line to all files.
4c35f2e [Prashant Sharma] SPARK-1563 Add package-info.java and package.scala files for all packages that appear in docs
Author: Patrick Wendell <pwendell@gmail.com>
Closes#784 from pwendell/group-by-key and squashes the following commits:
9b4505f [Patrick Wendell] Small fix
6347924 [Patrick Wendell] Documentation: Encourage use of reduceByKey instead of groupByKey.
Running SparkPi example gave this error.
```
Pi is roughly 3.14374
14/05/14 18:16:19 ERROR Utils: Uncaught exception in thread SparkListenerBus
scala.runtime.NonLocalReturnControl$mcV$sp
```
This is due to the catch-all in the SparkListenerBus, which logged control throwable used by scala system
Author: Tathagata Das <tathagata.das1565@gmail.com>
Closes#783 from tdas/controlexception-fix and squashes the following commits:
a466c8d [Tathagata Das] Ignored control exceptions when logging all exceptions.
Author: andrewor14 <andrewor14@gmail.com>
Closes#780 from andrewor14/submit-typo and squashes the following commits:
e70e057 [andrewor14] propertes -> properties
After having been invited to make the change in 6bee01dd04 (commitcomment-6284165) by @witgo.
Author: Jacek Laskowski <jacek@japila.pl>
Closes#748 from jaceklaskowski/sparkenv-string-interpolation and squashes the following commits:
be6ebac [Jacek Laskowski] String interpolation + some other small changes
This is nicer than relying on new SparkContext(new SparkConf())
Author: Patrick Wendell <pwendell@gmail.com>
Closes#774 from pwendell/spark-context and squashes the following commits:
ef9f12f [Patrick Wendell] SPARK-1833 - Have an empty SparkContext constructor.
As "99 ms" up to 99 ms
As "0.1 s" from 0.1 s up to 0.9 s
https://issues.apache.org/jira/browse/SPARK-1829
Compare the first image to the second here: http://imgur.com/RaLEsSZ,7VTlgfo#0
Author: Andrew Ash <andrew@andrewash.com>
Closes#768 from ash211/spark-1829 and squashes the following commits:
1c15b8e [Andrew Ash] SPARK-1829 Format sub-second durations more appropriately
If the intended behavior was that uncaught exceptions thrown in functions being run by the Akka scheduler would end up being handled by the default uncaught exception handler set in Executor, and if that behavior is, in fact, correct, then this is a way to accomplish that. I'm not certain, though, that we shouldn't be doing something different to handle uncaught exceptions from some of these scheduled functions.
In any event, this PR covers all of the cases I comment on in [SPARK-1620](https://issues.apache.org/jira/browse/SPARK-1620).
Author: Mark Hamstra <markhamstra@gmail.com>
Closes#622 from markhamstra/SPARK-1620 and squashes the following commits:
071d193 [Mark Hamstra] refactored post-SPARK-1772
1a6a35e [Mark Hamstra] another style fix
d30eb94 [Mark Hamstra] scalastyle
3573ecd [Mark Hamstra] Use wrapped try/catch in Utils.tryOrExit
8fc0439 [Mark Hamstra] Make functions run by the Akka scheduler use Executor's UncaughtExceptionHandler
This PR replaces the Schedulable data structures in Pool.scala with thread-safe ones from java. Note that Scala's `with SynchronizedBuffer` trait is soon to be deprecated in 2.11 because it is ["inherently unreliable"](http://www.scala-lang.org/api/2.11.0/index.html#scala.collection.mutable.SynchronizedBuffer). We should slowly drift away from `SynchronizedBuffer` in other places too.
Note that this PR introduces an API-breaking change; `sc.getAllPools` now returns an Array rather than an ArrayBuffer. This is because we want this method to return an immutable copy rather than one may potentially confuse the user if they try to modify the copy, which takes no effect on the original data structure.
Author: Andrew Or <andrewor14@gmail.com>
Closes#762 from andrewor14/pool-npe and squashes the following commits:
383e739 [Andrew Or] JavaConverters -> JavaConversions
3f32981 [Andrew Or] Merge branch 'master' of github.com:apache/spark into pool-npe
769be19 [Andrew Or] Assorted minor changes
2189247 [Andrew Or] Merge branch 'master' of github.com:apache/spark into pool-npe
05ad9e9 [Andrew Or] Fix test - contains is not the same as containsKey
0921ea0 [Andrew Or] var -> val
07d720c [Andrew Or] Synchronize Schedulable data structures
...loper api
Author: Koert Kuipers <koert@tresata.com>
Closes#764 from koertkuipers/feat-rdd-developerapi and squashes the following commits:
8516dd2 [Koert Kuipers] SPARK-1801. expose InterruptibleIterator and TaskKilledException in developer api
Add the implementation for ApproximateCountDistinct to SparkSql. We use the HyperLogLog algorithm implemented in stream-lib, and do the count in two phases: 1) counting the number of distinct elements in each partitions, and 2) merge the HyperLogLog results from different partitions.
A simple serializer and test cases are added as well.
Author: larvaboy <larvaboy@gmail.com>
Closes#737 from larvaboy/master and squashes the following commits:
bd8ef3f [larvaboy] Add support of user-provided standard deviation to ApproxCountDistinct.
9ba8360 [larvaboy] Fix alignment and null handling issues.
95b4067 [larvaboy] Add a test case for count distinct and approximate count distinct.
f57917d [larvaboy] Add the parser for the approximate count.
a2d5d10 [larvaboy] Add ApproximateCountDistinct aggregates and functions.
7ad273a [larvaboy] Add SparkSql serializer for HyperLogLog.
1d9aacf [larvaboy] Fix a minor typo in the toString method of the Count case class.
653542b [larvaboy] Fix a couple of minor typos.
This change adds a new partitioner which allows users
to specify # of keys per partition.
Author: Syed Hashmi <shashmi@cloudera.com>
Closes#721 from syedhashmi/master and squashes the following commits:
4ca94cc [Syed Hashmi] [SPARK-1784] Add a new partitioner
JIRA issue: [SPARK-1527](https://issues.apache.org/jira/browse/SPARK-1527)
getName() only gets the last component of the file path. When deleting test-generated directories,
we should pass the generated directory's absolute path to DiskBlockManager.
Author: Ye Xianjin <advancedxy@gmail.com>
This patch had conflicts when merged, resolved by
Committer: Patrick Wendell <pwendell@gmail.com>
Closes#436 from advancedxy/SPARK-1527 and squashes the following commits:
4678bab [Ye Xianjin] change rootDir*.getname to rootDir*.getAbsolutePath so the temporary directories are deleted when the test is finished.
The solution is to wrap a try / catch / log around the posting of each event to each listener.
Author: Andrew Or <andrewor14@gmail.com>
Closes#759 from andrewor14/listener-die and squashes the following commits:
aee5107 [Andrew Or] Merge branch 'master' of github.com:apache/spark into listener-die
370939f [Andrew Or] Remove two layers of indirection
422d278 [Andrew Or] Explicitly throw an exception instead of 1 / 0
0df0e2a [Andrew Or] Try/catch and log exceptions when posting events
This patch checks top-level closure arguments to `ClosureCleaner.clean` for `return` statements and raises an exception if it finds any. This is mainly a user-friendliness addition, since programs with return statements in closure arguments will currently fail upon RDD actions with a less-than-intuitive error message.
Author: William Benton <willb@redhat.com>
Closes#717 from willb/spark-571 and squashes the following commits:
c41eb7d [William Benton] Another test case for SPARK-571
30c42f4 [William Benton] Stylistic cleanups
559b16b [William Benton] Stylistic cleanups from review
de13b79 [William Benton] Style fixes
295b6a5 [William Benton] Forbid return statements in closure arguments.
b017c47 [William Benton] Added a test for SPARK-571
Author: Sandy Ryza <sandy@cloudera.com>
Closes#753 from sryza/sandy-spark-1815 and squashes the following commits:
957a8ac [Sandy Ryza] SPARK-1815. SparkContext should not be marked DeveloperApi
What they really mean is SPARK_DAEMON_***JAVA***_OPTS
Author: Andrew Or <andrewor14@gmail.com>
Closes#751 from andrewor14/spark-daemon-opts and squashes the following commits:
70c41f9 [Andrew Or] SPARK_DAEMON_OPTS -> SPARK_DAEMON_JAVA_OPTS
Author: Andrew Ash <andrew@andrewash.com>
Closes#743 from ash211/patch-4 and squashes the following commits:
c959f3b [Andrew Ash] Typo: resond -> respond
This seems strictly better, and I think it's justified only the grounds of
clean-up. It might also fix issues with path conversions, but I haven't
yet isolated any instance of that happening.
/cc @srowen @tdas
Author: Patrick Wendell <pwendell@gmail.com>
Closes#749 from pwendell/broadcast-cleanup and squashes the following commits:
d6d54f2 [Patrick Wendell] SPARK-1623: Use File objects instead of string's in HTTPBroadcast
This was changed, but in fact, it's used for things other than tests.
So I've changed it back.
Author: Patrick Wendell <pwendell@gmail.com>
Closes#747 from pwendell/executor-env and squashes the following commits:
36a60a5 [Patrick Wendell] Rename testExecutorEnvs --> executorEnvs.
Three issues related to temp files that tests generate – these should be touched up for hygiene but are not urgent.
Modules have a log4j.properties which directs the unit-test.log output file to a directory like `[module]/target/unit-test.log`. But this ends up creating `[module]/[module]/target/unit-test.log` instead of former.
The `work/` directory is not deleted by "mvn clean", in the parent and in modules. Neither is the `checkpoint/` directory created under the various external modules.
Many tests create a temp directory, which is not usually deleted. This can be largely resolved by calling `deleteOnExit()` at creation and trying to call `Utils.deleteRecursively` consistently to clean up, sometimes in an `@After` method.
_If anyone seconds the motion, I can create a more significant change that introduces a new test trait along the lines of `LocalSparkContext`, which provides management of temp directories for subclasses to take advantage of._
Author: Sean Owen <sowen@cloudera.com>
Closes#732 from srowen/SPARK-1798 and squashes the following commits:
5af578e [Sean Owen] Try to consistently delete test temp dirs and files, and set deleteOnExit() for each
b21b356 [Sean Owen] Remove work/ and checkpoint/ dirs with mvn clean
bdd0f41 [Sean Owen] Remove duplicate module dir in log4j.properties output path for tests
Enabled Mesos (0.18.1) dependency with shaded protobuf
Why is this needed?
Avoids any protobuf version collision between Mesos and any other
dependency in Spark e.g. Hadoop HDFS 2.2+ or 1.0.4.
Ticket: https://issues.apache.org/jira/browse/SPARK-1806
* Should close https://issues.apache.org/jira/browse/SPARK-1433
Author berngp
Author: Bernardo Gomez Palacio <bernardo.gomezpalacio@gmail.com>
Closes#741 from berngp/feature/SPARK-1806 and squashes the following commits:
5d70646 [Bernardo Gomez Palacio] SPARK-1806: Upgrade Mesos dependency to 0.18.1
The main issue this patch fixes is [SPARK-1772](https://issues.apache.org/jira/browse/SPARK-1772), in which Executors may not die when fatal exceptions (e.g., OOM) are thrown. This patch causes Executors to delegate to the ExecutorUncaughtExceptionHandler when a fatal exception is thrown.
This patch also continues the fight in the neverending war against `case t: Throwable =>`, by only catching Exceptions in many places, and adding a wrapper for Threads and Runnables to make sure any uncaught exceptions are at least printed to the logs.
It also turns out that it is unlikely that the IndestructibleActorSystem actually works, given testing ([here](https://gist.github.com/aarondav/ca1f0cdcd50727f89c0d)). The uncaughtExceptionHandler is not called from the places that we expected it would be.
[SPARK-1620](https://issues.apache.org/jira/browse/SPARK-1620) deals with part of this issue, but refactoring our Actor Systems to ensure that exceptions are dealt with properly is a much bigger change, outside the scope of this PR.
Author: Aaron Davidson <aaron@databricks.com>
Closes#715 from aarondav/throwable and squashes the following commits:
f9b9bfe [Aaron Davidson] Remove other redundant 'throw e'
e937a0a [Aaron Davidson] Address Prashant and Matei's comments
1867867 [Aaron Davidson] [RFC] SPARK-1772 Stop catching Throwable, let Executors die
This patch adds better balancing when performing a repartition of an
RDD. Previously the elements in the RDD were hash partitioned, meaning
if the RDD was skewed certain partitions would end up being very large.
This commit adds load balancing of elements across the repartitioned
RDD splits. The load balancing is not perfect: a given output partition
can have up to N more elements than the average if there are N input
partitions. However, some randomization is used to minimize the
probabiliy that this happens.
Author: Patrick Wendell <pwendell@gmail.com>
Closes#727 from pwendell/load-balance and squashes the following commits:
f9da752 [Patrick Wendell] Response to Matei's feedback
acfa46a [Patrick Wendell] SPARK-1770: Load balance elements when repartitioning.
Author: witgo <witgo@qq.com>
Closes#728 from witgo/scala_home and squashes the following commits:
cdfd8be [witgo] Merge branch 'master' of https://github.com/apache/spark into scala_home
fac094a [witgo] remove outdated runtime Information scala home
SparkSubmit ignores `--jars` for YARN client. This is a bug.
This PR also automatically adds the application jar to `spark.jar`. Previously, when running as yarn-client, you must specify the jar additionally through `--files` (because `--jars` didn't work). Now you don't have to explicitly specify it through either.
Tested on a YARN cluster.
Author: Andrew Or <andrewor14@gmail.com>
Closes#710 from andrewor14/yarn-jars and squashes the following commits:
35d1928 [Andrew Or] Merge branch 'master' of github.com:apache/spark into yarn-jars
c27bf6c [Andrew Or] For yarn-cluster and python, do not add primaryResource to spark.jar
c92c5bf [Andrew Or] Minor cleanups
269f9f3 [Andrew Or] Fix format
013d840 [Andrew Or] Fix tests
1407474 [Andrew Or] Merge branch 'master' of github.com:apache/spark into yarn-jars
3bb75e8 [Andrew Or] Allow SparkSubmit --jars to take effect in yarn-client mode
TL;DR is there is a bit of JAR hell trouble with Netty, that can be mostly resolved and will resolve a test failure.
I hit the error described at http://apache-spark-user-list.1001560.n3.nabble.com/SparkContext-startup-time-out-td1753.html while running FlumeStreamingSuite, and have for a short while (is it just me?)
velvia notes:
"I have found a workaround. If you add akka 2.2.4 to your dependencies, then everything works, probably because akka 2.2.4 brings in newer version of Jetty."
There are at least 3 versions of Netty in play in the build:
- the new Flume 1.4.0 dependency brings in io.netty:netty:3.4.0.Final, and that is the immediate problem
- the custom version of akka 2.2.3 depends on io.netty:netty:3.6.6.
- but, Spark Core directly uses io.netty:netty-all:4.0.17.Final
The POMs try to exclude other versions of netty, but are excluding org.jboss.netty:netty, when in fact older versions of io.netty:netty (not netty-all) are also an issue.
The org.jboss.netty:netty excludes are largely unnecessary. I replaced many of them with io.netty:netty exclusions until everything agreed on io.netty:netty-all:4.0.17.Final.
But this didn't work, since Akka 2.2.3 doesn't work with Netty 4.x. Down-grading to 3.6.6.Final across the board made some Spark code not compile.
If the build *keeps* io.netty:netty:3.6.6.Final as well, everything seems to work. Part of the reason seems to be that Netty 3.x used the old `org.jboss.netty` packages. This is less than ideal, but is no worse than the current situation.
So this PR resolves the issue and improves the JAR hell, even if it leaves the existing theoretical Netty 3-vs-4 conflict:
- Remove org.jboss.netty excludes where possible, for clarity; they're not needed except with Hadoop artifacts
- Add io.netty:netty excludes where needed -- except, let akka keep its io.netty:netty
- Change a bit of test code that actually depended on Netty 3.x, to use 4.x equivalent
- Update SBT build accordingly
A better change would be to update Akka far enough such that it agrees on Netty 4.x, but I don't know if that's feasible.
Author: Sean Owen <sowen@cloudera.com>
Closes#723 from srowen/SPARK-1789 and squashes the following commits:
43661b7 [Sean Owen] Update and add Netty excludes to prevent some JAR conflicts that cause test issues