Commit graph

4190 commits

Author SHA1 Message Date
lianhuiwang f5e63751f0 [SPARK-5173]support python application running on yarn cluster mode
now when we run python application on yarn cluster mode through spark-submit, spark-submit does not support python application on yarn cluster mode. so i modify code of submit and yarn's AM in order to support it.
through specifying .py file or primaryResource file via spark-submit, we can make pyspark run in yarn-cluster mode.
example:spark-submit --master yarn-master --num-executors 1 --driver-memory 1g --executor-memory 1g  xx.py --primaryResource yy.conf
this config is same as pyspark on yarn-client mode.
firstly,we put local path of .py or primaryResource to yarn's dist.files.that can be distributed on slave nodes.and then in spark-submit we transfer --py-files and --primaryResource to yarn.Client and use "org.apache.spark.deploy.PythonRunner" to user class that can run .py files on ApplicationMaster.
in yarn.Client we transfer --py-files and --primaryResource to  ApplicationMaster.
in ApplicationMaster, user's class is org.apache.spark.deploy.PythonRunner, and user's args is primaryResource and -py-files. so that can make pyspark run on ApplicationMaster.
JoshRosen tgravescs sryza

Author: lianhuiwang <lianhuiwang09@gmail.com>
Author: Wang Lianhui <lianhuiwang09@gmail.com>

Closes #3976 from lianhuiwang/SPARK-5173 and squashes the following commits:

28a8a58 [lianhuiwang] fix variable name
67f8cee [lianhuiwang] update with andrewor's comments
0319ae3 [lianhuiwang] address with sryza's comments
2385ef6 [lianhuiwang] address with sryza's comments
03640ab [lianhuiwang] add sparkHome to env
47d2fc3 [lianhuiwang] fix test
2adc8f5 [lianhuiwang] add spark.test.home
d60bc60 [lianhuiwang] fix test
5b30064 [lianhuiwang] add test
097a5ec [lianhuiwang] fix line length exceeds 100
905a106 [lianhuiwang] update with sryza and andrewor 's comments
f1f55b6 [lianhuiwang] when yarn-cluster, all python files can be non-local
172eec1 [Wang Lianhui] fix a min submit's bug
9c941bc [lianhuiwang] support python application running on yarn cluster mode
2015-02-02 12:32:28 -08:00
Sandy Ryza b2047b55c5 SPARK-4585. Spark dynamic executor allocation should use minExecutors as...
... initial number

Author: Sandy Ryza <sandy@cloudera.com>

Closes #4051 from sryza/sandy-spark-4585 and squashes the following commits:

d1dd039 [Sandy Ryza] Add spark.dynamicAllocation.initialNumExecutors and make min and max not required
b7c59dc [Sandy Ryza] SPARK-4585. Spark dynamic executor allocation should use minExecutors as initial number
2015-02-02 12:27:08 -08:00
Sandy Ryza 6f341310bf SPARK-5492. Thread statistics can break with older Hadoop versions
Author: Sandy Ryza <sandy@cloudera.com>

Closes #4305 from sryza/sandy-spark-5492 and squashes the following commits:

b7d4497 [Sandy Ryza] SPARK-5492. Thread statistics can break with older Hadoop versions
2015-02-02 00:54:06 -08:00
jerryshao 63dfe21dc7 [SPARK-5478][UI][Minor] Add missing right parentheses
![UI](https://dl.dropboxusercontent.com/u/19230832/Capture.PNG)

Author: jerryshao <saisai.shao@intel.com>

Closes #4267 from jerryshao/SPARK-5478 and squashes the following commits:

9fe51cc [jerryshao] Add missing right parentheses
2015-02-01 23:56:13 -08:00
Patrick Wendell a15f6e31fc [SPARK-3996]: Shade Jetty in Spark deliverables
(v2 of this patch with a fix that was only relevant for the maven build).

This patch piggy-back's on vanzin's work to simplify the Guava shading,
and adds Jetty as a shaded library in Spark. Other than adding Jetty,
it consilidates the <artifactSet>'s into the root pom. I found it was
a bit easier to follow that way, since you don't need to look into
child pom's to find out specific artifact sets included in shading.

Author: Patrick Wendell <patrick@databricks.com>

Closes #4285 from pwendell/jetty and squashes the following commits:

d3e7f4e [Patrick Wendell] Fix for shaded deps causing compile errors
19f0710 [Patrick Wendell] More code review feedback
961452d [Patrick Wendell] Responding to feedback from Marcello
6df25ca [Patrick Wendell] [WIP] [SPARK-3996]: Shade Jetty in Spark deliverables
2015-02-01 21:13:57 -08:00
Tom Panning 1ca0a1014e [SPARK-5176] The thrift server does not support cluster mode
Output an error message if the thrift server is started in cluster mode.

Author: Tom Panning <tom.panning@nextcentury.com>

Closes #4137 from tpanningnextcen/spark-5176-thrift-cluster-mode-error and squashes the following commits:

f5c0509 [Tom Panning] [SPARK-5176] The thrift server does not support cluster mode
2015-02-01 17:57:31 -08:00
zsxwing 883bc88d52 [SPARK-4859][Core][Streaming] Refactor LiveListenerBus and StreamingListenerBus
This PR refactors LiveListenerBus and StreamingListenerBus and extracts the common codes to a parent class `ListenerBus`.

It also includes bug fixes in #3710:
1. Fix the race condition of queueFullErrorMessageLogged in LiveListenerBus and StreamingListenerBus to avoid outputing `queue-full-error` logs multiple times.
2. Make sure the SHUTDOWN message will be delivered to listenerThread, so that we can make sure listenerThread will always be able to exit.
3. Log the error from listener rather than crashing listenerThread in StreamingListenerBus.

During fixing the above bugs, we find it's better to make LiveListenerBus and StreamingListenerBus have the same bahaviors. Then there will be many duplicated codes in LiveListenerBus and StreamingListenerBus.

Therefore, I extracted their common codes to `ListenerBus` as a parent class: LiveListenerBus and StreamingListenerBus only need to extend `ListenerBus` and implement `onPostEvent` (how to process an event) and `onDropEvent` (do something when droppping an event).

Author: zsxwing <zsxwing@gmail.com>

Closes #4006 from zsxwing/SPARK-4859-refactor and squashes the following commits:

c8dade2 [zsxwing] Fix the code style after renaming
5715061 [zsxwing] Rename ListenerHelper to ListenerBus and the original ListenerBus to AsynchronousListenerBus
f0ef647 [zsxwing] Fix the code style
4e85ffc [zsxwing] Merge branch 'master' into SPARK-4859-refactor
d2ef990 [zsxwing] Add private[spark]
4539f91 [zsxwing] Remove final to pass MiMa tests
a9dccd3 [zsxwing] Remove SparkListenerShutdown
7cc04c3 [zsxwing] Refactor LiveListenerBus and StreamingListenerBus and make them share same code base
2015-02-01 17:48:41 -08:00
Ryan Williams 80bd715a3e [SPARK-5422] Add support for sending Graphite metrics via UDP
Depends on [SPARK-5413](https://issues.apache.org/jira/browse/SPARK-5413) / #4209, included here, will rebase once the latter's merged.

Author: Ryan Williams <ryan.blake.williams@gmail.com>

Closes #4218 from ryan-williams/udp and squashes the following commits:

ebae393 [Ryan Williams] Add support for sending Graphite metrics via UDP
cb58262 [Ryan Williams] bump metrics dependency to v3.1.0
2015-01-31 23:41:05 -08:00
Sean Owen c84d5a10e8 SPARK-3359 [CORE] [DOCS] sbt/sbt unidoc doesn't work with Java 8
These are more `javadoc` 8-related changes I spotted while investigating. These should be helpful in any event, but this does not nearly resolve SPARK-3359, which may never be feasible while using `unidoc` and `javadoc` 8.

Author: Sean Owen <sowen@cloudera.com>

Closes #4193 from srowen/SPARK-3359 and squashes the following commits:

5b33f66 [Sean Owen] Additional scaladoc fixes for javadoc 8; still not going to be javadoc 8 compatible
2015-01-31 10:40:42 -08:00
Reynold Xin 636408311d [SPARK-5307] Add a config option for SerializationDebugger.
Just in case there is a bug in the SerializationDebugger that makes error reporting worse than it was.

Author: Reynold Xin <rxin@databricks.com>

Closes #4297 from rxin/ser-config and squashes the following commits:

f1d4629 [Reynold Xin] [SPARK-5307] Add a config option for SerializationDebugger.
2015-01-31 00:06:36 -08:00
Reynold Xin 740a56862b [SPARK-5307] SerializationDebugger
This patch adds a SerializationDebugger that is used to add serialization path to a NotSerializableException. When a NotSerializableException is encountered, the debugger visits the object graph to find the path towards the object that cannot be serialized, and constructs information to help user to find the object.

The patch uses the internals of JVM serialization (in particular, heavy usage of ObjectStreamClass). Compared with an earlier attempt, this one provides extra information including field names, array offsets, writeExternal calls, etc.

An example serialization stack:
```
Serialization stack:
  - object not serializable (class: org.apache.spark.serializer.NotSerializable, value: org.apache.spark.serializer.NotSerializable2c43caa4)
  - element of array (index: 0)
  - array (class [Ljava.lang.Object;, size 1)
  - field (class: org.apache.spark.serializer.SerializableArray, name: arrayField, type: class [Ljava.lang.Object;)
  - object (class org.apache.spark.serializer.SerializableArray, org.apache.spark.serializer.SerializableArray193c5908)
  - writeExternal data
  - externalizable object (class org.apache.spark.serializer.ExternalizableClass, org.apache.spark.serializer.ExternalizableClass320bdadc)
```

Author: Reynold Xin <rxin@databricks.com>

Closes #4098 from rxin/SerializationDebugger and squashes the following commits:

553b3ff [Reynold Xin] Update SerializationDebuggerSuite.scala
572d0cb [Reynold Xin] Disable automatically when reflection fails.
b349b77 [Reynold Xin] [SPARK-5307] SerializationDebugger to help debug NotSerializableException - take 2
2015-01-30 22:34:10 -08:00
Sandy Ryza 254eaa4d35 SPARK-5393. Flood of util.RackResolver log messages after SPARK-1714
Previously I had tried to solve this with by adding a line in Spark's log4j-defaults.properties.

The issue with the message in log4j-defaults.properties was that the log4j.properties packaged inside Hadoop was getting picked up instead. While it would be ideal to fix that as well, we still want to quiet this in situations where a user supplies their own custom log4j properties.

Author: Sandy Ryza <sandy@cloudera.com>

Closes #4192 from sryza/sandy-spark-5393 and squashes the following commits:

4d5dedc [Sandy Ryza] Only set log level if unset
46e07c5 [Sandy Ryza] SPARK-5393. Flood of util.RackResolver log messages after SPARK-1714
2015-01-30 11:31:54 -06:00
Davies Liu 5c746eedda [SPARK-5395] [PySpark] fix python process leak while coalesce()
Currently, the Python process is released into pool only after the task had finished, it cause many process forked if coalesce() is called.

This PR will change it to release the process as soon as read all the data from it (finish the partition), then a process could be reused to process multiple partitions in a single task.

Author: Davies Liu <davies@databricks.com>

Closes #4238 from davies/py_leak and squashes the following commits:

ec80a43 [Davies Liu] add @volatile
6da437a [Davies Liu] address comments
24ed322 [Davies Liu] fix python process leak while coalesce()
2015-01-29 17:28:37 -08:00
Patrick Wendell d2071e8f45 Revert "[WIP] [SPARK-3996]: Shade Jetty in Spark deliverables"
This reverts commit f240fe390b.
2015-01-29 17:14:27 -08:00
Patrick Wendell f240fe390b [WIP] [SPARK-3996]: Shade Jetty in Spark deliverables
This patch piggy-back's on vanzin's work to simplify the Guava shading,
and adds Jetty as a shaded library in Spark. Other than adding Jetty,
it consilidates the \<artifactSet\>'s into the root pom. I found it was
a bit easier to follow that way, since you don't need to look into
child pom's to find out specific artifact sets included in shading.

Author: Patrick Wendell <patrick@databricks.com>

Closes #4252 from pwendell/jetty and squashes the following commits:

19f0710 [Patrick Wendell] More code review feedback
961452d [Patrick Wendell] Responding to feedback from Marcello
6df25ca [Patrick Wendell] [WIP] [SPARK-3996]: Shade Jetty in Spark deliverables
2015-01-29 16:31:19 -08:00
Marcelo Vanzin f9e569452e [SPARK-5466] Add explicit guava dependencies where needed.
One side-effect of shading guava is that it disappears as a transitive
dependency. For Hadoop 2.x, this was masked by the fact that Hadoop
itself depends on guava. But certain versions of Hadoop 1.x also
shade guava, leaving either no guava or some random version pulled
by another dependency on the classpath.

So be explicit about the dependency in modules that use guava directly,
which is the right thing to do anyway.

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #4272 from vanzin/SPARK-5466 and squashes the following commits:

e3f30e5 [Marcelo Vanzin] Dependency for catalyst is not needed.
d3b2c84 [Marcelo Vanzin] [SPARK-5466] Add explicit guava dependencies where needed.
2015-01-29 13:00:45 -08:00
Xiangrui Meng 4ee79c71af [SPARK-5430] move treeReduce and treeAggregate from mllib to core
We have seen many use cases of `treeAggregate`/`treeReduce` outside the ML domain. Maybe it is time to move them to Core. pwendell

Author: Xiangrui Meng <meng@databricks.com>

Closes #4228 from mengxr/SPARK-5430 and squashes the following commits:

20ad40d [Xiangrui Meng] exclude tree* from mima
e89a43e [Xiangrui Meng] fix compile and update java doc
3ae1a4b [Xiangrui Meng] add treeReduce/treeAggregate to Python
6f948c5 [Xiangrui Meng] add treeReduce/treeAggregate to JavaRDDLike
d600b6c [Xiangrui Meng] move treeReduce and treeAggregate to core
2015-01-28 17:26:03 -08:00
Michael Nazario e023112d33 [SPARK-5441][pyspark] Make SerDeUtil PairRDD to Python conversions more robust
SerDeUtil.pairRDDToPython and SerDeUtil.pythonToPairRDD now both support empty RDDs by checking the result of take(1) instead of calling first which throws an exception.

Author: Michael Nazario <mnazario@palantir.com>

Closes #4236 from mnazario/feature/empty-first and squashes the following commits:

a531c0c [Michael Nazario] Added regression tests for SPARK-5441
e3b2fb6 [Michael Nazario] Added acceptance of the empty case
2015-01-28 13:58:46 -08:00
Ryan Williams a731314c31 [SPARK-5417] Remove redundant executor-id set() call
This happens inside SparkEnv initialization as of #4194

Author: Ryan Williams <ryan.blake.williams@gmail.com>

Closes #4213 from ryan-williams/exec-id-set and squashes the following commits:

b3e4f7b [Ryan Williams] Remove redundant executor-id set() call
2015-01-28 13:04:52 -08:00
Andrew Or 84b6ecdef6 [SPARK-5437] Fix DriverSuite and SparkSubmitSuite timeout issues
In DriverSuite, we currently set a timeout of 60 seconds. If after this time the process has not terminated, we leak the process because we never destroy it.

In SparkSubmitSuite, we currently do not have a timeout so the test can hang indefinitely.

Author: Andrew Or <andrew@databricks.com>

Closes #4230 from andrewor14/fix-driver-suite and squashes the following commits:

f5c80fd [Andrew Or] Fix timeout behaviors in both suites
8092c36 [Andrew Or] Stop SparkContext after every individual test
2015-01-28 12:53:22 -08:00
Sean Owen 9b18009b83 SPARK-1934 [CORE] "this" reference escape to "selectorThread" during construction in ConnectionManager
This change reshuffles the order of initialization in `ConnectionManager` so that the last thing that happens is running `selectorThread`, which invokes a method that relies on object state in `ConnectionManager`

zsxwing also reported a similar problem in `BlockManager` in the JIRA, but I can't find a similar pattern there. Maybe it was subsequently fixed?

Author: Sean Owen <sowen@cloudera.com>

Closes #4225 from srowen/SPARK-1934 and squashes the following commits:

c4dec3b [Sean Owen] Init all object state in ConnectionManager constructor before starting thread in constructor that accesses object's state
2015-01-28 12:44:35 -08:00
Winston Chen 453d7999b8 [SPARK-5361]Multiple Java RDD <-> Python RDD conversions not working correctly
This is found through reading RDD from `sc.newAPIHadoopRDD` and writing it back using `rdd.saveAsNewAPIHadoopFile` in pyspark.

It turns out that whenever there are multiple RDD conversions from JavaRDD to PythonRDD then back to JavaRDD, the exception below happens:

```
15/01/16 10:28:31 ERROR Executor: Exception in task 0.0 in stage 3.0 (TID 7)
java.lang.ClassCastException: [Ljava.lang.Object; cannot be cast to java.util.ArrayList
	at org.apache.spark.api.python.SerDeUtil$$anonfun$pythonToJava$1$$anonfun$apply$1.apply(SerDeUtil.scala:157)
	at org.apache.spark.api.python.SerDeUtil$$anonfun$pythonToJava$1$$anonfun$apply$1.apply(SerDeUtil.scala:153)
	at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:308)
```

The test case code below reproduces it:

```
from pyspark.rdd import RDD

dl = [
    (u'2', {u'director': u'David Lean'}),
    (u'7', {u'director': u'Andrew Dominik'})
]

dl_rdd = sc.parallelize(dl)
tmp = dl_rdd._to_java_object_rdd()
tmp2 = sc._jvm.SerDe.javaToPython(tmp)
t = RDD(tmp2, sc)
t.count()

tmp = t._to_java_object_rdd()
tmp2 = sc._jvm.SerDe.javaToPython(tmp)
t = RDD(tmp2, sc)
t.count() # it blows up here during the 2nd time of conversion
```

Author: Winston Chen <wchen@quid.com>

Closes #4146 from wingchen/master and squashes the following commits:

903df7d [Winston Chen] SPARK-5361, update to toSeq based on the PR
5d90a83 [Winston Chen] SPARK-5361, make python pretty, so to pass PEP 8 checks
126be6b [Winston Chen] SPARK-5361, add in test case
4cf1187 [Winston Chen] SPARK-5361, add in test case
9f1a097 [Winston Chen] add in tuple handling while converting form python RDD back to JavaRDD
2015-01-28 11:08:44 -08:00
Kousuke Saruta 0b35fcd7f0 [SPARK-5291][CORE] Add timestamp and reason why an executor is removed to SparkListenerExecutorAdded and SparkListenerExecutorRemoved
Recently `SparkListenerExecutorAdded` and `SparkListenerExecutorRemoved` are added.
I think it's useful if they have timestamp and the reason why an executor is removed.

Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>

Closes #4082 from sarutak/SPARK-5291 and squashes the following commits:

a026ff2 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-5291
979dfe1 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-5291
cf9f9080 [Kousuke Saruta] Fixed test case
1f2a89b [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-5291
243f2a60 [Kousuke Saruta] Modified MesosSchedulerBackendSuite
a527c35 [Kousuke Saruta] Added timestamp to SparkListenerExecutorAdded
2015-01-28 11:02:51 -08:00
Marcelo Vanzin 37a5e272f8 [SPARK-4809] Rework Guava library shading.
The current way of shading Guava is a little problematic. Code that
depends on "spark-core" does not see the transitive dependency, yet
classes in "spark-core" actually depend on Guava. So it's a little
tricky to run unit tests that use spark-core classes, since you need
a compatible version of Guava in your dependencies when running the
tests. This can become a little tricky, and is kind of a bad user
experience.

This change modifies the way Guava is shaded so that it's applied
uniformly across the Spark build. This means Guava is shaded inside
spark-core itself, so that the dependency issues above are solved.
Aside from that, all Spark sub-modules have their Guava references
relocated, so that they refer to the relocated classes now packaged
inside spark-core. Before, this was only done by the time the assembly
was built, so projects that did not end up inside the assembly (such
as streaming backends) could still reference the original location
of Guava classes.

The Guava classes are added to the "first" artifact Spark generates
(network-common), so that all downstream modules have the needed
classes available. Since "network-common" is a dependency of spark-core,
all Spark apps should get the relocated classes automatically.

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #3658 from vanzin/SPARK-4809 and squashes the following commits:

3c93e42 [Marcelo Vanzin] Shade Guava in the network-common artifact.
5d69ec9 [Marcelo Vanzin] Merge branch 'master' into SPARK-4809
b3104fc [Marcelo Vanzin] Add comment.
941848f [Marcelo Vanzin] Merge branch 'master' into SPARK-4809
f78c48a [Marcelo Vanzin] Merge branch 'master' into SPARK-4809
8053dd4 [Marcelo Vanzin] Merge branch 'master' into SPARK-4809
107d7da [Marcelo Vanzin] Add fix for SPARK-5052 (PR #3874).
40b8723 [Marcelo Vanzin] Merge branch 'master' into SPARK-4809
4a4ed42 [Marcelo Vanzin] [SPARK-4809] Rework Guava library shading.
2015-01-28 00:29:29 -08:00
Sandy Ryza b1b35ca2e4 SPARK-5199. FS read metrics should support CombineFileSplits and track bytes from all FSs
...mbineFileSplits

Author: Sandy Ryza <sandy@cloudera.com>

Closes #4050 from sryza/sandy-spark-5199 and squashes the following commits:

864514b [Sandy Ryza] Add tests and fix bug
0d504f1 [Sandy Ryza] Prettify
915c7e6 [Sandy Ryza] Get metrics from all filesystems
cdbc3e8 [Sandy Ryza] SPARK-5199. Input metrics should show up for InputFormats that return CombineFileSplits
2015-01-27 15:42:55 -08:00
Elmer Garduno 661e0fca5d [SPARK-5052] Add common/base classes to fix guava methods signatures.
Fixes problems with incorrect method signatures related to shaded classes. For discussion see the jira issue.

Author: Elmer Garduno <elmerg@google.com>

Closes #3874 from elmer-garduno/fix_guava_signatures and squashes the following commits:

aa5d8e0 [Elmer Garduno] Unshade common/base[Function|Supplier] classes to fix guava methods signatures.
2015-01-26 17:40:48 -08:00
Sean Owen 0497ea51ac SPARK-960 [CORE] [TEST] JobCancellationSuite "two jobs sharing the same stage" is broken
This reenables and fixes this test, after addressing two issues:

- The Semaphore that was intended to be shared locally was being serialized and copied; it's now a static member in the companion object as in other tests
- Later changes to Spark means that cancelling the first task will not cancel the shared stage and therefore the second task should succeed

Author: Sean Owen <sowen@cloudera.com>

Closes #4180 from srowen/SPARK-960 and squashes the following commits:

43da66f [Sean Owen] Fix 'two jobs sharing the same stage' test and reenable it: truly share a Semaphore locally as intended, and update expectation of failure in non-cancelled task
2015-01-26 14:32:27 -08:00
Sean Owen 54e7b456dd SPARK-4147 [CORE] Reduce log4j dependency
Defer use of log4j class until it's known that log4j 1.2 is being used. This may avoid dealing with log4j dependencies for callers that reroute slf4j to another logging framework. The only change is to push one half of the check in the original `if` condition inside. This is a trivial change, may or may not actually solve a problem, but I think it's all that makes sense to do for SPARK-4147.

Author: Sean Owen <sowen@cloudera.com>

Closes #4190 from srowen/SPARK-4147 and squashes the following commits:

4e99942 [Sean Owen] Defer use of log4j class until it's known that log4j 1.2 is being used. This may avoid dealing with log4j dependencies for callers that reroute slf4j to another logging framework.
2015-01-26 14:23:42 -08:00
Davies Liu 142093179a [SPARK-5355] use j.u.c.ConcurrentHashMap instead of TrieMap
j.u.c.ConcurrentHashMap is more battle tested.

cc rxin JoshRosen pwendell

Author: Davies Liu <davies@databricks.com>

Closes #4208 from davies/safe-conf and squashes the following commits:

c2182dc [Davies Liu] address comments, fix tests
3a1d821 [Davies Liu] fix test
da14ced [Davies Liu] Merge branch 'master' of github.com:apache/spark into safe-conf
ae4d305 [Davies Liu] change to j.u.c.ConcurrentMap
f8fa1cf [Davies Liu] change to TrieMap
a1d769a [Davies Liu] make SparkConf thread-safe
2015-01-26 12:51:32 -08:00
CodingCat 8df9435512 [SPARK-5268] don't stop CoarseGrainedExecutorBackend for irrelevant DisassociatedEvent
https://issues.apache.org/jira/browse/SPARK-5268

In CoarseGrainedExecutorBackend, we subscribe DisassociatedEvent in executor backend actor and exit the program upon receive such event...

let's consider the following case

The user may develop an Akka-based program which starts the actor with Spark's actor system and communicate with an external actor system (e.g. an Akka-based receiver in spark streaming which communicates with an external system) If the external actor system fails or disassociates with the actor within spark's system with purpose, we may receive DisassociatedEvent and the executor is restarted.

This is not the expected behavior.....

----

This is a simple fix to check the event before making the quit decision

Author: CodingCat <zhunansjtu@gmail.com>

Closes #4063 from CodingCat/SPARK-5268 and squashes the following commits:

4d7d48e [CodingCat] simplify the log
18c36f4 [CodingCat] more descriptive log
f299e0b [CodingCat] clean log
1632e79 [CodingCat] check whether DisassociatedEvent is relevant before quit
2015-01-25 19:28:53 -08:00
Kay Ousterhout fc2168f04e [SPARK-5326] Show fetch wait time as optional metric in the UI
With this change, here's what the UI looks like:

![image](https://cloud.githubusercontent.com/assets/1108612/5809994/1ec8a904-9ff4-11e4-8f24-6a59a1a858f7.png)

If you want to locally test this, you need to spin up multiple executors, because the shuffle read metrics are only shown for data read remotely.

Author: Kay Ousterhout <kayousterhout@gmail.com>

Closes #4110 from kayousterhout/SPARK-5326 and squashes the following commits:

610051e [Kay Ousterhout] Josh style comments
5feaa28 [Kay Ousterhout] What is the difference here??
aa129cb [Kay Ousterhout] Removed inadvertent change
721c742 [Kay Ousterhout] Improved tooltip
f3a7111 [Kay Ousterhout] Style fix
679b4e9 [Kay Ousterhout] [SPARK-5326] Show fetch wait time as optional metric in the UI
2015-01-25 16:48:26 -08:00
Kousuke Saruta 8f5c827b01 [SPARK-5344][WebUI] HistoryServer cannot recognize that inprogress file was renamed to completed file
`FsHistoryProvider` tries to update application status but if `checkForLogs` is called before `.inprogress` file is renamed to completed file, the file is not recognized as completed.

Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>

Closes #4132 from sarutak/SPARK-5344 and squashes the following commits:

9658008 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-5344
d2c72b6 [Kousuke Saruta] Fixed update issue of FsHistoryProvider
2015-01-25 15:34:20 -08:00
Ryan Williams aea25482c3 [SPARK-5402] log executor ID at executor-construction time
also rename "slaveHostname" to "executorHostname"

Author: Ryan Williams <ryan.blake.williams@gmail.com>

Closes #4195 from ryan-williams/exec and squashes the following commits:

e60a7bb [Ryan Williams] log executor ID at executor-construction time
2015-01-25 14:20:02 -08:00
Ryan Williams 2d9887bae5 [SPARK-5401] set executor ID before creating MetricsSystem
Author: Ryan Williams <ryan.blake.williams@gmail.com>

Closes #4194 from ryan-williams/metrics and squashes the following commits:

7c5a33f [Ryan Williams] set executor ID before creating MetricsSystem
2015-01-25 14:17:59 -08:00
Idan Zalzberg 412a58e118 Add comment about defaultMinPartitions
Added a comment about using math.min for choosing default partition count

Author: Idan Zalzberg <idanzalz@gmail.com>

Closes #4102 from idanz/patch-2 and squashes the following commits:

50e9d58 [Idan Zalzberg] Update SparkContext.scala
2015-01-25 11:28:05 -08:00
zsxwing 0d1e67ee9b [SPARK-5214][Test] Add a test to demonstrate EventLoop can be stopped in the event thread
Author: zsxwing <zsxwing@gmail.com>

Closes #4174 from zsxwing/SPARK-5214-unittest and squashes the following commits:

443e564 [zsxwing] Change the check interval to 5ms
7aaa2d7 [zsxwing] Add a test to demonstrate EventLoop can be stopped in the event thread
2015-01-24 11:00:35 -08:00
Josh Rosen cef1f092a6 [SPARK-5063] More helpful error messages for several invalid operations
This patch adds more helpful error messages for invalid programs that define nested RDDs, broadcast RDDs, perform actions inside of transformations (e.g. calling `count()` from inside of `map()`), and call certain methods on stopped SparkContexts.  Currently, these invalid programs lead to confusing NullPointerExceptions at runtime and have been a major source of questions on the mailing list and StackOverflow.

In a few cases, I chose to log warnings instead of throwing exceptions in order to avoid any chance that this patch breaks programs that worked "by accident" in earlier Spark releases (e.g. programs that define nested RDDs but never run any jobs with them).

In SparkContext, the new `assertNotStopped()` method is used to check whether methods are being invoked on a stopped SparkContext.  In some cases, user programs will not crash in spite of calling methods on stopped SparkContexts, so I've only added `assertNotStopped()` calls to methods that always throw exceptions when called on stopped contexts (e.g. by dereferencing a null `dagScheduler` pointer).

Author: Josh Rosen <joshrosen@databricks.com>

Closes #3884 from JoshRosen/SPARK-5063 and squashes the following commits:

a38774b [Josh Rosen] Fix spelling typo
a943e00 [Josh Rosen] Convert two exceptions into warnings in order to avoid breaking user programs in some edge-cases.
2d0d7f7 [Josh Rosen] Fix test to reflect 1.2.1 compatibility
3f0ea0c [Josh Rosen] Revert two unintentional formatting changes
8e5da69 [Josh Rosen] Remove assertNotStopped() calls for methods that were sometimes safe to call on stopped SC's in Spark 1.2
8cff41a [Josh Rosen] IllegalStateException fix
6ef68d0 [Josh Rosen] Fix Python line length issues.
9f6a0b8 [Josh Rosen] Add improved error messages to PySpark.
13afd0f [Josh Rosen] SparkException -> IllegalStateException
8d404f3 [Josh Rosen] Merge remote-tracking branch 'origin/master' into SPARK-5063
b39e041 [Josh Rosen] Fix BroadcastSuite test which broadcasted an RDD
99cc09f [Josh Rosen] Guard against calling methods on stopped SparkContexts.
34833e8 [Josh Rosen] Add more descriptive error message.
57cc8a1 [Josh Rosen] Add error message when directly broadcasting RDD.
15b2e6b [Josh Rosen] [SPARK-5063] Useful error messages for nested RDDs and actions inside of transformations
2015-01-23 17:53:15 -08:00
Davies Liu 9bad062268 [SPARK-5355] make SparkConf thread-safe
The SparkConf is not thread-safe, but is accessed by many threads. The getAll() could return parts of the configs if another thread is access it.

This PR changes SparkConf.settings to a thread-safe TrieMap.

Author: Davies Liu <davies@databricks.com>

Closes #4143 from davies/safe-conf and squashes the following commits:

f8fa1cf [Davies Liu] change to TrieMap
a1d769a [Davies Liu] make SparkConf thread-safe
2015-01-21 16:51:54 -08:00
wangfei 3be2a887bf [SPARK-4984][CORE][WEBUI] Adding a pop-up containing the full job description when it is very long
In some case the job description will be very long, such as a long sql. refer to #3718
This PR add a pop-up for job description when it is long.

![image](https://cloud.githubusercontent.com/assets/7018048/5847400/c757cbbc-a207-11e4-891f-528821c2e68d.png)

![image](https://cloud.githubusercontent.com/assets/7018048/5847409/d434b2b4-a207-11e4-8813-03a74b43d766.png)

Author: wangfei <wangfei1@huawei.com>

Closes #3819 from scwf/popup-descrip-ui and squashes the following commits:

ba02b83 [wangfei] address comments
a7c5e7b [wangfei] spot that it's been truncated
fbf6162 [wangfei] Merge branch 'master' into popup-descrip-ui
0bca96d [wangfei] remove no use val
4b55c3b [wangfei] fix style issue
353c6f4 [wangfei] pop up the description of job with a styled read-only text form field
2015-01-21 15:27:42 -08:00
Sandy Ryza 2eeada373e SPARK-1714. Take advantage of AMRMClient APIs to simplify logic in YarnA...
...llocator

The goal of this PR is to simplify YarnAllocator as much as possible and get it up to the level of code quality we see in the rest of Spark.

In service of this, it does a few things:
* Uses AMRMClient APIs for matching containers to requests.
* Adds calls to AMRMClient.removeContainerRequest so that, when we use a container, we don't end up requesting it again.
* Removes YarnAllocator's host->rack cache. YARN's RackResolver already does this caching, so this is redundant.
* Adds tests for basic YarnAllocator functionality.
* Breaks up the allocateResources method, which was previously nearly 300 lines.
* A little bit of stylistic cleanup.
* Fixes a bug that causes three times the requests to be filed when preferred host locations are given.

The patch is lossy. In particular, it loses the logic for trying to avoid containers bunching up on nodes. As I understand it, the logic that's gone is:

* If, in a single response from the RM, we receive a set of containers on a node, and prefer some number of containers on that node greater than 0 but less than the number we received, give back the delta between what we preferred and what we received.

This seems like a weird way to avoid bunching E.g. it does nothing to avoid bunching when we don't request containers on particular nodes.

Author: Sandy Ryza <sandy@cloudera.com>

Closes #3765 from sryza/sandy-spark-1714 and squashes the following commits:

32a5942 [Sandy Ryza] Muffle RackResolver logs
74f56dd [Sandy Ryza] Fix a couple comments and simplify requestTotalExecutors
60ea4bd [Sandy Ryza] Fix scalastyle
ca35b53 [Sandy Ryza] Simplify further
e9cf8a6 [Sandy Ryza] Fix YarnClusterSuite
257acf3 [Sandy Ryza] Remove locality stuff and more cleanup
59a3c5e [Sandy Ryza] Take out rack stuff
5f72fd5 [Sandy Ryza] Further documentation and cleanup
89edd68 [Sandy Ryza] SPARK-1714. Take advantage of AMRMClient APIs to simplify logic in YarnAllocator
2015-01-21 10:31:54 -06:00
WangTao 8c06a5faac [SPARK-5336][YARN]spark.executor.cores must not be less than spark.task.cpus
https://issues.apache.org/jira/browse/SPARK-5336

Author: WangTao <barneystinson@aliyun.com>
Author: WangTaoTheTonic <barneystinson@aliyun.com>

Closes #4123 from WangTaoTheTonic/SPARK-5336 and squashes the following commits:

6c9676a [WangTao] Update ClientArguments.scala
9632d3a [WangTaoTheTonic] minor comment fix
d03d6fa [WangTaoTheTonic] import ordering should be alphabetical'
3112af9 [WangTao] spark.executor.cores must not be less than spark.task.cpus
2015-01-21 09:42:30 -06:00
Kousuke Saruta 9a151ce58b [SPARK-5294][WebUI] Hide tables in AllStagePages for "Active Stages, Completed Stages and Failed Stages" when they are empty
Related to SPARK-5228 and #4028, `AllStagesPage` also should hide the table for  `ActiveStages`, `CompleteStages` and `FailedStages` when they are empty.

Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>

Closes #4083 from sarutak/SPARK-5294 and squashes the following commits:

a7625c1 [Kousuke Saruta] Fixed conflicts
2015-01-20 16:40:46 -08:00
Kousuke Saruta 769aced9e7 [SPARK-5329][WebUI] UIWorkloadGenerator should stop SparkContext.
UIWorkloadGenerator don't stop SparkContext. I ran UIWorkloadGenerator and try to watch the result at WebUI but Jobs are marked as finished.
It's because SparkContext is not stopped.

Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>

Closes #4112 from sarutak/SPARK-5329 and squashes the following commits:

bcc0fa9 [Kousuke Saruta] Disabled scalastyle for a bock comment
86a3b95 [Kousuke Saruta] Fixed UIWorkloadGenerator to stop SparkContext in it
2015-01-20 12:40:55 -08:00
Jacek Lewandowski c93a57f0d6 SPARK-4660: Use correct class loader in JavaSerializer (copy of PR #3840...
... by Piotr Kolaczkowski)

Author: Jacek Lewandowski <lewandowski.jacek@gmail.com>

Closes #4113 from jacek-lewandowski/SPARK-4660-master and squashes the following commits:

a5e84ca [Jacek Lewandowski] SPARK-4660: Use correct class loader in JavaSerializer (copy of PR #3840 by Piotr Kolaczkowski)
2015-01-20 12:38:01 -08:00
Jongyoul Lee 9d9294aebf [SPARK-5333][Mesos] MesosTaskLaunchData occurs BufferUnderflowException
- Rewind ByteBuffer before making ByteString

(This fixes a bug introduced in #3849 / SPARK-4014)

Author: Jongyoul Lee <jongyoul@gmail.com>

Closes #4119 from jongyoul/SPARK-5333 and squashes the following commits:

c6693a8 [Jongyoul Lee] [SPARK-5333][Mesos] MesosTaskLaunchData occurs BufferUnderflowException - changed logDebug location
4141f58 [Jongyoul Lee] [SPARK-5333][Mesos] MesosTaskLaunchData occurs BufferUnderflowException - Added license information
2190606 [Jongyoul Lee] [SPARK-5333][Mesos] MesosTaskLaunchData occurs BufferUnderflowException - Adjusted imported libraries
b7f5517 [Jongyoul Lee] [SPARK-5333][Mesos] MesosTaskLaunchData occurs BufferUnderflowException - Rewind ByteBuffer before making ByteString
2015-01-20 10:18:10 -08:00
Sean Owen 306ff187af SPARK-5270 [CORE] Provide isEmpty() function in RDD API
Pretty minor, but submitted for consideration -- this would at least help people make this check in the most efficient way I know.

Author: Sean Owen <sowen@cloudera.com>

Closes #4074 from srowen/SPARK-5270 and squashes the following commits:

66885b8 [Sean Owen] Add note that JavaRDDLike should not be implemented by user code
2e9b490 [Sean Owen] More tests, and Mima-exclude the new isEmpty method in JavaRDDLike
28395ff [Sean Owen] Add isEmpty to Java, Python
7dd04b7 [Sean Owen] Add efficient RDD.isEmpty()
2015-01-19 22:50:45 -08:00
zsxwing e69fb8c75a [SPARK-5214][Core] Add EventLoop and change DAGScheduler to an EventLoop
This PR adds a simple `EventLoop` and use it to replace Actor in DAGScheduler. `EventLoop` is a general class to support that posting events in multiple threads and handling events in a single event thread.

Author: zsxwing <zsxwing@gmail.com>

Closes #4016 from zsxwing/event-loop and squashes the following commits:

aefa1ce [zsxwing] Add protected to on*** methods
5cfac83 [zsxwing] Remove null check of eventProcessLoop
dba35b2 [zsxwing] Add a test that onReceive swallows InterruptException
460f7b3 [zsxwing] Use volatile instead of Atomic things in unit tests
227bf33 [zsxwing] Add a stop flag and some tests
37f79c6 [zsxwing] Fix docs
55fb6f6 [zsxwing] Add private[spark] to EventLoop
1f73eac [zsxwing] Fix the import order
3b2e59c [zsxwing] Add EventLoop and change DAGScheduler to an EventLoop
2015-01-19 18:15:51 -08:00
Jongyoul Lee 4a4f9ccba2 [SPARK-5088] Use spark-class for running executors directly
Author: Jongyoul Lee <jongyoul@gmail.com>

Closes #3897 from jongyoul/SPARK-5088 and squashes the following commits:

8232aa8 [Jongyoul Lee] [SPARK-5088] Use spark-class for running executors directly - Added a listenerBus for fixing test cases
932289f [Jongyoul Lee] [SPARK-5088] Use spark-class for running executors directly - Rebased from master
613cb47 [Jongyoul Lee] [SPARK-5088] Use spark-class for running executors directly - Fixed code if spark.executor.uri doesn't have any value - Added test cases
ff57bda [Jongyoul Lee] [SPARK-5088] Use spark-class for running executors directly - Adjusted orders of import
97e4bd4 [Jongyoul Lee] [SPARK-5088] Use spark-class for running executors directly - Changed command for using spark-class directly - Delete sbin/spark-executor and moved some codes into spark-class' case statement
2015-01-19 02:01:56 -08:00
Ilya Ganelin 3453d578ad [SPARK-3288] All fields in TaskMetrics should be private and use getters/setters
I've updated the fields and all usages of these fields in the Spark code. I've verified that this did not break anything on my local repo.

Author: Ilya Ganelin <ilya.ganelin@capitalone.com>

Closes #4020 from ilganeli/SPARK-3288 and squashes the following commits:

39f3810 [Ilya Ganelin] resolved merge issues
e446287 [Ilya Ganelin] Merge remote-tracking branch 'upstream/master' into SPARK-3288
b8c05cb [Ilya Ganelin] Missed making a variable private
6444391 [Ilya Ganelin] Made inc/dec functions private[spark]
1149e78 [Ilya Ganelin] Merge remote-tracking branch 'upstream/master' into SPARK-3288
26b312b [Ilya Ganelin] Debugging tests
17146c2 [Ilya Ganelin] Merge remote-tracking branch 'upstream/master' into SPARK-3288
5525c20 [Ilya Ganelin] Completed refactoring to make vars in TaskMetrics class private
c64da4f [Ilya Ganelin] Partially updated task metrics to make some vars private
2015-01-19 01:32:36 -08:00
Prashant Sharma 851b6a9bba SPARK-5217 Spark UI should report pending stages during job execution on AllStagesPage.
![screenshot from 2015-01-16 13 43 25](https://cloud.githubusercontent.com/assets/992952/5773256/d61df300-9d85-11e4-9b5a-6730058839fa.png)

This is a first step towards having time remaining estimates for queued and running jobs. See SPARK-5216

Author: Prashant Sharma <prashant.s@imaginea.com>

Closes #4043 from ScrapCodes/SPARK-5216/5217-show-waiting-stages and squashes the following commits:

3b11803 [Prashant Sharma] Review feedback.
0992842 [Prashant Sharma] Switched to Linked hashmap, changed the order to active->pending->completed->failed. And changed pending stages to not reverse sort.
c19d82a [Prashant Sharma] SPARK-5217 Spark UI should report pending stages during job execution on AllStagesPage.
2015-01-19 01:28:42 -08:00
Kousuke Saruta ecf943d353 [WebUI] Fix collapse of WebUI layout
When we decrease the width of browsers, the header of WebUI wraps and collapses like as following image.

![2015-01-11 19 49 37](https://cloud.githubusercontent.com/assets/4736016/5698887/b0b9aeee-99cd-11e4-9020-08f3f0014de0.png)

Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>

Closes #3995 from sarutak/fixed-collapse-webui-layout and squashes the following commits:

3e60b5b [Kousuke Saruta] Modified line-height property in webui.css
7bfb5fb [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into fixed-collapse-webui-layout
5d83e18 [Kousuke Saruta] Fixed collapse of WebUI layout
2015-01-16 12:19:08 -08:00
Kousuke Saruta e8422c521b [SPARK-5231][WebUI] History Server shows wrong job submission time.
History Server doesn't show collect job submission time.
It's because `JobProgressListener` updates job submission time every time `onJobStart` method is invoked from `ReplayListenerBus`.

Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>

Closes #4029 from sarutak/SPARK-5231 and squashes the following commits:

0af9e22 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-5231
da8bd14 [Kousuke Saruta] Made submissionTime in SparkListenerJobStartas and completionTime in SparkListenerJobEnd as regular Long
0412a6a [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-5231
26b9b99 [Kousuke Saruta] Fixed the test cases
2d47bd3 [Kousuke Saruta] Fixed to record job submission time and completion time collectly
2015-01-16 10:05:11 -08:00
Ye Xianjin e200ac8e53 [SPARK-5201][CORE] deal with int overflow in the ParallelCollectionRDD.slice method
There is an int overflow in the ParallelCollectionRDD.slice method. That's originally reported by SaintBacchus.
```
sc.makeRDD(1 to (Int.MaxValue)).count       // result = 0
sc.makeRDD(1 to (Int.MaxValue - 1)).count   // result = 2147483646 = Int.MaxValue - 1
sc.makeRDD(1 until (Int.MaxValue)).count    // result = 2147483646 = Int.MaxValue - 1
```
see https://github.com/apache/spark/pull/2874 for more details.
This pr try to fix the overflow. However, There's another issue I don't address.
```
val largeRange = Int.MinValue to Int.MaxValue
largeRange.length // throws java.lang.IllegalArgumentException: -2147483648 to 2147483647 by 1: seqs cannot contain more than Int.MaxValue elements.
```

So, the range we feed to sc.makeRDD cannot contain more than Int.MaxValue elements. This is the limitation of Scala. However I think  we may want to support that kind of range. But the fix is beyond this pr.

srowen andrewor14 would you mind take a look at this pr?

Author: Ye Xianjin <advancedxy@gmail.com>

Closes #4002 from advancedxy/SPARk-5201 and squashes the following commits:

96265a1 [Ye Xianjin] Update slice method comment and some responding docs.
e143d7a [Ye Xianjin] Update inclusive range check for splitting inclusive range.
b3f5577 [Ye Xianjin] We can include the last element in the last slice in general for inclusive range, hence eliminate the need to check Int.MaxValue or Int.MinValue.
7d39b9e [Ye Xianjin] Convert the two cases pattern matching to one case.
651c959 [Ye Xianjin] rename sign to needsInclusiveRange. add some comments
196f8a8 [Ye Xianjin] Add test cases for ranges end with Int.MaxValue or Int.MinValue
e66e60a [Ye Xianjin] Deal with inclusive and exclusive ranges in one case. If the range is inclusive and the end of the range is (Int.MaxValue or Int.MinValue), we should use inclusive range instead of exclusive
2015-01-16 09:21:33 -08:00
WangTaoTheTonic 2be82b1e66 [SPARK-1507][YARN]specify # cores for ApplicationMaster
Based on top of changes in https://github.com/apache/spark/pull/3806.

https://issues.apache.org/jira/browse/SPARK-1507

`--driver-cores` and `spark.driver.cores` for all cluster modes and `spark.yarn.am.cores` for yarn client mode.

Author: WangTaoTheTonic <barneystinson@aliyun.com>
Author: WangTao <barneystinson@aliyun.com>

Closes #4018 from WangTaoTheTonic/SPARK-1507 and squashes the following commits:

01419d3 [WangTaoTheTonic] amend the args name
b255795 [WangTaoTheTonic] indet thing
d86557c [WangTaoTheTonic] some comments amend
43c9392 [WangTao] fix compile error
b39a100 [WangTao] specify # cores for ApplicationMaster
2015-01-16 09:16:56 -08:00
Kostas Sakellis a79a9f923c [SPARK-4092] [CORE] Fix InputMetrics for coalesce'd Rdds
When calculating the input metrics there was an assumption that one task only reads from one block - this is not true for some operations including coalesce. This patch simply increments the task's input metrics if previous ones existed of the same read method.

A limitation to this patch is that if a task reads from two different blocks of different read methods, one will override the other.

Author: Kostas Sakellis <kostas@cloudera.com>

Closes #3120 from ksakellis/kostas-spark-4092 and squashes the following commits:

54e6658 [Kostas Sakellis] Drops metrics if conflicting read methods exist
f0e0cc5 [Kostas Sakellis] Add bytesReadCallback to InputMetrics
a2a36d4 [Kostas Sakellis] CR feedback
5a0c770 [Kostas Sakellis] [SPARK-4092] [CORE] Fix InputMetrics for coalesce'd Rdds
2015-01-15 18:48:39 -08:00
Kostas Sakellis 96c2c714f4 [SPARK-4857] [CORE] Adds Executor membership events to SparkListener
Adds onExecutorAdded and onExecutorRemoved events to the SparkListener. This will allow a client to get notified when an executor has been added/removed and provide additional information such as how many vcores it is consuming.

In addition, this commit adds a SparkListenerAdapter to the Java API that provides default implementations to the SparkListener. This is to get around the fact that default implementations for traits don't work in Java. Having Java clients extend SparkListenerAdapter moving forward will prevent breakage in java when we add new events to SparkListener.

Author: Kostas Sakellis <kostas@cloudera.com>

Closes #3711 from ksakellis/kostas-spark-4857 and squashes the following commits:

946d2c5 [Kostas Sakellis] Added executorAdded/Removed events to MesosSchedulerBackend
b1d054a [Kostas Sakellis] Remove executorInfo from ExecutorRemoved event
1727b38 [Kostas Sakellis] Renamed ExecutorDetails back to ExecutorInfo and other CR feedback
14fe78d [Kostas Sakellis] Added executor added/removed events to json protocol
93d087b [Kostas Sakellis] [SPARK-4857] [CORE] Adds Executor membership events to SparkListener
2015-01-15 17:53:42 -08:00
Kousuke Saruta 65858ba555 [Minor] Fix tiny typo in BlockManager
In BlockManager, there is a word `BlockTranserService` but I think it's typo for `BlockTransferService`.

Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>

Closes #4046 from sarutak/fix-tiny-typo and squashes the following commits:

a3e2a2f [Kousuke Saruta] Fixed tiny typo in BlockManager
2015-01-15 17:07:44 -08:00
Josh Rosen 259936be71 [SPARK-4014] Add TaskContext.attemptNumber and deprecate TaskContext.attemptId
`TaskContext.attemptId` is misleadingly-named, since it currently returns a taskId, which uniquely identifies a particular task attempt within a particular SparkContext, instead of an attempt number, which conveys how many times a task has been attempted.

This patch deprecates `TaskContext.attemptId` and add `TaskContext.taskId` and `TaskContext.attemptNumber` fields.  Prior to this change, it was impossible to determine whether a task was being re-attempted (or was a speculative copy), which made it difficult to write unit tests for tasks that fail on early attempts or speculative tasks that complete faster than original tasks.

Earlier versions of the TaskContext docs suggest that `attemptId` behaves like `attemptNumber`, so there's an argument to be made in favor of changing this method's implementation.  Since we've decided against making that change in maintenance branches, I think it's simpler to add better-named methods and retain the old behavior for `attemptId`; if `attemptId` behaved differently in different branches, then this would cause confusing build-breaks when backporting regression tests that rely on the new `attemptId` behavior.

Most of this patch is fairly straightforward, but there is a bit of trickiness related to Mesos tasks: since there's no field in MesosTaskInfo to encode the attemptId, I packed it into the `data` field alongside the task binary.

Author: Josh Rosen <joshrosen@databricks.com>

Closes #3849 from JoshRosen/SPARK-4014 and squashes the following commits:

89d03e0 [Josh Rosen] Merge remote-tracking branch 'origin/master' into SPARK-4014
5cfff05 [Josh Rosen] Introduce wrapper for serializing Mesos task launch data.
38574d4 [Josh Rosen] attemptId -> taskAttemptId in PairRDDFunctions
a180b88 [Josh Rosen] Merge remote-tracking branch 'origin/master' into SPARK-4014
1d43aa6 [Josh Rosen] Merge remote-tracking branch 'origin/master' into SPARK-4014
eee6a45 [Josh Rosen] Merge remote-tracking branch 'origin/master' into SPARK-4014
0b10526 [Josh Rosen] Use putInt instead of putLong (silly mistake)
8c387ce [Josh Rosen] Use local with maxRetries instead of local-cluster.
cbe4d76 [Josh Rosen] Preserve attemptId behavior and deprecate it:
b2dffa3 [Josh Rosen] Address some of Reynold's minor comments
9d8d4d1 [Josh Rosen] Doc typo
1e7a933 [Josh Rosen] [SPARK-4014] Change TaskContext.attemptId to return attempt number instead of task ID.
fd515a5 [Josh Rosen] Add failing test for SPARK-4014
2015-01-14 11:45:40 -08:00
Kousuke Saruta 9d4449c4b3 [SPARK-5228][WebUI] Hide tables for "Active Jobs/Completed Jobs/Failed Jobs" when they are empty
In current WebUI, tables for Active Stages, Completed Stages, Skipped Stages and Failed Stages are hidden when they are empty while tables for Active Jobs, Completed Jobs and Failed Jobs are not hidden though they are empty.

This is before my patch is applied.

![2015-01-13 14 13 03](https://cloud.githubusercontent.com/assets/4736016/5730793/2b73d6f4-9b32-11e4-9a24-1784d758c644.png)

And this is after my patch is applied.

![2015-01-13 14 38 13](https://cloud.githubusercontent.com/assets/4736016/5730797/359ea2da-9b32-11e4-97b0-544739ddbf4c.png)

Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>

Closes #4028 from sarutak/SPARK-5228 and squashes the following commits:

b1e6e8b [Kousuke Saruta] Fixed a small typo
daab563 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-5228
9493a1d [Kousuke Saruta] Modified AllJobPage.scala so that hide Active Jobs/Completed Jobs/Failed Jobs when they are empty
2015-01-14 11:10:29 -08:00
WangTaoTheTonic 9dea64e53a [SPARK-4697][YARN]System properties should override environment variables
I found some arguments in yarn module take environment variables before system properties while the latter override the former in core module.

Author: WangTaoTheTonic <barneystinson@aliyun.com>
Author: WangTao <barneystinson@aliyun.com>

Closes #3557 from WangTaoTheTonic/SPARK4697 and squashes the following commits:

836b9ef [WangTaoTheTonic] fix type mismatch
e3e486a [WangTaoTheTonic] remove the comma
1262d57 [WangTaoTheTonic] handle spark.app.name and SPARK_YARN_APP_NAME in SparkSubmitArguments
bee9447 [WangTaoTheTonic] wrong brace
81833bb [WangTaoTheTonic] rebase
40934b4 [WangTaoTheTonic] just switch blocks
5f43f45 [WangTao] System property can override environment variable
2015-01-13 09:50:14 -08:00
WangTaoTheTonic f7741a9a72 [SPARK-5006][Deploy]spark.port.maxRetries doesn't work
https://issues.apache.org/jira/browse/SPARK-5006

I think the issue is produced in https://github.com/apache/spark/pull/1777.

Not digging mesos's backend yet. Maybe should add same logic either.

Author: WangTaoTheTonic <barneystinson@aliyun.com>
Author: WangTao <barneystinson@aliyun.com>

Closes #3841 from WangTaoTheTonic/SPARK-5006 and squashes the following commits:

8cdf96d [WangTao] indent thing
2d86d65 [WangTaoTheTonic] fix line length
7cdfd98 [WangTaoTheTonic] fit for new HttpServer constructor
61a370d [WangTaoTheTonic] some minor fixes
bc6e1ec [WangTaoTheTonic] rebase
67bcb46 [WangTaoTheTonic] put conf at 3rd position, modify suite class, add comments
f450cd1 [WangTaoTheTonic] startServiceOnPort will use a SparkConf arg
29b751b [WangTaoTheTonic] rebase as ExecutorRunnableUtil changed to ExecutorRunnable
396c226 [WangTaoTheTonic] make the grammar more like scala
191face [WangTaoTheTonic] invalid value name
62ec336 [WangTaoTheTonic] spark.port.maxRetries doesn't work
2015-01-13 09:29:25 -08:00
Michael Armbrust a3978f3e15 [SPARK-5078] Optionally read from SPARK_LOCAL_HOSTNAME
Current spark lets you set the ip address using SPARK_LOCAL_IP, but then this is given to akka after doing a reverse DNS lookup. This makes it difficult to run spark in Docker. You can already change the hostname that is used programmatically, but it would be nice to be able to do this with an environment variable as well.

Author: Michael Armbrust <michael@databricks.com>

Closes #3893 from marmbrus/localHostnameEnv and squashes the following commits:

85045b6 [Michael Armbrust] Optionally read from SPARK_LOCAL_HOSTNAME
2015-01-12 11:57:59 -08:00
lianhuiwang ef9224e080 [SPARK-5102][Core]subclass of MapStatus needs to be registered with Kryo
CompressedMapStatus and HighlyCompressedMapStatus needs to be registered with Kryo, because they are subclass of MapStatus.

Author: lianhuiwang <lianhuiwang09@gmail.com>

Closes #4007 from lianhuiwang/SPARK-5102 and squashes the following commits:

9d2238a [lianhuiwang] remove register of MapStatus
05a285d [lianhuiwang] subclass of MapStatus needs to be registered with Kryo
2015-01-12 10:57:12 -08:00
zsxwing 6942b974ad [SPARK-4951][Core] Fix the issue that a busy executor may be killed
A few changes to fix this issue:

1. Handle the case that receiving `SparkListenerTaskStart` before `SparkListenerBlockManagerAdded`.
2. Don't add `executorId` to `removeTimes` when the executor is busy.
3. Use `HashMap.retain` to safely traverse the HashMap and remove items.
4. Use the same lock in ExecutorAllocationManager and ExecutorAllocationListener to fix the race condition in `totalPendingTasks`.
5. Move the blocking codes out of the message processing code in YarnSchedulerActor.

Author: zsxwing <zsxwing@gmail.com>

Closes #3783 from zsxwing/SPARK-4951 and squashes the following commits:

d51fa0d [zsxwing] Add comments
2e365ce [zsxwing] Remove expired executors from 'removeTimes' and add idle executors back when a new executor joins
49f61a9 [zsxwing] Eliminate duplicate executor registered warnings
d4c4e9a [zsxwing] Minor fixes for the code style
05f6238 [zsxwing] Move the blocking codes out of the message processing code
105ba3a [zsxwing] Fix the race condition in totalPendingTasks
d5c615d [zsxwing] Fix the issue that a busy executor may be killed
2015-01-11 16:23:28 -08:00
lewuathe 1656aae2b4 [SPARK-5073] spark.storage.memoryMapThreshold have two default value
Because major OS page sizes is about 4KB, the default value of spark.storage.memoryMapThreshold is integrated to 2 * 4096

Author: lewuathe <lewuathe@me.com>

Closes #3900 from Lewuathe/integrate-memoryMapThreshold and squashes the following commits:

e417acd [lewuathe] [SPARK-5073] Update docs/configuration
834aba4 [lewuathe] [SPARK-5073] Fix style
adcea33 [lewuathe] [SPARK-5073] Integrate memory map threshold to 2MB
fcce2e5 [lewuathe] [SPARK-5073] spark.storage.memoryMapThreshold have two default value
2015-01-11 13:50:42 -08:00
wangfei 92d9a704ce [SPARK-4871][SQL] Show sql statement in spark ui when run sql with spark-sql
Author: wangfei <wangfei1@huawei.com>

Closes #3718 from scwf/sparksqlui and squashes the following commits:

e0d6b5d [wangfei] format fix
383b505 [wangfei] fix conflicts
4d2038a [wangfei] using setJobDescription
df79837 [wangfei] fix compile error
92ce834 [wangfei] show sql statement in spark ui when run sql use spark-sql
2015-01-10 17:04:56 -08:00
mcheah e0f28e010c [SPARK-4737] Task set manager properly handles serialization errors
Dealing with [SPARK-4737], the handling of serialization errors should not be the DAGScheduler's responsibility. The task set manager now catches the error and aborts the stage.

If the TaskSetManager throws a TaskNotSerializableException, the TaskSchedulerImpl will return an empty list of task descriptions, because no tasks were started. The scheduler should abort the stage gracefully.

Note that I'm not too familiar with this part of the codebase and its place in the overall architecture of the Spark stack. If implementing it this way will have any averse side effects please voice that loudly.

Author: mcheah <mcheah@palantir.com>

Closes #3638 from mccheah/task-set-manager-properly-handle-ser-err and squashes the following commits:

1545984 [mcheah] Some more style fixes from Andrew Or.
5267929 [mcheah] Fixing style suggestions from Andrew Or.
dfa145b [mcheah] Fixing style from Josh Rosen's feedback
b2a430d [mcheah] Not returning empty seq when a task set cannot be serialized.
94844d7 [mcheah] Fixing compilation error, one brace too many
5f486f4 [mcheah] Adding license header for fake task class
bf5e706 [mcheah] Fixing indentation.
097e7a2 [mcheah] [SPARK-4737] Catching task serialization exception in TaskSetManager
2015-01-09 14:16:20 -08:00
WangTaoTheTonic e966452060 [SPARK-1953][YARN]yarn client mode Application Master memory size is same as driver memory...
... size

Ways to set Application Master's memory on yarn-client mode:
1.  `spark.yarn.am.memory` in SparkConf or System Properties
2.  default value 512m

Note: this arguments is only available in yarn-client mode.

Author: WangTaoTheTonic <barneystinson@aliyun.com>

Closes #3607 from WangTaoTheTonic/SPARK4181 and squashes the following commits:

d5ceb1b [WangTaoTheTonic] spark.driver.memeory is used in both modes
6c1b264 [WangTaoTheTonic] rebase
b8410c0 [WangTaoTheTonic] minor optiminzation
ddcd592 [WangTaoTheTonic] fix the bug produced in rebase and some improvements
3bf70cc [WangTaoTheTonic] rebase and give proper hint
987b99d [WangTaoTheTonic] disable --driver-memory in client mode
2b27928 [WangTaoTheTonic] inaccurate description
b7acbb2 [WangTaoTheTonic] incorrect method invoked
2557c5e [WangTaoTheTonic] missing a single blank
42075b0 [WangTaoTheTonic] arrange the args and warn logging
69c7dba [WangTaoTheTonic] rebase
1960d16 [WangTaoTheTonic] fix wrong comment
7fa9e2e [WangTaoTheTonic] log a warning
f6bee0e [WangTaoTheTonic] docs issue
d619996 [WangTaoTheTonic] Merge branch 'master' into SPARK4181
b09c309 [WangTaoTheTonic] use code format
ab16bb5 [WangTaoTheTonic] fix bug and add comments
44e48c2 [WangTaoTheTonic] minor fix
6fd13e1 [WangTaoTheTonic] add overhead mem and remove some configs
0566bb8 [WangTaoTheTonic] yarn client mode Application Master memory size is same as driver memory size
2015-01-09 13:23:13 -08:00
Kay Ousterhout b6aa557300 [SPARK-1143] Separate pool tests into their own suite.
The current TaskSchedulerImplSuite includes some tests that are
actually for the TaskSchedulerImpl, but the remainder of the tests avoid using
the TaskSchedulerImpl entirely, and actually test the pool and scheduling
algorithm mechanisms. This commit separates the pool/scheduling algorithm
tests into their own suite, and also simplifies those tests.

The pull request replaces #339.

Author: Kay Ousterhout <kayousterhout@gmail.com>

Closes #3967 from kayousterhout/SPARK-1143 and squashes the following commits:

8a898c4 [Kay Ousterhout] [SPARK-1143] Separate pool tests into their own suite.
2015-01-09 09:47:06 -08:00
Marcelo Vanzin 48cecf673c [SPARK-4048] Enhance and extend hadoop-provided profile.
This change does a few things to make the hadoop-provided profile more useful:

- Create new profiles for other libraries / services that might be provided by the infrastructure
- Simplify and fix the poms so that the profiles are only activated while building assemblies.
- Fix tests so that they're able to run when the profiles are activated
- Add a new env variable to be used by distributions that use these profiles to provide the runtime
  classpath for Spark jobs and daemons.

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #2982 from vanzin/SPARK-4048 and squashes the following commits:

82eb688 [Marcelo Vanzin] Add a comment.
eb228c0 [Marcelo Vanzin] Fix borked merge.
4e38f4e [Marcelo Vanzin] Merge branch 'master' into SPARK-4048
9ef79a3 [Marcelo Vanzin] Alternative way to propagate test classpath to child processes.
371ebee [Marcelo Vanzin] Review feedback.
52f366d [Marcelo Vanzin] Merge branch 'master' into SPARK-4048
83099fc [Marcelo Vanzin] Merge branch 'master' into SPARK-4048
7377e7b [Marcelo Vanzin] Merge branch 'master' into SPARK-4048
322f882 [Marcelo Vanzin] Fix merge fail.
f24e9e7 [Marcelo Vanzin] Merge branch 'master' into SPARK-4048
8b00b6a [Marcelo Vanzin] Merge branch 'master' into SPARK-4048
9640503 [Marcelo Vanzin] Cleanup child process log message.
115fde5 [Marcelo Vanzin] Simplify a comment (and make it consistent with another pom).
e3ab2da [Marcelo Vanzin] Fix hive-thriftserver profile.
7820d58 [Marcelo Vanzin] Fix CliSuite with provided profiles.
1be73d4 [Marcelo Vanzin] Restore flume-provided profile.
d1399ed [Marcelo Vanzin] Restore jetty dependency.
82a54b9 [Marcelo Vanzin] Remove unused profile.
5c54a25 [Marcelo Vanzin] Fix HiveThriftServer2Suite with *-provided profiles.
1fc4d0b [Marcelo Vanzin] Update dependencies for hive-thriftserver.
f7b3bbe [Marcelo Vanzin] Add snappy to hadoop-provided list.
9e4e001 [Marcelo Vanzin] Remove duplicate hive profile.
d928d62 [Marcelo Vanzin] Redirect child stderr to parent's log.
4d67469 [Marcelo Vanzin] Propagate SPARK_DIST_CLASSPATH on Yarn.
417d90e [Marcelo Vanzin] Introduce "SPARK_DIST_CLASSPATH".
2f95f0d [Marcelo Vanzin] Propagate classpath to child processes during testing.
1adf91c [Marcelo Vanzin] Re-enable maven-install-plugin for a few projects.
284dda6 [Marcelo Vanzin] Rework the "hadoop-provided" profile, add new ones.
2015-01-08 17:15:13 -08:00
Kousuke Saruta a00af6bec5 [SPARK-4973][CORE] Local directory in the driver of client-mode continues remaining even if application finished when external shuffle is enabled
When we enables external shuffle service, local directories in the driver of client-mode continue remaining even if application has finished.
I think local directories for drivers should be deleted.

Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>

Closes #3811 from sarutak/SPARK-4973 and squashes the following commits:

ad944ab [Kousuke Saruta] Fixed DiskBlockManager to cleanup local directory if it's the driver
43770da [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-4973
88feecd [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-4973
d99718e [Kousuke Saruta] Fixed SparkSubmit.scala and DiskBlockManager.scala in order to delete local directories of the driver of local-mode when external shuffle service is enabled
2015-01-08 13:43:09 -08:00
Eric Moyer 538f221627 Document that groupByKey will OOM for large keys
This pull request is my own work and I license it under Spark's open-source license.

This contribution is an improvement to the documentation. I documented that the maximum number of values per key for groupByKey is limited by available RAM (see [Datablox][datablox link] and [the spark mailing list][list link]).

Just saying that better performance is available is not sufficient. Sometimes you need to do a group-by - your operation needs all the items available in order to complete. This warning explains the problem.

[datablox link]: http://databricks.gitbooks.io/databricks-spark-knowledge-base/content/best_practices/prefer_reducebykey_over_groupbykey.html
[list link]: http://apache-spark-user-list.1001560.n3.nabble.com/Understanding-RDD-GroupBy-OutOfMemory-Exceptions-tp11427p11466.html

Author: Eric Moyer <eric_moyer@yahoo.com>

Closes #3936 from RadixSeven/better-group-by-docs and squashes the following commits:

5b6f4e9 [Eric Moyer] groupByKey docs naming updates
238e81b [Eric Moyer] Doc that groupByKey will OOM for large keys
2015-01-08 11:55:23 -08:00
Kousuke Saruta 0a597276db [Minor] Fix the value represented by spark.executor.id for consistency.
The property  `spark.executor.id` can represent both `driver` and `<driver>`  for one driver.
It's inconsistent.

This issue is minor so I didn't file this in JIRA.

Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>

Closes #3812 from sarutak/fix-driver-identifier and squashes the following commits:

d885498 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into fix-driver-identifier
4275663 [Kousuke Saruta] Fixed the value represented by spark.executor.id of local mode
2015-01-08 11:35:56 -08:00
Zhang, Liye 06dc4b5206 [SPARK-4989][CORE] avoid wrong eventlog conf cause cluster down in standalone mode
when enabling eventlog in standalone mode, if give the wrong configuration, the standalone cluster will down (cause master restart, lose connection with workers).
How to reproduce: just give an invalid value to "spark.eventLog.dir", for example: spark.eventLog.dir=hdfs://tmp/logdir1, hdfs://tmp/logdir2. This will throw illegalArgumentException, which will cause the Master restart. And the whole cluster is not available.

Author: Zhang, Liye <liye.zhang@intel.com>

Closes #3824 from liyezhang556520/wrongConf4Cluster and squashes the following commits:

3c24d98 [Zhang, Liye] revert change with logwarning and excetption for FileNotFoundException
3c1ac2e [Zhang, Liye] change var to val
a49c52f [Zhang, Liye] revert wrong modification
12eee85 [Zhang, Liye] add more message in log and on webUI
5c1fa33 [Zhang, Liye] cache exceptions when eventlog with wrong conf
2015-01-08 10:40:26 -08:00
zsxwing 2b729d2250 [SPARK-5126][Core] Verify Spark urls before creating Actors so that invalid urls can crash the process.
Because `actorSelection` will return `deadLetters` for an invalid path,  Worker keeps quiet for an invalid master url. It's better to log an error so that people can find such problem quickly.

This PR will check the url before sending to `actorSelection`, throw and log a SparkException for an invalid url.

Author: zsxwing <zsxwing@gmail.com>

Closes #3927 from zsxwing/SPARK-5126 and squashes the following commits:

9d429ee [zsxwing] Create a utility method in Utils to parse Spark url; verify urls before creating Actors so that invalid urls can crash the process.
8286e51 [zsxwing] Check the url before sending to Akka and log the error if the url is invalid
2015-01-07 23:01:30 -08:00
hushan[胡珊] d345ebebd5 [SPARK-5132][Core]Correct stage Attempt Id key in stageInfofromJson
SPARK-5132:
stageInfoToJson: Stage Attempt Id
stageInfoFromJson: Attempt Id

Author: hushan[胡珊] <hushan@xiaomi.com>

Closes #3932 from suyanNone/json-stage and squashes the following commits:

41419ab [hushan[胡珊]] Correct stage Attempt Id key in stageInfofromJson
2015-01-07 12:09:12 -08:00
Masayoshi TSUZUKI 6e74edeca3 [SPARK-2458] Make failed application log visible on History Server
Enabled HistoryServer to show incomplete applications.
We can see the log for incomplete applications by clicking the bottom link.

Author: Masayoshi TSUZUKI <tsudukim@oss.nttdata.co.jp>

Closes #3467 from tsudukim/feature/SPARK-2458-2 and squashes the following commits:

76205d2 [Masayoshi TSUZUKI] Fixed and added test code.
29a04a9 [Masayoshi TSUZUKI] Merge branch 'master' of github.com:tsudukim/spark into feature/SPARK-2458-2
f9ef854 [Masayoshi TSUZUKI] Added space between "if" and "(". Fixed "Incomplete" as capitalized in the web UI. Modified double negative variable name.
9b465b0 [Masayoshi TSUZUKI] Modified typo and better implementation.
3ed8a41 [Masayoshi TSUZUKI] Modified too long lines.
08ea14d [Masayoshi TSUZUKI] [SPARK-2458] Make failed application log visible on History Server
2015-01-07 07:32:53 -08:00
Sean Owen 4cba6eb420 SPARK-4159 [CORE] Maven build doesn't run JUnit test suites
This PR:

- Reenables `surefire`, and copies config from `scalatest` (which is itself an old fork of `surefire`, so similar)
- Tells `surefire` to test only Java tests
- Enables `surefire` and `scalatest` for all children, and in turn eliminates some duplication.

For me this causes the Scala and Java tests to be run once each, it seems, as desired. It doesn't affect the SBT build but works for Maven. I still need to verify that all of the Scala tests and Java tests are being run.

Author: Sean Owen <sowen@cloudera.com>

Closes #3651 from srowen/SPARK-4159 and squashes the following commits:

2e8a0af [Sean Owen] Remove specialized SPARK_HOME setting for REPL, YARN tests as it appears to be obsolete
12e4558 [Sean Owen] Append to unit-test.log instead of overwriting, so that both surefire and scalatest output is preserved. Also standardize/correct comments a bit.
e6f8601 [Sean Owen] Reenable Java tests by reenabling surefire with config cloned from scalatest; centralize test config in the parent
2015-01-06 12:02:08 -08:00
Reynold Xin bbcba3a943 [SPARK-5093] Set spark.network.timeout to 120s consistently.
Author: Reynold Xin <rxin@databricks.com>

Closes #3903 from rxin/timeout-120 and squashes the following commits:

7c2138e [Reynold Xin] [SPARK-5093] Set spark.network.timeout to 120s consistently.
2015-01-05 15:19:53 -08:00
Jongyoul Lee 1c0e7ce056 [SPARK-4465] runAsSparkUser doesn't affect TaskRunner in Mesos environme...
...nt at all.

- fixed a scope of runAsSparkUser from MesosExecutorDriver.run to MesosExecutorBackend.launchTask
- See the Jira Issue for more details.

Author: Jongyoul Lee <jongyoul@gmail.com>

Closes #3741 from jongyoul/SPARK-4465 and squashes the following commits:

46ad71e [Jongyoul Lee] [SPARK-4465] runAsSparkUser doesn't affect TaskRunner in Mesos environment at all. - Removed unused import
3d6631f [Jongyoul Lee] [SPARK-4465] runAsSparkUser doesn't affect TaskRunner in Mesos environment at all. - Removed comments and adjusted indentations
2343f13 [Jongyoul Lee] [SPARK-4465] runAsSparkUser doesn't affect TaskRunner in Mesos environment at all. - fixed a scope of runAsSparkUser from MesosExecutorDriver.run to MesosExecutorBackend.launchTask
2015-01-05 12:05:09 -08:00
WangTao ce39b34404 [SPARK-5057] Log message in failed askWithReply attempts
https://issues.apache.org/jira/browse/SPARK-5057

Author: WangTao <barneystinson@aliyun.com>
Author: WangTaoTheTonic <barneystinson@aliyun.com>

Closes #3875 from WangTaoTheTonic/SPARK-5057 and squashes the following commits:

1503487 [WangTao] use string interpolation
706c8a7 [WangTaoTheTonic] log more messages
2015-01-05 12:00:02 -08:00
Varun Saxena d3f07fd23c [SPARK-4688] Have a single shared network timeout in Spark
[SPARK-4688] Have a single shared network timeout in Spark

Author: Varun Saxena <vsaxena.varun@gmail.com>
Author: varunsaxena <vsaxena.varun@gmail.com>

Closes #3562 from varunsaxena/SPARK-4688 and squashes the following commits:

6e97f72 [Varun Saxena] [SPARK-4688] Single shared network timeout
cd783a2 [Varun Saxena] SPARK-4688
d6f8c29 [Varun Saxena] SCALA-4688
9562b15 [Varun Saxena] SPARK-4688
a75f014 [varunsaxena] SPARK-4688
594226c [varunsaxena] SPARK-4688
2015-01-05 10:32:37 -08:00
zsxwing 5c506cecb9 [SPARK-5074][Core] Fix a non-deterministic test failure
Add `assert(sc.listenerBus.waitUntilEmpty(WAIT_TIMEOUT_MILLIS))` to make sure `sparkListener` receive the message.

Author: zsxwing <zsxwing@gmail.com>

Closes #3889 from zsxwing/SPARK-5074 and squashes the following commits:

e61c198 [zsxwing] Fix a non-deterministic test failure
2015-01-04 21:18:33 -08:00
zsxwing 27e7f5a723 [SPARK-5083][Core] Fix a flaky test in TaskResultGetterSuite
Because `sparkEnv.blockManager.master.removeBlock` is asynchronous, we need to make sure the block has already been removed before calling `super.enqueueSuccessfulTask`.

Author: zsxwing <zsxwing@gmail.com>

Closes #3894 from zsxwing/SPARK-5083 and squashes the following commits:

d97c03d [zsxwing] Fix a flaky test in TaskResultGetterSuite
2015-01-04 21:09:21 -08:00
zsxwing 6c726a3fbd [SPARK-5069][Core] Fix the race condition of TaskSchedulerImpl.dagScheduler
It's not necessary to set `TaskSchedulerImpl.dagScheduler` in preStart. It's safe to set it after `initializeEventProcessActor()`.

Author: zsxwing <zsxwing@gmail.com>

Closes #3887 from zsxwing/SPARK-5069 and squashes the following commits:

d95894f [zsxwing] Fix the race condition of TaskSchedulerImpl.dagScheduler
2015-01-04 21:06:04 -08:00
zsxwing 72396522bc [SPARK-5067][Core] Use '===' to compare well-defined case class
A simple fix would be adding `assert(e1.appId == e2.appId)` for `SparkListenerApplicationStart`. But actually we can use `===` for well-defined case class directly. Therefore, instead of fixing this issue, I use `===` to compare those well-defined case classes (all fields have implemented a correct `equals` method, such as primitive types)

Author: zsxwing <zsxwing@gmail.com>

Closes #3886 from zsxwing/SPARK-5067 and squashes the following commits:

0a51711 [zsxwing] Use '===' to compare well-defined case class
2015-01-04 21:03:17 -08:00
Josh Rosen 939ba1f8f6 [SPARK-4835] Disable validateOutputSpecs for Spark Streaming jobs
This patch disables output spec. validation for jobs launched through Spark Streaming, since this interferes with checkpoint recovery.

Hadoop OutputFormats have a `checkOutputSpecs` method which performs certain checks prior to writing output, such as checking whether the output directory already exists.  SPARK-1100 added checks for FileOutputFormat, SPARK-1677 (#947) added a SparkConf configuration to disable these checks, and SPARK-2309 (#1088) extended these checks to run for all OutputFormats, not just FileOutputFormat.

In Spark Streaming, we might have to re-process a batch during checkpoint recovery, so `save` actions may be called multiple times.  In addition to `DStream`'s own save actions, users might use `transform` or `foreachRDD` and call the `RDD` and `PairRDD` save actions.  When output spec. validation is enabled, the second calls to these actions will fail due to existing output.

This patch automatically disables output spec. validation for jobs submitted by the Spark Streaming scheduler.  This is done by using Scala's `DynamicVariable` to propagate the bypass setting without having to mutate SparkConf or introduce a global variable.

Author: Josh Rosen <joshrosen@databricks.com>

Closes #3832 from JoshRosen/SPARK-4835 and squashes the following commits:

36eaf35 [Josh Rosen] Add comment explaining use of transform() in test.
6485cf8 [Josh Rosen] Add test case in Streaming; fix bug for transform()
7b3e06a [Josh Rosen] Remove Streaming-specific setting to undo this change; update conf. guide
bf9094d [Josh Rosen] Revise disableOutputSpecValidation() comment to not refer to Spark Streaming.
e581d17 [Josh Rosen] Deduplicate isOutputSpecValidationEnabled logic.
762e473 [Josh Rosen] [SPARK-4835] Disable validateOutputSpecs for Spark Streaming jobs.
2015-01-04 20:26:18 -08:00
Dale 3fddc9468f [SPARK-4787] Stop SparkContext if a DAGScheduler init error occurs
Author: Dale <tigerquoll@outlook.com>

Closes #3809 from tigerquoll/SPARK-4787 and squashes the following commits:

5661e01 [Dale] [SPARK-4787] Ensure that call to stop() doesn't lose the exception by using a finally block.
2172578 [Dale] [SPARK-4787] Stop context properly if an exception occurs during DAGScheduler initialization.
2015-01-04 13:29:13 -08:00
Brennon York b96008d552 [SPARK-794][Core] Remove sleep() in ClusterScheduler.stop
Removed `sleep()` from the `stop()` method of the `TaskSchedulerImpl` class which, from the JIRA ticket, is believed to be a legacy artifact slowing down testing originally introduced in the `ClusterScheduler` class.

Author: Brennon York <brennon.york@capitalone.com>

Closes #3851 from brennonyork/SPARK-794 and squashes the following commits:

04c3e64 [Brennon York] Removed sleep() from the stop() method
2015-01-04 12:40:39 -08:00
Josh Rosen 012839807c [HOTFIX] Bind web UI to ephemeral port in DriverSuite
The job launched by DriverSuite should bind the web UI to an ephemeral port, since it looks like port contention in this test has caused a large number of Jenkins failures when many builds are started simultaneously.  Our tests already disable the web UI, but this doesn't affect subprocesses launched by our tests.  In this case, I've opted to bind to an ephemeral port instead of disabling the UI because disabling features in this test may mask its ability to catch certain bugs.

See also: e24d3a9

Author: Josh Rosen <joshrosen@databricks.com>

Closes #3873 from JoshRosen/driversuite-webui-port and squashes the following commits:

48cd05c [Josh Rosen] [HOTFIX] Bind web UI to ephemeral port in DriverSuite.
2015-01-01 15:03:54 -08:00
Reynold Xin 7749dd6c36 [SPARK-5038] Add explicit return type for implicit functions.
As we learned in #3580, not explicitly typing implicit functions can lead to compiler bugs and potentially unexpected runtime behavior.

This is a follow up PR for rest of Spark (outside Spark SQL). The original PR for Spark SQL can be found at https://github.com/apache/spark/pull/3859

Author: Reynold Xin <rxin@databricks.com>

Closes #3860 from rxin/implicit and squashes the following commits:

73702f9 [Reynold Xin] [SPARK-5038] Add explicit return type for implicit functions.
2014-12-31 17:07:47 -08:00
Josh Rosen e24d3a9a29 [HOTFIX] Disable Spark UI in SparkSubmitSuite tests
This should fix a major cause of build breaks when running many parallel tests.
2014-12-31 14:13:09 -08:00
Brennon York 8e14c5eb55 [SPARK-4298][Core] - The spark-submit cannot read Main-Class from Manifest.
Resolves a bug where the `Main-Class` from a .jar file wasn't being read in properly. This was caused by the fact that the `primaryResource` object was a URI and needed to be normalized through a call to `.getPath` before it could be passed into the `JarFile` object.

Author: Brennon York <brennon.york@capitalone.com>

Closes #3561 from brennonyork/SPARK-4298 and squashes the following commits:

5e0fce1 [Brennon York] Use string interpolation for error messages, moved comment line from original code to above its necessary code segment
14daa20 [Brennon York] pushed mainClass assignment into match statement, removed spurious spaces, removed { } from case statements, removed return values
c6dad68 [Brennon York] Set case statement to support multiple jar URI's and enabled the 'file' URI to load the main-class
8d20936 [Brennon York] updated to reset the error message back to the default
a043039 [Brennon York] updated to split the uri and jar vals
8da7cbf [Brennon York] fixes SPARK-4298
2014-12-31 11:54:10 -08:00
Josh Rosen 352ed6bbe3 [SPARK-1010] Clean up uses of System.setProperty in unit tests
Several of our tests call System.setProperty (or test code which implicitly sets system properties) and don't always reset/clear the modified properties, which can create ordering dependencies between tests and cause hard-to-diagnose failures.

This patch removes most uses of System.setProperty from our tests, since in most cases we can use SparkConf to set these configurations (there are a few exceptions, including the tests of SparkConf itself).

For the cases where we continue to use System.setProperty, this patch introduces a `ResetSystemProperties` ScalaTest mixin class which snapshots the system properties before individual tests and to automatically restores them on test completion / failure.  See the block comment at the top of the ResetSystemProperties class for more details.

Author: Josh Rosen <joshrosen@databricks.com>

Closes #3739 from JoshRosen/cleanup-system-properties-in-tests and squashes the following commits:

0236d66 [Josh Rosen] Replace setProperty uses in two example programs / tools
3888fe3 [Josh Rosen] Remove setProperty use in LocalJavaStreamingContext
4f4031d [Josh Rosen] Add note on why SparkSubmitSuite needs ResetSystemProperties
4742a5b [Josh Rosen] Clarify ResetSystemProperties trait inheritance ordering.
0eaf0b6 [Josh Rosen] Remove setProperty call in TaskResultGetterSuite.
7a3d224 [Josh Rosen] Fix trait ordering
3fdb554 [Josh Rosen] Remove setProperty call in TaskSchedulerImplSuite
bee20df [Josh Rosen] Remove setProperty calls in SparkContextSchedulerCreationSuite
655587c [Josh Rosen] Remove setProperty calls in JobCancellationSuite
3f2f955 [Josh Rosen] Remove System.setProperty calls in DistributedSuite
cfe9cce [Josh Rosen] Remove use of system properties in SparkContextSuite
8783ab0 [Josh Rosen] Remove TestUtils.setSystemProperty, since it is subsumed by the ResetSystemProperties trait.
633a84a [Josh Rosen] Remove use of system properties in FileServerSuite
25bfce2 [Josh Rosen] Use ResetSystemProperties in UtilsSuite
1d1aa5a [Josh Rosen] Use ResetSystemProperties in SizeEstimatorSuite
dd9492b [Josh Rosen] Use ResetSystemProperties in AkkaUtilsSuite
b0daff2 [Josh Rosen] Use ResetSystemProperties in BlockManagerSuite
e9ded62 [Josh Rosen] Use ResetSystemProperties in TaskSchedulerImplSuite
5b3cb54 [Josh Rosen] Use ResetSystemProperties in SparkListenerSuite
0995c4b [Josh Rosen] Use ResetSystemProperties in SparkContextSchedulerCreationSuite
c83ded8 [Josh Rosen] Use ResetSystemProperties in SparkConfSuite
51aa870 [Josh Rosen] Use withSystemProperty in ShuffleSuite
60a63a1 [Josh Rosen] Use ResetSystemProperties in JobCancellationSuite
14a92e4 [Josh Rosen] Use withSystemProperty in FileServerSuite
628f46c [Josh Rosen] Use ResetSystemProperties in DistributedSuite
9e3e0dd [Josh Rosen] Add ResetSystemProperties test fixture mixin; use it in SparkSubmitSuite.
4dcea38 [Josh Rosen] Move withSystemProperty to TestUtils class.
2014-12-30 18:12:20 -08:00
Josh Rosen efa80a531e [SPARK-4882] Register PythonBroadcast with Kryo so that PySpark works with KryoSerializer
This PR fixes an issue where PySpark broadcast variables caused NullPointerExceptions if KryoSerializer was used.  The fix is to register PythonBroadcast with Kryo so that it's deserialized with a KryoJavaSerializer.

Author: Josh Rosen <joshrosen@databricks.com>

Closes #3831 from JoshRosen/SPARK-4882 and squashes the following commits:

0466c7a [Josh Rosen] Register PythonBroadcast with Kryo.
d5b409f [Josh Rosen] Enable registrationRequired, which would have caught this bug.
069d8a7 [Josh Rosen] Add failing test for SPARK-4882
2014-12-30 09:29:52 -08:00
Zhang, Liye 9077e721cd [SPARK-4920][UI] add version on master and worker page for standalone mode
Author: Zhang, Liye <liye.zhang@intel.com>

Closes #3769 from liyezhang556520/spark-4920_WebVersion and squashes the following commits:

3bb7e0d [Zhang, Liye] add version on master and worker page
2014-12-30 09:19:47 -08:00
Yash Datta 9bc0df6804 SPARK-4968: takeOrdered to skip reduce step in case mappers return no partitions
takeOrdered should skip reduce step in case mapped RDDs have no partitions. This prevents the mentioned exception :

4. run query
SELECT * FROM testTable WHERE market = 'market2' ORDER BY End_Time DESC LIMIT 100;
Error trace
java.lang.UnsupportedOperationException: empty collection
at org.apache.spark.rdd.RDD$$anonfun$reduce$1.apply(RDD.scala:863)
at org.apache.spark.rdd.RDD$$anonfun$reduce$1.apply(RDD.scala:863)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.reduce(RDD.scala:863)
at org.apache.spark.rdd.RDD.takeOrdered(RDD.scala:1136)

Author: Yash Datta <Yash.Datta@guavus.com>

Closes #3830 from saucam/fix_takeorder and squashes the following commits:

5974d10 [Yash Datta] SPARK-4968: takeOrdered to skip reduce step in case mappers return no partitions
2014-12-29 13:49:45 -08:00
Kousuke Saruta 8d72341ab7 [Minor] Fix a typo of type parameter in JavaUtils.scala
In JavaUtils.scala, thare is a typo of type parameter. In addition, the type information is removed at the time of compile by erasure.

This issue is really minor so I don't  file in JIRA.

Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>

Closes #3789 from sarutak/fix-typo-in-javautils and squashes the following commits:

e20193d [Kousuke Saruta] Fixed a typo of type parameter
82bc5d9 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into fix-typo-in-javautils
99f6f63 [Kousuke Saruta] Fixed a typo of type parameter in JavaUtils.scala
2014-12-29 12:05:08 -08:00
YanTangZhai 815de54002 [SPARK-4946] [CORE] Using AkkaUtils.askWithReply in MapOutputTracker.askTracker to reduce the chance of the communicating problem
Using AkkaUtils.askWithReply in MapOutputTracker.askTracker to reduce the chance of the communicating problem

Author: YanTangZhai <hakeemzhai@tencent.com>
Author: yantangzhai <tyz0303@163.com>

Closes #3785 from YanTangZhai/SPARK-4946 and squashes the following commits:

9ca6541 [yantangzhai] [SPARK-4946] [CORE] Using AkkaUtils.askWithReply in MapOutputTracker.askTracker to reduce the chance of the communicating problem
e4c2c0a [YanTangZhai] Merge pull request #15 from apache/master
718afeb [YanTangZhai] Merge pull request #12 from apache/master
6e643f8 [YanTangZhai] Merge pull request #11 from apache/master
e249846 [YanTangZhai] Merge pull request #10 from apache/master
d26d982 [YanTangZhai] Merge pull request #9 from apache/master
76d4027 [YanTangZhai] Merge pull request #8 from apache/master
03b62b0 [YanTangZhai] Merge pull request #7 from apache/master
8a00106 [YanTangZhai] Merge pull request #6 from apache/master
cbcba66 [YanTangZhai] Merge pull request #3 from apache/master
cdef539 [YanTangZhai] Merge pull request #1 from apache/master
2014-12-29 11:30:54 -08:00
GuoQiang Li 080ceb771a [SPARK-4952][Core]Handle ConcurrentModificationExceptions in SparkEnv.environmentDetails
Author: GuoQiang Li <witgo@qq.com>

Closes #3788 from witgo/SPARK-4952 and squashes the following commits:

d903529 [GuoQiang Li] Handle ConcurrentModificationExceptions in SparkEnv.environmentDetails
2014-12-26 23:31:29 -08:00