Commit graph

3752 commits

Author SHA1 Message Date
Reynold Xin 66e1c40c67 Minor fix for the previous commit. 2014-09-27 22:18:02 -07:00
Dale 9966d1a8aa SPARK-CORE [SPARK-3651] Group common CoarseGrainedSchedulerBackend variables together
from [SPARK-3651]
In CoarseGrainedSchedulerBackend, we have:

    private val executorActor = new HashMap[String, ActorRef]
    private val executorAddress = new HashMap[String, Address]
    private val executorHost = new HashMap[String, String]
    private val freeCores = new HashMap[String, Int]
    private val totalCores = new HashMap[String, Int]

We only ever put / remove stuff from these maps together. It would simplify the code if we consolidate these all into one map as we have done in JobProgressListener in https://issues.apache.org/jira/browse/SPARK-2299.

Author: Dale <tigerquoll@outlook.com>

Closes #2533 from tigerquoll/SPARK-3651 and squashes the following commits:

d1be0a9 [Dale] [SPARK-3651]  implemented suggested changes. Changed a reference from executorInfo to executorData to be consistent with other usages
6890663 [Dale] [SPARK-3651]  implemented suggested changes
7d671cf [Dale] [SPARK-3651]  Grouped variables under a ExecutorDataObject, and reference them via a map entry as they are all retrieved under the same key
2014-09-27 22:08:10 -07:00
Reynold Xin 5b922bb458 [SPARK-3543] Clean up Java TaskContext implementation.
This addresses some minor issues in https://github.com/apache/spark/pull/2425

Author: Reynold Xin <rxin@apache.org>

Closes #2557 from rxin/TaskContext and squashes the following commits:

a51e5f6 [Reynold Xin] [SPARK-3543] Clean up Java TaskContext implementation.
2014-09-27 14:46:00 -07:00
Reynold Xin 436a7730b6 Minor cleanup to tighten visibility and remove compilation warning.
Author: Reynold Xin <rxin@apache.org>

Closes #2555 from rxin/cleanup and squashes the following commits:

6add199 [Reynold Xin] Minor cleanup to tighten visibility and remove compilation warning.
2014-09-27 00:57:26 -07:00
Erik Erlandson 2d972fd84a [SPARK-1021] Defer the data-driven computation of partition bounds in so...
...rtByKey() until evaluation.

Author: Erik Erlandson <eerlands@redhat.com>

Closes #1689 from erikerlandson/spark-1021-pr and squashes the following commits:

50b6da6 [Erik Erlandson] use standard getIteratorSize in countAsync
4e334a9 [Erik Erlandson] exception mystery fixed by fixing bug in ComplexFutureAction
b88b5d4 [Erik Erlandson] tweak async actions to use ComplexFutureAction[T] so they handle RangePartitioner sampling job properly
b2b20e8 [Erik Erlandson] Fix bug in exception passing with ComplexFutureAction[T]
ca8913e [Erik Erlandson] RangePartition sampling job -> FutureAction
7143f97 [Erik Erlandson] [SPARK-1021] modify range bounds variable to be thread safe
ac67195 [Erik Erlandson] [SPARK-1021] Defer the data-driven computation of partition bounds in sortByKey() until evaluation.
2014-09-26 23:15:10 -07:00
Prashant Sharma 5e34855cf0 [SPARK-3543] Write TaskContext in Java and expose it through a static accessor.
Author: Prashant Sharma <prashant.s@imaginea.com>
Author: Shashank Sharma <shashank21j@gmail.com>

Closes #2425 from ScrapCodes/SPARK-3543/withTaskContext and squashes the following commits:

8ae414c [Shashank Sharma] CR
ee8bd00 [Prashant Sharma] Added internal API in docs comments.
ddb8cbe [Prashant Sharma] Moved setting the thread local to where TaskContext is instantiated.
a7d5e23 [Prashant Sharma] Added doc comments.
edf945e [Prashant Sharma] Code review git add -A
f716fd1 [Prashant Sharma] introduced thread local for getting the task context.
333c7d6 [Prashant Sharma] Translated Task context from scala to java.
2014-09-26 21:29:54 -07:00
Daoyuan Wang 30461c6ac3 [SPARK-3695]shuffle fetch fail output
should output detailed host and port in error message

Author: Daoyuan Wang <daoyuan.wang@intel.com>

Closes #2539 from adrian-wang/fetchfail and squashes the following commits:

6c1b1e0 [Daoyuan Wang] shuffle fetch fail output
2014-09-26 11:26:53 -07:00
zsxwing 86bce76498 SPARK-2634: Change MapOutputTrackerWorker.mapStatuses to ConcurrentHashMap
MapOutputTrackerWorker.mapStatuses is used concurrently, it should be thread-safe. This bug has already been fixed in #1328. Nevertheless, considering #1328 won't be merged soon, I send this trivial fix and hope this issue can be solved soon.

Author: zsxwing <zsxwing@gmail.com>

Closes #1541 from zsxwing/SPARK-2634 and squashes the following commits:

d450053 [zsxwing] SPARK-2634: Change MapOutputTrackerWorker.mapStatuses to ConcurrentHashMap
2014-09-25 18:24:01 -07:00
epahomov 9b56e249e0 [SPARK-3690] Closing shuffle writers we swallow more important exception
Author: epahomov <pahomov.egor@gmail.com>

Closes #2537 from epahomov/SPARK-3690 and squashes the following commits:

a0b7de4 [epahomov] [SPARK-3690] Closing shuffle writers we swallow more important exception
2014-09-25 14:50:12 -07:00
Aaron Staple 8ca4ecb6a5 [SPARK-546] Add full outer join to RDD and DStream.
leftOuterJoin and rightOuterJoin are already implemented.  This patch adds fullOuterJoin.

Author: Aaron Staple <aaron.staple@gmail.com>

Closes #1395 from staple/SPARK-546 and squashes the following commits:

1f5595c [Aaron Staple] Fix python style
7ac0aa9 [Aaron Staple] [SPARK-546] Add full outer join to RDD and DStream.
3b5d137 [Aaron Staple] In JavaPairDStream, make class tag specification in rightOuterJoin consistent with other functions.
31f2956 [Aaron Staple] Fix left outer join documentation comments.
2014-09-24 20:39:09 -07:00
Mubarak Seyed 729952a5ef [SPARK-1853] Show Streaming application code context (file, line number) in Spark Stages UI
This is a refactored version of the original PR https://github.com/apache/spark/pull/1723 my mubarak

Please take a look andrewor14, mubarak

Author: Mubarak Seyed <mubarak.seyed@gmail.com>
Author: Tathagata Das <tathagata.das1565@gmail.com>

Closes #2464 from tdas/streaming-callsite and squashes the following commits:

dc54c71 [Tathagata Das] Made changes based on PR comments.
390b45d [Tathagata Das] Fixed minor bugs.
904cd92 [Tathagata Das] Merge remote-tracking branch 'apache-github/master' into streaming-callsite
7baa427 [Tathagata Das] Refactored getCallSite and setCallSite to make it simpler. Also added unit test for DStream creation site.
b9ed945 [Mubarak Seyed] Adding streaming utils
c461cf4 [Mubarak Seyed] Merge remote-tracking branch 'upstream/master'
ceb43da [Mubarak Seyed] Changing default regex function name
8c5d443 [Mubarak Seyed] Merge remote-tracking branch 'upstream/master'
196121b [Mubarak Seyed] Merge remote-tracking branch 'upstream/master'
491a1eb [Mubarak Seyed] Removing streaming visibility from getRDDCreationCallSite in DStream
33a7295 [Mubarak Seyed] Fixing review comments: Merging both setCallSite methods
c26d933 [Mubarak Seyed] Merge remote-tracking branch 'upstream/master'
f51fd9f [Mubarak Seyed] Fixing scalastyle, Regex for Utils.getCallSite, and changing method names in DStream
5051c58 [Mubarak Seyed] Getting return value of compute() into variable and call setCallSite(prevCallSite) only once. Adding return for other code paths (for None)
a207eb7 [Mubarak Seyed] Fixing code review comments
ccde038 [Mubarak Seyed] Removing Utils import from MappedDStream
2a09ad6 [Mubarak Seyed] Changes in Utils.scala for SPARK-1853
1d90cc3 [Mubarak Seyed] Changes for SPARK-1853
5f3105a [Mubarak Seyed] Merge remote-tracking branch 'upstream/master'
70f494f [Mubarak Seyed] Changes for SPARK-1853
1500deb [Mubarak Seyed] Changes in Spark Streaming UI
9d38d3c [Mubarak Seyed] [SPARK-1853] Show Streaming application code context (file, line number) in Spark Stages UI
d466d75 [Mubarak Seyed] Changes for spark streaming UI
2014-09-23 15:09:12 -07:00
Andrew Or b3fef50e22 [SPARK-3653] Respect SPARK_*_MEMORY for cluster mode
`SPARK_DRIVER_MEMORY` was only used to start the `SparkSubmit` JVM, which becomes the driver only in client mode but not cluster mode. In cluster mode, this property is simply not propagated to the worker nodes.

`SPARK_EXECUTOR_MEMORY` is picked up from `SparkContext`, but in cluster mode the driver runs on one of the worker machines, where this environment variable may not be set.

Author: Andrew Or <andrewor14@gmail.com>

Closes #2500 from andrewor14/memory-env-vars and squashes the following commits:

6217b38 [Andrew Or] Respect SPARK_*_MEMORY for cluster mode
2014-09-23 14:00:33 -07:00
Sandy Ryza d79238d03a SPARK-3612. Executor shouldn't quit if heartbeat message fails to reach ...
...the driver

Author: Sandy Ryza <sandy@cloudera.com>

Closes #2487 from sryza/sandy-spark-3612 and squashes the following commits:

2b7353d [Sandy Ryza] SPARK-3612. Executor shouldn't quit if heartbeat message fails to reach the driver
2014-09-23 13:44:18 -07:00
Marcelo Vanzin 8dfe79ffb2 [SPARK-3647] Add more exceptions to Guava relocation.
Guava's Optional refers to some package private classes / methods, and
when those are relocated the code stops working, throwing exceptions.
So add the affected classes to the exception list too, and add a unit
test.

(Note that this unit test only really makes sense in maven, since we
don't relocate in the sbt build. Also, JavaAPISuite doesn't seem to
be run by "mvn test" - I had to manually add command line options to
enable it.)

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #2496 from vanzin/SPARK-3647 and squashes the following commits:

84f58d7 [Marcelo Vanzin] [SPARK-3647] Add more exceptions to Guava relocation.
2014-09-23 13:42:00 -07:00
Ian Hummel a0454efe21 [SPARK-3595] Respect configured OutputCommitters when calling saveAsHadoopFile
Addresses the issue in https://issues.apache.org/jira/browse/SPARK-3595, namely saveAsHadoopFile hardcoding the OutputCommitter.  This is not ideal when running Spark jobs that write to S3, especially when running them from an EMR cluster where the default OutputCommitter is a DirectOutputCommitter.

Author: Ian Hummel <ian@themodernlife.net>

Closes #2450 from themodernlife/spark-3595 and squashes the following commits:

f37a0e5 [Ian Hummel] Update based on comments from pwendell
a11d9f3 [Ian Hummel] Fix formatting
4359664 [Ian Hummel] Add an example showing usage
8b6be94 [Ian Hummel] Add ability to specify OutputCommitter, espcially useful when writing to an S3 bucket from an EMR cluster
2014-09-21 13:04:36 -07:00
WangTao 8e875d2aff [SPARK-3599]Avoid loading properties file frequently
https://issues.apache.org/jira/browse/SPARK-3599

Author: WangTao <barneystinson@aliyun.com>
Author: WangTaoTheTonic <barneystinson@aliyun.com>

Closes #2454 from WangTaoTheTonic/avoidLoadingFrequently and squashes the following commits:

3681182 [WangTao] do not use clone
7dca036 [WangTao] use lazy val instead
2a79f26 [WangTaoTheTonic] Avoid loaing properties file frequently
2014-09-20 19:07:23 -07:00
Sandy Ryza 7c8ad1c083 SPARK-3574. Shuffle finish time always reported as -1
The included test waits 100 ms after job completion for task completion events to come in so it can verify they have reasonable finish times.  Does anyone know a better way to wait on listener events that are expected to come in?

Author: Sandy Ryza <sandy@cloudera.com>

Closes #2440 from sryza/sandy-spark-3574 and squashes the following commits:

c81439b [Sandy Ryza] Fix test failure
b340956 [Sandy Ryza] SPARK-3574. Remove shuffleFinishTime metric
2014-09-20 16:03:17 -07:00
Davies Liu fce5e251d6 [SPARK-3491] [MLlib] [PySpark] use pickle to serialize data in MLlib
Currently, we serialize the data between JVM and Python case by case manually, this cannot scale to support so many APIs in MLlib.

This patch will try to address this problem by serialize the data using pickle protocol, using Pyrolite library to serialize/deserialize in JVM. Pickle protocol can be easily extended to support customized class.

All the modules are refactored to use this protocol.

Known issues: There will be some performance regression (both CPU and memory, the serialized data increased)

Author: Davies Liu <davies.liu@gmail.com>

Closes #2378 from davies/pickle_mllib and squashes the following commits:

dffbba2 [Davies Liu] Merge branch 'master' of github.com:apache/spark into pickle_mllib
810f97f [Davies Liu] fix equal of matrix
032cd62 [Davies Liu] add more type check and conversion for user_product
bd738ab [Davies Liu] address comments
e431377 [Davies Liu] fix cache of rdd, refactor
19d0967 [Davies Liu] refactor Picklers
2511e76 [Davies Liu] cleanup
1fccf1a [Davies Liu] address comments
a2cc855 [Davies Liu] fix tests
9ceff73 [Davies Liu] test size of serialized Rating
44e0551 [Davies Liu] fix cache
a379a81 [Davies Liu] fix pickle array in python2.7
df625c7 [Davies Liu] Merge commit '154d141' into pickle_mllib
154d141 [Davies Liu] fix autobatchedpickler
44736d7 [Davies Liu] speed up pickling array in Python 2.7
e1d1bfc [Davies Liu] refactor
708dc02 [Davies Liu] fix tests
9dcfb63 [Davies Liu] fix style
88034f0 [Davies Liu] rafactor, address comments
46a501e [Davies Liu] choose batch size automatically
df19464 [Davies Liu] memorize the module and class name during pickleing
f3506c5 [Davies Liu] Merge branch 'master' into pickle_mllib
722dd96 [Davies Liu] cleanup _common.py
0ee1525 [Davies Liu] remove outdated tests
b02e34f [Davies Liu] remove _common.py
84c721d [Davies Liu] Merge branch 'master' into pickle_mllib
4d7963e [Davies Liu] remove muanlly serialization
6d26b03 [Davies Liu] fix tests
c383544 [Davies Liu] classification
f2a0856 [Davies Liu] mllib/regression
d9f691f [Davies Liu] mllib/util
cccb8b1 [Davies Liu] mllib/tree
8fe166a [Davies Liu] Merge branch 'pickle' into pickle_mllib
aa2287e [Davies Liu] random
f1544c4 [Davies Liu] refactor clustering
52d1350 [Davies Liu] use new protocol in mllib/stat
b30ef35 [Davies Liu] use pickle to serialize data for mllib/recommendation
f44f771 [Davies Liu] enable tests about array
3908f5c [Davies Liu] Merge branch 'master' into pickle
c77c87b [Davies Liu] cleanup debugging code
60e4e2f [Davies Liu] support unpickle array.array for Python 2.6
2014-09-19 15:01:11 -07:00
Andrew Or 9306297d1d [Minor Hot Fix] Move a line in SparkSubmit to the right place
This was introduced in #2449

Author: Andrew Or <andrewor14@gmail.com>

Closes #2452 from andrewor14/standalone-hot-fix and squashes the following commits:

d5190ca [Andrew Or] Put that line in the right place
2014-09-18 17:49:28 -07:00
Victsm b3ed37e5ba [SPARK-3560] Fixed setting spark.jars system property in yarn-cluster mode
Author: Victsm <victor.nju@gmail.com>
Author: Min Shen <mshen@linkedin.com>

Closes #2449 from Victsm/SPARK-3560 and squashes the following commits:

918405a [Victsm] Removed the additional space
4502a2a [Min Shen] [SPARK-3560] Fixed setting spark.jars system property in yarn-cluster mode.

(cherry picked from commit 832dff64dd)
Signed-off-by: Andrew Or <andrewor14@gmail.com>
2014-09-18 15:58:29 -07:00
WangTaoTheTonic 471e6a3a47 [SPARK-3589][Minor]remove redundant code
https://issues.apache.org/jira/browse/SPARK-3589

"export CLASSPATH" in spark-class is redundant since same variable is exported before.
We could reuse defined value "isYarnCluster" in SparkSubmit.scala.

Author: WangTaoTheTonic <barneystinson@aliyun.com>

Closes #2445 from WangTaoTheTonic/removeRedundant and squashes the following commits:

6fb6872 [WangTaoTheTonic] remove redundant code
2014-09-18 12:07:53 -07:00
WangTaoTheTonic 3447d10090 [SPARK-3547]Using a special exit code instead of 1 to represent ClassNotFoundExcepti...
...on

As improvement of https://github.com/apache/spark/pull/1944, we should use more special exit code to represent ClassNotFoundException.

Author: WangTaoTheTonic <barneystinson@aliyun.com>

Closes #2421 from WangTaoTheTonic/classnotfoundExitCode and squashes the following commits:

645a22a [WangTaoTheTonic] Serveral typos to trigger Jenkins
d6ae559 [WangTaoTheTonic] use 101 instead
a2d6465 [WangTaoTheTonic] use 127 instead
fbb232f [WangTaoTheTonic] Using a special exit code instead of 1 to represent ClassNotFoundException
2014-09-18 10:17:18 -07:00
WangTaoTheTonic 3f169bfe3c [SPARK-3565]Fix configuration item not consistent with document
https://issues.apache.org/jira/browse/SPARK-3565

"spark.ports.maxRetries" should be "spark.port.maxRetries". Make the configuration keys in document and code consistent.

Author: WangTaoTheTonic <barneystinson@aliyun.com>

Closes #2427 from WangTaoTheTonic/fixPortRetries and squashes the following commits:

c178813 [WangTaoTheTonic] Use blank lines trigger Jenkins
646f3fe [WangTaoTheTonic] also in SparkBuild.scala
3700dba [WangTaoTheTonic] Fix configuration item not consistent with document
2014-09-17 21:59:23 -07:00
Kousuke Saruta 1147973f1c [SPARK-3567] appId field in SparkDeploySchedulerBackend should be volatile
Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>

Closes #2428 from sarutak/appid-volatile-modification and squashes the following commits:

c7d890d [Kousuke Saruta] Added volatile modifier to appId field in SparkDeploySchedulerBackend
2014-09-17 16:52:27 -07:00
Kousuke Saruta 6688a266f2 [SPARK-3564][WebUI] Display App ID on HistoryPage
Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>

Closes #2424 from sarutak/display-appid-on-webui and squashes the following commits:

417fe90 [Kousuke Saruta] Added "App ID column" to HistoryPage
2014-09-17 16:31:58 -07:00
Kousuke Saruta cbc065039f [SPARK-3571] Spark standalone cluster mode doesn't work.
I think, this issue is caused by #1106

Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>

Closes #2436 from sarutak/SPARK-3571 and squashes the following commits:

7a4deea [Kousuke Saruta] Modified Master.scala to use numWorkersVisited and numWorkersAlive instead of stopPos
4e51e35 [Kousuke Saruta] Modified Master to prevent from 0 divide
4817ecd [Kousuke Saruta] Brushed up previous change
71e84b6 [Kousuke Saruta] Modified Master to enable schedule normally
2014-09-17 16:23:50 -07:00
Andrew Or 0a7091e689 [SPARK-3555] Fix UISuite race condition
The test "jetty selects different port under contention" is flaky.

If another process binds to 4040 before the test starts, then the first server we start there will fail, and the subsequent servers we start thereafter may successfully bind to 4040 if it was released between the servers starting. Instead, we should just let Java find a random free port for us and hold onto it for the duration of the test.

Author: Andrew Or <andrewor14@gmail.com>

Closes #2418 from andrewor14/fix-port-contention and squashes the following commits:

0cd4974 [Andrew Or] Stop them servers
a7071fe [Andrew Or] Pick random port instead of 4040
2014-09-16 16:03:20 -07:00
Kousuke Saruta a9e910430f [SPARK-3546] InputStream of ManagedBuffer is not closed and causes running out of file descriptor
Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>

Closes #2408 from sarutak/resolve-resource-leak-issue and squashes the following commits:

074781d [Kousuke Saruta] Modified SuffleBlockFetcherIterator
5f63f67 [Kousuke Saruta] Move metrics increment logic and debug logging outside try block
b37231a [Kousuke Saruta] Modified FileSegmentManagedBuffer#nioByteBuffer to check null or not before invoking channel.close
bf29d4a [Kousuke Saruta] Modified FileSegment to close channel
2014-09-16 12:41:45 -07:00
Aaron Staple 8e7ae477ba [SPARK-2314][SQL] Override collect and take in python library, and count in java library, with optimized versions.
SchemaRDD overrides RDD functions, including collect, count, and take, with optimized versions making use of the query optimizer.  The java and python interface classes wrapping SchemaRDD need to ensure the optimized versions are called as well.  This patch overrides relevant calls in the python and java interfaces with optimized versions.

Adds a new Row serialization pathway between python and java, based on JList[Array[Byte]] versus the existing RDD[Array[Byte]]. I wasn’t overjoyed about doing this, but I noticed that some QueryPlans implement optimizations in executeCollect(), which outputs an Array[Row] rather than the typical RDD[Row] that can be shipped to python using the existing serialization code. To me it made sense to ship the Array[Row] over to python directly instead of converting it back to an RDD[Row] just for the purpose of sending the Rows to python using the existing serialization code.

Author: Aaron Staple <aaron.staple@gmail.com>

Closes #1592 from staple/SPARK-2314 and squashes the following commits:

89ff550 [Aaron Staple] Merge with master.
6bb7b6c [Aaron Staple] Fix typo.
b56d0ac [Aaron Staple] [SPARK-2314][SQL] Override count in JavaSchemaRDD, forwarding to SchemaRDD's count.
0fc9d40 [Aaron Staple] Fix comment typos.
f03cdfa [Aaron Staple] [SPARK-2314][SQL] Override collect and take in sql.py, forwarding to SchemaRDD's collect.
2014-09-16 11:45:35 -07:00
Ye Xianjin febafefa5a [SPARK-3040] pick up a more proper local ip address for Utils.findLocalIpAddress method
Short version: NetworkInterface.getNetworkInterfaces returns ifs in reverse order compared to ifconfig output. It may pick up ip address associated with tun0 or virtual network interface.
See [SPARK_3040](https://issues.apache.org/jira/browse/SPARK-3040) for more detail

Author: Ye Xianjin <advancedxy@gmail.com>

Closes #1946 from advancedxy/SPARK-3040 and squashes the following commits:

f33f6b2 [Ye Xianjin] add windows support
087a785 [Ye Xianjin] reverse the Networkinterface.getNetworkInterfaces output order to get a more proper local ip address.
2014-09-15 21:53:38 -07:00
Davies Liu da33acb8b6 [SPARK-2951] [PySpark] support unpickle array.array for Python 2.6
Pyrolite can not unpickle array.array which pickled by Python 2.6, this patch fix it by extend Pyrolite.

There is a bug in Pyrolite when unpickle array of float/double, this patch workaround it by reverse the endianness for float/double. This workaround should be removed after Pyrolite have a new release to fix this issue.

I had send an PR to Pyrolite to fix it:  https://github.com/irmen/Pyrolite/pull/11

Author: Davies Liu <davies.liu@gmail.com>

Closes #2365 from davies/pickle and squashes the following commits:

f44f771 [Davies Liu] enable tests about array
3908f5c [Davies Liu] Merge branch 'master' into pickle
c77c87b [Davies Liu] cleanup debugging code
60e4e2f [Davies Liu] support unpickle array.array for Python 2.6
2014-09-15 18:57:25 -07:00
yantangzhai 37d925280c [SPARK-2714] DAGScheduler logs jobid when runJob finishes
DAGScheduler logs jobid when runJob finishes

Author: yantangzhai <tyz0303@163.com>

Closes #1617 from YanTangZhai/SPARK-2714 and squashes the following commits:

0a0243f [yantangzhai] [SPARK-2714] DAGScheduler logs jobid when runJob finishes
fbb1150 [yantangzhai] [SPARK-2714] DAGScheduler logs jobid when runJob finishes
7aec2a9 [yantangzhai] [SPARK-2714] DAGScheduler logs jobid when runJob finishes
fb42f0f [yantangzhai] [SPARK-2714] DAGScheduler logs jobid when runJob finishes
090d908 [yantangzhai] [SPARK-2714] DAGScheduler logs jobid when runJob finishes
2014-09-15 16:57:38 -07:00
Kousuke Saruta e59fac1f97 [SPARK-3518] Remove wasted statement in JsonProtocol
Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>

Closes #2380 from sarutak/SPARK-3518 and squashes the following commits:

8a1464e [Kousuke Saruta] Replaced a variable with simple field reference
c660fbc [Kousuke Saruta] Removed useless statement in JsonProtocol.scala
2014-09-15 16:11:41 -07:00
Davies Liu 4e3fbe8cdb [SPARK-3463] [PySpark] aggregate and show spilled bytes in Python
Aggregate the number of bytes spilled into disks during aggregation or sorting, show them in Web UI.

![spilled](https://cloud.githubusercontent.com/assets/40902/4209758/4b995562-386d-11e4-97c1-8e838ee1d4e3.png)

This patch is blocked by SPARK-3465. (It includes a fix for that).

Author: Davies Liu <davies.liu@gmail.com>

Closes #2336 from davies/metrics and squashes the following commits:

e37df38 [Davies Liu] remove outdated comments
1245eb7 [Davies Liu] remove the temporary fix
ebd2f43 [Davies Liu] Merge branch 'master' into metrics
7e4ad04 [Davies Liu] Merge branch 'master' into metrics
fbe9029 [Davies Liu] show spilled bytes in Python in web ui
2014-09-13 22:31:21 -07:00
Davies Liu 2aea0da84c [SPARK-3030] [PySpark] Reuse Python worker
Reuse Python worker to avoid the overhead of fork() Python process for each tasks. It also tracks the broadcasts for each worker, avoid sending repeated broadcasts.

This can reduce the time for dummy task from 22ms to 13ms (-40%). It can help to reduce the latency for Spark Streaming.

For a job with broadcast (43M after compress):
```
    b = sc.broadcast(set(range(30000000)))
    print sc.parallelize(range(24000), 100).filter(lambda x: x in b.value).count()
```
It will finish in 281s without reused worker, and it will finish in 65s with reused worker(4 CPUs). After reusing the worker, it can save about 9 seconds for transfer and deserialize the broadcast for each tasks.

It's enabled by default, could be disabled by `spark.python.worker.reuse = false`.

Author: Davies Liu <davies.liu@gmail.com>

Closes #2259 from davies/reuse-worker and squashes the following commits:

f11f617 [Davies Liu] Merge branch 'master' into reuse-worker
3939f20 [Davies Liu] fix bug in serializer in mllib
cf1c55e [Davies Liu] address comments
3133a60 [Davies Liu] fix accumulator with reused worker
760ab1f [Davies Liu] do not reuse worker if there are any exceptions
7abb224 [Davies Liu] refactor: sychronized with itself
ac3206e [Davies Liu] renaming
8911f44 [Davies Liu] synchronized getWorkerBroadcasts()
6325fc1 [Davies Liu] bugfix: bid >= 0
e0131a2 [Davies Liu] fix name of config
583716e [Davies Liu] only reuse completed and not interrupted worker
ace2917 [Davies Liu] kill python worker after timeout
6123d0f [Davies Liu] track broadcasts for each worker
8d2f08c [Davies Liu] reuse python worker
2014-09-13 16:22:04 -07:00
Reynold Xin b4dded40fb Proper indent for the previous commit. 2014-09-12 22:51:25 -07:00
Sean Owen feaa3706f1 SPARK-3470 [CORE] [STREAMING] Add Closeable / close() to Java context objects
...  that expose a stop() lifecycle method. This doesn't add `AutoCloseable`, which is Java 7+ only. But it should be possible to use try-with-resources on a `Closeable` in Java 7, as long as the `close()` does not throw a checked exception, and these don't. Q.E.D.

Author: Sean Owen <sowen@cloudera.com>

Closes #2346 from srowen/SPARK-3470 and squashes the following commits:

612c21d [Sean Owen] Add Closeable / close() to Java context objects that expose a stop() lifecycle method
2014-09-12 22:50:37 -07:00
Reynold Xin 2584ea5b23 [SPARK-3469] Make sure all TaskCompletionListener are called even with failures
This is necessary because we rely on this callback interface to clean resources up. The old behavior would lead to resource leaks.

Note that this also changes the fault semantics of TaskCompletionListener. Previously failures in TaskCompletionListeners would result in the task being reported immediately. With this change, we report the exception at the end, and the reported exception is a TaskCompletionListenerException that contains all the exception messages.

Author: Reynold Xin <rxin@apache.org>

Closes #2343 from rxin/taskcontext-callback and squashes the following commits:

a3845b2 [Reynold Xin] Mark TaskCompletionListenerException as private[spark].
ac5baea [Reynold Xin] Removed obsolete comment.
aa68ea4 [Reynold Xin] Throw an exception if task completion callback fails.
29b6162 [Reynold Xin] oops compilation failed.
1cb444d [Reynold Xin] [SPARK-3469] Call all TaskCompletionListeners even if some fail.
2014-09-12 21:55:39 -07:00
Marcelo Vanzin af2583826c [SPARK-3217] Add Guava to classpath when SPARK_PREPEND_CLASSES is set.
When that option is used, the compiled classes from the build directory
are prepended to the classpath. Now that we avoid packaging Guava, that
means we have classes referencing the original Guava location in the app's
classpath, so errors happen.

For that case, add Guava manually to the classpath.

Note: if Spark is compiled with "-Phadoop-provided", it's tricky to
make things work with SPARK_PREPEND_CLASSES, because you need to add
the Hadoop classpath using SPARK_CLASSPATH and that means the older
Hadoop Guava overrides the newer one Spark needs. So someone using
SPARK_PREPEND_CLASSES needs to remember to not use that profile.

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #2141 from vanzin/SPARK-3217 and squashes the following commits:

b967324 [Marcelo Vanzin] [SPARK-3217] Add Guava to classpath when SPARK_PREPEND_CLASSES is set.
2014-09-12 14:54:42 -07:00
Sandy Ryza 1d767967e9 SPARK-3014. Log a more informative messages in a couple failure scenario...
...s

Author: Sandy Ryza <sandy@cloudera.com>

Closes #1934 from sryza/sandy-spark-3014 and squashes the following commits:

ae19cc1 [Sandy Ryza] SPARK-3014. Log a more informative messages in a couple failure scenarios
2014-09-12 16:48:28 -05:00
Davies Liu 42904b8d01 [SPARK-3465] fix task metrics aggregation in local mode
Before overwrite t.taskMetrics, take a deepcopy of it.

Author: Davies Liu <davies.liu@gmail.com>

Closes #2338 from davies/fix_metric and squashes the following commits:

a5cdb63 [Davies Liu] Merge branch 'master' into fix_metric
7c879e0 [Davies Liu] add more comments
754b5b8 [Davies Liu] copy taskMetrics only when isLocal is true
5ca26dc [Davies Liu] fix task metrics aggregation in local mode
2014-09-11 18:53:26 -07:00
witgo 33c7a738ae SPARK-2482: Resolve sbt warnings during build
At the same time, import the `scala.language.postfixOps` and ` org.scalatest.time.SpanSugar._` cause `scala.language.postfixOps` doesn't work

Author: witgo <witgo@qq.com>

Closes #1330 from witgo/sbt_warnings3 and squashes the following commits:

179ba61 [witgo] Resolve sbt warnings during build
2014-09-11 18:44:35 -07:00
Andrew Ash ce59725b87 [SPARK-3429] Don't include the empty string "" as a defaultAclUser
Changes logging from

```
14/09/05 02:01:08 INFO SecurityManager: Changing view acls to: aash,
14/09/05 02:01:08 INFO SecurityManager: Changing modify acls to: aash,
14/09/05 02:01:08 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(aash, ); users with modify permissions: Set(aash, )
```
to
```
14/09/05 02:28:28 INFO SecurityManager: Changing view acls to: aash
14/09/05 02:28:28 INFO SecurityManager: Changing modify acls to: aash
14/09/05 02:28:28 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(aash); users with modify permissions: Set(aash)
```

Note that the first set of logs have a Set of size 2 containing "aash" and the empty string ""

cc tgravescs

Author: Andrew Ash <andrew@andrewash.com>

Closes #2286 from ash211/empty-default-acl and squashes the following commits:

18cc612 [Andrew Ash] Use .isEmpty instead of ==""
cf973a1 [Andrew Ash] Don't include the empty string "" as a defaultAclUser
2014-09-11 17:28:36 -07:00
Andrew Or 6324eb7b5b [Spark-3490] Disable SparkUI for tests
We currently open many ephemeral ports during the tests, and as a result we occasionally can't bind to new ones. This has caused the `DriverSuite` and the `SparkSubmitSuite` to fail intermittently.

By disabling the `SparkUI` when it's not needed, we already cut down on the number of ports opened significantly, on the order of the number of `SparkContexts` ever created. We must keep it enabled for a few tests for the UI itself, however.

Author: Andrew Or <andrewor14@gmail.com>

Closes #2363 from andrewor14/disable-ui-for-tests and squashes the following commits:

332a7d5 [Andrew Or] No need to set spark.ui.port to 0 anymore
30c93a2 [Andrew Or] Simplify streaming UISuite
a431b84 [Andrew Or] Fix streaming test failures
8f5ae53 [Andrew Or] Fix no new line at the end
29c9b5b [Andrew Or] Disable SparkUI for tests
2014-09-11 17:18:46 -07:00
WangTaoTheTonic 558962a83f [SPARK-3411] Improve load-balancing of concurrently-submitted drivers across workers
If the waiting driver array is too big, the drivers in it will be dispatched to the first worker we get(if it has enough resources), with or without the Randomization.

We should do randomization every time we dispatch a driver, in order to better balance drivers.

Author: WangTaoTheTonic <barneystinson@aliyun.com>
Author: WangTao <barneystinson@aliyun.com>

Closes #1106 from WangTaoTheTonic/fixBalanceDrivers and squashes the following commits:

d1a928b [WangTaoTheTonic] Minor adjustment
b6560cf [WangTaoTheTonic] solve the shuffle problem for HashSet
f674e59 [WangTaoTheTonic] add comment and minor fix
2835929 [WangTao] solve the failed test and avoid filtering
2ca3091 [WangTao] fix checkstyle
bc91bb1 [WangTao] Avoid shuffle every time we schedule the driver using round robin
bbc7087 [WangTaoTheTonic] Optimize the schedule in Master
2014-09-10 13:06:47 -07:00
Prashant Sharma 02b5ac7191 Minor - Fix trivial compilation warnings.
Author: Prashant Sharma <prashant.s@imaginea.com>

Closes #2331 from ScrapCodes/compilation-warn and squashes the following commits:

44c1e76 [Prashant Sharma] Minor - Fix trivial compilation warnings.
2014-09-09 14:42:28 -07:00
scwf 26862337c9 [SPARK-3193]output errer info when Process exit code is not zero in test suite
https://issues.apache.org/jira/browse/SPARK-3193
I noticed that sometimes pr tests failed due to the Process exitcode != 0,refer to
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18688/consoleFull
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19118/consoleFull

[info] SparkSubmitSuite:
[info] - prints usage on empty input
[info] - prints usage with only --help
[info] - prints error with unrecognized options
[info] - handle binary specified but not class
[info] - handles arguments with --key=val
[info] - handles arguments to user program
[info] - handles arguments to user program with name collision
[info] - handles YARN cluster mode
[info] - handles YARN client mode
[info] - handles standalone cluster mode
[info] - handles standalone client mode
[info] - handles mesos client mode
[info] - handles confs with flag equivalents
[info] - launch simple application with spark-submit *** FAILED ***
[info]   org.apache.spark.SparkException: Process List(./bin/spark-submit, --class, org.apache.spark.deploy.SimpleApplicationTest, --name, testApp, --master, local, file:/tmp/1408854098404-0/testJar-1408854098404.jar) exited with code 1
[info]   at org.apache.spark.util.Utils$.executeAndGetOutput(Utils.scala:872)
[info]   at org.apache.spark.deploy.SparkSubmitSuite.runSparkSubmit(SparkSubmitSuite.scala:311)
[info]   at org.apache.spark.deploy.SparkSubmitSuite$$anonfun$14.apply$mcV$sp(SparkSubmitSuite.scala:291)
[info]   at org.apache.spark.deploy.SparkSubmitSuite$$anonfun$14.apply(SparkSubmitSuite.scala:284)
[info]   at org.apacSpark assembly has been built with Hive, including Datanucleus jars on classpath

this PR output the process error info when failed, it can be helpful for diagnosis.

Author: scwf <wangfei1@huawei.com>

Closes #2108 from scwf/output-test-error-info and squashes the following commits:

0c48082 [scwf] minor fix according to comments
563fde1 [scwf] output errer info when Process exitcode not zero
2014-09-09 11:57:01 -07:00
Sandy Ryza 88547a09fc SPARK-3422. JavaAPISuite.getHadoopInputSplits isn't used anywhere.
Author: Sandy Ryza <sandy@cloudera.com>

Closes #2324 from sryza/sandy-spark-3422 and squashes the following commits:

6446175 [Sandy Ryza] SPARK-3422. JavaAPISuite.getHadoopInputSplits isn't used anywhere.
2014-09-09 10:23:28 -07:00
Mark Hamstra 092e2f152f SPARK-2425 Don't kill a still-running Application because of some misbehaving Executors
Introduces a LOADING -> RUNNING ApplicationState transition and prevents Master from removing an Application with RUNNING Executors.

Two basic changes: 1) Instead of allowing MAX_NUM_RETRY abnormal Executor exits over the entire lifetime of the Application, allow that many since any Executor successfully began running the Application; 2) Don't remove the Application while Master still thinks that there are RUNNING Executors.

This should be fine as long as the ApplicationInfo doesn't believe any Executors are forever RUNNING when they are not.  I think that any non-RUNNING Executors will eventually no longer be RUNNING in Master's accounting, but another set of eyes should confirm that.  This PR also doesn't try to detect which nodes have gone rogue or to kill off bad Workers, so repeatedly failing Executors will continue to fail and fill up log files with failure reports as long as the Application keeps running.

Author: Mark Hamstra <markhamstra@gmail.com>

Closes #1360 from markhamstra/SPARK-2425 and squashes the following commits:

f099c0b [Mark Hamstra] Reuse appInfo
b2b7b25 [Mark Hamstra] Moved 'Application failed' logging
bdd0928 [Mark Hamstra] switched to string interpolation
1dd591b [Mark Hamstra] SPARK-2425 introduce LOADING -> RUNNING ApplicationState transition and prevent Master from removing Application with RUNNING Executors
2014-09-08 20:51:56 -07:00
Reynold Xin 08ce18881e [SPARK-3019] Pluggable block transfer interface (BlockTransferService)
This pull request creates a new BlockTransferService interface for block fetch/upload and refactors the existing ConnectionManager to implement BlockTransferService (NioBlockTransferService).

Most of the changes are simply moving code around. The main class to inspect is ShuffleBlockFetcherIterator.

Review guide:
- Most of the ConnectionManager code is now in network.cm package
- ManagedBuffer is a new buffer abstraction backed by several different implementations (file segment, nio ByteBuffer, Netty ByteBuf)
- BlockTransferService is the main internal interface introduced in this PR
- NioBlockTransferService implements BlockTransferService and replaces the old BlockManagerWorker
- ShuffleBlockFetcherIterator replaces the told BlockFetcherIterator to use the new interface

TODOs that should be separate PRs:
- Implement NettyBlockTransferService
- Finalize the API/semantics for ManagedBuffer.release()

Author: Reynold Xin <rxin@apache.org>

Closes #2240 from rxin/blockTransferService and squashes the following commits:

64cd9d7 [Reynold Xin] Merge branch 'master' into blockTransferService
1dfd3d7 [Reynold Xin] Limit the length of the FileInputStream.
1332156 [Reynold Xin] Fixed style violation from refactoring.
2960c93 [Reynold Xin] Added ShuffleBlockFetcherIteratorSuite.
e29c721 [Reynold Xin] Updated comment for ShuffleBlockFetcherIterator.
8a1046e [Reynold Xin] Code review feedback:
2c6b1e1 [Reynold Xin] Removed println in test cases.
2a907e4 [Reynold Xin] Merge branch 'master' into blockTransferService-merge
07ccf0d [Reynold Xin] Added init check to CMBlockTransferService.
98c668a [Reynold Xin] Added failure handling and fixed unit tests.
ae05fcd [Reynold Xin] Updated tests, although DistributedSuite is hanging.
d8d595c [Reynold Xin] Merge branch 'master' of github.com:apache/spark into blockTransferService
9ef279c [Reynold Xin] Initial refactoring to move ConnectionManager to use the BlockTransferService.
2014-09-08 15:59:20 -07:00