This is largely based on extracting the dynamic allocation parts from tnachen's #3861.
Author: Iulian Dragos <jaguarul@gmail.com>
Closes#4984 from dragos/issue/mesos-coarse-dynamicAllocation and squashes the following commits:
39df8cd [Iulian Dragos] Update tests to latest changes in core.
9d2c9fa [Iulian Dragos] Remove adjustment of executorLimitOption in doKillExecutors.
8b00f52 [Iulian Dragos] Latest round of reviews.
0cd00e0 [Iulian Dragos] Add persistent shuffle directory
15c45c1 [Iulian Dragos] Add dynamic allocation to the Spark coarse-grained scheduler.
(This reopens a patch that was closed in the past: #6248)
When you view the stage page while running the following:
```
sc.parallelize(1 to X, 10000).count()
```
The page never loads, the job is stalled, and you end up running into an OOM:
```
HTTP ERROR 500
Problem accessing /stages/stage/. Reason:
Server Error
Caused by:
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2367)
at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:130)
```
This patch compresses Jetty responses in gzip. The correct long-term fix is to add pagination.
Author: Andrew Or <andrew@databricks.com>
Closes#7296 from andrewor14/gzip-jetty and squashes the following commits:
a051c64 [Andrew Or] Use GZIP to compress Jetty responses
The configuration ```SPARK_EXECUTOR_CORES``` won't put into ```SparkConf```, so it has no effect to the dynamic executor allocation.
Author: xutingjun <xutingjun@huawei.com>
Closes#7322 from XuTingjun/SPARK_EXECUTOR_CORES and squashes the following commits:
2cafa89 [xutingjun] make SPARK_EXECUTOR_CORES has effect to dynamicAllocation
Currently, the mesos scheduler only looks at the 'cpu' and 'mem' resources when trying to determine the usablility of a resource offer from a mesos slave node. It may be preferable for the user to be able to ensure that the spark jobs are only started on a certain set of nodes (based on attributes).
For example, If the user sets a property, let's say `spark.mesos.constraints` is set to `tachyon=true;us-east-1=false`, then the resource offers will be checked to see if they meet both these constraints and only then will be accepted to start new executors.
Author: Ankur Chauhan <achauhan@brightcove.com>
Closes#5563 from ankurcha/mesos_attribs and squashes the following commits:
902535b [Ankur Chauhan] Fix line length
d83801c [Ankur Chauhan] Update code as per code review comments
8b73f2d [Ankur Chauhan] Fix imports
c3523e7 [Ankur Chauhan] Added docs
1a24d0b [Ankur Chauhan] Expand scope of attributes matching to include all data types
482fd71 [Ankur Chauhan] Update access modifier to private[this] for offer constraints
5ccc32d [Ankur Chauhan] Fix nit pick whitespace
1bce782 [Ankur Chauhan] Fix nit pick whitespace
c0cbc75 [Ankur Chauhan] Use offer id value for debug message
7fee0ea [Ankur Chauhan] Add debug statements
fc7eb5b [Ankur Chauhan] Fix import codestyle
00be252 [Ankur Chauhan] Style changes as per code review comments
662535f [Ankur Chauhan] Incorporate code review comments + use SparkFunSuite
fdc0937 [Ankur Chauhan] Decline offers that did not meet criteria
67b58a0 [Ankur Chauhan] Add documentation for spark.mesos.constraints
63f53f4 [Ankur Chauhan] Update codestyle - uniform style for config values
02031e4 [Ankur Chauhan] Fix scalastyle warnings in tests
c09ed84 [Ankur Chauhan] Fixed the access modifier on offerConstraints val to private[mesos]
0c64df6 [Ankur Chauhan] Rename overhead fractions to memory_*, fix spacing
8cc1e8f [Ankur Chauhan] Make exception message more explicit about the source of the error
addedba [Ankur Chauhan] Added test case for malformed constraint string
ec9d9a6 [Ankur Chauhan] Add tests for parse constraint string
72fe88a [Ankur Chauhan] Fix up tests + remove redundant method override, combine utility class into new mesos scheduler util trait
92b47fd [Ankur Chauhan] Add attributes based constraints support to MesosScheduler
Spark standalone master web UI show "Alive Workers" total core, total used cores and "Alive workers" total memory, memory used.
But the JSON API page "http://MASTERURL:8088/json" shows "ALL workers" core, memory number.
This webUI data is not sync with the JSON API.
The proper way is to sync the number with webUI and JSON API.
Author: Wisely Chen <wiselychen@appier.com>
Closes#7038 from thegiive/SPARK-8656 and squashes the following commits:
9e54bf0 [Wisely Chen] Change variable name to camel case
2c8ea89 [Wisely Chen] Change some styling and add local variable
431d2b0 [Wisely Chen] Worker List should contain DEAD node also
8b3b8e8 [Wisely Chen] [SPARK-8656] Fix the webUI and JSON API number is not synced
Latest changes after refactoring to the RPC layer. I rebased against trunk to make sure to get any recent changes since it had been a while. I wasn't crazy about the name `ConfigureTimeout` and `RpcTimeout` seemed to fit better, but I'm open to suggestions!
I ran most of the tests and they pass, but others would get stuck with "WARN TaskSchedulerImpl: Initial job has not accepted any resources". I think its just my machine, so I'd though I would push what I have anyway.
Still left to do:
* I only added a couple unit tests so far, there are probably some more cases to test
* Make sure all uses require a `RpcTimeout`
* Right now, both the `ask` and `Await.result` use the same timeout, should we differentiate between these in the TimeoutException message?
* I wrapped `Await.result` in `RpcTimeout`, should we also wrap `Await.ready`?
* Proper scoping of classes and methods
hardmettle, feel free to help out with any of these!
Author: Bryan Cutler <bjcutler@us.ibm.com>
Author: Harsh Gupta <harsh@Harshs-MacBook-Pro.local>
Author: BryanCutler <cutlerb@gmail.com>
Closes#6205 from BryanCutler/configTimeout-6980 and squashes the following commits:
46c8d48 [Bryan Cutler] [SPARK-6980] Changed RpcEnvSuite test to never reply instead of just sleeping, to avoid possible sync issues
06afa53 [Bryan Cutler] [SPARK-6980] RpcTimeout class extends Serializable, was causing error in MasterSuite
7bb70f1 [Bryan Cutler] Merge branch 'master' into configTimeout-6980
dbd5f73 [Bryan Cutler] [SPARK-6980] Changed RpcUtils askRpcTimeout and lookupRpcTimeout scope to private[spark] and improved deprecation warning msg
4e89c75 [Bryan Cutler] [SPARK-6980] Missed one usage of deprecated RpcUtils.askTimeout in YarnSchedulerBackend although it is not being used, and fixed SparkConfSuite UT to not use deprecated RpcUtils functions
6a1c50d [Bryan Cutler] [SPARK-6980] Minor cleanup of test case
7f4d78e [Bryan Cutler] [SPARK-6980] Fixed scala style checks
287059a [Bryan Cutler] [SPARK-6980] Removed extra import in AkkaRpcEnvSuite
3d8b1ff [Bryan Cutler] [SPARK-6980] Cleaned up imports in AkkaRpcEnvSuite
3a168c7 [Bryan Cutler] [SPARK-6980] Rewrote Akka RpcTimeout UTs in RpcEnvSuite
7636189 [Bryan Cutler] [SPARK-6980] Fixed call to askWithReply in DAGScheduler to use RpcTimeout - this was being compiled by auto-tupling and changing the message type of BlockManagerHeartbeat
be11c4e [Bryan Cutler] Merge branch 'master' into configTimeout-6980
039afed [Bryan Cutler] [SPARK-6980] Corrected import organization
218aa50 [Bryan Cutler] [SPARK-6980] Corrected issues from feedback
fadaf6f [Bryan Cutler] [SPARK-6980] Put back in deprecated RpcUtils askTimeout and lookupTimout to fix MiMa errors
fa6ed82 [Bryan Cutler] [SPARK-6980] Had to increase timeout on positive test case because a processor slowdown could trigger an Future TimeoutException
b05d449 [Bryan Cutler] [SPARK-6980] Changed constructor to use val duration instead of getter function, changed name of string property from conf to timeoutProp for consistency
c6cfd33 [Bryan Cutler] [SPARK-6980] Changed UT ask message timeout to explicitly intercept a SparkException
1394de6 [Bryan Cutler] [SPARK-6980] Moved MessagePrefix to createRpcTimeoutException directly
1517721 [Bryan Cutler] [SPARK-6980] RpcTimeout object scope should be private[spark]
2206b4d [Bryan Cutler] [SPARK-6980] Added unit test for ask then immediat awaitReply
1b9beab [Bryan Cutler] [SPARK-6980] Cleaned up import ordering
08f5afc [Bryan Cutler] [SPARK-6980] Added UT for constructing RpcTimeout with default value
d3754d1 [Bryan Cutler] [SPARK-6980] Added akkaConf to prevent dead letter logging
995d196 [Bryan Cutler] [SPARK-6980] Cleaned up import ordering, comments, spacing from PR feedback
7774d56 [Bryan Cutler] [SPARK-6980] Cleaned up UT imports
4351c48 [Bryan Cutler] [SPARK-6980] Added UT for addMessageIfTimeout, cleaned up UTs
1607a5f [Bryan Cutler] [SPARK-6980] Changed addMessageIfTimeout to PartialFunction, cleanup from PR comments
2f94095 [Bryan Cutler] [SPARK-6980] Added addMessageIfTimeout for when a Future is completed with TimeoutException
235919b [Bryan Cutler] [SPARK-6980] Resolved conflicts after master merge
c07d05c [Bryan Cutler] Merge branch 'master' into configTimeout-6980-tmp
b7fb99f [BryanCutler] Merge pull request #2 from hardmettle/configTimeoutUpdates_6980
4be3a8d [Harsh Gupta] Modifying loop condition to find property match
0ee5642 [Harsh Gupta] Changing the loop condition to halt at the first match in the property list for RpcEnv exception catch
f74064d [Harsh Gupta] Retrieving properties from property list using iterator and while loop instead of chained functions
a294569 [Bryan Cutler] [SPARK-6980] Added creation of RpcTimeout with Seq of property keys
23d2f26 [Bryan Cutler] [SPARK-6980] Fixed await result not being handled by RpcTimeout
49f9f04 [Bryan Cutler] [SPARK-6980] Minor cleanup and scala style fix
5b59a44 [Bryan Cutler] [SPARK-6980] Added some RpcTimeout unit tests
78a2c0a [Bryan Cutler] [SPARK-6980] Using RpcTimeout.awaitResult for future in AppClient now
97523e0 [Bryan Cutler] [SPARK-6980] Akka ask timeout description refactored to RPC layer
The existing test suite has a lot of duplicate code and doesn't even cover the most fundamental feature of the HeartbeatReceiver, which is expiring hosts that have not responded in a while.
This introduces manual clocks in `HeartbeatReceiver` and makes it respond to heartbeats only for registered executors. A few internal messages are moved to `receiveAndReply` to increase determinism of the tests so we don't have to rely on flaky constructs like `eventually`.
Author: Andrew Or <andrew@databricks.com>
Closes#7173 from andrewor14/heartbeat-receiver-tests and squashes the following commits:
4a903d6 [Andrew Or] Increase HeartReceiverSuite coverage and clean up
This patch rewrites the old checkpointing code in a way that is easier to understand. It also adds a guard against an invalid specification of checkpoint directory to provide a clearer error message. Most of the changes here are relatively minor.
Author: Andrew Or <andrew@databricks.com>
Closes#6968 from andrewor14/checkpoint-cleanup and squashes the following commits:
4ef8263 [Andrew Or] Use global synchronized instead
6f6fd84 [Andrew Or] Merge branch 'master' of github.com:apache/spark into checkpoint-cleanup
b1437ad [Andrew Or] Warn instead of throw
5484293 [Andrew Or] Merge branch 'master' of github.com:apache/spark into checkpoint-cleanup
7fb4af5 [Andrew Or] Guard against bad settings of checkpoint directory
691da98 [Andrew Or] Simplify checkpoint code / code style / comments
I've updated default values in comments, documentation, and in the command line builder to be 1g based on comments in the JIRA. I've also updated most usages to point at a single variable defined in the Utils.scala and JavaUtils.java files. This wasn't possible in all cases (R, shell scripts etc.) but usage in most code is now pointing at the same place.
Please let me know if I've missed anything.
Will the spark-shell use the value within the command line builder during instantiation?
Author: Ilya Ganelin <ilya.ganelin@capitalone.com>
Closes#7132 from ilganeli/SPARK-3071 and squashes the following commits:
4074164 [Ilya Ganelin] String fix
271610b [Ilya Ganelin] Merge branch 'SPARK-3071' of github.com:ilganeli/spark into SPARK-3071
273b6e9 [Ilya Ganelin] Test fix
fd67721 [Ilya Ganelin] Update JavaUtils.java
26cc177 [Ilya Ganelin] test fix
e5db35d [Ilya Ganelin] Fixed test failure
39732a1 [Ilya Ganelin] merge fix
a6f7deb [Ilya Ganelin] Created default value for DRIVER MEM in Utils that's now used in almost all locations instead of setting manually in each
09ad698 [Ilya Ganelin] Update SubmitRestProtocolSuite.scala
19b6f25 [Ilya Ganelin] Missed one doc update
2698a3d [Ilya Ganelin] Updated default value for driver memory
Author: Holden Karau <holden@pigscanfly.ca>
Closes#7171 from holdenk/SPARK-8769-toLocalIterator-documentation-improvement and squashes the following commits:
97ddd99 [Holden Karau] Add note
Author: Holden Karau <holden@pigscanfly.ca>
Closes#7172 from holdenk/SPARK-8771-actor-system-deprecation-tag-uses-deprecated-deprecation-tag and squashes the following commits:
7f1455b [Holden Karau] Add .0s to the versions for the derpecated anotations in SparkEnv.scala
ca13c9d [Holden Karau] Add a version to the deprecated annotation for the actorSystem in SparkEnv
If `fs.hdfs.impl.disable.cache` was `false`(default), `FileSystem` will use the cached `DFSClient` which use old token.
[AMDelegationTokenRenewer](https://github.com/apache/spark/blob/master/yarn/src/main/scala/org/apache/spark/deploy/yarn/AMDelegationTokenRenewer.scala#L196)
```scala
val credentials = UserGroupInformation.getCurrentUser.getCredentials
credentials.writeTokenStorageFile(tempTokenPath, discachedConfiguration)
```
Although the `credentials` had the new Token, but it still use the cached client and old token.
So It's better to set the `fs.hdfs.impl.disable.cache` as `true` to avoid token expired.
[Jira](https://issues.apache.org/jira/browse/SPARK-8688)
Author: huangzhaowei <carlmartinmax@gmail.com>
Closes#7069 from SaintBacchus/SPARK-8688 and squashes the following commits:
f94cd0b [huangzhaowei] modify function parameter
8fb9eb9 [huangzhaowei] explicit the comment
0cd55c9 [huangzhaowei] Rename function name to be an accurate one
cf776a1 [huangzhaowei] [SPARK-8688][YARN]Bug fix: disable the cache fs to gain the HDFS connection.
Otherwise other tests don't log anything useful...
Author: Marcelo Vanzin <vanzin@cloudera.com>
Closes#7140 from vanzin/SPARK-3444 and squashes the following commits:
de14836 [Marcelo Vanzin] Better fix.
6cff13a [Marcelo Vanzin] [SPARK-3444] [core] Restore INFO level after log4j test.
This PR updates the rest Actors in core to RpcEndpoint.
Because there is no `ActorSelection` in RpcEnv, I changes the logic of `registerWithMaster` in Worker and AppClient to avoid blocking the message loop. These changes need to be reviewed carefully.
Author: zsxwing <zsxwing@gmail.com>
Closes#5392 from zsxwing/rpc-rewrite-part3 and squashes the following commits:
2de7bed [zsxwing] Merge branch 'master' into rpc-rewrite-part3
f12d943 [zsxwing] Address comments
9137b82 [zsxwing] Fix the code style
e734c71 [zsxwing] Merge branch 'master' into rpc-rewrite-part3
2d24fb5 [zsxwing] Fix the code style
5a82374 [zsxwing] Merge branch 'master' into rpc-rewrite-part3
fa47110 [zsxwing] Merge branch 'master' into rpc-rewrite-part3
72304f0 [zsxwing] Update the error strategy for AkkaRpcEnv
e56cb16 [zsxwing] Always send failure back to the sender
a7b86e6 [zsxwing] Use JFuture for java.util.concurrent.Future
aa34b9b [zsxwing] Fix the code style
bd541e7 [zsxwing] Merge branch 'master' into rpc-rewrite-part3
25a84d8 [zsxwing] Use ThreadUtils
060ff31 [zsxwing] Merge branch 'master' into rpc-rewrite-part3
dbfc916 [zsxwing] Improve the docs and comments
837927e [zsxwing] Merge branch 'master' into rpc-rewrite-part3
5c27f97 [zsxwing] Merge branch 'master' into rpc-rewrite-part3
fadbb9e [zsxwing] Fix the code style
6637e3c [zsxwing] Merge remote-tracking branch 'origin/master' into rpc-rewrite-part3
7fdee0e [zsxwing] Fix the return type to ExecutorService and ScheduledExecutorService
e8ad0a5 [zsxwing] Fix the code style
6b2a104 [zsxwing] Log error and use SparkExitCode.UNCAUGHT_EXCEPTION exit code
fbf3194 [zsxwing] Add Utils.newDaemonSingleThreadExecutor and newDaemonSingleThreadScheduledExecutor
b776817 [zsxwing] Update Master, Worker, Client, AppClient and related classes to use RpcEndpoint
This issue was reported by saurfang. Thanks!
There is a following code in StagePage.scala.
```
|width="$serializationTimeProportion%"></rect>
|<rect class="getting-result-time-proportion"
|x="$gettingResultTimeProportionPos%" y="0px" height="26px"
|width="$gettingResultTimeProportion%"></rect></svg>',
|'start': new Date($launchTime),
|'end': new Date($finishTime)
|}
|""".stripMargin.replaceAll("\n", " ")
```
The last `replaceAll("\n", "")` doesn't work when we checkout and build source code on Windows and deploy on Linux.
It's because when we checkout the source code on Windows, new-line-code is replaced with `"\r\n"` and `replaceAll("\n", "")` replaces only `"\n"`.
Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
Closes#7133 from sarutak/SPARK-8739 and squashes the following commits:
17fb044 [Kousuke Saruta] Fixed a new-line-code issue
Because `System.currentTimeMillis()` is not accurate for tasks that only need several milliseconds, sometimes `totalExecutionTime` in `makeTimeline` will be 0. If `totalExecutionTime` is 0, there will the following error in the console.
![screen shot 2015-06-29 at 7 08 55 pm](https://cloud.githubusercontent.com/assets/1000778/8406776/5cd38e04-1e92-11e5-89f2-0c5134fe4b6b.png)
This PR fixes it by using an empty svg tag when `totalExecutionTime` is 0. This is a screenshot for a task that its totalExecutionTime is 0 after fixing it.
![screen shot 2015-06-30 at 12 26 52 am](https://cloud.githubusercontent.com/assets/1000778/8412896/7b33b4be-1ebf-11e5-9100-d6d656af3747.png)
Author: zsxwing <zsxwing@gmail.com>
Closes#7088 from zsxwing/SPARK-8705 and squashes the following commits:
9ee4ef5 [zsxwing] Address comments
ef2ecfa [zsxwing] Don't display rects when totalExecutionTime is 0
Showing these applications may lead to weird behavior in the History Server. For old logs, if
the app ID is recorded later, you may end up with a duplicate entry. For new logs, the app might
be listed with a ".inprogress" suffix.
So ignore those, but still allow old applications that don't record app IDs at all (1.0 and 1.1) to be shown.
Author: Marcelo Vanzin <vanzin@cloudera.com>
Author: Carson Wang <carson.wang@intel.com>
Closes#7097 from vanzin/SPARK-8372 and squashes the following commits:
a24eab2 [Marcelo Vanzin] Feedback.
112ae8f [Marcelo Vanzin] Merge branch 'master' into SPARK-8372
7b91b74 [Marcelo Vanzin] Handle logs generated by 1.0 and 1.1.
1eca3fe [Carson Wang] [SPARK-8372] History server shows incorrect information for application not started
when the ```taskEnd.reason``` is ```Resubmitted```, it shouldn't do statistics. Because this tasks has a ```SUCCESS``` taskEnd before.
Author: xutingjun <xutingjun@huawei.com>
Closes#6950 from XuTingjun/pageError and squashes the following commits:
af35dc3 [xutingjun] When taskEnd is Resubmitted, don't do statistics
Note that 'dir/*' can be more efficient in some Hadoop FS implementations that 'dir/' (now fixed scaladoc by using HTML entity for *)
Author: Sean Owen <sowen@cloudera.com>
Closes#7126 from srowen/SPARK-8437.2 and squashes the following commits:
7bb45da [Sean Owen] Note that 'dir/*' can be more efficient in some Hadoop FS implementations that 'dir/' (now fixed scaladoc by using HTML entity for *)
Look detail of this issue at [SPARK-8592](https://issues.apache.org/jira/browse/SPARK-8592)
**CoarseGrainedExecutorBackend** should exit when **RegisterExecutor** failed
Author: xuchenCN <chenxu198511@gmail.com>
Closes#7110 from xuchenCN/SPARK-8592 and squashes the following commits:
71e0077 [xuchenCN] [SPARK-8592] [CORE] CoarseGrainedExecutorBackend: Cannot register with driver => NPE
Subset the enabled algorithms in an SSLOptions to the elements that are supported by the protocol provider.
Update the list of ciphers in the sample config to include modern algorithms, and specify both Oracle and IBM names. In practice the user would either specify their own chosen cipher suites, or specify none, and delegate the decision to the provider.
Author: Tim Ellison <t.p.ellison@gmail.com>
Closes#7043 from tellison/SSLEnhancements and squashes the following commits:
034efa5 [Tim Ellison] Ensure Java imports are grouped and ordered by package.
3797f8b [Tim Ellison] Remove unnecessary use of Option to improve clarity, and fix import style ordering.
4b5c89f [Tim Ellison] More robust SSL options processing.
This is a simple change to add a new environment variable
"spark.sparkr.r.command" that specifies the command that SparkR will
use when creating an R engine process. If this is not specified,
"Rscript" will be used by default.
I did not add any documentation, since I couldn't find any place where
environment variables (such as "spark.sparkr.use.daemon") are
documented.
I also did not add a unit test. The only test that would work
generally would be one starting SparkR with
sparkR.init(sparkEnvir=list(spark.sparkr.r.command="Rscript")), just
using the default value. I think that this is a low-risk change.
Likely committers: shivaram
Author: Michael Sannella x268 <msannell@tibco.com>
Closes#6557 from msannell/altR and squashes the following commits:
7eac142 [Michael Sannella x268] add spark.sparkr.r.command config parameter
This PR also includes re-ordering the order that repositories are used when resolving packages. User provided repositories will be prioritized.
cc andrewor14
Author: Burak Yavuz <brkyvz@gmail.com>
Closes#7089 from brkyvz/delete-prev-ivy-resolution and squashes the following commits:
a21f95a [Burak Yavuz] remove previous ivy resolution when using spark-submit
Note that 'dir/*' can be more efficient in some Hadoop FS implementations that 'dir/'
Author: Sean Owen <sowen@cloudera.com>
Closes#7036 from srowen/SPARK-8437 and squashes the following commits:
0e813ae [Sean Owen] Note that 'dir/*' can be more efficient in some Hadoop FS implementations that 'dir/'
Hopefully, this suite will not be flaky anymore.
Author: Yin Huai <yhuai@databricks.com>
Closes#7027 from yhuai/SPARK-8567 and squashes the following commits:
c0167e2 [Yin Huai] Add sc.stop().
This patch excludes `hadoop-client`'s dependency on `mockito-all`. As of #7061, Spark depends on `mockito-core` instead of `mockito-all`, so the dependency from Hadoop was leading to test compilation failures for some of the Hadoop 2 SBT builds.
Author: Josh Rosen <joshrosen@databricks.com>
Closes#7090 from JoshRosen/SPARK-8709 and squashes the following commits:
e190122 [Josh Rosen] [SPARK-8709] Exclude hadoop-client's mockito-all dependency.
This is a follow up of #6404, the ScriptTransformation prints the error msg into stderr directly, probably be a disaster for application log.
Author: Cheng Hao <hao.cheng@intel.com>
Closes#6882 from chenghao-intel/verbose and squashes the following commits:
bfedd77 [Cheng Hao] revert the write
76ff46b [Cheng Hao] update the CircularBuffer
692b19e [Cheng Hao] check the process exitValue for ScriptTransform
47e0970 [Cheng Hao] Use the RedirectThread instead
1de771d [Cheng Hao] naming the threads in ScriptTransformation
8536e81 [Cheng Hao] disable the error message redirection for stderr
Use case - we want to log applicationId (YARN in hour case) to request help with troubleshooting from the DevOps
Author: Vladimir Vladimirov <vladimir.vladimirov@magnetic.com>
Closes#6936 from smartkiwi/master and squashes the following commits:
870338b [Vladimir Vladimirov] this would make doctest to run in python3
0eae619 [Vladimir Vladimirov] Scala doesn't use u'...' for unicode literals
14d77a8 [Vladimir Vladimirov] stop using ELLIPSIS
b4ebfc5 [Vladimir Vladimirov] addressed PR feedback - updated docstring
223a32f [Vladimir Vladimirov] fixed test - applicationId is property that returns the string
3221f5a [Vladimir Vladimirov] [SPARK-8528] added documentation for Scala
2cff090 [Vladimir Vladimirov] [SPARK-8528] add applicationId property for SparkContext object in pyspark
When there are massive tasks, such as `sc.parallelize(1 to 100000, 10000).count()`, the generated JS codes have a lot of string concatenations in the stage page, nearly 40 string concatenations for one task.
We can generate the whole string for a task instead of execution string concatenations in the browser.
Before this patch, the load time of the page is about 21 seconds.
![screen shot 2015-06-29 at 6 44 04 pm](https://cloud.githubusercontent.com/assets/1000778/8406644/eb55ed18-1e90-11e5-9ad5-50d27ad1dff1.png)
After this patch, it reduces to about 17 seconds.
![screen shot 2015-06-29 at 6 47 34 pm](https://cloud.githubusercontent.com/assets/1000778/8406665/087003ca-1e91-11e5-80a8-3485aa9adafa.png)
One disadvantage is that the generated JS codes become hard to read.
Author: zsxwing <zsxwing@gmail.com>
Closes#7082 from zsxwing/js-string and squashes the following commits:
b29231d [zsxwing] Avoid massive concating strings in Javascript
Spark's tests currently depend on `mockito-all`, which bundles Hamcrest and Objenesis classes. Instead, it should depend on `mockito-core`, which declares those libraries as Maven dependencies. This is necessary in order to fix a dependency conflict that leads to a NoSuchMethodError when using certain Hamcrest matchers.
See https://github.com/mockito/mockito/wiki/Declaring-mockito-dependency for more details.
Author: Josh Rosen <joshrosen@databricks.com>
Closes#7061 from JoshRosen/mockito-core-instead-of-all and squashes the following commits:
70eccbe [Josh Rosen] Depend on mockito-core instead of mockito-all.
If `RDD.getPreferredLocations()` throws an exception it may crash the DAGScheduler and SparkContext. This patch addresses this by adding a try-catch block.
Author: Josh Rosen <joshrosen@databricks.com>
Closes#7023 from JoshRosen/SPARK-8606 and squashes the following commits:
770b169 [Josh Rosen] Fix getPreferredLocations() DAGScheduler crash with try block.
44a9b55 [Josh Rosen] Add test of a buggy getPartitions() method
19aa9f7 [Josh Rosen] Add (failing) regression test for getPreferredLocations() DAGScheduler crash
Author: Sandy Ryza <sandy@cloudera.com>
Closes#7050 from sryza/sandy-spark-8623 and squashes the following commits:
58a8079 [Sandy Ryza] SPARK-8623. Hadoop RDDs fail to properly serialize configuration
Add `getStaticClass` method in SparkR's `RBackendHandler`
This is a fix for the problem referenced in [SPARK-5185](https://issues.apache.org/jira/browse/SPARK-5185).
cc shivaram
Author: cafreeman <cfreeman@alteryx.com>
Closes#7001 from cafreeman/branch-1.4 and squashes the following commits:
8f81194 [cafreeman] Add missing license
31aedcf [cafreeman] Refactor test to call an external R script
2c22073 [cafreeman] Merge branch 'branch-1.4' of github.com:apache/spark into branch-1.4
0bea809 [cafreeman] Fixed relative path issue and added smaller JAR
ee25e60 [cafreeman] Merge branch 'branch-1.4' of github.com:apache/spark into branch-1.4
9a5c362 [cafreeman] test for including JAR when launching sparkContext
9101223 [cafreeman] Merge branch 'branch-1.4' of github.com:apache/spark into branch-1.4
5a80844 [cafreeman] Fix style nits
7c6bd0c [cafreeman] [SPARK-8607] SparkR
(cherry picked from commit 2579948bf5)
Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
This commit adds a new metric, `messageProcessingTime`, to the DAGScheduler metrics source. This metrics tracks the time taken to process messages in the scheduler's event processing loop, which is a helpful debugging aid for diagnosing performance issues in the scheduler (such as SPARK-4961).
In order to do this, I moved the creation of the DAGSchedulerSource metrics source into DAGScheduler itself, similar to how MasterSource is created and registered in Master.
Author: Josh Rosen <joshrosen@databricks.com>
Closes#7002 from JoshRosen/SPARK-8344 and squashes the following commits:
57f914b [Josh Rosen] Fix import ordering
7d6bb83 [Josh Rosen] Add message processing time metrics to DAGScheduler
Fix for incorrect memory in Spark UI as per SPARK-5768
Author: Joshi <rekhajoshm@gmail.com>
Author: Rekha Joshi <rekhajoshm@gmail.com>
Closes#6972 from rekhajoshm/SPARK-5768 and squashes the following commits:
b678a91 [Joshi] Fix for incorrect memory in Spark UI
2fe53d9 [Joshi] Fix for incorrect memory in Spark UI
eb823b8 [Joshi] SPARK-5768: Fix for incorrect memory in Spark UI
0be142d [Rekha Joshi] Merge pull request #3 from apache/master
106fd8e [Rekha Joshi] Merge pull request #2 from apache/master
e3677c9 [Rekha Joshi] Merge pull request #1 from apache/master
This commit updates the shuffle read path to enable ShuffleReader implementations more control over the deserialization process.
The BlockStoreShuffleFetcher.fetch() method has been renamed to BlockStoreShuffleFetcher.fetchBlockStreams(). Previously, this method returned a record iterator; now, it returns an iterator of (BlockId, InputStream). Deserialization of records is now handled in the ShuffleReader.read() method.
This change creates a cleaner separation of concerns and allows implementations of ShuffleReader more flexibility in how records are retrieved.
Author: Matt Massie <massie@cs.berkeley.edu>
Author: Kay Ousterhout <kayousterhout@gmail.com>
Closes#6423 from massie/shuffle-api-cleanup and squashes the following commits:
8b0632c [Matt Massie] Minor Scala style fixes
d0a1b39 [Matt Massie] Merge pull request #1 from kayousterhout/massie_shuffle-api-cleanup
290f1eb [Kay Ousterhout] Added test for HashShuffleReader.read()
5186da0 [Kay Ousterhout] Revert "Add test to ensure HashShuffleReader is freeing resources"
f98a1b9 [Matt Massie] Add test to ensure HashShuffleReader is freeing resources
a011bfa [Matt Massie] Use PrivateMethodTester on check that delegate stream is closed
4ea1712 [Matt Massie] Small code cleanup for readability
7429a98 [Matt Massie] Update tests to check that BufferReleasingStream is closing delegate InputStream
f458489 [Matt Massie] Remove unnecessary map() on return Iterator
4abb855 [Matt Massie] Consolidate metric code. Make it clear why InterrubtibleIterator is needed.
5c30405 [Matt Massie] Return visibility of BlockStoreShuffleFetcher to private[hash]
7eedd1d [Matt Massie] Small Scala import cleanup
28f8085 [Matt Massie] Small import nit
f93841e [Matt Massie] Update shuffle read metrics in ShuffleReader instead of BlockStoreShuffleFetcher.
7e8e0fe [Matt Massie] Minor Scala style fixes
01e8721 [Matt Massie] Explicitly cast iterator in branches for type clarity
7c8f73e [Matt Massie] Close Block InputStream immediately after all records are read
208b7a5 [Matt Massie] Small code style changes
b70c945 [Matt Massie] Make BlockStoreShuffleFetcher visible to shuffle package
19135f2 [Matt Massie] [SPARK-7884] Allow Spark shuffle APIs to be more customizable
Author: Holden Karau <holden@pigscanfly.ca>
Closes#6918 from holdenk/SPARK-8498-fix-npe-in-errorhandling-path-in-unsafeshuffle-writer and squashes the following commits:
f807832 [Holden Karau] Log error if we can't throw it
855f9aa [Holden Karau] Spelling - not my strongest suite. Fix Propegates to Propagates.
039d620 [Holden Karau] Add missing closeandwriteoutput
30e558d [Holden Karau] go back to try/finally
e503b8c [Holden Karau] Improve the test to ensure we aren't masking the underlying exception
ae0b7a7 [Holden Karau] Fix the test
2e6abf7 [Holden Karau] Be more cautious when cleaning up during failed write and re-throw user exceptions
This patch also reenables the tests. Now that we have access to the log4j logs it should be easier to debug the flakiness.
yhuai brkyvz
Author: Andrew Or <andrew@databricks.com>
Closes#6886 from andrewor14/spark-submit-suite-fix and squashes the following commits:
3f99ff1 [Andrew Or] Move destroy to finally block
9a62188 [Andrew Or] Re-enable ignored tests
2382672 [Andrew Or] Check for exit code
This PR solves three SerializationDebugger issues.
* SPARK-7180 - SerializationDebugger fails with ArrayOutOfBoundsException
* SPARK-8090 - SerializationDebugger does not handle classes with writeReplace correctly
* SPARK-8091 - SerializationDebugger does not handle classes with writeObject method
The solutions for each are explained as follows
* SPARK-7180 - The wrong slot desc was used for getting the value of the fields in the object being tested.
* SPARK-8090 - Test the type of the replaced object.
* SPARK-8091 - Use a dummy ObjectOutputStream to collect all the objects written by the writeObject() method, and then test those objects as usual.
I also added more tests in the testsuite to increase code coverage. For example, added tests for cases where there are not serializability issues.
Author: Tathagata Das <tathagata.das1565@gmail.com>
Closes#6625 from tdas/SPARK-7180 and squashes the following commits:
c7cb046 [Tathagata Das] Addressed comments on docs
ae212c8 [Tathagata Das] Improved docs
304c97b [Tathagata Das] Fixed build error
26b5179 [Tathagata Das] more tests.....92% line coverage
7e2fdcf [Tathagata Das] Added more tests
d1967fb [Tathagata Das] Added comments.
da75d34 [Tathagata Das] Removed unnecessary lines.
50a608d [Tathagata Das] Fixed bugs and added support for writeObject
This is a follow-up of [SPARK-3288](https://issues.apache.org/jira/browse/SPARK-3288).
Author: Takuya UESHIN <ueshin@happy-camper.st>
Closes#6896 from ueshin/issues/SPARK-8476 and squashes the following commits:
89251d8 [Takuya UESHIN] Make inc/decDiskBytesSpilled in TaskMetrics private[spark].
This is a follow up PR for #6456 to make AppendOnlyMap consistent with OpenHashSet.
/cc srowen andrewor14
Author: zsxwing <zsxwing@gmail.com>
Closes#6879 from zsxwing/append-only-map and squashes the following commits:
912c0ad [zsxwing] Fix the doc
dd4385b [zsxwing] Make AppendOnlyMap use the same growth strategy of OpenHashSet and consistent exception message
Author: Sandy Ryza <sandy@cloudera.com>
Closes#6679 from sryza/sandy-spark-8135 and squashes the following commits:
c5554ff [Sandy Ryza] SPARK-8135. In SerializableWritable, don't load defaults when instantiating Configuration
Dependencies of artifacts in the local ivy cache were not being resolved properly. The dependencies were not being picked up. Now they should be.
cc andrewor14
Author: Burak Yavuz <brkyvz@gmail.com>
Closes#6788 from brkyvz/local-ivy-fix and squashes the following commits:
2875bf4 [Burak Yavuz] fix temp dir bug
48cc648 [Burak Yavuz] improve deletion
a69e3e6 [Burak Yavuz] delete cache before test as well
0037197 [Burak Yavuz] fix merge conflicts
f60772c [Burak Yavuz] use different folder for m2 cache during testing
b6ef038 [Burak Yavuz] [SPARK-8095] Resolve dependencies of Spark Packages in local ivy cache
```def getAllNodes: Seq[RDDOperationNode] =
{ _childNodes ++ _childClusters.flatMap(_.childNodes) }```
when the ```_childClusters``` has so many nodes, the process will hang on. I think we can improve the efficiency here.
Author: xutingjun <xutingjun@huawei.com>
Closes#6839 from XuTingjun/DAGImprove and squashes the following commits:
53b03ea [xutingjun] change code to more concise and easier to read
f98728b [xutingjun] fix words: node -> nodes
f87c663 [xutingjun] put the filter inside
81f9fd2 [xutingjun] put the filter inside
This is a follow-up PR to remove unused `PythonRDD.emptyRDD` added by #6826
Author: zsxwing <zsxwing@gmail.com>
Closes#6867 from zsxwing/remove-PythonRDD-emptyRDD and squashes the following commits:
b66d363 [zsxwing] Remove PythonRDD.emptyRDD
The previous growing strategy is alway doubling the capacity.
This PR adjusts the growing strategy: doubling the capacity but if overflow, use the maximum capacity as the new capacity. It increases the maximum capacity of PartitionedPairBuffer from `2 ^ 29` to `2 ^ 30 - 1`, the maximum capacity of PartitionedSerializedPairBuffer from `2 ^ 28` to `(2 ^ 29) - 1`, and the maximum capacity of AppendOnlyMap from `0.7 * (2 ^ 29)` to `(2 ^ 29)`.
Author: zsxwing <zsxwing@gmail.com>
Closes#6456 from zsxwing/SPARK-7913 and squashes the following commits:
abcb932 [zsxwing] Address comments
e30b61b [zsxwing] Increase the maximum capacity of AppendOnlyMap
05b6420 [zsxwing] Update the exception message
64fe227 [zsxwing] Increase the maximum capacity of PartitionedPairBuffer and PartitionedSerializedPairBuffer