Commit graph

7764 commits

Author SHA1 Message Date
yi.wu 125cbe3ae0 [SPARK-32736][CORE] Avoid caching the removed decommissioned executors in TaskSchedulerImpl
### What changes were proposed in this pull request?

The motivation of this PR is to avoid caching the removed decommissioned executors in `TaskSchedulerImpl`. The cache is introduced in https://github.com/apache/spark/pull/29422. The cache will hold the `isHostDecommissioned` info for a while. So if the task `FetchFailure` event comes after the executor loss event, `DAGScheduler` can still get the `isHostDecommissioned` from the cache and unregister the host shuffle map status when the host is decommissioned too.

This PR tries to achieve the same goal without the cache. Instead of saving the `workerLost` in `ExecutorUpdated` / `ExecutorDecommissionInfo` / `ExecutorDecommissionState`, we could save the `hostOpt` directly. When the host is decommissioned or lost too, the `hostOpt` can be a specific host address. Otherwise, it's `None` to indicate that only the executor is decommissioned or lost.

Now that we have the host info, we can also unregister the host shuffle map status when `executorLost` is triggered for the decommissioned executor.

Besides, this PR also includes a few cleanups around the touched code.

### Why are the changes needed?

It helps to unregister the shuffle map status earlier for both decommission and normal executor lost cases.

It also saves memory in  `TaskSchedulerImpl` and simplifies the code a little bit.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

This PR only refactor the code. The original behaviour should be covered by `DecommissionWorkerSuite`.

Closes #29579 from Ngone51/impr-decom.

Authored-by: yi.wu <yi.wu@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2020-09-08 04:40:13 +00:00
yi.wu e6fec33f18 [SPARK-32077][CORE] Support host-local shuffle data reading when external shuffle service is disabled
### What changes were proposed in this pull request?

This PR adds support to read host-local shuffle data from disk directly when external shuffle service is disabled.

Similar to #25299, we first try to get local disk directories for the shuffle data, which is located at the same host with the current executor. The only difference is, in #25299, it gets the directories from the external shuffle service while in this PR, it gets the directory from the executors.

To implement the feature, this PR extends the `HostLocalDirManager ` for both `ExternalBlockStoreClient` and `NettyBlockTransferService`. Also, this PR adds `getHostLocalDirs` for `NettyBlockTransferService` as `ExternalBlockStoreClient` does, in order to send the get-dir-request to the corresponding executor. And this PR resued the request message`GetLocalDirsForExecutors` for simple.

### Why are the changes needed?

After SPARK-27651 / #25299, Spark can read host-local shuffle data directly from disk when external shuffle service is enabled. To extend the future, we can also support it when the external shuffle service is disabled.

### Does this PR introduce _any_ user-facing change?

Yes. Before this PR, to use the host-local shuffle reading feature, users should not only enable `spark.shuffle.readHostLocalDisk` but also `spark.shuffle.service.enabled`. After this PR, enable `spark.shuffle.readHostLocalDisk` should be enough, and external shuffle service is no longer a pre-requirement.

### How was this patch tested?

Added test and tested manually.

Closes #28911 from Ngone51/support_node_local_shuffle.

Authored-by: yi.wu <yi.wu@databricks.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-09-02 13:03:44 -07:00
angerszhu 55ce49ed28 [SPARK-32400][SQL][TEST][FOLLOWUP][TEST-MAVEN] Fix resource loading error in HiveScripTransformationSuite
### What changes were proposed in this pull request?
#29401 move `test_script.py` from sql/hive module to sql/core module, cause HiveScripTransformationSuite load resource issue.

### Why are the changes needed?
This issue cause jenkins test failed in mvn

spark-master-test-maven-hadoop-2.7-hive-2.3-jdk-11: https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-2.7-hive-2.3-jdk-11/
spark-master-test-maven-hadoop-3.2-hive-2.3-jdk-11:
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-3.2-hive-2.3-jdk-11/
spark-master-test-maven-hadoop-3.2-hive-2.3:
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-3.2-hive-2.3/
![image](https://user-images.githubusercontent.com/46485123/91681585-71285a80-eb81-11ea-8519-99fc9783d6b9.png)

![image](https://user-images.githubusercontent.com/46485123/91681010-aaf86180-eb7f-11ea-8dbb-61365a3b0ab4.png)

Error as below:
```
 Exception thrown while executing Spark plan:
 HiveScriptTransformation [a#349299, b#349300, c#349301, d#349302, e#349303], python /home/jenkins/workspace/spark-master-test-maven-hadoop-2.7-hive-2.3-jdk-11/sql/hive/file:/home/jenkins/workspace/spark-master-test-maven-hadoop-2.7-hive-2.3-jdk-11/sql/core/target/spark-sql_2.12-3.1.0-SNAPSHOT-tests.jar!/test_script.py, [a#349309, b#349310, c#349311, d#349312, e#349313], ScriptTransformationIOSchema(List(),List(),Some(org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe),Some(org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe),List((field.delim, )),List((field.delim, )),Some(org.apache.hadoop.hive.ql.exec.TextRecordReader),Some(org.apache.hadoop.hive.ql.exec.TextRecordWriter),false)
+- Project [_1#349288 AS a#349299, _2#349289 AS b#349300, _3#349290 AS c#349301, _4#349291 AS d#349302, _5#349292 AS e#349303]
   +- LocalTableScan [_1#349288, _2#349289, _3#349290, _4#349291, _5#349292]

 == Exception ==
 org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 18021.0 failed 1 times, most recent failure: Lost task 0.0 in stage 18021.0 (TID 37324) (192.168.10.31 executor driver): org.apache.spark.SparkException: Subprocess exited with status 2. Error: python: can't open file '/home/jenkins/workspace/spark-master-test-maven-hadoop-2.7-hive-2.3-jdk-11/sql/hive/file:/home/jenkins/workspace/spark-master-test-maven-hadoop-2.7-hive-2.3-jdk-11/sql/core/target/spark-sql_2.12-3.1.0-SNAPSHOT-tests.jar!/test_script.py': [Errno 2] No such file or directory

 at org.apache.spark.sql.execution.BaseScriptTransformationExec.checkFailureAndPropagate(BaseScriptTransformationExec.scala:180)
 at org.apache.spark.sql.execution.BaseScriptTransformationExec.checkFailureAndPropagate$(BaseScriptTransformationExec.scala:157)
 at org.apache.spark.sql.hive.execution.HiveScriptTransformationExec.checkFailureAndPropagate(HiveScriptTransformationExec.scala:49)
 at org.apache.spark.sql.hive.execution.HiveScriptTransformationExec$$anon$1.hasNext(HiveScriptTransformationExec.scala:110)
 at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
 at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:340)
 at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:898)
 at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:898)
 at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
 at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
 at org.apache.spark.scheduler.Task.run(Task.scala:127)
 at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:480)
 at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1426)
 at o
```
### Does this PR introduce _any_ user-facing change?
NO

### How was this patch tested?
Existed UT

Closes #29588 from AngersZhuuuu/SPARK-32400-FOLLOWUP.

Authored-by: angerszhu <angers.zhu@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-09-02 18:27:29 +09:00
Udbhav30 065f17386d [SPARK-32481][CORE][SQL] Support truncate table to move data to trash
### What changes were proposed in this pull request?
Instead of deleting the data, we can move the data to trash.
Based on the configuration provided by the user it will be deleted permanently from the trash.

### Why are the changes needed?
Instead of directly deleting the data, we can provide flexibility to move data to the trash and then delete it permanently.

### Does this PR introduce _any_ user-facing change?
Yes, After truncate table the data is not permanently deleted now.
It is first moved to the trash and then after the given time deleted permanently;

### How was this patch tested?
new UTs added

Closes #29552 from Udbhav30/truncate.

Authored-by: Udbhav30 <u.agrawal30@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-08-30 10:25:32 -07:00
Devesh Agrawal b786f31a42 [SPARK-32643][CORE][K8S] Consolidate state decommissioning in the TaskSchedulerImpl realm
### What changes were proposed in this pull request?
The decommissioning state is a bit fragment across two places in the TaskSchedulerImpl:

https://github.com/apache/spark/pull/29014/ stored the incoming decommission info messages in TaskSchedulerImpl.executorsPendingDecommission.
While https://github.com/apache/spark/pull/28619/ was storing just the executor end time in the map TaskSetManager.tidToExecutorKillTimeMapping (which in turn is contained in TaskSchedulerImpl).
While the two states are not really overlapping, it's a bit of a code hygiene concern to save this state in two places.

With https://github.com/apache/spark/pull/29422, TaskSchedulerImpl is emerging as the place where all decommissioning book keeping is kept within the driver. So consolidate the information in _tidToExecutorKillTimeMapping_ into _executorsPendingDecommission_.

However, in order to do so, we need to walk away from keeping the raw ExecutorDecommissionInfo messages and instead keep another class ExecutorDecommissionState. This decoupling will allow the RPC message class ExecutorDecommissionInfo to evolve independently from the book keeping ExecutorDecommissionState.

### Why are the changes needed?

This is just a code cleanup. These two features were added independently and its time to consolidate their state for good hygiene.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Existing tests.

Closes #29452 from agrawaldevesh/consolidate_decom_state.

Authored-by: Devesh Agrawal <devesh.agrawal@gmail.com>
Signed-off-by: Holden Karau <hkarau@apple.com>
2020-08-26 15:16:47 -07:00
Dongjoon Hyun 2dee4352a0 Revert "[SPARK-32481][CORE][SQL] Support truncate table to move data to trash"
This reverts commit 5c077f0580.
2020-08-26 11:24:35 -07:00
Udbhav30 5c077f0580 [SPARK-32481][CORE][SQL] Support truncate table to move data to trash
### What changes were proposed in this pull request?
Instead of deleting the data, we can move the data to trash.
Based on the configuration provided by the user it will be deleted permanently from the trash.

### Why are the changes needed?
Instead of directly deleting the data, we can provide flexibility to move data to the trash and then delete it permanently.

### Does this PR introduce _any_ user-facing change?
Yes, After truncate table the data is not permanently deleted now.
It is first moved to the trash and then after the given time deleted permanently;

### How was this patch tested?
new UTs added

Closes #29387 from Udbhav30/tuncateTrash.

Authored-by: Udbhav30 <u.agrawal30@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-08-25 23:38:43 -07:00
Thomas Graves 2feab4ef4f [SPARK-32287][TESTS][FOLLOWUP] Add debugging for flaky ExecutorAllocationManagerSuite
### What changes were proposed in this pull request?

fixing flaky test in ExecutorAllocationManagerSuite. The issue is that there is a timing issue when we do a reset as to when the numExecutorsToAddPerResourceProfileId gets reset. The fix is to just always set those values back to 1 when we call reset().

### Why are the changes needed?

fixing flaky test in ExecutorAllocationManagerSuite

### Does this PR introduce _any_ user-facing change?

no
### How was this patch tested?

ran the unit test via this PR a bunch of times and the fix seems to be working.

Closes #29508 from tgravescs/debugExecAllocTest.

Authored-by: Thomas Graves <tgraves@nvidia.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-08-25 23:05:35 +09:00
Daniel Moore a3179a78b6 [SPARK-32664][CORE] Switch the log level from info to debug at BlockManager.getLocalBlockData
### What changes were proposed in this pull request?
Changing an info log to a debug log based on SPARK-32664

### Why are the changes needed?
It is outlined in SPARK-32664

### Does this PR introduce _any_ user-facing change?
There are changes to the debug and info logs

### How was this patch tested?
Tested by looking at the logs

Closes #29527 from dmoore62/SPARK-32664.

Authored-by: Daniel Moore <moore@knights.ucf.edu>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-08-25 22:45:44 +09:00
Baohe Zhang 9151a589a7 [SPARK-31608][CORE][WEBUI][TEST] Add test suites for HybridStore and HistoryServerMemoryManager
### What changes were proposed in this pull request?
This pull request adds 2 test suites for 2 new classes HybridStore and HistoryServerMemoryManager, which were created in https://github.com/apache/spark/pull/28412. This pull request also did some minor changes in these 2 classes to expose some variables for testing. Besides 2 suites, this pull request adds a unit test in FsHistoryProviderSuite to test parsing logs with HybridStore.

### Why are the changes needed?
Unit tests are needed for new features.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Unit tests.

Closes #29509 from baohe-zhang/SPARK-31608-UT.

Authored-by: Baohe Zhang <baohe.zhang@verizonmedia.com>
Signed-off-by: Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com>
2020-08-25 12:07:49 +09:00
Michael Munday bc23bb7882 [SPARK-32588][CORE][TEST] Fix SizeEstimator initialization in tests
In order to produce consistent results from SizeEstimator the tests
override some system properties that are used during SizeEstimator
initialization. However there were several places where either the
compressed references property wasn't set or the system properties
were set but the SizeEstimator not re-initialized.

This caused failures when running the tests with a large heap build
of OpenJ9 because it does not use compressed references unlike most
environments.

### What changes were proposed in this pull request?
Initialize SizeEstimator class explicitly in the tests where required to avoid relying on a particular environment.

### Why are the changes needed?
Test failures can be seen when compressed references are disabled (e.g. using an OpenJ9 large heap build or Hotspot with a large heap).

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Tests run on machine running OpenJ9 large heap build.

Closes #29407 from mundaym/fix-sizeestimator.

Authored-by: Michael Munday <mike.munday@ibm.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
2020-08-24 10:19:30 -05:00
Smith Cruise a4d785dadc [MINOR] Typo in ShuffleMapStage.scala
Simple typo

### What changes were proposed in this pull request?

### Why are the changes needed?

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

Closes #29486 from Smith-Cruise/patch-1.

Authored-by: Smith Cruise <chendingchao1@126.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
2020-08-22 13:26:59 -05:00
yangjie01 25c7d0fe6a [SPARK-32526][SQL] Pass all test of sql/catalyst module in Scala 2.13
### What changes were proposed in this pull request?
The purpose of this pr is to resolve [SPARK-32526](https://issues.apache.org/jira/browse/SPARK-32526), all remaining failed cases are fixed.

The main change of this pr as follow:

- Change of `ExecutorAllocationManager.scala` for core module compilation in Scala 2.13, it's a blocking problem

- Change `Seq[_]` to `scala.collection.Seq[_]` refer to failed cases

- Added different expected plan of `Test 4: Star with several branches` of StarJoinCostBasedReorderSuite  for Scala 2.13 because the candidates plans:

```
Join Inner, (d1_pk#5 = f1_fk1#0)
:- Join Inner, (f1_fk2#1 = d2_pk#8)
:  :- Join Inner, (f1_fk3#2 = d3_pk#11)
```
and

```
Join Inner, (f1_fk2#1 = d2_pk#8)
:- Join Inner, (d1_pk#5 = f1_fk1#0)
:  :- Join Inner, (f1_fk3#2 = d3_pk#11)
```

have same cost `Cost(200,9200)`, but `HashMap` is rewritten in scala 2.13 and The order of iterations leads to different results.

This pr fix test cases as follow:

- LiteralExpressionSuite (1 FAILED -> PASS)
- StarJoinCostBasedReorderSuite ( 1 FAILED-> PASS)
- ObjectExpressionsSuite( 2 FAILED-> PASS)
- ScalaReflectionSuite (1 FAILED-> PASS)
- RowEncoderSuite (10 FAILED-> PASS)
- ExpressionEncoderSuite  (ABORTED-> PASS)

### Why are the changes needed?
We need to support a Scala 2.13 build.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
<!--
- Scala 2.12: Pass the Jenkins or GitHub Action

- Scala 2.13: Do the following:

```
dev/change-scala-version.sh 2.13
mvn clean install -DskipTests  -pl sql/catalyst -Pscala-2.13 -am
mvn test -pl sql/catalyst -Pscala-2.13
```

**Before**
```
Tests: succeeded 4035, failed 17, canceled 0, ignored 6, pending 0
*** 1 SUITE ABORTED ***
*** 15 TESTS FAILED ***
```

**After**

```
Tests: succeeded 4338, failed 0, canceled 0, ignored 6, pending 0
All tests passed.
```

Closes #29434 from LuciferYang/sql-catalyst-tests.

Authored-by: yangjie01 <yangjie01@baidu.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
2020-08-22 09:24:16 -05:00
Brandon Jiang 1450b5e095 [MINOR][DOCS] fix typo for docs,log message and comments
### What changes were proposed in this pull request?
Fix typo for docs, log messages and comments

### Why are the changes needed?
typo fix to increase readability

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
manual test has been performed to test the updated

Closes #29443 from brandonJY/spell-fix-doc.

Authored-by: Brandon Jiang <Brandon.jiang.a@outlook.com>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
2020-08-22 06:45:35 +09:00
yi.wu 44a288fc41 [SPARK-32653][CORE] Decommissioned host/executor should be considered as inactive in TaskSchedulerImpl
### What changes were proposed in this pull request?

Add decommissioning status checking for a host or executor while checking it's active or not. And a decommissioned host or executor should be considered as inactive.

### Why are the changes needed?

First of all, this PR is not a correctness bug fix but gives improvement indeed. And the main problem here we want to fix is that a decommissioned host or executor should be considered as inactive.

`TaskSetManager.computeValidLocalityLevels` depends on `TaskSchedulerImpl.isExecutorAlive/hasExecutorsAliveOnHost` to calculate the locality levels. Therefore, the `TaskSetManager` could also get corresponding locality levels of those decommissioned hosts or executors if they're not considered as inactive. However, on the other side,  `CoarseGrainedSchedulerBackend` won't construct the `WorkerOffer` for those decommissioned executors. That also means `TaskSetManager` might never have a chance to launch tasks at certain locality levels but only suffers the unnecessary delay because of delay scheduling. So, this PR helps to reduce this kind of unnecessary delay by making decommissioned host/executor inactive in `TaskSchedulerImpl`.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Added unit tests

Closes #29468 from Ngone51/fix-decom-alive.

Lead-authored-by: yi.wu <yi.wu@databricks.com>
Co-authored-by: Devesh Agrawal <devesh.agrawal@gmail.com>
Co-authored-by: wuyi <yi.wu@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2020-08-20 12:00:32 +00:00
Xingbo Jiang f793977e9a [SPARK-32658][CORE] Fix PartitionWriterStream partition length overflow
### What changes were proposed in this pull request?

The `count` in `PartitionWriterStream` should be a long value, instead of int. The issue is introduced by apache/sparkabef84a . When the overflow happens, the shuffle index file would record wrong index of a reduceId, thus lead to `FetchFailedException: Stream is corrupted` error.

Besides the fix, I also added some debug logs, so in the future it's easier to debug similar issues.

### Why are the changes needed?

This is a regression and bug fix.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

A Spark user reported this issue when migrating their workload to 3.0. One of the jobs fail deterministically on Spark 3.0 without the patch, and the job succeed after applied the fix.

Closes #29474 from jiangxb1987/fixPartitionWriteStream.

Authored-by: Xingbo Jiang <xingbo.jiang@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2020-08-20 07:08:30 +00:00
yi.wu a1a32d2eb5 [SPARK-32600][CORE] Unify task name in some logs between driver and executor
### What changes were proposed in this pull request?

This PR replaces some arbitrary task names in logs with the widely used task name (e.g. "task 0.0 in stage 1.0 (TID 1)") among driver and executor. This will change the task name in `TaskDescription` by appending TID.

### Why are the changes needed?

Some logs are still using TID(a.k.a `taskId`) only as the task name, e.g.,

7f275ee597/core/src/main/scala/org/apache/spark/executor/Executor.scala (L786)

7f275ee597/core/src/main/scala/org/apache/spark/executor/Executor.scala (L632-L635)

And the task thread name also only has the `taskId`:

7f275ee597/core/src/main/scala/org/apache/spark/executor/Executor.scala (L325)

As mentioned in https://github.com/apache/spark/pull/1259, TID itself does not capture stage or retries, making it harder to correlate with the application. It's inconvenient when debugging applications.

Actually, task name like "task name (e.g. "task 0.0 in stage 1.0 (TID 1)")" has already been used widely after https://github.com/apache/spark/pull/1259. We'd better follow the naming convention.

### Does this PR introduce _any_ user-facing change?

Yes. Users will see the more consistent task names in the log.

### How was this patch tested?

Manually checked.

Closes #29418 from Ngone51/unify-task-name.

Authored-by: yi.wu <yi.wu@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2020-08-19 08:44:49 +00:00
yi.wu 3092527f75 [SPARK-32651][CORE] Decommission switch configuration should have the highest hierarchy
### What changes were proposed in this pull request?

Rename `spark.worker.decommission.enabled` to `spark.decommission.enabled` and move it from `org.apache.spark.internal.config.Worker` to `org.apache.spark.internal.config.package`.

### Why are the changes needed?

Decommission has been supported in Standalone and k8s yet and may be supported in Yarn(https://github.com/apache/spark/pull/27636) in the future. Therefore, the switch configuration should have the highest hierarchy rather than belongs to Standalone's Worker. In other words, it should be independent of the cluster managers.

### Does this PR introduce _any_ user-facing change?

No, as the decommission feature hasn't been released.

### How was this patch tested?

Pass existed tests.

Closes #29466 from Ngone51/fix-decom-conf.

Authored-by: yi.wu <yi.wu@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2020-08-19 06:53:06 +00:00
yi.wu 70964e741a [SPARK-21040][CORE][FOLLOW-UP] Only calculate executorKillTime when speculation is enabled
### What changes were proposed in this pull request?

Only calculate `executorKillTime` in `TaskSetManager.executorDecommission()` when speculation is enabled.

### Why are the changes needed?

Avoid unnecessary operations to save time/memory.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass existed tests.

Closes #29464 from Ngone51/followup-SPARK-21040.

Authored-by: yi.wu <yi.wu@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2020-08-18 13:50:57 +00:00
Devesh Agrawal 1ac23dea52 [SPARK-32613][CORE] Fix regressions in DecommissionWorkerSuite
### What changes were proposed in this pull request?

The DecommissionWorkerSuite started becoming flaky and it revealed a real regression. Recently closed #29211 necessitates remembering the decommissioning shortly beyond the removal of the executor.

In addition to fixing this issue, ensure that DecommissionWorkerSuite continues to pass when executors haven't had a chance to exit eagery. That is the old behavior before #29211 also still works.

Added some more tests to TaskSchedulerImpl to ensure that the decommissioning information is indeed purged after a timeout.

Hardened the test DecommissionWorkerSuite to make it wait for successful job completion.

### Why are the changes needed?

First, let me describe the intended behavior of decommissioning: If a fetch failure happens where the source executor was decommissioned, we want to treat that as an eager signal to clear all shuffle state associated with that executor. In addition if we know that the host was decommissioned, we want to forget about all map statuses from all other executors on that decommissioned host. This is what the test "decommission workers ensure that fetch failures lead to rerun" is trying to test. This invariant is important to ensure that decommissioning a host does not lead to multiple fetch failures that might fail the job. This fetch failure can happen before the executor is truly marked "lost" because of heartbeat delays.

- However, #29211 eagerly exits the executors when they are done decommissioning. This removal of the executor was racing with the fetch failure. By the time the fetch failure is triggered the executor is already removed and thus has forgotten its decommissioning information. (I tested this by delaying the decommissioning). The fix is to keep the decommissioning information around for some time after removal with some extra logic to finally purge it after a timeout.

- In addition the executor loss can also bump up `shuffleFileLostEpoch` (added in #28848). This happens because when the executor is lost, it forgets the shuffle state about just that executor and increments the `shuffleFileLostEpoch`. This incrementing precludes the clearing of state of the entire host when the fetch failure happens because the failed task is still reusing the old epoch. The fix here is also simple: Ignore the `shuffleFileLostEpoch` when the shuffle status is being cleared due to a fetch failure resulting from host decommission.

I am strategically making both of these fixes be very local to decommissioning to avoid other regressions. Especially the version stuff is tricky (it hasn't been fundamentally changed since it was first introduced in 2013).

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Manually ran DecommissionWorkerSuite several times using a script and ensured it all passed.

### (Internal) Configs added
I added two configs, one of which is sort of meant for testing only:
- `spark.test.executor.decommission.initial.sleep.millis`: Initial delay by the decommissioner shutdown thread. Default is same as before of 1 second. This is used for testing only. This one is kept "hidden" (ie not added as a constant to avoid config bloat)
- `spark.executor.decommission.removed.infoCacheTTL`: Number of seconds to keep the removed executors decom entries around. It defaults to 5 minutes. It should be around the average time it takes for all of the shuffle data to be fetched from the mapper to the reducer, but I think that can take a while since the reducers also do a multistep sort.

Closes #29422 from agrawaldevesh/decom_fixes.

Authored-by: Devesh Agrawal <devesh.agrawal@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2020-08-18 06:47:31 +00:00
Cheng Su 8f0fef1843 [SPARK-32399][SQL] Full outer shuffled hash join
### What changes were proposed in this pull request?

Add support for full outer join inside shuffled hash join. Currently if the query is a full outer join, we only use sort merge join as the physical operator. However it can be CPU and IO intensive in case input table is large for sort merge join. Shuffled hash join on the other hand saves the sort CPU and IO compared to sort merge join, especially when table is large.

This PR implements the full outer join as followed:
* Process rows from stream side by looking up hash relation, and mark the matched rows from build side by:
  * for joining with unique key, a `BitSet` is used to record matched rows from build side (`key index` to represent each row)
  * for joining with non-unique key, a `HashSet[Long]` is  used to record matched rows from build side (`key index` + `value index` to represent each row).
`key index` is defined as the index into key addressing array `longArray` in `BytesToBytesMap`.
`value index` is defined as the iterator index of values for same key.

* Process rows from build side by iterating hash relation, and filter out rows from build side being looked up already (done in `ShuffledHashJoinExec.fullOuterJoin`)

For context, this PR was originally implemented as followed (up to commit e3322766d4):
1. Construct hash relation from build side, with extra boolean value at the end of row to track look up information (done in `ShuffledHashJoinExec.buildHashedRelation` and `UnsafeHashedRelation.apply`).
2. Process rows from stream side by looking up hash relation, and mark the matched rows from build side be looked up (done in `ShuffledHashJoinExec.fullOuterJoin`).
3. Process rows from build side by iterating hash relation, and filter out rows from build side being looked up already (done in `ShuffledHashJoinExec.fullOuterJoin`).

See discussion of pros and cons between these two approaches [here](https://github.com/apache/spark/pull/29342#issuecomment-672275450), [here](https://github.com/apache/spark/pull/29342#issuecomment-672288194) and [here](https://github.com/apache/spark/pull/29342#issuecomment-672640531).

TODO: codegen for full outer shuffled hash join can be implemented in another followup PR.

### Why are the changes needed?

As implementation in this PR, full outer shuffled hash join will have overhead to iterate build side twice (once for building hash map, and another for outputting non-matching rows), and iterate stream side once. However, full outer sort merge join needs to iterate both sides twice, and sort the large table can be more CPU and IO intensive. So full outer shuffled hash join can be more efficient than sort merge join when stream side is much more larger than build side.

For example query below, full outer SHJ saved 30% wall clock time compared to full outer SMJ.

```
def shuffleHashJoin(): Unit = {
    val N: Long = 4 << 22
    withSQLConf(
      SQLConf.SHUFFLE_PARTITIONS.key -> "2",
      SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key -> "20000000") {
      codegenBenchmark("shuffle hash join", N) {
        val df1 = spark.range(N).selectExpr(s"cast(id as string) as k1")
        val df2 = spark.range(N / 10).selectExpr(s"cast(id * 10 as string) as k2")
        val df = df1.join(df2, col("k1") === col("k2"), "full_outer")
        df.noop()
    }
  }
}
```

```
Running benchmark: shuffle hash join
  Running case: shuffle hash join off
  Stopped after 2 iterations, 16602 ms
  Running case: shuffle hash join on
  Stopped after 5 iterations, 31911 ms

Java HotSpot(TM) 64-Bit Server VM 1.8.0_181-b13 on Mac OS X 10.15.4
Intel(R) Core(TM) i9-9980HK CPU  2.40GHz
shuffle hash join:                        Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
shuffle hash join off                              7900           8301         567          2.1         470.9       1.0X
shuffle hash join on                               6250           6382          95          2.7         372.5       1.3X
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Added unit test in `JoinSuite.scala`, `AbstractBytesToBytesMapSuite.java` and `HashedRelationSuite.scala`.

Closes #29342 from c21/full-outer-shj.

Authored-by: Cheng Su <chengsu@fb.com>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
2020-08-17 08:06:19 +09:00
Kousuke Saruta 1a4c8f718f [SPARK-32119][CORE] ExecutorPlugin doesn't work with Standalone Cluster and Kubernetes with --jars
### What changes were proposed in this pull request?

This PR changes Executor to load jars and files added by --jars and --files on Executor initialization.
To avoid downloading those jars/files twice, they are assosiated with `startTime` as their uploaded timestamp.

### Why are the changes needed?

ExecutorPlugin can't work with Standalone Cluster and Kubernetes
when a jar which contains plugins and files used by the plugins are added by --jars and --files option with spark-submit.

This is because jars and files added by --jars and --files are not loaded on Executor initialization.
I confirmed it works with YARN because jars/files are distributed as distributed cache.

### Does this PR introduce _any_ user-facing change?

Yes. jars/files added by --jars and --files are downloaded on each executor on initialization.

### How was this patch tested?

Added a new testcase.

Closes #28939 from sarutak/fix-plugin-issue.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Mridul Muralidharan <mridul<at>gmail.com>
2020-08-14 17:10:22 -05:00
Holden Karau 548ac7c4af [SPARK-31198][CORE] Use graceful decommissioning as part of dynamic scaling
### What changes were proposed in this pull request?

If graceful decommissioning is enabled, Spark's dynamic scaling uses this instead of directly killing executors.

### Why are the changes needed?

When scaling down Spark we should avoid triggering recomputes as much as possible.

### Does this PR introduce _any_ user-facing change?

Hopefully their jobs run faster or at the same speed. It also enables experimental shuffle service free dynamic scaling when graceful decommissioning is enabled (using the same code as the shuffle tracking dynamic scaling).

### How was this patch tested?

For now I've extended the ExecutorAllocationManagerSuite for both core & streaming.

Closes #29367 from holdenk/SPARK-31198-use-graceful-decommissioning-as-part-of-dynamic-scaling.

Lead-authored-by: Holden Karau <hkarau@apple.com>
Co-authored-by: Holden Karau <holden@pigscanfly.ca>
Signed-off-by: Holden Karau <hkarau@apple.com>
2020-08-12 17:07:18 -07:00
yi.wu c6ea98323f [SPARK-32250][SPARK-27510][CORE][TEST] Fix flaky MasterSuite.test(...) in Github Actions
### What changes were proposed in this pull request?

Set more dispatcher threads for the flaky test.

### Why are the changes needed?

When running test on Github Actions machine, the available processors in JVM  is only 2, while on Jenkins it's 32. For this specific test, 2 available processors, which also decides the number of threads in Dispatcher, are not enough to consume the messages. In the worst situation, `MockExecutorLaunchFailWorker` would occupy these 2 threads for handling messages `LaunchDriver`, `LaunchExecutor` at the same time but leave no thread for the driver to handle the message `RegisteredApplication`. At the end, it results in a deadlock situation and causes the test failure.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

We can check whether the test is still flaky in Github Actions after this fix.

Closes #29408 from Ngone51/spark-32250.

Authored-by: yi.wu <yi.wu@databricks.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-08-12 21:05:50 +09:00
Venkata krishnan Sowrirajan 2d6eb00256 [SPARK-32596][CORE] Clear Ivy resolution files as part of finally block
<!--
Thanks for sending a pull request!  Here are some tips for you:
  1. If this is your first time, please read our contributor guidelines: https://spark.apache.org/contributing.html
  2. Ensure you have added or run the appropriate tests for your PR: https://spark.apache.org/developer-tools.html
  3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][SPARK-XXXX] Your PR title ...'.
  4. Be sure to keep the PR description updated to reflect all changes.
  5. Please write your PR title to summarize what this PR proposes.
  6. If possible, provide a concise example to reproduce the issue for a faster review.
  7. If you want to add a new configuration, please read the guideline first for naming configurations in
     'core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala'.
-->

### What changes were proposed in this pull request?
Clear Ivy resolution files as part of finally block if not failures while artifacts resolution can leave the resolution files around.
Use tempIvyPath for SparkSubmitUtils.buildIvySettings in tests. This why the test
"SPARK-10878: test resolution files cleaned after resolving artifact" did not capture these issues.

### Why are the changes needed?
This is a bug

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Existing unit tests

Closes #29411 from venkata91/SPARK-32596.

Authored-by: Venkata krishnan Sowrirajan <vsowrirajan@linkedin.com>
Signed-off-by: Mridul Muralidharan <mridul<at>gmail.com>
2020-08-12 00:22:22 -05:00
Maryann Xue 9587277717 [SPARK-32470][CORE] Remove task result size check for shuffle map stage
### What changes were proposed in this pull request?

This PR removes the total task result size check for shuffle map stage tasks, as these tasks return map status and metrics, which will not be cached on the driver and thus will not crash the driver.

### Why are the changes needed?

Checking total task result size for shuffle map stage tasks would lead to erroring normal jobs which create a big number of tasks even if the job eventually does not return a large dataset.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Added UT.

Closes #29276 from maryannxue/spark-32470.

Authored-by: Maryann Xue <maryann.xue@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2020-08-11 06:29:32 +00:00
Dongjoon Hyun b421bf0196 [SPARK-32517][CORE] Add StorageLevel.DISK_ONLY_3
### What changes were proposed in this pull request?

This PR aims to add `StorageLevel.DISK_ONLY_3` as a built-in `StorageLevel`.

### Why are the changes needed?

In a YARN cluster, HDFS uaually provides storages with replication factor 3. So, we can save the result to HDFS to get `StorageLevel.DISK_ONLY_3` technically. However, disaggregate clusters or clusters without storage services are rising. Previously, in that situation, the users were able to use similar `MEMORY_AND_DISK_2` or a user-created `StorageLevel`. This PR aims to support those use cases officially for better UX.

### Does this PR introduce _any_ user-facing change?

Yes. This provides a new built-in option.

### How was this patch tested?

Pass the GitHub Action or Jenkins with the revised test cases.

Closes #29331 from dongjoon-hyun/SPARK-32517.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-08-10 07:33:06 -07:00
Devesh Agrawal 34c61b9548 [SPARK-32575][CORE][TESTS] Bump up timeouts in BlockManagerDecommissionIntegrationSuite to reduce flakyness
### What changes were proposed in this pull request?

As reported by HyukjinKwon, BlockManagerDecommissionIntegrationSuite test is apparently still flaky (even after https://github.com/apache/spark/pull/29226): https://github.com/apache/spark/pull/29226#issuecomment-670286829.

The new flakyness is because the executors are not launching in the 6 seconds time out I had given them when run under github checks.

Bumped up the timeouts.

### Why are the changes needed?

To make this test not flaky so that it can give us high signal if decommissioning regresses.

### Does this PR introduce _any_ user-facing change?

No, unit test only check.

### How was this patch tested?

No new tests. Just github and jenkins.

Closes #29388 from agrawaldevesh/more_bm_harden.

Authored-by: Devesh Agrawal <devesh.agrawal@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-08-09 20:00:39 -07:00
Yan Xiaole cf37cd518e [SPARK-32557][CORE] Logging and swallowing the exception per entry in History server
### What changes were proposed in this pull request?
This PR adds a try catch wrapping the History server scan logic to log and swallow the exception per entry.

### Why are the changes needed?
As discussed in #29350 , one entry failure shouldn't affect others.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Manually tested.

Closes #29374 from yanxiaole/SPARK-32557.

Authored-by: Yan Xiaole <xiaole.yan@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-08-09 16:47:31 -07:00
Kousuke Saruta 34d9f1cf4c [SPARK-32462][WEBUI] Reset previous search text for datatable
### What changes were proposed in this pull request?

This PR proposes to change the behavior of DataTable for stage-page and executors-page not to save the previous search text.

### Why are the changes needed?

DataTable is used in stage-page and executors-page for pagination and filter tasks/executors by search text.
In the current implementation, search text is saved so if we visit stage-page for a job, the previous search text is filled in the textbox and the task table is filtered.
I'm sometimes surprised by this behavior as the stage-page lists no tasks because tasks are filtered by the previous search text.
I think, it's not useful.

### Does this PR introduce _any_ user-facing change?

Yes. Search text is no longer saved.

### How was this patch tested?

New testcase with the following command.
```
$ build/sbt -Dguava.version=27.0-jre -Dtest.default.exclude.tags= -Dspark.test.webdriver.chrome.driver=/path/to/chromedriver "testOnly org.apache.spark.ui.ChromeUISeleniumSuite -- -z Search"
```

Closes #29265 from sarutak/fix-search-box.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-08-08 22:14:29 -07:00
JoeyValentine dc3fac8184 [MINOR][DOCS] Fix typos at ExecutorAllocationManager.scala
### What changes were proposed in this pull request?

This PR fixes some typos in <code>core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala</code> file.

### Why are the changes needed?

<code>spark.dynamicAllocation.sustainedSchedulerBacklogTimeout</code> (N) is used only after the <code>spark.dynamicAllocation.schedulerBacklogTimeout</code> (M) is exceeded.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

No test needed.

Closes #29351 from JoeyValentine/master.

Authored-by: JoeyValentine <rlaalsdn0506@naver.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-08-08 12:36:07 -07:00
Kousuke Saruta 4e267f3eb9 [SPARK-32538][CORE][TEST] Use local time zone for the timestamp logged in unit-tests.log
### What changes were proposed in this pull request?

This PR lets the logger log timestamp based on local time zone during test.
`SparkFunSuite` fixes the default time zone to America/Los_Angeles so the timestamp logged in unit-tests.log is also based on the fixed time zone.

### Why are the changes needed?

It's confusable for developers whose time zone is not America/Los_Angeles.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Run existing tests and confirmed uint-tests.log.
If your local time zone is America/Los_Angeles, you can test by setting the environment variable `TZ` like as follows.
```
$ TZ=Asia/Tokyo build/sbt "testOnly org.apache.spark.executor.ExecutorSuite"
$ tail core/target/unit-tests.log
```

Closes #29356 from sarutak/fix-unit-test-log-timezone.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
2020-08-07 11:29:18 +09:00
Gengliang Wang e93b8f02cd [SPARK-32539][INFRA] Disallow FileSystem.get(Configuration conf) in style check by default
### What changes were proposed in this pull request?

Disallow `FileSystem.get(Configuration conf)` in Scala style check by default and suggest developers use `FileSystem.get(URI uri, Configuration conf)` or `Path.getFileSystem()` instead.

### Why are the changes needed?

The method `FileSystem.get(Configuration conf)` will return a default FileSystem instance if the conf `fs.file.impl` is not set. This can cause file not found exception on reading a target path of non-default file system, e.g. S3. It is hard to discover such a mistake via unit tests.
If we disallow it in Scala style check by default and suggest developers use `FileSystem.get(URI uri, Configuration conf)` or `Path.getFileSystem(Configuration conf)`, we can reduce potential regression and PR review effort.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Manually run scala style check and test.

Closes #29357 from gengliangwang/newStyleRule.

Authored-by: Gengliang Wang <gengliang.wang@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2020-08-06 05:56:59 +00:00
yi.wu 7f275ee597 [SPARK-32518][CORE] CoarseGrainedSchedulerBackend.maxNumConcurrentTasks should consider all kinds of resources
### What changes were proposed in this pull request?

1.  Make `CoarseGrainedSchedulerBackend.maxNumConcurrentTasks()` considers all kinds of resources when calculating the max concurrent tasks

2. Refactor `calculateAvailableSlots()` to make it be able to be used for both `CoarseGrainedSchedulerBackend` and `TaskSchedulerImpl`

### Why are the changes needed?

Currently, `CoarseGrainedSchedulerBackend.maxNumConcurrentTasks()` only considers the CPU for the max concurrent tasks. This can cause the application to hang when a barrier stage requires extra custom resources but the cluster doesn't have enough corresponding resources. Because, without the checking for other custom resources in `maxNumConcurrentTasks`, the barrier stage can be submitted to the `TaskSchedulerImpl`. But the `TaskSchedulerImpl` won't launch tasks for the barrier stage due to the insufficient task slots calculated by `TaskSchedulerImpl.calculateAvailableSlots` (which does check all kinds of resources).

The application hang issue can be reproduced by the added unit test.

### Does this PR introduce _any_ user-facing change?

Yes. In case of a barrier stage requires more custom resources than the cluster has, the application can get hang before this PR but can fail due to insufficient resources at the end after this PR.

### How was this patch tested?

Added a unit test.

Closes #29332 from Ngone51/fix-slots.

Authored-by: yi.wu <yi.wu@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2020-08-06 05:39:47 +00:00
Holden Karau 375d348a83 [SPARK-31197][CORE] Shutdown executor once we are done decommissioning
### What changes were proposed in this pull request?

Exit the executor when it has been asked to decommission and there is nothing left for it to do.

This is a rebase of https://github.com/apache/spark/pull/28817

### Why are the changes needed?

If we want to use decommissioning in Spark's own scale down we should terminate the executor once finished.
Furthermore, in graceful shutdown it makes sense to release resources we no longer need if we've been asked to shutdown by the cluster manager instead of always holding the resources as long as possible.

### Does this PR introduce _any_ user-facing change?

The decommissioned executors will exit and the end of decommissioning. This is sort of a user facing change, however decommissioning hasn't been in any releases yet.

### How was this patch tested?

I changed the unit test to not send the executor exit message and still wait on the executor exited message.

Closes #29211 from holdenk/SPARK-31197-exit-execs-redone.

Authored-by: Holden Karau <hkarau@apple.com>
Signed-off-by: Holden Karau <hkarau@apple.com>
2020-08-05 16:28:14 -07:00
Yan Xiaole c1d17df826 [SPARK-32529][CORE] Fix Historyserver log scan aborted by application status change
# What changes were proposed in this pull request?
This PR adds a `FileNotFoundException` try catch block while adding a new entry to history server application listing to skip the non-existing path.

### Why are the changes needed?
If there are a large number (>100k) of applications log dir, listing the log dir will take a few seconds. After getting the path list some applications might have finished already, and the filename will change from `foo.inprogress` to `foo`.

It leads to a problem when adding an entry to the listing, querying file status like `fileSizeForLastIndex` will throw out a `FileNotFoundException` exception if the application was finished. And the exception will abort current loop, in a busy cluster, it will make history server couldn't list and load any application log.

```
20/08/03 15:17:23 ERROR FsHistoryProvider: Exception in checking for event log updates
 java.io.FileNotFoundException: File does not exist: hdfs://xx/logs/spark/application_11111111111111.lz4.inprogress
 at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1527)
 at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1520)
 at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
 at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1520)
 at org.apache.spark.deploy.history.SingleFileEventLogFileReader.status$lzycompute(EventLogFileReaders.scala:170)
```

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
1. setup another script keeps changing the filename of applications under history log dir
2. launch the history server
3. check whether the `File does not exist` error log was gone.

Closes #29350 from yanxiaole/SPARK-32529.

Authored-by: Yan Xiaole <xiaole.yan@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-08-05 10:57:11 -07:00
Terry Kim 171b7d5d71 [SPARK-23431][CORE] Expose stage level peak executor metrics via REST API
### What changes were proposed in this pull request?

Note that this PR is forked from #23340 originally written by edwinalu.

This PR proposes to expose the peak executor metrics at the stage level via the REST APIs:
* `/applications/<application_id>/stages/`: peak values of executor metrics for **each stage**
* `/applications/<application_id>/stages/<stage_id>/< stage_attempt_id >`: peak values of executor metrics for **each executor** for the stage, followed by peak values of executor metrics for the stage

### Why are the changes needed?

The stage level peak executor metrics can help better understand your application's resource utilization.

### Does this PR introduce _any_ user-facing change?

1. For the `/applications/<application_id>/stages/` API, you will see the following new info for **each stage**:
```JSON
  "peakExecutorMetrics" : {
    "JVMHeapMemory" : 213367864,
    "JVMOffHeapMemory" : 189011656,
    "OnHeapExecutionMemory" : 0,
    "OffHeapExecutionMemory" : 0,
    "OnHeapStorageMemory" : 2133349,
    "OffHeapStorageMemory" : 0,
    "OnHeapUnifiedMemory" : 2133349,
    "OffHeapUnifiedMemory" : 0,
    "DirectPoolMemory" : 282024,
    "MappedPoolMemory" : 0,
    "ProcessTreeJVMVMemory" : 0,
    "ProcessTreeJVMRSSMemory" : 0,
    "ProcessTreePythonVMemory" : 0,
    "ProcessTreePythonRSSMemory" : 0,
    "ProcessTreeOtherVMemory" : 0,
    "ProcessTreeOtherRSSMemory" : 0,
    "MinorGCCount" : 13,
    "MinorGCTime" : 115,
    "MajorGCCount" : 4,
    "MajorGCTime" : 339
  }
```

2. For the `/applications/<application_id>/stages/<stage_id>/<stage_attempt_id>` API, you will see the following new info for **each executor** under `executorSummary`:
```JSON
  "peakMemoryMetrics" : {
    "JVMHeapMemory" : 0,
    "JVMOffHeapMemory" : 0,
    "OnHeapExecutionMemory" : 0,
    "OffHeapExecutionMemory" : 0,
    "OnHeapStorageMemory" : 0,
    "OffHeapStorageMemory" : 0,
    "OnHeapUnifiedMemory" : 0,
    "OffHeapUnifiedMemory" : 0,
    "DirectPoolMemory" : 0,
    "MappedPoolMemory" : 0,
    "ProcessTreeJVMVMemory" : 0,
    "ProcessTreeJVMRSSMemory" : 0,
    "ProcessTreePythonVMemory" : 0,
    "ProcessTreePythonRSSMemory" : 0,
    "ProcessTreeOtherVMemory" : 0,
    "ProcessTreeOtherRSSMemory" : 0,
    "MinorGCCount" : 0,
    "MinorGCTime" : 0,
    "MajorGCCount" : 0,
    "MajorGCTime" : 0
  }
```
, and the following at the stage level:
```JSON
"peakExecutorMetrics" : {
    "JVMHeapMemory" : 213367864,
    "JVMOffHeapMemory" : 189011656,
    "OnHeapExecutionMemory" : 0,
    "OffHeapExecutionMemory" : 0,
    "OnHeapStorageMemory" : 2133349,
    "OffHeapStorageMemory" : 0,
    "OnHeapUnifiedMemory" : 2133349,
    "OffHeapUnifiedMemory" : 0,
    "DirectPoolMemory" : 282024,
    "MappedPoolMemory" : 0,
    "ProcessTreeJVMVMemory" : 0,
    "ProcessTreeJVMRSSMemory" : 0,
    "ProcessTreePythonVMemory" : 0,
    "ProcessTreePythonRSSMemory" : 0,
    "ProcessTreeOtherVMemory" : 0,
    "ProcessTreeOtherRSSMemory" : 0,
    "MinorGCCount" : 13,
    "MinorGCTime" : 115,
    "MajorGCCount" : 4,
    "MajorGCTime" : 339
  }
```

### How was this patch tested?

Added tests.

Closes #29020 from imback82/metrics.

Lead-authored-by: Terry Kim <yuminkim@gmail.com>
Co-authored-by: edwinalu <edwina.lu@gmail.com>
Signed-off-by: Gengliang Wang <gengliang.wang@databricks.com>
2020-08-04 21:11:00 +08:00
Takuya UESHIN 7deb67c28f [SPARK-32160][CORE][PYSPARK][FOLLOWUP] Change the config name to switch allow/disallow SparkContext in executors
### What changes were proposed in this pull request?

This is a follow-up of #29278.
This PR changes the config name to switch allow/disallow `SparkContext` in executors as per the comment https://github.com/apache/spark/pull/29278#pullrequestreview-460256338.

### Why are the changes needed?

The config name `spark.executor.allowSparkContext` is more reasonable.

### Does this PR introduce _any_ user-facing change?

Yes, the config name is changed.

### How was this patch tested?

Updated tests.

Closes #29340 from ueshin/issues/SPARK-32160/change_config_name.

Authored-by: Takuya UESHIN <ueshin@databricks.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-08-04 12:45:06 +09:00
Takeshi Yamamuro c6109ba918 [SPARK-32257][SQL] Reports explicit errors for invalid usage of SET/RESET command
### What changes were proposed in this pull request?

This PR modified the parser code to handle invalid usages of a SET/RESET command.
For example;
```
SET spark.sql.ansi.enabled true
```
The above SQL command does not change the configuration value and it just tries to display the value of the configuration
`spark.sql.ansi.enabled true`. This PR disallows using special characters including spaces in the configuration name and reports a user-friendly error instead. In the error message, it tells users a workaround to use quotes or a string literal if they still needs to specify a configuration with them. 

Before this PR:
```
scala> sql("SET spark.sql.ansi.enabled true").show(1, -1)
+---------------------------+-----------+
|key                        |value      |
+---------------------------+-----------+
|spark.sql.ansi.enabled true|<undefined>|
+---------------------------+-----------+
```

After this PR:
```
scala> sql("SET spark.sql.ansi.enabled true")
org.apache.spark.sql.catalyst.parser.ParseException:
Expected format is 'SET', 'SET key', or 'SET key=value'. If you want to include special characters in key, please use quotes, e.g., SET `ke y`=value.(line 1, pos 0)

== SQL ==
SET spark.sql.ansi.enabled true
^^^
```

### Why are the changes needed?

For better user-friendly errors.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Added tests in `SparkSqlParserSuite`.

Closes #29146 from maropu/SPARK-32257.

Lead-authored-by: Takeshi Yamamuro <yamamuro@apache.org>
Co-authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2020-08-03 13:00:07 +00:00
Gengliang Wang 71aea02e9f [SPARK-32467][UI] Avoid encoding URL twice on https redirect
### What changes were proposed in this pull request?

When https is enabled for Spark UI, an HTTP request will be redirected as an encoded HTTPS URL: https://github.com/apache/spark/pull/10238/files#diff-f79a5ead735b3d0b34b6b94486918e1cR312

When we create the redirect url, we will call getRequestURI and getQueryString. Both two methods may return an encoded string. However, we pass them directly to the following URI constructor
```
URI(String scheme, String authority, String path, String query, String fragment)
```
As this URI constructor assumes both path and query parameters are decoded strings, it will encode them again. This makes the redirect URL encoded twice.

This problem is on stage page with HTTPS enabled. The URL of "/taskTable" contains query parameter `order%5B0%5D%5Bcolumn%5D`. After encoded it becomes  `order%255B0%255D%255Bcolumn%255D` and it will be decoded as `order%5B0%5D%5Bcolumn%5D` instead of `order[0][dir]`.  When the parameter `order[0][dir]` is missing, there will be an excetpion from:
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/status/api/v1/StagesResource.scala#L176
and the stage page fail to load.

To fix the problem, we can try decoding the query parameters before encoding it. This is to make sure we encode the URL

### Why are the changes needed?

Fix a UI issue when HTTPS is enabled

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

A new Unit test + manually test on a cluster

Closes #29271 from gengliangwang/urlEncode.

Authored-by: Gengliang Wang <gengliang.wang@databricks.com>
Signed-off-by: Gengliang Wang <gengliang.wang@databricks.com>
2020-08-01 13:09:26 +08:00
Venkata krishnan Sowrirajan 4eaf3a0a23 [SPARK-31418][CORE][FOLLOW-UP][MINOR] Fix log messages to print stage id instead of the object name
### What changes were proposed in this pull request?
Just few log lines fixes which are logging the object name instead of the stage IDs

### Why are the changes needed?
This would make it easier later for debugging.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Just log messages. Existing tests should be enough

Closes #29279 from venkata91/SPARK-31418-follow-up.

Authored-by: Venkata krishnan Sowrirajan <vsowrirajan@linkedin.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-07-31 22:12:24 +09:00
Takuya UESHIN 8014b0b5d6 [SPARK-32160][CORE][PYSPARK] Add a config to switch allow/disallow to create SparkContext in executors
### What changes were proposed in this pull request?

This is a follow-up of #28986.
This PR adds a config to switch allow/disallow to create `SparkContext` in executors.

- `spark.driver.allowSparkContextInExecutors`

### Why are the changes needed?

Some users or libraries actually create `SparkContext` in executors.
We shouldn't break their workloads.

### Does this PR introduce _any_ user-facing change?

Yes, users will be able to create `SparkContext` in executors with the config enabled.

### How was this patch tested?

More tests are added.

Closes #29278 from ueshin/issues/SPARK-32160/add_configs.

Authored-by: Takuya UESHIN <ueshin@databricks.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-07-31 17:28:35 +09:00
Kousuke Saruta 9d7b1d935f [SPARK-32175][SPARK-32175][FOLLOWUP] Remove flaky test added in
### What changes were proposed in this pull request?

This PR removes a test added in SPARK-32175(#29002).

### Why are the changes needed?

That test is flaky. It can be mitigated by increasing the timeout but it would rather be simpler to remove the test.
See also the [discussion](https://github.com/apache/spark/pull/29002#issuecomment-666746857).

### Does this PR introduce _any_ user-facing change?

No.

Closes #29314 from sarutak/remove-flaky-test.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Kousuke Saruta <sarutak@oss.nttdata.com>
2020-07-31 10:37:05 +09:00
Devesh Agrawal 6032c5b032 [SPARK-32417] Fix flakyness of BlockManagerDecommissionIntegrationSuite
### What changes were proposed in this pull request?

This test tries to fix the flakyness of BlockManagerDecommissionIntegrationSuite.

### Description of the problem

Make the block manager decommissioning test be less flaky

An interesting failure happens when migrateDuring = true (and persist or shuffle is true):
- We schedule the job with tasks on executors 0, 1, 2.
- We wait 300 ms and decommission executor 0.
- If the task is not yet done on executor 0, it will now fail because
   the block manager won't be able to save the block. This condition is
   easy to trigger on a loaded machine where the github checks run.
- The task with retry on a different executor (1 or 2) and its shuffle
   blocks will land there.
- No actual block migration happens here because the decommissioned
   executor technically failed before it could even produce a block.

To remove the above race, this change replaces the fixed wait for 300 ms to wait for an actual task to succeed. When a task has succeeded, we know its blocks would have been written for sure and thus its executor would certainly be forced to migrate those blocks when it is decommissioned.

The change always decommissions an executor on which a real task finished successfully instead of picking the first executor. Because the system may choose to schedule nothing on the first executor and instead run the two tasks on one executor.

### Why are the changes needed?

I have had bad luck with BlockManagerDecommissionIntegrationSuite and it has failed several times on my PRs. So fixing it.

### Does this PR introduce _any_ user-facing change?

No, unit test only change.

### How was this patch tested?

Github checks. Ran this test 100 times, 10 at a time in parallel in a script.

Closes #29226 from agrawaldevesh/block-manager-decom-flaky.

Authored-by: Devesh Agrawal <devesh.agrawal@gmail.com>
Signed-off-by: Holden Karau <hkarau@apple.com>
2020-07-30 12:00:19 -07:00
Devesh Agrawal 366a178933 [SPARK-32199][SPARK-32198] Reduce job failures during decommissioning
### What changes were proposed in this pull request?

This PR reduces the prospect of a job loss during decommissioning. It
fixes two holes in the current decommissioning framework:

- (a) Loss of decommissioned executors is not treated as a job failure:
We know that the decommissioned executor would be dying soon, so its death is
clearly not caused by the application.

- (b) Shuffle files on the decommissioned host are cleared when the
first fetch failure is detected from a decommissioned host: This is a
bit tricky in terms of when to clear the shuffle state ? Ideally you
want to clear it the millisecond before the shuffle service on the node
dies (or the executor dies when there is no external shuffle service) --
too soon and it could lead to some wastage and too late would lead to
fetch failures.

  The approach here is to do this clearing when the very first fetch
failure is observed on the decommissioned block manager, without waiting for
other blocks to also signal a failure.

### Why are the changes needed?

Without them decommissioning a lot of executors at a time leads to job failures.

### Code overview

The task scheduler tracks the executors that were decommissioned along with their
`ExecutorDecommissionInfo`. This information is used by: (a) For handling a `ExecutorProcessLost` error, or (b) by the `DAGScheduler` when handling a fetch failure.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Added a new unit test `DecommissionWorkerSuite` to test the new behavior by exercising the Master-Worker decommissioning. I chose to add a new test since the setup logic was quite different from the existing `WorkerDecommissionSuite`. I am open to changing the name of the newly added test suite :-)

### Questions for reviewers
- Should I add a feature flag to guard these two behaviors ? They seem safe to me that they should only get triggered by decommissioning, but you never know :-).

Closes #29014 from agrawaldevesh/decom_harden.

Authored-by: Devesh Agrawal <devesh.agrawal@gmail.com>
Signed-off-by: Holden Karau <hkarau@apple.com>
2020-07-30 11:58:11 -07:00
Dongjoon Hyun 7cf3b54a2a [SPARK-32489][CORE] Pass core module UTs in Scala 2.13
### What changes were proposed in this pull request?

So far, we fixed many stuffs in `core` module. This PR fixes the remaining UT failures in Scala 2.13.

- `OneApplicationResource.environmentInfo` will return a deterministic result for `sparkProperties`, `hadoopProperties`, `systemProperties`, and `classpathEntries`.
- `SubmitRestProtocolSuite` has Scala 2.13 answer in addition to the existing Scala 2.12 answer, and uses the expected answer based on the Scala runtime version.

### Why are the changes needed?

To support Scala 2.13.

### Does this PR introduce _any_ user-facing change?

Yes, `environmentInfo` is changed, but this fixes the indeterministic behavior.

### How was this patch tested?

- Scala 2.12: Pass the Jenkins or GitHub Action
- Scala 2.13: Do the following.
```
$ dev/change-scala-version.sh 2.13
$ build/mvn test -pl core --am -Pscala-2.13
```

**BEFORE**
```
Tests: succeeded 2612, failed 3, canceled 1, ignored 8, pending 0
*** 3 TESTS FAILED ***
```

**AFTER**
```
Tests: succeeded 2615, failed 0, canceled 1, ignored 8, pending 0
All tests passed.
```

Closes #29298 from dongjoon-hyun/SPARK-32489.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-07-30 10:59:26 -07:00
Dongjoon Hyun 163867435a [SPARK-32487][CORE] Remove j.w.r.NotFoundException from import in [Stages|OneApplication]Resource
### What changes were proposed in this pull request?

This PR aims to remove `java.ws.rs.NotFoundException` from two problematic `import` statements. All the other use cases are correct.

### Why are the changes needed?

In `StagesResource` and `OneApplicationResource`, there exist two `NotFoundException`s.
- javax.ws.rs.NotFoundException
- org.apache.spark.status.api.v1.NotFoundException

To use `org.apache.spark.status.api.v1.NotFoundException` correctly, we should not import `java.ws.rs.NotFoundException`. This causes UT failures in Scala 2.13 environment.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

- Scala 2.12: Pass the GitHub Action or Jenkins.
- Scala 2.13: Do the following manually.
```
$ dev/change-scala-version.sh 2.13
$ build/mvn test -pl core --am -Pscala-2.13 -Dtest=none -DwildcardSuites=org.apache.spark.deploy.history.HistoryServerSuite
```

**BEFORE**
```
*** 4 TESTS FAILED ***
```

**AFTER**
```
*** 1 TEST FAILED ***
```

Closes #29293 from dongjoon-hyun/SPARK-32487.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-07-29 17:57:46 -07:00
Dongjoon Hyun 9dc0237851 [SPARK-32476][CORE] ResourceAllocator.availableAddrs should be deterministic
### What changes were proposed in this pull request?

This PR aims to make `ResourceAllocator.availableAddrs` deterministic.

### Why are the changes needed?

Currently, this function returns indeterministically due to the underlying `HashMap`. So, the test case itself is creating a list `[0, 1, 2]` initially, but ends up with comparing `[2, 1, 0]`.

Not only this happens in the 3.0.0, but also this causes UT failures on Scala 2.13 environment.

### Does this PR introduce _any_ user-facing change?

Yes, but this fixes the in-deterministic behavior.

### How was this patch tested?

- Scala 2.12: This should pass the UT with the modified test case.
- Scala 2.13: This can be tested like the following (at least `JsonProtocolSuite`)

```
$ dev/change-scala-version.sh 2.13
$ build/mvn test -pl core --am -Pscala-2.13 -Dtest=none -DwildcardSuites=org.apache.spark.deploy.JsonProtocolSuite
```

**BEFORE**
```
*** 2 TESTS FAILED ***
```

**AFTER**
```
All tests passed.
```

Closes #29281 from dongjoon-hyun/SPARK-32476.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-07-29 10:47:32 -07:00
Dongjoon Hyun 5eab8d27e6 [SPARK-32477][CORE] JsonProtocol.accumulablesToJson should be deterministic
### What changes were proposed in this pull request?

This PR aims to make `JsonProtocol.accumulablesToJson` deterministic.

### Why are the changes needed?

Currently, `JsonProtocol.accumulablesToJson` is indeterministic. So, `JsonProtocolSuite` itself is also using mixed test cases in terms of `"Accumulables": [ ... ]`.

Not only this is indeterministic, but also this causes a UT failure in `JsonProtocolSuite` in Scala 2.13.

### Does this PR introduce _any_ user-facing change?

Yes. However, this is a fix on indeterministic behavior.

### How was this patch tested?

- Scala 2.12: Pass the GitHub Action or Jenkins.
- Scala 2.13: Do the following.
```
$ dev/change-scala-version.sh 2.13
$ build/mvn test -pl core --am -Pscala-2.13 -Dtest=none -DwildcardSuites=org.apache.spark.util.JsonProtocolSuite
```

**BEFORE**
```
*** 1 TEST FAILED ***
```

**AFTER**
```
All tests passed.
```

Closes #29282 from dongjoon-hyun/SPARK-32477.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-07-29 07:48:23 -07:00
Kousuke Saruta 9be088357e [SPARK-32175][CORE] Fix the order between initialization for ExecutorPlugin and starting heartbeat thread
### What changes were proposed in this pull request?

This PR changes the order between initialization for ExecutorPlugin and starting heartbeat thread in Executor.

### Why are the changes needed?

In the current master, heartbeat thread in a executor starts after plugin initialization so if the initialization takes long time, heartbeat is not sent to driver and the executor will be removed from cluster.

### Does this PR introduce _any_ user-facing change?

Yes. Plugins for executors will be allowed to take long time for initialization.

### How was this patch tested?

New testcase.

Closes #29002 from sarutak/fix-heartbeat-issue.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Thomas Graves <tgraves@apache.org>
2020-07-29 08:44:56 -05:00
Dongjoon Hyun 77987a222c [SPARK-32473][CORE][TESTS] Use === instead IndexSeqView
### What changes were proposed in this pull request?

This PR aims to fix `SorterSuite` and `RadixSortSuite` in Scala 2.13 by using `===` instead of `IndexSeqView`.
```
$ git grep "\.view =="
core/src/test/scala/org/apache/spark/util/collection/SorterSuite.scala:    assert(data0.view === data1.view)
core/src/test/scala/org/apache/spark/util/collection/SorterSuite.scala:    assert(data0.view === data2.view)
core/src/test/scala/org/apache/spark/util/collection/unsafe/sort/RadixSortSuite.scala:      assert(ref.view == result.view)
core/src/test/scala/org/apache/spark/util/collection/unsafe/sort/RadixSortSuite.scala:      assert(res1.view == res2.view)
core/src/test/scala/org/apache/spark/util/collection/unsafe/sort/RadixSortSuite.scala:      assert(ref.view == result.view)
core/src/test/scala/org/apache/spark/util/collection/unsafe/sort/RadixSortSuite.scala:      assert(res1.view == res2.view)
```

### Why are the changes needed?

Scala 2.13 reimplements `IndexSeqView` and the behavior is different.
- https://docs.scala-lang.org/overviews/core/collections-migration-213.html

**Scala 2.12**
```scala
Welcome to Scala 2.12.10 (OpenJDK 64-Bit Server VM, Java 1.8.0_262).
Type in expressions for evaluation. Or try :help.

scala> Seq(1,2,3).toArray.view == Seq(1,2,3).toArray.view
res0: Boolean = true
```

**Scala 2.13**
```scala
Welcome to Scala 2.13.3 (OpenJDK 64-Bit Server VM, Java 1.8.0_262).
Type in expressions for evaluation. Or try :help.

scala> Seq(1,2,3).toArray.view == Seq(1,2,3).toArray.view
val res0: Boolean = false
```

### Does this PR introduce _any_ user-facing change?

No. This is a test-only fix.

### How was this patch tested?

- Scala 2.12: Pass the GitHub Action or Jenkins.
- Scala 2.13: Manually test the following.
```
$ dev/change-scala-version.sh 2.13
$ build/mvn test -pl core --am -Pscala-2.13 -Dtest=none -DwildcardSuites=org.apache.spark.util.collection.unsafe.sort.RadixSortSuite
$ build/mvn test -pl core --am -Pscala-2.13 -Dtest=none -DwildcardSuites=org.apache.spark.util.collection.SorterSuite
```

**BEFORE**
```
Tests: succeeded 9, failed 36, canceled 0, ignored 0, pending 0
*** 36 TESTS FAILED ***
Tests: succeeded 3, failed 1, canceled 0, ignored 2, pending 0
*** 1 TEST FAILED ***
```

**AFTER**
```
Tests: succeeded 45, failed 0, canceled 0, ignored 0, pending 0
All tests passed.
Tests: succeeded 4, failed 0, canceled 0, ignored 2, pending 0
All tests passed.
```

Closes #29280 from dongjoon-hyun/SPARK-32473.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-07-28 22:31:39 -07:00
LantaoJin 26e6574d58 [SPARK-32283][CORE] Kryo should support multiple user registrators
### What changes were proposed in this pull request?
`spark.kryo.registrator` in 3.0 has a regression problem. From [SPARK-12080](https://issues.apache.org/jira/browse/SPARK-12080), it supports multiple user registrators by
```scala
private val userRegistrators = conf.get("spark.kryo.registrator", "")
    .split(',').map(_.trim)
    .filter(!_.isEmpty)
```
But it donsn't work in 3.0. Fix it by `toSequence` in `Kryo.scala`

### Why are the changes needed?
In previous Spark version (2.x), it supported multiple user registrators by
```scala
private val userRegistrators = conf.get("spark.kryo.registrator", "")
    .split(',').map(_.trim)
    .filter(!_.isEmpty)
```
But it doesn't work in 3.0. It's should be a regression.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Existed unit tests.

Closes #29123 from LantaoJin/SPARK-32283.

Authored-by: LantaoJin <jinlantao@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2020-07-29 03:58:03 +00:00
HyukjinKwon c1140661bf [SPARK-32443][CORE] Use POSIX-compatible command -v in testCommandAvailable
### What changes were proposed in this pull request?

This PR aims to use `command -v` in non-Window operating systems instead of executing the given command.

### Why are the changes needed?

1. `command` is POSIX-compatible
    - **POSIX.1-2017**:  https://pubs.opengroup.org/onlinepubs/9699919799/utilities/command.html
2. `command` is faster and safer than the direct execution
    - `command` doesn't invoke another process.
```scala
scala> sys.process.Process("ls").run().exitValue()
LICENSE
NOTICE
bin
doc
lib
man
res1: Int = 0
```

3. The existing way behaves inconsistently.
    - `rm` cannot be checked.

**AS-IS**
```scala
scala> sys.process.Process("rm").run().exitValue()
usage: rm [-f | -i] [-dPRrvW] file ...
       unlink file
res0: Int = 64
```

**TO-BE**
```
Welcome to Scala 2.13.3 (OpenJDK 64-Bit Server VM, Java 1.8.0_262).
Type in expressions for evaluation. Or try :help.
scala> sys.process.Process(Seq("sh", "-c", s"command -v ls")).run().exitValue()
/bin/ls
val res1: Int = 0
```

4. The existing logic is already broken in Scala 2.13 environment because it hangs like the following.
```scala
$ bin/scala
Welcome to Scala 2.13.3 (OpenJDK 64-Bit Server VM, Java 1.8.0_262).
Type in expressions for evaluation. Or try :help.

scala> sys.process.Process("cat").run().exitValue() // hang here.
```

### Does this PR introduce _any_ user-facing change?

No. Although this is inside `main` source directory, this is used for testing purpose.

```
$ git grep testCommandAvailable | grep -v 'def testCommandAvailable'
core/src/test/scala/org/apache/spark/rdd/PipedRDDSuite.scala:    assume(TestUtils.testCommandAvailable("cat"))
core/src/test/scala/org/apache/spark/rdd/PipedRDDSuite.scala:    assume(TestUtils.testCommandAvailable("wc"))
core/src/test/scala/org/apache/spark/rdd/PipedRDDSuite.scala:    assume(TestUtils.testCommandAvailable("cat"))
core/src/test/scala/org/apache/spark/rdd/PipedRDDSuite.scala:    assume(TestUtils.testCommandAvailable("cat"))
core/src/test/scala/org/apache/spark/rdd/PipedRDDSuite.scala:    assume(TestUtils.testCommandAvailable("cat"))
core/src/test/scala/org/apache/spark/rdd/PipedRDDSuite.scala:    assume(TestUtils.testCommandAvailable(envCommand))
core/src/test/scala/org/apache/spark/rdd/PipedRDDSuite.scala:    assume(!TestUtils.testCommandAvailable("some_nonexistent_command"))
core/src/test/scala/org/apache/spark/rdd/PipedRDDSuite.scala:    assume(TestUtils.testCommandAvailable("cat"))
core/src/test/scala/org/apache/spark/rdd/PipedRDDSuite.scala:    assume(TestUtils.testCommandAvailable("cat"))
core/src/test/scala/org/apache/spark/rdd/PipedRDDSuite.scala:    assume(TestUtils.testCommandAvailable(envCommand))
sql/core/src/test/scala/org/apache/spark/sql/IntegratedUDFTestUtils.scala:  private lazy val isPythonAvailable: Boolean = TestUtils.testCommandAvailable(pythonExec)
sql/core/src/test/scala/org/apache/spark/sql/IntegratedUDFTestUtils.scala:    if (TestUtils.testCommandAvailable(pythonExec)) {
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala:    skip = !TestUtils.testCommandAvailable("/bin/bash"))
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala:    skip = !TestUtils.testCommandAvailable("/bin/bash"))
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala:    skip = !TestUtils.testCommandAvailable("/bin/bash"))
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala:    skip = !TestUtils.testCommandAvailable("/bin/bash"))
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala:    skip = !TestUtils.testCommandAvailable("/bin/bash"))
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala:    skip = !TestUtils.testCommandAvailable("/bin/bash"))
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala:    assume(TestUtils.testCommandAvailable("/bin/bash"))
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala:    skip = !TestUtils.testCommandAvailable("/bin/bash"))
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala:    skip = !TestUtils.testCommandAvailable("/bin/bash"))
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveScriptTransformationSuite.scala:    assume(TestUtils.testCommandAvailable("/bin/bash"))
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveScriptTransformationSuite.scala:    assume(TestUtils.testCommandAvailable("/bin/bash"))
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveScriptTransformationSuite.scala:    assume(TestUtils.testCommandAvailable("/bin/bash"))
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveScriptTransformationSuite.scala:    assume(TestUtils.testCommandAvailable("/bin/bash"))
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveScriptTransformationSuite.scala:    assume(TestUtils.testCommandAvailable("/bin/bash"))
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveScriptTransformationSuite.scala:    assume(TestUtils.testCommandAvailable("python"))
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveScriptTransformationSuite.scala:    assume(TestUtils.testCommandAvailable("/bin/bash"))
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveScriptTransformationSuite.scala:    assume(TestUtils.testCommandAvailable("/bin/bash"))
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala:      assume(TestUtils.testCommandAvailable("/bin/bash"))
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala:      assume(TestUtils.testCommandAvailable("echo | sed"))
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala:      assume(TestUtils.testCommandAvailable("/bin/bash"))
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala:      assume(TestUtils.testCommandAvailable("/bin/bash"))
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala:      assume(TestUtils.testCommandAvailable("/bin/bash"))
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala:      assume(TestUtils.testCommandAvailable("/bin/bash"))
```

### How was this patch tested?

- **Scala 2.12**: Pass the Jenkins with the existing tests and one modified test.
- **Scala 2.13**: Do the following manually. It should pass instead of `hang`.
```
$ dev/change-scala-version.sh 2.13
$ build/mvn test -pl core --am -Pscala-2.13 -Dtest=none -DwildcardSuites=org.apache.spark.rdd.PipedRDDSuite
...
Tests: succeeded 12, failed 0, canceled 0, ignored 0, pending 0
All tests passed.
```

Closes #29241 from dongjoon-hyun/SPARK-32443.

Lead-authored-by: HyukjinKwon <gurwls223@apache.org>
Co-authored-by: Dongjoon Hyun <dongjoon@apache.org>
Co-authored-by: Hyukjin Kwon <gurwls223@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-07-27 12:02:43 -07:00
Warren Zhu 998086c9a1 [SPARK-30794][CORE] Stage Level scheduling: Add ability to set off heap memory
### What changes were proposed in this pull request?
Support set off heap memory in `ExecutorResourceRequests`

### Why are the changes needed?
Support stage level scheduling

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Added UT in `ResourceProfileSuite` and `DAGSchedulerSuite`

Closes #28972 from warrenzhu25/30794.

Authored-by: Warren Zhu <zhonzh@microsoft.com>
Signed-off-by: Thomas Graves <tgraves@apache.org>
2020-07-27 08:16:13 -05:00
Dongjoon Hyun 7e0c5b3b53 [SPARK-32442][CORE][TESTS] Fix TaskSetManagerSuite by hiding o.a.s.FakeSchedulerBackend
### What changes were proposed in this pull request?

There exists two `FakeSchedulerBackend` classes.
```
$ git grep "class FakeSchedulerBackend"
core/src/test/scala/org/apache/spark/HeartbeatReceiverSuite.scala:private class FakeSchedulerBackend(
core/src/test/scala/org/apache/spark/scheduler/TaskSchedulerImplSuite.scala:class FakeSchedulerBackend extends SchedulerBackend {
```

This PR aims to hide the following at `TaskSetManagerSuite`.
```scala
import org.apache.spark.{FakeSchedulerBackend => _, _}
```

### Why are the changes needed?

Although `TaskSetManagerSuite` is inside `org.apache.spark.scheduler` package, `import org.apache.spark._` makes Scala 2.13 confused and causes 4 UT failures.
```
$ dev/change-scala-version.sh 2.13
$ build/mvn test -pl core --am -Pscala-2.13 -Dtest=none -DwildcardSuites=org.apache.spark.scheduler.TaskSetManagerSuite
...
Tests: succeeded 48, failed 4, canceled 0, ignored 0, pending 0
*** 4 TESTS FAILED ***
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

- **Scala 2.12**: Pass the Jenkins or GitHub Action
- **Scala 2.13**: Pass the following manually.
```
$ dev/change-scala-version.sh 2.13
$ build/mvn test -pl core --am -Pscala-2.13 -Dtest=none -DwildcardSuites=org.apache.spark.scheduler.TaskSetManagerSuite
...
Tests: succeeded 52, failed 0, canceled 0, ignored 0, pending 0
All tests passed.
```

Closes #29240 from dongjoon-hyun/SPARK-32442.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-07-26 07:54:30 -07:00
Dongjoon Hyun 147022a5c6 [SPARK-32440][CORE][TESTS] Make BlockManagerSuite robust from Scala object size difference
### What changes were proposed in this pull request?

This PR aims to increase the memory parameter in `BlockManagerSuite`'s worker decommission test cases.

### Why are the changes needed?

Scala 2.13 generates different Java objects and this affects Spark's `SizeEstimator/SizeTracker/SizeTrackingVector`. This causes UT failures like the following. If we decrease the values, those test cases fails in Scala 2.12, too.

```
$ dev/change-scala-version.sh 2.13
$ build/mvn test -pl core --am -Pscala-2.13 -Dtest=none -DwildcardSuites=org.apache.spark.storage.BlockManagerSuite
...
- test decommission block manager should not be part of peers *** FAILED ***
  0 did not equal 2 (BlockManagerSuite.scala:1869)
- test decommissionRddCacheBlocks should offload all cached blocks *** FAILED ***
  0 did not equal 2 (BlockManagerSuite.scala:1884)
...
Tests: succeeded 81, failed 2, canceled 0, ignored 0, pending 0
*** 2 TESTS FAILED ***
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

```
$ dev/change-scala-version.sh 2.13
$ build/mvn test -pl core --am -Pscala-2.13 -Dtest=none -DwildcardSuites=org.apache.spark.storage.BlockManagerSuite
...
Tests: succeeded 83, failed 0, canceled 0, ignored 0, pending 0
All tests passed.
```

Closes #29238 from dongjoon-hyun/SPARK-32440.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-07-25 15:54:21 -07:00
Dongjoon Hyun 80e8898158 [SPARK-32438][CORE][TESTS] Use HashMap.withDefaultValue in RDDSuite
### What changes were proposed in this pull request?

Since Scala 2.13, `HashMap` is changed to become a final in the future and `.withDefault` is recommended. This PR aims to use `HashMap.withDefaultValue` instead of overriding manually in the test case.

- https://www.scala-lang.org/api/current/scala/collection/mutable/HashMap.html

```scala
deprecatedInheritance(message =
"HashMap wil be made final; use .withDefault for the common use case of computing a default value",
since = "2.13.0")
```

### Why are the changes needed?

In Scala 2.13, the existing code causes a failure because the default value function doesn't work correctly.

```
$ dev/change-scala-version.sh 2.13
$ build/mvn test -pl core --am -Pscala-2.13 -Dtest=none -DwildcardSuites=org.apache.spark.rdd.RDDSuite
- aggregate *** FAILED ***
  org.apache.spark.SparkException: Job aborted due to stage failure:
Task 0 in stage 61.0 failed 1 times, most recent failure: Lost task 0.0 in stage 61.0 (TID 198, localhost, executor driver):
java.util.NoSuchElementException: key not found: a
```

### Does this PR introduce _any_ user-facing change?

No. This is a test case change.

### How was this patch tested?

1. **Scala 2.12:** Pass the Jenkins or GitHub with the existing tests.
2. **Scala 2.13**: Manually do the following.
```
$ dev/change-scala-version.sh 2.13
$ build/mvn test -pl core --am -Pscala-2.13 -Dtest=none -DwildcardSuites=org.apache.spark.rdd.RDDSuite
...
Tests: succeeded 72, failed 0, canceled 0, ignored 0, pending 0
All tests passed.
```

Closes #29235 from dongjoon-hyun/SPARK-32438.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-07-25 10:52:55 -07:00
Dongjoon Hyun f9f18673dc [SPARK-32436][CORE] Initialize numNonEmptyBlocks in HighlyCompressedMapStatus.readExternal
### What changes were proposed in this pull request?

This PR aims to initialize `numNonEmptyBlocks` in `HighlyCompressedMapStatus.readExternal`.

In Scala 2.12, this is initialized to `-1` via the following.
```scala
protected def this() = this(null, -1, null, -1, null, -1)  // For deserialization only
```

### Why are the changes needed?

In Scala 2.13, this causes several UT failures because `HighlyCompressedMapStatus.readExternal` doesn't initialize this field. The following is one example.

- org.apache.spark.scheduler.MapStatusSuite
```
MapStatusSuite:
- compressSize
- decompressSize
*** RUN ABORTED ***
  java.lang.NoSuchFieldError: numNonEmptyBlocks
  at org.apache.spark.scheduler.HighlyCompressedMapStatus.<init>(MapStatus.scala:181)
  at org.apache.spark.scheduler.HighlyCompressedMapStatus$.apply(MapStatus.scala:281)
  at org.apache.spark.scheduler.MapStatus$.apply(MapStatus.scala:73)
  at org.apache.spark.scheduler.MapStatusSuite.$anonfun$new$8(MapStatusSuite.scala:64)
  at scala.runtime.java8.JFunction1$mcVD$sp.apply(JFunction1$mcVD$sp.scala:18)
  at scala.collection.immutable.List.foreach(List.scala:333)
  at org.apache.spark.scheduler.MapStatusSuite.$anonfun$new$7(MapStatusSuite.scala:61)
  at scala.runtime.java8.JFunction1$mcVJ$sp.apply(JFunction1$mcVJ$sp.scala:18)
  at scala.collection.immutable.List.foreach(List.scala:333)
  at org.apache.spark.scheduler.MapStatusSuite.$anonfun$new$6(MapStatusSuite.scala:60)
  ...
```

### Does this PR introduce _any_ user-facing change?

No. This is a private class.

### How was this patch tested?

1. Pass the GitHub Action or Jenkins with the existing tests.
2. Test with Scala-2.13 with `MapStatusSuite`.
```
$ dev/change-scala-version.sh 2.13
$ build/mvn test -pl core --am -Pscala-2.13 -Dtest=none -DwildcardSuites=org.apache.spark.scheduler.MapStatusSuite
...
MapStatusSuite:
- compressSize
- decompressSize
- MapStatus should never report non-empty blocks' sizes as 0
- large tasks should use org.apache.spark.scheduler.HighlyCompressedMapStatus
- HighlyCompressedMapStatus: estimated size should be the average non-empty block size
- SPARK-22540: ensure HighlyCompressedMapStatus calculates correct avgSize
- RoaringBitmap: runOptimize succeeded
- RoaringBitmap: runOptimize failed
- Blocks which are bigger than SHUFFLE_ACCURATE_BLOCK_THRESHOLD should not be underestimated.
- SPARK-21133 HighlyCompressedMapStatus#writeExternal throws NPE
Run completed in 7 seconds, 971 milliseconds.
Total number of tests run: 10
Suites: completed 2, aborted 0
Tests: succeeded 10, failed 0, canceled 0, ignored 0, pending 0
All tests passed.
```

Closes #29231 from dongjoon-hyun/SPARK-32436.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-07-25 10:16:01 -07:00
Dongjoon Hyun f642234d85 [SPARK-32437][CORE] Improve MapStatus deserialization speed with RoaringBitmap 0.9.0
### What changes were proposed in this pull request?

This PR aims to speed up `MapStatus` deserialization by 5~18% with the latest RoaringBitmap `0.9.0` and new APIs. Note that we focus on `deserialization` time because `serialization` occurs once while `deserialization` occurs many times.

### Why are the changes needed?

The current version is too old. We had better upgrade it to get the performance improvement and bug fixes.
Although `MapStatusesSerDeserBenchmark` is synthetic, the benchmark result is updated with this patch.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the Jenkins or GitHub Action.

Closes #29233 from dongjoon-hyun/SPARK-ROAR.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-07-25 08:07:28 -07:00
Gabor Somogyi b890fdc8df [SPARK-32387][SS] Extract UninterruptibleThread runner logic from KafkaOffsetReader
### What changes were proposed in this pull request?
`UninterruptibleThread` running functionality is baked into `KafkaOffsetReader` which can be extracted into a class. The main intention is to simplify `KafkaOffsetReader` in order to make easier to solve SPARK-32032. In this PR I've made this extraction without functionality change.

### Why are the changes needed?
`UninterruptibleThread` running functionality is baked into `KafkaOffsetReader`.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Existing + additional unit tests.

Closes #29187 from gaborgsomogyi/SPARK-32387.

Authored-by: Gabor Somogyi <gabor.g.somogyi@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-07-24 11:41:42 -07:00
Thomas Graves e6ef27be52 [SPARK-32287][TESTS] Flaky Test: ExecutorAllocationManagerSuite.add executors default profile
### What changes were proposed in this pull request?

I wasn't able to reproduce the failure but the best I can tell is that the allocation manager timer triggers and call doRequest. The timeout is 10s so try to increase that to 30seconds.

### Why are the changes needed?

test failure

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

unit test

Closes #29225 from tgravescs/SPARK-32287.

Authored-by: Thomas Graves <tgraves@nvidia.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-07-24 11:12:28 -07:00
Sean Owen be2eca22e9 [SPARK-32398][TESTS][CORE][STREAMING][SQL][ML] Update to scalatest 3.2.0 for Scala 2.13.3+
### What changes were proposed in this pull request?

Updates to scalatest 3.2.0. Though it looks large, it is 99% changes to the new location of scalatest classes.

### Why are the changes needed?

3.2.0+ has a fix that is required for Scala 2.13.3+ compatibility.

### Does this PR introduce _any_ user-facing change?

No, only affects tests.

### How was this patch tested?

Existing tests.

Closes #29196 from srowen/SPARK-32398.

Authored-by: Sean Owen <srowen@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-07-23 16:20:17 -07:00
Venkata krishnan Sowrirajan e7fb67cd88 [SPARK-31418][SCHEDULER] Request more executors in case of dynamic allocation is enabled and a task becomes unschedulable due to spark's blacklisting feature
### What changes were proposed in this pull request?
In this change, when dynamic allocation is enabled instead of aborting immediately when there is an unschedulable taskset due to blacklisting, pass an event saying `SparkListenerUnschedulableTaskSetAdded` which will be handled by `ExecutorAllocationManager` and request more executors needed to schedule the unschedulable blacklisted tasks. Once the event is sent, we start the abortTimer similar to [SPARK-22148][SPARK-15815] to abort in the case when no new executors launched either due to max executors reached or cluster manager is out of capacity.

### Why are the changes needed?
This is an improvement. In the case when dynamic allocation is enabled, this would request more executors to schedule the unschedulable tasks instead of aborting the stage without even retrying upto spark.task.maxFailures times (in some cases not retrying at all). This is a potential issue with respect to Spark's Fault tolerance.

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
Added unit tests both in ExecutorAllocationManagerSuite and TaskSchedulerImplSuite

Closes #28287 from venkata91/SPARK-31418.

Authored-by: Venkata krishnan Sowrirajan <vsowrirajan@linkedin.com>
Signed-off-by: Thomas Graves <tgraves@apache.org>
2020-07-23 12:33:22 -05:00
Devesh Agrawal f8d29d371c [SPARK-32217] Plumb whether a worker would also be decommissioned along with executor
### What changes were proposed in this pull request?

This PR is a giant plumbing PR that plumbs an `ExecutorDecommissionInfo` along
with the DecommissionExecutor message.

### Why are the changes needed?

The primary motivation is to know whether a decommissioned executor
would also be loosing shuffle files -- and thus it is important to know
whether the host would also be decommissioned.

In the absence of this PR, the existing code assumes that decommissioning an executor does not loose the whole host with it, and thus does not clear the shuffle state if external shuffle service is enabled. While this may hold in some cases (like K8s decommissioning an executor pod, or YARN container preemption), it does not hold in others like when the cluster is managed by a Standalone Scheduler (Master). This is similar to the existing `workerLost` field in the `ExecutorProcessLost` message.

In the future, this `ExecutorDecommissionInfo` can be embellished for
knowing how long the executor has to live for scenarios like Cloud spot
kills (or Yarn preemption) and the like.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Tweaked an existing unit test in `AppClientSuite`

Closes #29032 from agrawaldevesh/plumb_decom_info.

Authored-by: Devesh Agrawal <devesh.agrawal@gmail.com>
Signed-off-by: Holden Karau <hkarau@apple.com>
2020-07-22 21:04:06 -07:00
Wing Yew Poon e8c06af7d1 [SPARK-32003][CORE] When external shuffle service is used, unregister outputs for executor on fetch failure after executor is lost
### What changes were proposed in this pull request?

If an executor is lost, the `DAGScheduler` handles the executor loss by removing the executor but does not unregister its outputs if the external shuffle service is used. However, if the node on which the executor runs is lost, the shuffle service may not be able to serve the shuffle files.
In such a case, when fetches from the executor's outputs fail in the same stage, the `DAGScheduler` again removes the executor and by right, should unregister its outputs. It doesn't because the epoch used to track the executor failure has not increased.

We track the epoch for failed executors that result in lost file output separately, so we can unregister the outputs in this scenario. The idea to track a second epoch is due to Attila Zsolt Piros.

### Why are the changes needed?

Without the changes, the loss of a node could require two stage attempts to recover instead of one.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

New unit test. This test fails without the change and passes with it.

Closes #28848 from wypoon/SPARK-32003.

Authored-by: Wing Yew Poon <wypoon@cloudera.com>
Signed-off-by: Imran Rashid <irashid@cloudera.com>
2020-07-22 09:53:16 -05:00
Max Gekk feca9edbdd [MINOR][SQL][TESTS] Create tables once in JDBC tests
### What changes were proposed in this pull request?
In PR, I propose to create input tables once before executing tests in `JDBCSuite` and `JdbcRDDSuite`. Currently, the table are created before every test in the test suites.

### Why are the changes needed?
This speed up the test suites up 30-40%.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Run the modified test suites

Closes #29176 from MaxGekk/jdbc-suite-before-all.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2020-07-22 08:32:01 +00:00
yi.wu b4a9606890 [SPARK-31922][CORE] logDebug "RpcEnv already stopped" error on LocalSparkCluster shutdown
### What changes were proposed in this pull request?

Catch the `RpcEnvStoppedException` and log debug it when stop is called for a `LocalSparkCluster`.

This PR also contains two small changes to fix the potential issues.

### Why are the changes needed?

Currently, there's always "RpcEnv already stopped" error if we exit spark-shell with local-cluster mode:

```
20/06/07 14:54:18 ERROR TransportRequestHandler: Error while invoking RpcHandler#receive() for one-way message.
org.apache.spark.rpc.RpcEnvStoppedException: RpcEnv already stopped.
        at org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:167)
        at org.apache.spark.rpc.netty.Dispatcher.postOneWayMessage(Dispatcher.scala:150)
        at org.apache.spark.rpc.netty.NettyRpcHandler.receive(NettyRpcEnv.scala:691)
        at org.apache.spark.network.server.TransportRequestHandler.processOneWayMessage(TransportRequestHandler.java:253)
        at org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:111)
        at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:140)
        at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:53)
        at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
        at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
        at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
        at org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:102)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
        at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
        at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
        at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163)
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:714)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:650)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:576)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)
        at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
        at java.lang.Thread.run(Thread.java:748)

```

When we call stop on `StandaloneSchedulerBackend`, the backend will firstly send `UnregisterApplication` to `Master` and then call stop on `LocalSparkCluster` immediately. On the other side, `Master` will send messages to `Worker` when it receives `UnregisterApplication`.  However, the rpcEnv of the `Worker` has been already stoped by the backend. Therefore, the error message shows when the `Worker` tries to handle the messages.

It's only an error on shutdown, users would not like to care about it. So we could hide it in debug log and this is also what we've done previously in #18547.

### Does this PR introduce _any_ user-facing change?

Yes, users will not see the error message after this PR.

### How was this patch tested?

Tested manually.

Closes #28746 from Ngone51/fix-spark-31922.

Authored-by: yi.wu <yi.wu@databricks.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-07-21 23:16:38 -07:00
Baohe Zhang 7b9d7551a6 [SPARK-32350][CORE] Add batch-write on LevelDB to improve performance of HybridStore
### What changes were proposed in this pull request?
The idea is to improve the performance of HybridStore by adding batch write support to LevelDB. #28412  introduces HybridStore. HybridStore will write data to InMemoryStore at first and use a background thread to dump data to LevelDB once the writing to InMemoryStore is completed. In the comments section of #28412 , mridulm mentioned using batch writing can improve the performance of this dumping process and he wrote the code of writeAll().

### Why are the changes needed?
I did the comparison of the HybridStore switching time between one-by-one write and batch write on an HDD disk. When the disk is free, the batch-write has around 25% improvement, and when the disk is 100% busy, the batch-write has 7x - 10x improvement.

when the disk is at 0% utilization:
| log size, jobs and tasks per job   | original switching time, with write() | switching time with writeAll() |
| ---------------------------------- | ------------------------------------- | ------------------------------ |
| 133m, 400 jobs, 100 tasks per job  | 16s                                   | 13s                            |
| 265m, 400 jobs, 200 tasks per job  | 30s                                   | 23s                            |
| 1.3g, 1000 jobs, 400 tasks per job | 136s                                  | 108s                           |

when the disk is at 100% utilization:
| log size, jobs and tasks per job  | original switching time, with write() | switching time with writeAll() |
| --------------------------------- | ------------------------------------- | ------------------------------ |
| 133m, 400 jobs, 100 tasks per job | 116s                                  | 17s                            |
| 265m, 400 jobs, 200 tasks per job | 251s                                  | 26s                            |

I also ran some write related benchmarking tests on LevelDBBenchmark.java and measured the total time of writing 1024 objects. The tests were conducted when the disk is at 0% utilization.

| Benchmark test           | with write(), ms | with writeAll(), ms |
| ------------------------ | ---------------- | ------------------- |
| randomUpdatesIndexed     | 213.06           | 157.356             |
| randomUpdatesNoIndex     | 57.869           | 35.439              |
| randomWritesIndexed      | 298.854          | 229.274             |
| randomWritesNoIndex      | 66.764           | 38.361              |
| sequentialUpdatesIndexed | 87.019           | 56.219              |
| sequentialUpdatesNoIndex | 61.851           | 41.942              |
| sequentialWritesIndexed  | 94.044           | 56.534              |
| sequentialWritesNoIndex  | 118.345          | 66.483              |

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Manually tested.

Closes #29149 from baohe-zhang/SPARK-32350.

Authored-by: Baohe Zhang <baohe.zhang@verizonmedia.com>
Signed-off-by: Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com>
2020-07-22 13:27:34 +09:00
Holden Karau a4ca355af8 [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown
### What is changed?

This pull request adds the ability to migrate shuffle files during Spark's decommissioning. The design document associated with this change is at https://docs.google.com/document/d/1xVO1b6KAwdUhjEJBolVPl9C6sLj7oOveErwDSYdT-pE .

To allow this change the `MapOutputTracker` has been extended to allow the location of shuffle files to be updated with `updateMapOutput`. When a shuffle block is put, a block update message will be sent which triggers the `updateMapOutput`.

Instead of rejecting remote puts of shuffle blocks `BlockManager` delegates the storage of shuffle blocks to it's shufflemanager's resolver (if supported). A new, experimental, trait is added for shuffle resolvers to indicate they handle remote putting of blocks.

The existing block migration code is moved out into a separate file, and a producer/consumer model is introduced for migrating shuffle files from the host as quickly as possible while not overwhelming other executors.

### Why are the changes needed?

Recomputting shuffle blocks can be expensive, we should take advantage of our decommissioning time to migrate these blocks.

### Does this PR introduce any user-facing change?

This PR introduces two new configs parameters, `spark.storage.decommission.shuffleBlocks.enabled` & `spark.storage.decommission.rddBlocks.enabled` that control which blocks should be migrated during storage decommissioning.

### How was this patch tested?

New unit test & expansion of the Spark on K8s decom test to assert that decommisioning with shuffle block migration means that the results are not recomputed even when the original executor is terminated.

This PR is a cleaned-up version of the previous WIP PR I made https://github.com/apache/spark/pull/28331 (thanks to attilapiros for his very helpful reviewing on it :)).

Closes #28708 from holdenk/SPARK-20629-copy-shuffle-data-when-nodes-are-being-shutdown-cleaned-up.

Lead-authored-by: Holden Karau <hkarau@apple.com>
Co-authored-by: Holden Karau <holden@pigscanfly.ca>
Co-authored-by: “attilapiros” <piros.attila.zsolt@gmail.com>
Co-authored-by: Attila Zsolt Piros <attilazsoltpiros@apiros-mbp16.lan>
Signed-off-by: Holden Karau <hkarau@apple.com>
2020-07-19 21:33:13 -07:00
Prakhar Jain 0678afe393 [SPARK-21040][CORE] Speculate tasks which are running on decommission executors
### What changes were proposed in this pull request?
This PR adds functionality to consider the running tasks on decommission executors based on some config.
In spark-on-cloud , we sometimes already know that an executor won't be alive for more than fix amount of time. Ex- In AWS Spot nodes, once we get the notification, we know that a node will be gone in 120 seconds.
So if the running tasks on the decommissioning executors may run beyond currentTime+120 seconds, then they are candidate for speculation.

### Why are the changes needed?
Currently when an executor is decommission, we stop scheduling new tasks on those executors but the already running tasks keeps on running on them. Based on the cloud, we might know beforehand that an executor won't be alive for more than a preconfigured time. Different cloud providers gives different timeouts before they take away the nodes. For Ex- In case of AWS spot nodes, an executor won't be alive for more than 120 seconds. We can utilize this information in cloud environments and take better decisions about speculating the already running tasks on decommission executors.

### Does this PR introduce _any_ user-facing change?
Yes. This PR adds a new config "spark.executor.decommission.killInterval" which they can explicitly set based on the cloud environment where they are running.

### How was this patch tested?
Added UT.

Closes #28619 from prakharjain09/SPARK-21040-speculate-decommission-exec-tasks.

Authored-by: Prakhar Jain <prakharjain09@gmail.com>
Signed-off-by: Holden Karau <hkarau@apple.com>
2020-07-17 16:11:02 -07:00
Devesh Agrawal ffdbbae1d4 [SPARK-32215] Expose a (protected) /workers/kill endpoint on the MasterWebUI
### What changes were proposed in this pull request?

This PR allows an external agent to inform the Master that certain hosts
are being decommissioned.

### Why are the changes needed?

The current decommissioning is triggered by the Worker getting getting a SIGPWR
(out of band possibly by some cleanup hook), which then informs the Master
about it. This approach may not be feasible in some environments that cannot
trigger a clean up hook on the Worker. In addition, when a large number of
worker nodes are being decommissioned then the master will get a flood of
messages.

So we add a new post endpoint `/workers/kill` on the MasterWebUI that allows an
external agent to inform the master about all the nodes being decommissioned in
bulk. The list of nodes is specified by providing a list of hostnames. All workers on those
hosts will be decommissioned.

This API is merely a new entry point into the existing decommissioning
logic. It does not change how the decommissioning request is handled in
its core.

### Does this PR introduce _any_ user-facing change?

Yes, a new endpoint `/workers/kill` is added to the MasterWebUI. By default only
requests originating from an IP address local to the MasterWebUI are allowed.

### How was this patch tested?

Added unit tests

Closes #29015 from agrawaldevesh/master_decom_endpoint.

Authored-by: Devesh Agrawal <devesh.agrawal@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2020-07-17 06:04:34 +00:00
Warren Zhu db47c6e340 [SPARK-32125][UI] Support get taskList by status in Web UI and SHS Rest API
### What changes were proposed in this pull request?
Support fetching taskList by status as below:
```
/applications/[app-id]/stages/[stage-id]/[stage-attempt-id]/taskList?status=failed
```

### Why are the changes needed?

When there're large number of tasks in one stage, current api is hard to get taskList by status

### Does this PR introduce _any_ user-facing change?
Yes. Updated monitoring doc.

### How was this patch tested?
Added tests in `HistoryServerSuite`

Closes #28942 from warrenzhu25/SPARK-32125.

Authored-by: Warren Zhu <zhonzh@microsoft.com>
Signed-off-by: Gengliang Wang <gengliang.wang@databricks.com>
2020-07-16 11:31:24 +08:00
Erik Krogen cf22d947fb [SPARK-32036] Replace references to blacklist/whitelist language with more appropriate terminology, excluding the blacklisting feature
### What changes were proposed in this pull request?

This PR will remove references to these "blacklist" and "whitelist" terms besides the blacklisting feature as a whole, which can be handled in a separate JIRA/PR.

This touches quite a few files, but the changes are straightforward (variable/method/etc. name changes) and most quite self-contained.

### Why are the changes needed?

As per discussion on the Spark dev list, it will be beneficial to remove references to problematic language that can alienate potential community members. One such reference is "blacklist" and "whitelist". While it seems to me that there is some valid debate as to whether these terms have racist origins, the cultural connotations are inescapable in today's world.

### Does this PR introduce _any_ user-facing change?

In the test file `HiveQueryFileTest`, a developer has the ability to specify the system property `spark.hive.whitelist` to specify a list of Hive query files that should be tested. This system property has been renamed to `spark.hive.includelist`. The old property has been kept for compatibility, but will log a warning if used. I am open to feedback from others on whether keeping a deprecated property here is unnecessary given that this is just for developers running tests.

### How was this patch tested?

Existing tests should be suitable since no behavior changes are expected as a result of this PR.

Closes #28874 from xkrogen/xkrogen-SPARK-32036-rename-blacklists.

Authored-by: Erik Krogen <ekrogen@linkedin.com>
Signed-off-by: Thomas Graves <tgraves@apache.org>
2020-07-15 11:40:55 -05:00
Baohe Zhang 90b0c26b22 [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster
### What changes were proposed in this pull request?
Add a new class HybridStore to make the history server faster when loading event files. When rebuilding the application state from event logs, HybridStore will write data to InMemoryStore at first and use a background thread to dump data to LevelDB once the writing to InMemoryStore is completed. HybridStore is to make content serving faster by using more memory. It's only safe to enable it when the cluster is not having a heavy load.

### Why are the changes needed?
HybridStore can greatly reduce the event logs loading time, especially for large log files. In general, it has 4x - 6x UI loading speed improvement for large log files. The detailed result is shown in comments.

### Does this PR introduce any user-facing change?
This PR adds new configs `spark.history.store.hybridStore.enabled` and `spark.history.store.hybridStore.maxMemoryUsage`.

### How was this patch tested?
A test suite for HybridStore is added. I also manually tested it on 3.1.0 on mac os.

This is a follow-up for the work done by Hieu Huynh in 2019.

Closes #28412 from baohe-zhang/SPARK-31608.

Authored-by: Baohe Zhang <baohe.zhang@verizonmedia.com>
Signed-off-by: Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com>
2020-07-15 07:51:13 +09:00
HyukjinKwon 4ad9bfd53b [SPARK-32138] Drop Python 2.7, 3.4 and 3.5
### What changes were proposed in this pull request?

This PR aims to drop Python 2.7, 3.4 and 3.5.

Roughly speaking, it removes all the widely known Python 2 compatibility workarounds such as `sys.version` comparison, `__future__`. Also, it removes the Python 2 dedicated codes such as `ArrayConstructor` in Spark.

### Why are the changes needed?

 1. Unsupport EOL Python versions
 2. Reduce maintenance overhead and remove a bit of legacy codes and hacks for Python 2.
 3. PyPy2 has a critical bug that causes a flaky test, SPARK-28358 given my testing and investigation.
 4. Users can use Python type hints with Pandas UDFs without thinking about Python version
 5. Users can leverage one latest cloudpickle, https://github.com/apache/spark/pull/28950. With Python 3.8+ it can also leverage C pickle.

### Does this PR introduce _any_ user-facing change?

Yes, users cannot use Python 2.7, 3.4 and 3.5 in the upcoming Spark version.

### How was this patch tested?

Manually tested and also tested in Jenkins.

Closes #28957 from HyukjinKwon/SPARK-32138.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-07-14 11:22:44 +09:00
Holden Karau 90ac9f975b [SPARK-32004][ALL] Drop references to slave
### What changes were proposed in this pull request?

This change replaces the world slave with alternatives matching the context.

### Why are the changes needed?

There is no need to call things slave, we might as well use better clearer names.

### Does this PR introduce _any_ user-facing change?

Yes, the ouput JSON does change. To allow backwards compatibility this is an additive change.
The shell scripts for starting & stopping workers are renamed, and for backwards compatibility old scripts are added to call through to the new ones while printing a deprecation message to stderr.

### How was this patch tested?

Existing tests.

Closes #28864 from holdenk/SPARK-32004-drop-references-to-slave.

Lead-authored-by: Holden Karau <hkarau@apple.com>
Co-authored-by: Holden Karau <holden@pigscanfly.ca>
Signed-off-by: Holden Karau <hkarau@apple.com>
2020-07-13 14:05:33 -07:00
angerszhu 09789ff725 [SPARK-31226][CORE][TESTS] SizeBasedCoalesce logic will lose partition
### What changes were proposed in this pull request?

When last partition's splitFile's split size is larger then  maxSize, this partition will be lost

Origin logic error like below as 1, 2, 3, 4, 5
```scala
// 5. since index = partition.size now,  jump out of the loop , then the last partition is lost since we won't call updatePartition() again.
while (index < partitions.size) {
     //  1. we assume that when index = partitions.length -1(the last partition)
      val partition = partitions(index)
      val fileSplit =
        partition.asInstanceOf[HadoopPartition].inputSplit.value.asInstanceOf[FileSplit]
      val splitSize = fileSplit.getLength
     // 2.  if  this partition's  splitSize > maxSize
      if (currentSum + splitSize < maxSize) {
        addPartition(partition, splitSize)
        index += 1
        if (index == partitions.size) {
          updateGroups
        }
      } else {
       //  3. if currentGroup.partitions.size  >0, this situation is possiable
        if (currentGroup.partitions.size == 0) {
          addPartition(partition, splitSize)
          index += 1
        } else {
        //   4. then will just call updateGroups() here first, and index won't update in group
          updateGroups
        }
      }
    }
    groups.toArray
  }
}
```
### Why are the changes needed?
Fix bug

### Does this PR introduce any user-facing change?
NO

### How was this patch tested?

Manual code review.

Closes #27988 from AngersZhuuuu/SPARK-31226.

Authored-by: angerszhu <angers.zhu@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-07-11 14:48:23 -07:00
Sean Owen 3ad4863673 [SPARK-29292][SPARK-30010][CORE] Let core compile for Scala 2.13
### What changes were proposed in this pull request?

The purpose of this PR is to partly resolve SPARK-29292, and fully resolve SPARK-30010, which should allow Spark to compile vs Scala 2.13 in Spark Core and up through GraphX (not SQL, Streaming, etc).

Note that we are not trying to determine here whether this makes Spark work on 2.13 yet, just compile, as a prerequisite for assessing test outcomes. However, of course, we need to ensure that the change does not break 2.12.

The changes are, in the main, adding .toSeq and .toMap calls where mutable collections / maps are returned as Seq / Map, which are immutable by default in Scala 2.13. The theory is that it should be a no-op for Scala 2.12 (these return themselves), and required for 2.13.

There are a few non-trivial changes highlighted below.
In particular, to get Core to compile, we need to resolve SPARK-30010 which removes a deprecated SparkConf method

### Why are the changes needed?

Eventually, we need to support a Scala 2.13 build, perhaps in Spark 3.1.

### Does this PR introduce _any_ user-facing change?

Yes, removal of the deprecated SparkConf.setAll overload, which isn't legal in Scala 2.13 anymore.

### How was this patch tested?

Existing tests. (2.13 was not _tested_; this is about getting it to compile without breaking 2.12)

Closes #28971 from srowen/SPARK-29292.1.

Authored-by: Sean Owen <srowen@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-07-11 14:34:02 -07:00
Kousuke Saruta ceaa3924cb [SPARK-32200][WEBUI] Redirect to the history page when accessed to /history on the HistoryServer without appliation id
### What changes were proposed in this pull request?

This PR proposes to change the HistoryServer to redirect to the history page when we access to /history without application id.

### Why are the changes needed?

In the current master, status code 400 will be returned when we access to /history.
So I wonder it's better to redirect to the history page for the better UX.

### Does this PR introduce _any_ user-facing change?

Yes. In the current master, if we access to /history without application id, we will see like the following page.
![history-400](https://user-images.githubusercontent.com/4736016/86649650-e9105380-c01c-11ea-93bb-78fd8d2e6f7b.png)
After this change applied, we will be redirected to the history page.

### How was this patch tested?

New test added.

Closes #29016 from sarutak/history-redirect.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-07-11 13:46:59 -07:00
HyukjinKwon b84ed4146d [SPARK-32245][INFRA] Run Spark tests in Github Actions
### What changes were proposed in this pull request?

This PR aims to run the Spark tests in Github Actions.

To briefly explain the main idea:

- Reuse `dev/run-tests.py` with SBT build
- Reuse the modules in `dev/sparktestsupport/modules.py` to test each module
- Pass the modules to test into `dev/run-tests.py` directly via `TEST_ONLY_MODULES` environment variable. For example, `pyspark-sql,core,sql,hive`.
- `dev/run-tests.py` _does not_ take the dependent modules into account but solely the specified modules to test.

Another thing to note might be `SlowHiveTest` annotation. Running the tests in Hive modules takes too much so the slow tests are extracted and it runs as a separate job. It was extracted from the actual elapsed time in Jenkins:

![Screen Shot 2020-07-09 at 7 48 13 PM](https://user-images.githubusercontent.com/6477701/87050238-f6098e80-c238-11ea-9c4a-ab505af61381.png)

So, Hive tests are separated into to jobs. One is slow test cases, and the other one is the other test cases.

_Note that_ the current GitHub Actions build virtually copies what the default PR builder on Jenkins does (without other profiles such as JDK 11, Hadoop 2, etc.). The only exception is Kinesis https://github.com/apache/spark/pull/29057/files#diff-04eb107ee163a50b61281ca08f4e4c7bR23

### Why are the changes needed?

Last week and onwards, the Jenkins machines became very unstable for many reasons:
  - Apparently, the machines became extremely slow. Almost all tests can't pass.
  - One machine (worker 4) started to have the corrupt `.m2` which fails the build.
  - Documentation build fails time to time for an unknown reason in Jenkins machine specifically. This is disabled for now at https://github.com/apache/spark/pull/29017.
  - Almost all PRs are basically blocked by this instability currently.

The advantages of using Github Actions:
  - To avoid depending on few persons who can access to the cluster.
  - To reduce the elapsed time in the build - we could split the tests (e.g., SQL, ML, CORE), and run them in parallel so the total build time will significantly reduce.
  - To control the environment more flexibly.
  - Other contributors can test and propose to fix Github Actions configurations so we can distribute this build management cost.

Note that:
- The current build in Jenkins takes _more than 7 hours_. With Github actions it takes _less than 2 hours_
- We can now control the environments especially for Python easily.
- The test and build look more stable than the Jenkins'.

### Does this PR introduce _any_ user-facing change?

No, dev-only change.

### How was this patch tested?

Tested at https://github.com/HyukjinKwon/spark/pull/4

Closes #29057 from HyukjinKwon/migrate-to-github-actions.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-07-11 13:09:06 -07:00
Pavithraramachandran d7d5bdfd79 [SPARK-32103][CORE] Support IPv6 host/port in core module
### What changes were proposed in this pull request?
In IPv6 scenario, the current logic to split hostname and port is not correct.

### Why are the changes needed?
to support IPV6 deployment scenario

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
UT and IPV6 spark deployment with yarn

Closes #28931 from PavithraRamachandran/ipv6_issue.

Authored-by: Pavithraramachandran <pavi.rams@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-07-10 13:55:20 -07:00
yi.wu 578b90cdec [SPARK-32091][CORE] Ignore timeout error when remove blocks on the lost executor
### What changes were proposed in this pull request?

This PR adds the check to see whether the executor is lost (by asking the `CoarseGrainedSchedulerBackend`) after timeout error raised in `BlockManagerMasterEndponit` due to removing blocks(e.g. RDD, broadcast, shuffle). If the executor is lost, we will ignore the error. Otherwise, throw the error.

### Why are the changes needed?

When removing blocks(e.g. RDD, broadcast, shuffle), `BlockManagerMaserEndpoint` will make RPC calls to each known `BlockManagerSlaveEndpoint` to remove the specific blocks. The PRC call sometimes could end in a timeout when the executor has been lost, but only notified the `BlockManagerMasterEndpoint` after the removing call has already happened. The timeout error could therefore fail the whole job.

In this case, we actually could just ignore the error since those blocks on the lost executor could be considered as removed already.

### Does this PR introduce _any_ user-facing change?

Yes. In case of users hits this issue, they will have the job executed successfully instead of throwing the exception.

### How was this patch tested?

Added unit tests.

Closes #28924 from Ngone51/ignore-timeout-error-for-inactive-executor.

Authored-by: yi.wu <yi.wu@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2020-07-10 13:36:29 +00:00
Takuya UESHIN cfecc2030d [SPARK-32160][CORE][PYSPARK] Disallow to create SparkContext in executors
### What changes were proposed in this pull request?

This PR proposes to disallow to create `SparkContext` in executors, e.g., in UDFs.

### Why are the changes needed?

Currently executors can create SparkContext, but shouldn't be able to create it.

```scala
sc.range(0, 1).foreach { _ =>
  new SparkContext(new SparkConf().setAppName("test").setMaster("local"))
}
```

### Does this PR introduce _any_ user-facing change?

Yes, users won't be able to create `SparkContext` in executors.

### How was this patch tested?

Addes tests.

Closes #28986 from ueshin/issues/SPARK-32160/disallow_spark_context_in_executors.

Authored-by: Takuya UESHIN <ueshin@databricks.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-07-09 15:51:56 +09:00
Jungtaek Lim (HeartSaVioR) 161cf2a126 [SPARK-32024][WEBUI][FOLLOWUP] Quick fix on test failure on missing when statements
### What changes were proposed in this pull request?

This patch fixes the test failure due to the missing when statements for destination path. Note that it didn't fail on master branch, because 245aee9 got rid of size call in destination path, but still good to not depend on 245aee9.

### Why are the changes needed?

The build against branch-3.0 / branch-2.4 starts to fail after merging SPARK-32024 (#28859) and this patch will fix it.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Ran modified UT against master / branch-3.0 / branch-2.4.

Closes #29046 from HeartSaVioR/QUICKFIX-SPARK-32024.

Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-07-09 15:26:38 +09:00
Warren Zhu d1d16d14bc [SPARK-31723][CORE][TEST] Reenable one test case in HistoryServerSuite
### What changes were proposed in this pull request?
Enable test("static relative links are prefixed with uiRoot (spark.ui.proxyBase)")

### Why are the changes needed?
In Jira, the failed test is another one test("ajax rendered relative links are prefixed with uiRoot (spark.ui.proxyBase)"). This test has been fixed in 6a895d0

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Fix UT

Closes #28970 from warrenzhu25/31723.

Authored-by: Warren Zhu <zhonzh@microsoft.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-07-08 16:45:36 -07:00
Zhen Li 8e7fc04637 [SPARK-32024][WEBUI] Update ApplicationStoreInfo.size during HistoryServerDiskManager initializing
### What changes were proposed in this pull request?

Update ApplicationStoreInfo.size to real size during HistoryServerDiskManager initializing.

### Why are the changes needed?

This PR is for fixing bug [32024](https://issues.apache.org/jira/browse/SPARK-32024). We found after history server restart, below error would randomly happen: "java.lang.IllegalStateException: Disk usage tracker went negative (now = -***, delta = -***)" from `HistoryServerDiskManager`.
![Capture](https://user-images.githubusercontent.com/10524738/85034468-fda4ae80-b136-11ea-9011-f0c3e6508002.JPG)

**Cause**: Reading data from level db would trigger table file compaction, which may also trigger size of level db directory changes.  This size change may not be recorded in LevelDB (`ApplicationStoreInfo` in `listing`). When service restarts, `currentUsage` is calculated from real directory size, but `ApplicationStoreInfo` are loaded from leveldb, then `currentUsage` may be less then sum of `ApplicationStoreInfo.size`. In `makeRoom()` function, `ApplicationStoreInfo.size` is used to update usage. Then `currentUsage` becomes negative after several round of `release()` and `lease()` (`makeRoom()`).
**Reproduce**: we can reproduce this issue in dev environment by reducing config value of "spark.history.retainedApplications" and "spark.history.store.maxDiskUsage" to some small values. Here are steps: 1. start history server, load some applications and access some pages (maybe "stages" page to trigger leveldb compaction). 2. restart HS, and refresh pages.
I also added an UT to simulate this case in `HistoryServerDiskManagerSuite`.
**Benefit**: this change would help improve history server reliability.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Add unit test and manually tested it.

Closes #28859 from zhli1142015/update-ApplicationStoreInfo.size-during-disk-manager-initialize.

Authored-by: Zhen Li <zhli@microsoft.com>
Signed-off-by: Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com>
2020-07-08 21:58:45 +09:00
Yuanjian Li 365961155a [SPARK-32124][CORE][FOLLOW-UP] Use the invalid value Int.MinValue to fill the map index when the event logs from the old Spark version
### What changes were proposed in this pull request?
Use the invalid value Int.MinValue to fill the map index when the event logs from the old Spark version.

### Why are the changes needed?
Follow up PR for #28941.

### Does this PR introduce _any_ user-facing change?
When we use the Spark version 3.0 history server reading the event log written by the old Spark version, we use the invalid value -2 to fill the map index.

### How was this patch tested?
Existing UT.

Closes #28965 from xuanyuanking/follow-up.

Authored-by: Yuanjian Li <xyliyuanjian@gmail.com>
Signed-off-by: Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com>
2020-07-08 09:36:06 +09:00
sidedoorleftroad 3fe3365292 [SPARK-32172][CORE] Use createDirectory instead of mkdir
### What changes were proposed in this pull request?

Use Files.createDirectory() to create local directory instead of File.mkdir() in DiskBlockManager.
Many times, we will see such error log information like "Failed to create local dir in xxxxxx". But there is no clear information indicating why the directory creation failed.
When Files.createDirectory() fails to create a local directory, it can give specific error information for subsequent troubleshooting(also throws IOException).

### Why are the changes needed?

Throw clear error message when creating directory fails.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

`DiskBlockManagerSuite`

Closes #28997 from sidedoorleftroad/SPARK-32172.

Authored-by: sidedoorleftroad <sidedoorleftroad@163.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-07-06 09:20:31 -07:00
Dongjoon Hyun dea7bc464d [SPARK-32100][CORE][TESTS][FOLLOWUP] Reduce the required test resources
### What changes were proposed in this pull request?

This PR aims to reduce the required test resources in WorkerDecommissionExtendedSuite.

### Why are the changes needed?

When Jenkins farms is crowded, the following failure happens currently [here](https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-3.2-hive-2.3/890/testReport/junit/org.apache.spark.scheduler/WorkerDecommissionExtendedSuite/Worker_decommission_and_executor_idle_timeout/)
```
java.util.concurrent.TimeoutException: Can't find 20 executors before 60000 milliseconds elapsed
	at org.apache.spark.TestUtils$.waitUntilExecutorsUp(TestUtils.scala:326)
	at org.apache.spark.scheduler.WorkerDecommissionExtendedSuite.$anonfun$new$2(WorkerDecommissionExtendedSuite.scala:45)
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the Jenkins.

Closes #29001 from dongjoon-hyun/SPARK-32100-2.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-07-05 20:12:41 -07:00
Kousuke Saruta 3726aab640 [SPARK-32177][WEBUI] Remove the weird line from near the Spark logo on mouseover in the WebUI
### What changes were proposed in this pull request?

This PR changes `webui.css` to fix a style issue on moving mouse cursor on the Spark logo.

### Why are the changes needed?

In the webui, the Spark logo is on the top right side.
When we move mouse cursor on the logo, a weird underline appears near the logo.
<img width="209" alt="logo_with_line" src="https://user-images.githubusercontent.com/4736016/86542828-3c6a9f00-bf54-11ea-9b9d-cc50c12c2c9b.png">

### Does this PR introduce _any_ user-facing change?

Yes. After this change applied, no more weird line shown even if mouse cursor moves on the logo.
<img width="207" alt="removed-line-from-logo" src="https://user-images.githubusercontent.com/4736016/86542877-98cdbe80-bf54-11ea-8695-ee39689673ab.png">

### How was this patch tested?

By moving mouse cursor on the Spark logo and confirmed no more weird line there.

Closes #29003 from sarutak/fix-logo-underline.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-07-05 19:09:04 -07:00
Eren Avsarogullari f843a5bf7c [SPARK-32026][CORE][TEST] Add PrometheusServletSuite
### What changes were proposed in this pull request?

This PR aims to add `PrometheusServletSuite`.

### Why are the changes needed?

This improves the test coverage.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the newly added test suite.

Closes #28865 from erenavsarogullari/spark_driver_prometheus_metrics_improvement.

Authored-by: Eren Avsarogullari <erenavsarogullari@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-07-01 11:33:18 -07:00
TJX2014 165c948e32 [SPARK-32068][WEBUI] Correct task lauchtime show issue due to timezone in stage tab
### What changes were proposed in this pull request?
`formatDate` in utils.js `org/apache/spark/ui/static/utils.js` is partly refactored.

### Why are the changes needed?
In branch-2.4,task launch time is returned as html string from driver,
while in branch-3.x,this is returned in JSON Object as`Date`type  from `org.apache.spark.status.api.v1.TaskData`
Due to:
LaunchTime from jersey server in spark driver is correct, which will be converted to date string like `2020-06-28T02:57:42.605GMT` in json object, then the formatDate in utils.js treat it as date.split(".")[0].replace("T", " ").
So `2020-06-28T02:57:42.605GMT` will be converted to `2020-06-28 02:57:42`, but correct is `2020-06-28 10:57:42` in GMT+8 timezone.
![选区_071](https://user-images.githubusercontent.com/7149304/85937186-b6d36780-b933-11ea-8382-80a3891f1c2a.png)
![选区_070](https://user-images.githubusercontent.com/7149304/85937190-bcc94880-b933-11ea-8860-2083c97ea269.png)

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Manual test.

Closes #28918 from TJX2014/master-SPARK-32068-ui-task-lauch-time-tz.

Authored-by: TJX2014 <xiaoxingstack@gmail.com>
Signed-off-by: Thomas Graves <tgraves@apache.org>
2020-06-30 08:56:59 -05:00
yi.wu 6fcb70e0ca [SPARK-32055][CORE][SQL] Unify getReader and getReaderForRange in ShuffleManager
### What changes were proposed in this pull request?

This PR tries to unify the method `getReader` and `getReaderForRange` in `ShuffleManager`.

### Why are the changes needed?

Reduce the duplicate codes, simplify the implementation, and for better maintenance.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Covered by existing tests.

Closes #28895 from Ngone51/unify-getreader.

Authored-by: yi.wu <yi.wu@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2020-06-29 11:37:03 +00:00
Warren Zhu 197ac3b132 [SPARK-32124][CORE][SHS] Fix taskEndReasonFromJson to handle event logs from old Spark versions
### What changes were proposed in this pull request?
Fix bug of exception when parse event log of fetch failed task end reason without `Map Index`

### Why are the changes needed?
When Spark history server read event log produced by older version of spark 2.4 (which don't have `Map Index` field), parsing of TaskEndReason will fail. This will cause TaskEnd event being ignored.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
JsonProtocolSuite.test("FetchFailed Map Index backwards compatibility")

Closes #28941 from warrenzhu25/shs-task.

Authored-by: Warren Zhu <zhonzh@microsoft.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-06-28 21:06:45 -07:00
gengjiaan 7445c7534b [SPARK-31845][CORE][TESTS] Refactor DAGSchedulerSuite by introducing completeAndCheckAnswer and using completeNextStageWithFetchFailure
### What changes were proposed in this pull request?
**First**
`DAGSchedulerSuite` provides `completeNextStageWithFetchFailure` to make all tasks in non first stage occur `FetchFailed`.
But many test case uses complete directly as follows:
```scala
 complete(taskSets(1), Seq(
     (FetchFailed(makeBlockManagerId("hostA"),
        shuffleDep1.shuffleId, 0L, 0, 0, "ignored"), null)))
```
We need to reuse `completeNextStageWithFetchFailure`.

**Second**
`DAGSchedulerSuite` also check the results show below:
```scala
complete(taskSets(0), Seq((Success, 42)))
assert(results === Map(0 -> 42))
```
We can extract it as a generic method of `checkAnswer`.

### Why are the changes needed?
Reuse `completeNextStageWithFetchFailure`

### Does this PR introduce _any_ user-facing change?
'No'.

### How was this patch tested?
Jenkins test

Closes #28866 from beliefer/reuse-completeNextStageWithFetchFailure.

Authored-by: gengjiaan <gengjiaan@360.cn>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-06-26 19:36:06 -07:00
Dongjoon Hyun 594cb56075 [SPARK-32100][CORE][TESTS] Add WorkerDecommissionExtendedSuite
### What changes were proposed in this pull request?

This PR aims to add `WorkerDecomissionExtendedSuite` for various worker decommission combinations.

### Why are the changes needed?

This will improve the test coverage.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the Jenkins.

Closes #28929 from dongjoon-hyun/SPARK-WD-TEST.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-06-25 16:21:14 -07:00
Max Gekk 045106e29d [SPARK-32072][CORE][TESTS] Fix table formatting with benchmark results
### What changes were proposed in this pull request?
Set column width w/ benchmark names to maximum of either
1. 40 (before this PR) or
2. The length of benchmark name or
3. Maximum length of cases names

### Why are the changes needed?
To improve readability of benchmark results. For example, `MakeDateTimeBenchmark`.

Before:
```
make_timestamp():                         Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
prepare make_timestamp()                           3636           3673          38          0.3        3635.7       1.0X
make_timestamp(2019, 1, 2, 3, 4, 50.123456)             94             99           4         10.7          93.8      38.8X
make_timestamp(2019, 1, 2, 3, 4, 60.000000)             68             80          13         14.6          68.3      53.2X
make_timestamp(2019, 12, 31, 23, 59, 60.00)             65             79          19         15.3          65.3      55.7X
make_timestamp(*, *, *, 3, 4, 50.123456)            271            280          14          3.7         270.7      13.4X
```

After:
```
make_timestamp():                            Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
---------------------------------------------------------------------------------------------------------------------------
prepare make_timestamp()                              3694           3745          82          0.3        3694.0       1.0X
make_timestamp(2019, 1, 2, 3, 4, 50.123456)             82             90           9         12.2          82.3      44.9X
make_timestamp(2019, 1, 2, 3, 4, 60.000000)             72             77           5         13.9          71.9      51.4X
make_timestamp(2019, 12, 31, 23, 59, 60.00)             67             71           5         15.0          66.8      55.3X
make_timestamp(*, *, *, 3, 4, 50.123456)               273            289          14          3.7         273.2      13.5X
```

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
By re-generating benchmark results for `MakeDateTimeBenchmark`:
```
$ SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain org.apache.spark.sql.execution.benchmark.MakeDateTimeBenchmark"
```
in the environment:

| Item | Description |
| ---- | ----|
| Region | us-west-2 (Oregon) |
| Instance | r3.xlarge |
| AMI | ubuntu/images/hvm-ssd/ubuntu-bionic-18.04-amd64-server-20190722.1 (ami-06f2f779464715dc5) |
| Java | OpenJDK 64-Bit Server VM 1.8.0_252 and OpenJDK 64-Bit Server VM 11.0.7+10 |

Closes #28906 from MaxGekk/benchmark-table-formatting.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2020-06-24 04:43:53 +00:00
Zhen Li eedc6cc37d [SPARK-32028][WEBUI] fix app id link for multi attempts app in history summary page
### What changes were proposed in this pull request?

Fix app id link for multi attempts application in history summary page
If attempt id is available (yarn), app id link url will contain correct attempt id, like `/history/application_1561589317410_0002/1/jobs/`.
If attempt id is not available (standalone), app id link url will not contain fake attempt id, like `/history/app-20190404053606-0000/jobs/`.

### Why are the changes needed?

This PR is for fixing [32028](https://issues.apache.org/jira/browse/SPARK-32028). App id link use application attempt count as attempt id. this would cause link url wrong for below cases:
1. there are multi attempts, all links point to last attempt
![multi_same](https://user-images.githubusercontent.com/10524738/85098505-c45c5500-b1af-11ea-8912-fa5fd72ce064.JPG)

2. if there is one attempt, but attempt id is not 1 (before attempt maybe crash or fail to gerenerate event file). link url points to worng attempt (1) here.
![wrong_attemptJPG](https://user-images.githubusercontent.com/10524738/85098513-c9b99f80-b1af-11ea-8cbc-fd7f745c1080.JPG)

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Tested this manually.

Closes #28867 from zhli1142015/fix-appid-link-in-history-page.

Authored-by: Zhen Li <zhli@microsoft.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
2020-06-23 21:43:02 -05:00
mcheah aa4c10025a [SPARK-31798][SHUFFLE][API] Shuffle Writer API changes to return custom map output metadata
Introduces the concept of a `MapOutputMetadata` opaque object that can be returned from map output writers.

Note that this PR only proposes the API changes on the shuffle writer side. Following patches will be proposed for actually accepting the metadata on the driver and persisting it in the driver's shuffle metadata storage plugin.

### Why are the changes needed?

For a more complete design discussion on this subject as a whole, refer to [this design document](https://docs.google.com/document/d/1Aj6IyMsbS2sdIfHxLvIbHUNjHIWHTabfknIPoxOrTjk/edit#).

### Does this PR introduce any user-facing change?

Enables additional APIs for the shuffle storage plugin tree. Usage will become more apparent as the API evolves.

### How was this patch tested?

No tests here, since this is only an API-side change that is not consumed by core Spark itself.

Closes #28616 from mccheah/return-map-output-metadata.

Authored-by: mcheah <mcheah@palantir.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-06-21 19:47:24 -07:00
yi.wu 4badef38a5 [SPARK-32000][CORE][TESTS] Fix the flaky test for partially launched task in barrier-mode
### What changes were proposed in this pull request?

This PR changes the test to get an active executorId and set it as preferred location instead of setting a fixed preferred location.

### Why are the changes needed?

The test is flaky. After checking the [log](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124086/artifact/core/), I find the root cause is:

Two test cases from different test suites got submitted at the same time because of concurrent execution. In this particular case, the two test cases (from DistributedSuite and BarrierTaskContextSuite) both launch under local-cluster mode. The two applications are submitted at the SAME time so they have the same applications(app-20200615210132-0000). Thus, when the cluster of BarrierTaskContextSuite is launching executors, it failed to create the directory for the executor 0, because the path (/home/jenkins/workspace/work/app-app-20200615210132-0000/0) has been used by the cluster of DistributedSuite. Therefore, it has to launch executor 1 and 2 instead, that lead to non of the tasks can get preferred locality thus they got scheduled together and lead to the test failure.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

The test can not be reproduced locally. We can only know it's been fixed when it's no longer flaky on Jenkins.

Closes #28849 from Ngone51/fix-spark-32000.

Authored-by: yi.wu <yi.wu@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2020-06-17 13:28:47 +00:00