### What changes were proposed in this pull request?
`PipedRDD` will invoke `stdinWriterThread.interrupt()` at task completion, and `obj.wait` will get `InterruptedException`. However, there exists a possibility which the thread termination gets delayed because the thread starts from `obj.wait()` with that exception. To prevent test flakiness, we need to use `eventually`. Also, This PR fixes the typo in code comment and variable name.
### Why are the changes needed?
```
- stdin writer thread should be exited when task is finished *** FAILED ***
Some(Thread[stdin writer for List(cat),5,]) was not empty (PipedRDDSuite.scala:107)
```
- https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-2.7/6867/testReport/junit/org.apache.spark.rdd/PipedRDDSuite/stdin_writer_thread_should_be_exited_when_task_is_finished/
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
Manual.
We can reproduce the same failure like Jenkins if we catch `InterruptedException` and sleep longer than the `eventually` timeout inside the test code. The following is the example to reproduce it.
```scala
val nums = sc.makeRDD(Array(1, 2, 3, 4), 1).map { x =>
try {
obj.synchronized {
obj.wait() // make the thread waits here.
}
} catch {
case ie: InterruptedException =>
Thread.sleep(15000)
throw ie
}
x
}
```
Closes#25808 from dongjoon-hyun/SPARK-29104.
Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
I fix the test "barrier task killed" which is flaky:
* Split interrupt/no interrupt test into separate sparkContext. Prevent them to influence each other.
* only check exception on partiton-0. partition-1 is hang on sleep which may throw other exception.
### Why are the changes needed?
Make test robust.
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
N/A
Closes#25799 from WeichenXu123/oss_fix_barrier_test.
Authored-by: WeichenXu <weichen.xu@databricks.com>
Signed-off-by: WeichenXu <weichen.xu@databricks.com>
### What changes were proposed in this pull request?
Log levels in Executor.scala are changed from DEBUG to INFO.
### Why are the changes needed?
Logging level DEBUG is too low here. These messages are simple acknowledgement for successful loading and initialization of plugins. So its better to keep them in INFO level.
### Does this PR introduce any user-facing change?
No
### How was this patch tested?
Manually tested.
Closes#25634 from iRakson/ExecutorPlugin.
Authored-by: iRakson <raksonrakesh@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
In the PR, I propose to create an instance of `TimestampFormatter` only once at the initialization, and reuse it inside of `nullSafeEval()` and `doGenCode()` in the case when the `fmt` parameter is foldable.
### Why are the changes needed?
The changes improve performance of the `date_format()` function.
Before:
```
format date: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
format date wholestage off 7180 / 7181 1.4 718.0 1.0X
format date wholestage on 7051 / 7194 1.4 705.1 1.0X
```
After:
```
format date: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
format date wholestage off 4787 / 4839 2.1 478.7 1.0X
format date wholestage on 4736 / 4802 2.1 473.6 1.0X
```
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
By existing test suites `DateExpressionsSuite` and `DateFunctionsSuite`.
Closes#25782 from MaxGekk/date_format-foldable.
Authored-by: Maxim Gekk <max.gekk@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
This patch proposes to add new tests to test the username of HiveClient to prevent changing the semantic unintentionally. The owner of Hive table has been changed back-and-forth, principal -> username -> principal, and looks like the change is not intentional. (Please refer [SPARK-28996](https://issues.apache.org/jira/browse/SPARK-28996) for more details.) This patch intends to prevent this.
This patch also renames previous HiveClientSuite(s) to HivePartitionFilteringSuite(s) as it was commented as TODO, as well as previous tests are too narrowed to test only partition filtering.
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
Newly added UTs.
Closes#25696 from HeartSaVioR/SPARK-28996.
Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
### What changes were proposed in this pull request?
Bucketizer support multi-column in the python side
### Why are the changes needed?
Bucketizer should support multi-column like the scala side.
### Does this PR introduce any user-facing change?
yes, this PR add new Python API
### How was this patch tested?
added testsuites
Closes#25801 from zhengruifeng/20542_py.
Authored-by: zhengruifeng <ruifengz@foxmail.com>
Signed-off-by: zhengruifeng <ruifengz@foxmail.com>
### What changes were proposed in this pull request?
When InSet generates Java switch-based code, if the input set is empty, we don't generate switch condition, but a simple expression that is default case of original switch.
### Why are the changes needed?
SPARK-26205 adds an optimization to InSet that generates Java switch condition for certain cases. When the given set is empty, it is possibly that codegen causes compilation error:
```
[info] - SPARK-29100: InSet with empty input set *** FAILED *** (58 milliseconds)
[info] Code generation of input[0, int, true] INSET () failed:
[info] org.codehaus.janino.InternalCompilerException: failed to compile: org.codehaus.janino.InternalCompilerException: Compiling "GeneratedClass" in "generated.java": Compiling "apply(java.lang.Object _i)"; apply(java.lang.Object _i): Operand stack inconsistent at offset 45: Previous size 0, now 1
[info] org.codehaus.janino.InternalCompilerException: failed to compile: org.codehaus.janino.InternalCompilerException: Compiling "GeneratedClass" in "generated.java": Compiling "apply(java.lang.Object _i)"; apply(java.lang.Object _i): Operand stack inconsistent at offset 45: Previous size 0, now 1
[info] at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.org$apache$spark$sql$catalyst$expressions$codegen$CodeGenerator$$doCompile(CodeGenerator.scala:1308)
[info] at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:1386)
[info] at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:1383)
```
### Does this PR introduce any user-facing change?
Yes. Previously, when users have InSet against an empty set, generated code causes compilation error. This patch fixed it.
### How was this patch tested?
Unit test added.
Closes#25806 from viirya/SPARK-29100.
Authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
### What changes were proposed in this pull request?
This pr proposes to define an individual method for each common subexpression in HashAggregateExec. In the current master, the common subexpr elimination code in HashAggregateExec is expanded in a single method; 4664a082c2/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala (L397)
The method size can be too big for JIT compilation, so I believe splitting it is beneficial for performance. For example, in a query `SELECT SUM(a + b), AVG(a + b + c) FROM VALUES (1, 1, 1) t(a, b, c)`,
the current master generates;
```
/* 098 */ private void agg_doConsume_0(InternalRow localtablescan_row_0, int agg_expr_0_0, int agg_expr_1_0, int agg_expr_2_0) throws java.io.IOException {
/* 099 */ // do aggregate
/* 100 */ // common sub-expressions
/* 101 */ int agg_value_6 = -1;
/* 102 */
/* 103 */ agg_value_6 = agg_expr_0_0 + agg_expr_1_0;
/* 104 */
/* 105 */ int agg_value_5 = -1;
/* 106 */
/* 107 */ agg_value_5 = agg_value_6 + agg_expr_2_0;
/* 108 */ boolean agg_isNull_4 = false;
/* 109 */ long agg_value_4 = -1L;
/* 110 */ if (!false) {
/* 111 */ agg_value_4 = (long) agg_value_5;
/* 112 */ }
/* 113 */ int agg_value_10 = -1;
/* 114 */
/* 115 */ agg_value_10 = agg_expr_0_0 + agg_expr_1_0;
/* 116 */ // evaluate aggregate functions and update aggregation buffers
/* 117 */ agg_doAggregate_sum_0(agg_value_10);
/* 118 */ agg_doAggregate_avg_0(agg_value_4, agg_isNull_4);
/* 119 */
/* 120 */ }
```
On the other hand, this pr generates;
```
/* 121 */ private void agg_doConsume_0(InternalRow localtablescan_row_0, int agg_expr_0_0, int agg_expr_1_0, int agg_expr_2_0) throws java.io.IOException {
/* 122 */ // do aggregate
/* 123 */ // common sub-expressions
/* 124 */ long agg_subExprValue_0 = agg_subExpr_0(agg_expr_2_0, agg_expr_0_0, agg_expr_1_0);
/* 125 */ int agg_subExprValue_1 = agg_subExpr_1(agg_expr_0_0, agg_expr_1_0);
/* 126 */ // evaluate aggregate functions and update aggregation buffers
/* 127 */ agg_doAggregate_sum_0(agg_subExprValue_1);
/* 128 */ agg_doAggregate_avg_0(agg_subExprValue_0);
/* 129 */
/* 130 */ }
```
I run some micro benchmarks for this pr;
```
(base) maropu~:$system_profiler SPHardwareDataType
Hardware:
Hardware Overview:
Processor Name: Intel Core i5
Processor Speed: 2 GHz
Number of Processors: 1
Total Number of Cores: 2
L2 Cache (per Core): 256 KB
L3 Cache: 4 MB
Memory: 8 GB
(base) maropu~:$java -version
java version "1.8.0_181"
Java(TM) SE Runtime Environment (build 1.8.0_181-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.181-b13, mixed mode)
(base) maropu~:$ /bin/spark-shell --master=local[1] --conf spark.driver.memory=8g --conf spark.sql.shurtitions=1 -v
val numCols = 40
val colExprs = "id AS key" +: (0 until numCols).map { i => s"id AS _c$i" }
spark.range(3000000).selectExpr(colExprs: _*).createOrReplaceTempView("t")
val aggExprs = (2 until numCols).map { i =>
(0 until i).map(d => s"_c$d")
.mkString("AVG(", " + ", ")")
}
// Drops the time of a first run then pick that of a second run
timer { sql(s"SELECT ${aggExprs.mkString(", ")} FROM t").write.format("noop").save() }
// the master
maxCodeGen: 12957
Elapsed time: 36.309858661s
// this pr
maxCodeGen=4184
Elapsed time: 2.399490285s
```
### Why are the changes needed?
To avoid the too-long-function issue in JVMs.
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
Added tests in `WholeStageCodegenSuite`
Closes#25710 from maropu/SplitSubexpr.
Authored-by: Takeshi Yamamuro <yamamuro@apache.org>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
### What changes were proposed in this pull request?
This patch adds new UT to ensure existing query (before Spark 3.0.0) with checkpoint doesn't break with default configuration of "includeHeaders" being introduced via SPARK-23539.
This patch also modifies existing test which checks type of columns to also check headers column as well.
### Why are the changes needed?
The patch adds missing tests which guarantees backward compatibility of the change of SPARK-23539.
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
UT passed.
Closes#25792 from HeartSaVioR/SPARK-23539-FOLLOWUP.
Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan@gmail.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
…create table through spark-sql or beeline
## What changes were proposed in this pull request?
fix table owner use user instead of principal when create table through spark-sql
private val userName = conf.getUser will get ugi's userName which is principal info, and i copy the source code into HiveClientImpl, and use ugi.getShortUserName() instead of ugi.getUserName(). The owner display correctly.
## How was this patch tested?
1. create a table in kerberos cluster
2. use "desc formatted tbName" check owner
Please review http://spark.apache.org/contributing.html before opening a pull request.
Closes#23952 from hddong/SPARK-26929-fix-table-owner.
Lead-authored-by: hongdd <jn_hdd@163.com>
Co-authored-by: hongdongdong <hongdongdong@cmss.chinamobile.com>
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
### What changes were proposed in this pull request?
The previous refactors of the shuffle writers using the shuffle writer plugin resulted in shuffle write metric updates - particularly write times - being lost in particular situations. This patch restores the lost metric updates.
### Why are the changes needed?
This fixes a regression. I'm pretty sure that without this, the Spark UI will lose shuffle write time information.
### Does this PR introduce any user-facing change?
No change from Spark 2.4. Without this, there would be a user-facing bug in Spark 3.0.
### How was this patch tested?
Existing unit tests.
Closes#25780 from mccheah/fix-write-metrics.
Authored-by: mcheah <mcheah@palantir.com>
Signed-off-by: Imran Rashid <irashid@cloudera.com>
### What changes were proposed in this pull request?
This pr proposes to print bytecode statistics (max class bytecode size, max method bytecode size, max constant pool size, and # of inner classes) for generated classes in debug prints, `debugCodegen`. Since these metrics are critical for codegen framework developments, I think its worth printing there. This pr intends to enable `debugCodegen` to print these metrics as following;
```
scala> sql("SELECT sum(v) FROM VALUES(1) t(v)").debugCodegen
Found 2 WholeStageCodegen subtrees.
== Subtree 1 / 2 (maxClassCodeSize:2693; maxMethodCodeSize:124; maxConstantPoolSize:130(0.20% used); numInnerClasses:0) ==
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
*(1) HashAggregate(keys=[], functions=[partial_sum(cast(v#0 as bigint))], output=[sum#5L])
+- *(1) LocalTableScan [v#0]
Generated code:
/* 001 */ public Object generate(Object[] references) {
/* 002 */ return new GeneratedIteratorForCodegenStage1(references);
/* 003 */ }
...
```
### Why are the changes needed?
For efficient developments
### Does this PR introduce any user-facing change?
No
### How was this patch tested?
Manually tested
Closes#25766 from maropu/PrintBytecodeStats.
Authored-by: Takeshi Yamamuro <yamamuro@apache.org>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
### What changes were proposed in this pull request?
This is a follow-up of https://github.com/apache/spark/pull/25638 to switch `scala-library` from `test` dependency to `compile` dependency in `network-common` module.
### Why are the changes needed?
Previously, we added `scala-library` as a test dependency to resolve the followings, but it was insufficient to resolve. This PR aims to switch it to compile dependency.
```
$ java -version
openjdk version "11.0.3" 2019-04-16
OpenJDK Runtime Environment AdoptOpenJDK (build 11.0.3+7)
OpenJDK 64-Bit Server VM AdoptOpenJDK (build 11.0.3+7, mixed mode)
$ mvn clean install -pl common/network-common -DskipTests
...
[INFO] --- scala-maven-plugin:4.2.0:doc-jar (attach-scaladocs) spark-network-common_2.12 ---
error: fatal error: object scala in compiler mirror not found.
one error found
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
```
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
Manually, run the following on JDK11.
```
$ mvn clean install -pl common/network-common -DskipTests
```
Closes#25800 from dongjoon-hyun/SPARK-28932-2.
Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
It's more clear to only do table lookup in `ResolveTables` rule (for v2 tables) and `ResolveRelations` rule (for v1 tables). `ResolveInsertInto` should only resolve the `InsertIntoStatement` with resolved relations.
### Why are the changes needed?
to make the code simpler
### Does this PR introduce any user-facing change?
no
### How was this patch tested?
existing tests
Closes#25774 from cloud-fan/simplify.
Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
Currently, there is no migration section for PySpark, SparkCore and Structured Streaming.
It is difficult for users to know what to do when they upgrade.
This PR proposes to create create a "Migration Guide" tap at Spark documentation.
![Screen Shot 2019-09-11 at 7 02 05 PM](https://user-images.githubusercontent.com/6477701/64688126-ad712f80-d4c6-11e9-8672-9a2c56c05bf8.png)
![Screen Shot 2019-09-11 at 7 27 15 PM](https://user-images.githubusercontent.com/6477701/64689915-389ff480-d4ca-11e9-8c54-7f46095d0d23.png)
This page will contain migration guides for Spark SQL, PySpark, SparkR, MLlib, Structured Streaming and Core. Basically it is a refactoring.
There are some new information added, which I will leave a comment inlined for easier review.
1. **MLlib**
Merge [ml-guide.html#migration-guide](https://spark.apache.org/docs/latest/ml-guide.html#migration-guide) and [ml-migration-guides.html](https://spark.apache.org/docs/latest/ml-migration-guides.html)
```
'docs/ml-guide.md'
↓ Merge new/old migration guides
'docs/ml-migration-guide.md'
```
2. **PySpark**
Extract PySpark specific items from https://spark.apache.org/docs/latest/sql-migration-guide-upgrade.html
```
'docs/sql-migration-guide-upgrade.md'
↓ Extract PySpark specific items
'docs/pyspark-migration-guide.md'
```
3. **SparkR**
Move [sparkr.html#migration-guide](https://spark.apache.org/docs/latest/sparkr.html#migration-guide) into a separate file, and extract from [sql-migration-guide-upgrade.html](https://spark.apache.org/docs/latest/sql-migration-guide-upgrade.html)
```
'docs/sparkr.md' 'docs/sql-migration-guide-upgrade.md'
Move migration guide section ↘ ↙ Extract SparkR specific items
docs/sparkr-migration-guide.md
```
4. **Core**
Newly created at `'docs/core-migration-guide.md'`. I skimmed resolved JIRAs at 3.0.0 and found some items to note.
5. **Structured Streaming**
Newly created at `'docs/ss-migration-guide.md'`. I skimmed resolved JIRAs at 3.0.0 and found some items to note.
6. **SQL**
Merged [sql-migration-guide-upgrade.html](https://spark.apache.org/docs/latest/sql-migration-guide-upgrade.html) and [sql-migration-guide-hive-compatibility.html](https://spark.apache.org/docs/latest/sql-migration-guide-hive-compatibility.html)
```
'docs/sql-migration-guide-hive-compatibility.md' 'docs/sql-migration-guide-upgrade.md'
Move Hive compatibility section ↘ ↙ Left over after filtering PySpark and SparkR items
'docs/sql-migration-guide.md'
```
### Why are the changes needed?
In order for users in production to effectively migrate to higher versions, and detect behaviour or breaking changes before upgrading and/or migrating.
### Does this PR introduce any user-facing change?
Yes, this changes Spark's documentation at https://spark.apache.org/docs/latest/index.html.
### How was this patch tested?
Manually build the doc. This can be verified as below:
```bash
cd docs
SKIP_API=1 jekyll build
open _site/index.html
```
Closes#25757 from HyukjinKwon/migration-doc.
Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
This PR add the `namespaces` keyword to `TableIdentifierParserSuite`.
### Why are the changes needed?
Improve the test.
### Does this PR introduce any user-facing change?
No
### How was this patch tested?
N/A
Closes#25758 from highmoutain/3.0bugfix.
Authored-by: changchun.wang <251922566@qq.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
This patch fixes the bug regarding NPE in SQLConf.get, which is only possible when SparkContext._dagScheduler is null due to stopping SparkContext. The logic doesn't seem to consider active SparkContext could be in progress of stopping.
Note that it can't be encountered easily as SparkContext.stop() blocks the main thread, but there're many cases which SQLConf.get is accessed concurrently while SparkContext.stop() is executing - users run another threads, or listener is accessing SQLConf.get after dagScheduler is set to null (this is the case what I encountered.)
### Why are the changes needed?
The bug brings NPE.
### Does this PR introduce any user-facing change?
No
### How was this patch tested?
Previous patch #25753 was tested with new UT, and due to disruption with other tests in concurrent test run, the test is excluded in this patch.
Closes#25790 from HeartSaVioR/SPARK-29046-v2.
Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
In the PR, I propose to fix comments of date-time expressions, and replace the `yyyy` pattern by `uuuu` when the implementation supposes the former one.
### Why are the changes needed?
To make comments consistent to implementations.
### Does this PR introduce any user-facing change?
No
### How was this patch tested?
By running Scala Style checker.
Closes#25796 from MaxGekk/year-pattern-uuuu-followup.
Authored-by: Maxim Gekk <max.gekk@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
[SPARK-27122](https://github.com/apache/spark/pull/24088) fixes `ClassCastException` at `yarn` module by introducing `DelegatingServletContextHandler`. Initially, this was discovered with JDK9+, but the class path issues affected JDK8 environment, too. After [SPARK-28709](https://github.com/apache/spark/pull/25439), I also hit the similar issue at `streaming` module.
This PR aims to fix `streaming` module by adding `getContextPath` to `DelegatingServletContextHandler` and using it.
### Why are the changes needed?
Currently, when we test `streaming` module independently, it fails like the following.
```
$ build/mvn test -pl streaming
...
UISeleniumSuite:
- attaching and detaching a Streaming tab *** FAILED ***
java.lang.ClassCastException: org.sparkproject.jetty.servlet.ServletContextHandler cannot be cast to org.eclipse.jetty.servlet.ServletContextHandler
...
Tests: succeeded 337, failed 1, canceled 0, ignored 1, pending 0
*** 1 TEST FAILED ***
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
```
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
Pass the Jenkins with the modified tests. And do the following manually.
Since you can observe this when you run `streaming` module test only (instead of running all), you need to install the changed `core` module and use it.
```
$ java -version
openjdk version "1.8.0_222"
OpenJDK Runtime Environment (AdoptOpenJDK)(build 1.8.0_222-b10)
OpenJDK 64-Bit Server VM (AdoptOpenJDK)(build 25.222-b10, mixed mode)
$ build/mvn install -DskipTests
$ build/mvn test -pl streaming
```
Closes#25791 from dongjoon-hyun/SPARK-29087.
Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
Replace use of `SerializationUtils.clone` with new `Utils.cloneProperties` method
Add benchmark + results showing dramatic speed up for effectively equivalent functionality.
### What changes were proposed in this pull request?
While I am not sure that SerializationUtils.clone is a performance issue in production, I am sure that it is overkill for the task it is doing (providing a distinct copy of a `Properties` object).
This PR provides a benchmark showing the dramatic improvement over the clone operation and replaces uses of `SerializationUtils.clone` on `Properties` with the more specialized `Utils.cloneProperties`.
### Does this PR introduce any user-facing change?
Strings are immutable so there is no reason to serialize and deserialize them, it just creates extra garbage.
The only functionality that would be changed is the unsupported insertion of non-String objects into the spark local properties.
### How was this patch tested?
1. Pass the Jenkins with the existing tests.
2. Since this is a performance improvement PR, manually run the benchmark.
Closes#25787 from databricks-david-lewis/SPARK-29081.
Authored-by: David Lewis <david.lewis@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
Make r file extension check case insensitive for spark-submit.
### Why are the changes needed?
spark-submit does not accept `.r` files as R scripts. Some codebases have r files that end with lowercase file extensions. It is inconvenient to use spark-submit with lowercase extension R files. The error is not very clear (https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala#L232).
```
$ ./bin/spark-submit examples/src/main/r/dataframe.r
Exception in thread "main" org.apache.spark.SparkException: Cannot load main class from JAR file:/Users/dongjoon/APACHE/spark-release/spark-2.4.4-bin-hadoop2.7/examples/src/main/r/dataframe.r
```
### Does this PR introduce any user-facing change?
Yes. spark-submit can now be used to run R scripts with `.r` file extension.
### How was this patch tested?
Manual.
```
$ mv examples/src/main/r/dataframe.R examples/src/main/r/dataframe.r
$ ./bin/spark-submit examples/src/main/r/dataframe.r
```
Closes#25778 from Loquats/r-case.
Authored-by: Andy Zhang <yue.zhang@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
During Spark History Server startup, there are two things happening simultaneously that call into `java.nio.file.FileSystems.getDefault()` and we sometime hit [JDK-8194653](https://bugs.openjdk.java.net/browse/JDK-8194653).
1) start jetty server
2) start ApplicationHistoryProvider (which reads files from HDFS)
We should do these two things sequentially instead of in parallel.
We introduce a start() method in ApplicationHistoryProvider (and its subclass FsHistoryProvider), and we do initialize inside the start() method instead of the constructor.
In HistoryServer, we explicitly call provider.start() after we call bind() which starts the Jetty server.
### Why are the changes needed?
It is a bug that occasionally starting Spark History Server results in process hang due to deadlock among threads.
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
I stress tested this PR with a bash script to stop and start Spark History Server more than 1000 times, it worked fine. Previously I can only do the stop/start loop less than 10 times before I hit the deadlock issue.
Closes#25705 from shanyu/shanyu-29003.
Authored-by: Shanyu Zhao <shzhao@microsoft.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
This PR enables GitHub Action on PRs.
### Why are the changes needed?
So far, we detect JDK11 compilation error after merging.
This PR aims to prevent JDK11 compilation error at PR stage.
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
Manual. See the GitHub Action on this PR.
Closes#25786 from dongjoon-hyun/SPARK-29079.
Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: DB Tsai <d_tsai@apple.com>
### What changes were proposed in this pull request?
This PR aims to add a new enforcer rule to ban duplicated pom dependency during build stage.
### Why are the changes needed?
This will help us by preventing the extra effort like the followings.
```
e63098b287 [SPARK-29007][MLLIB][FOLLOWUP] Remove duplicated dependency
39e044e3d8 [MINOR][BUILD] Remove duplicate test-jar:test spark-sql dependency from Hive module
d8fefab4d8 [HOTFIX][BUILD][TEST-MAVEN] Remove duplicate dependency
e9445b187e [SPARK-6866][Build] Remove duplicated dependency in launcher/pom.xml
```
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
Manually.
If we have something like e63098b287, it will fail at building phase at PR like the following.
```
[WARNING] Rule 0: org.apache.maven.plugins.enforcer.BanDuplicatePomDependencyVersions failed with message:
Found 1 duplicate dependency declaration in this project:
- dependencies.dependency[org.apache.spark:spark-streaming_${scala.binary.version}:test-jar] ( 2 times )
...
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-enforcer-plugin:3.0.0-M2:enforce (enforce-no-duplicate-dependencies) on project spark-mllib_2.12: Some Enforcer rules have failed. Look above for specific messages explaining why the rule failed. -> [Help 1]
```
Closes#25784 from dongjoon-hyun/SPARK-29075.
Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
We already have found and fixed the correctness issue before when RDD output is INDETERMINATE. One missing part is sampling-based RDD. This kind of RDDs is order sensitive to its input. A sampling-based RDD with unordered input, should be INDETERMINATE.
### Why are the changes needed?
A sampling-based RDD with unordered input is just like MapPartitionsRDD with isOrderSensitive parameter as true. The RDD output can be different after a rerun.
It is a problem in ML applications.
In ML, sample is used to prepare training data. ML algorithm fits the model based on the sampled data. If rerun tasks of sample produce different output during model fitting, ML results will be unreliable and also buggy.
Each sample is random output, but once you sampled, the output should be determinate.
### Does this PR introduce any user-facing change?
Previously, a sampling-based RDD can possibly come with different output after a rerun.
After this patch, sampling-based RDD is INDETERMINATE. For an INDETERMINATE map stage, currently Spark scheduler will re-try all the tasks of the failed stage.
### How was this patch tested?
Added test.
Closes#25751 from viirya/sample-order-sensitive.
Authored-by: Liang-Chi Hsieh <liangchi@uber.com>
Signed-off-by: Liang-Chi Hsieh <liangchi@uber.com>
### What changes were proposed in this pull request?
This removes the duplicated dependency which is added by [SPARK-29007](b62ef8f793/mllib/pom.xml (L58-L64)).
### Why are the changes needed?
Maven complains this kind of duplications. We had better be safe in the future Maven versions.
```
$ cd mllib
$ mvn clean package -DskipTests
[INFO] Scanning for projects...
[WARNING]
[WARNING] Some problems were encountered while building the effective model for org.apache.spark:spark-mllib_2.12🫙3.0.0-SNAPSHOT
[WARNING] 'dependencies.dependency.(groupId:artifactId:type:classifier)' must be unique: org.apache.spark:spark-streaming_${scala.binary.version}:test-jar -> duplicate declaration of version ${project.version} line 119, column 17
[WARNING]
[WARNING] It is highly recommended to fix these problems because they threaten the stability of your build.
[WARNING]
[WARNING] For this reason, future Maven versions might no longer support building such malformed projects.
[WARNING]
...
```
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
Manual check since this is a warning.
```
$ cd mllib
$ mvn clean package -DskipTests
```
Closes#25783 from dongjoon-hyun/SPARK-29007.
Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
## What changes were proposed in this pull request?
This update adds support for Kafka Headers functionality in Structured Streaming.
## How was this patch tested?
With following unit tests:
- KafkaRelationSuite: "default starting and ending offsets with headers" (new)
- KafkaSinkSuite: "batch - write to kafka" (updated)
Closes#22282 from dongjinleekr/feature/SPARK-23539.
Lead-authored-by: Lee Dongjin <dongjin@apache.org>
Co-authored-by: Jungtaek Lim <kabhwan@gmail.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
### What changes were proposed in this pull request?
Follow the scala ```OneVsRestParams``` implementation, move ```setClassifier``` from ```OneVsRestParams``` to ```OneVsRest``` in Pyspark
### Why are the changes needed?
1. Maintain the parity between scala and python code.
2. ```Classifier``` can only be set in the estimator.
### Does this PR introduce any user-facing change?
Yes.
Previous behavior: ```OneVsRestModel``` has method ```setClassifier```
Current behavior: ```setClassifier``` is removed from ```OneVsRestModel```. ```classifier``` can only be set in ```OneVsRest```.
### How was this patch tested?
Use existing tests
Closes#25715 from huaxingao/spark-28969.
Authored-by: Huaxin Gao <huaxing@us.ibm.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
### What changes were proposed in this pull request?
ThriftServerSessionPage displays timestamp 0 (1970/01/01) instead of nothing if query finish time and close time are not set.
![image](https://user-images.githubusercontent.com/25019163/64711118-6d578000-d4b9-11e9-9b11-2e3616319a98.png)
Change it to display nothing, like ThriftServerPage.
### Why are the changes needed?
Obvious bug.
### Does this PR introduce any user-facing change?
Finish time and Close time will be displayed correctly on ThriftServerSessionPage in JDBC/ODBC Spark UI.
### How was this patch tested?
Manual test.
Closes#25762 from juliuszsompolski/SPARK-29056.
Authored-by: Juliusz Sompolski <julek@databricks.com>
Signed-off-by: Yuming Wang <wgyumg@gmail.com>
### What changes were proposed in this pull request?
Added document for CREATE VIEW command.
### Why are the changes needed?
As a reference to syntax and examples of CREATE VIEW command.
### How was this patch tested?
Documentation update. Verified manually.
Closes#25543 from amanomer/spark-28795.
Lead-authored-by: aman_omer <amanomer1996@gmail.com>
Co-authored-by: Xiao Li <gatorsmile@gmail.com>
Co-authored-by: Aman Omer <amanomer1996@gmail.com>
Signed-off-by: Xiao Li <gatorsmile@gmail.com>
### What changes were proposed in this pull request?
Document DROP DATABASE statement in SQL Reference
### Why are the changes needed?
Currently from spark there is no complete sql guide is present, so it is better to document all the sql commands, this jira is sub part of this task.
### Does this PR introduce any user-facing change?
Yes, Before there was no documentation about drop database syntax
After Fix
![image](https://user-images.githubusercontent.com/35216143/64787097-977a7200-d58d-11e9-911c-d2ff6f3ccff5.png)
![image](https://user-images.githubusercontent.com/35216143/64787122-a6612480-d58d-11e9-978c-9455baff007f.png)
### How was this patch tested?
tested with jenkyll build
Closes#25554 from sandeep-katta/dropDbDoc.
Authored-by: sandeep katta <sandeep.katta2007@gmail.com>
Signed-off-by: Xiao Li <gatorsmile@gmail.com>
### What changes were proposed in this pull request?
Document REFRESH TABLE statement in the SQL Reference Guide.
### Why are the changes needed?
Currently there is no documentation in the SPARK SQL to describe how to use this command, it is to address this issue.
### Does this PR introduce any user-facing change?
Yes.
#### Before:
There is no documentation for this.
#### After:
<img width="826" alt="Screen Shot 2019-09-12 at 11 39 21 AM" src="https://user-images.githubusercontent.com/7550280/64811385-01752600-d552-11e9-876d-91ebb005b851.png">
### How was this patch tested?
Using jykll build --serve
Closes#25549 from kevinyu98/spark-28828-refreshTable.
Authored-by: Kevin Yu <qyu@us.ibm.com>
Signed-off-by: Xiao Li <gatorsmile@gmail.com>
### What changes were proposed in this pull request?
The `Column.isInCollection()` with a large size collection will generate an expression with large size children expressions. This make analyzer and optimizer take a long time to run.
In this PR, in `isInCollection()` function, directly generate `InSet` expression, avoid generating too many children expressions.
### Why are the changes needed?
`Column.isInCollection()` with a large size collection sometimes become a bottleneck when running sql.
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
Manually benchmark it in spark-shell:
```
def testExplainTime(collectionSize: Int) = {
val df = spark.range(10).withColumn("id2", col("id") + 1)
val list = Range(0, collectionSize).toList
val startTime = System.currentTimeMillis()
df.where(col("id").isInCollection(list)).where(col("id2").isInCollection(list)).explain()
val elapsedTime = System.currentTimeMillis() - startTime
println(s"cost time: ${elapsedTime}ms")
}
```
Then test on collection size 5, 10, 100, 1000, 10000, test result is:
collection size | explain time (before) | explain time (after)
------ | ------ | ------
5 | 26ms | 29ms
10 | 30ms | 48ms
100 | 104ms | 50ms
1000 | 1202ms | 58ms
10000 | 10012ms | 523ms
Closes#25754 from WeichenXu123/improve_in_collection.
Lead-authored-by: WeichenXu <weichen.xu@databricks.com>
Co-authored-by: Xiao Li <gatorsmile@gmail.com>
Signed-off-by: Xiao Li <gatorsmile@gmail.com>
### What changes were proposed in this pull request?
This PR adds a utility class `AdaptiveSparkPlanHelper` which provides methods related to tree traversal of an `AdaptiveSparkPlanExec` plan. Unlike their counterparts in `TreeNode` or
`QueryPlan`, these methods traverse down leaf nodes of adaptive plans, i.e., `AdaptiveSparkPlanExec` and `QueryStageExec`.
### Why are the changes needed?
This utility class can greatly simplify tree traversal code for adaptive spark plans.
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
Refined `AdaptiveQueryExecSuite` with the help of the new utility methods.
Closes#25764 from maryannxue/aqe-utils.
Authored-by: maryannxue <maryannxue@apache.org>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
### What changes were proposed in this pull request?
In the PR, I propose to extend `ExtractBenchmark` and add new ones for:
- `EXTRACT` and `DATE` as input column
- the `DATE_PART` function and `DATE`/`TIMESTAMP` input column
### Why are the changes needed?
The `EXTRACT` expression is rebased on the `DATE_PART` expression by the PR https://github.com/apache/spark/pull/25410 where some of sub-expressions take `DATE` column as the input (`Millennium`, `Year` and etc.) but others require `TIMESTAMP` column (`Hour`, `Minute`). Separate benchmarks for `DATE` should exclude overhead of implicit conversions `DATE` <-> `TIMESTAMP`.
### Does this PR introduce any user-facing change?
No, it doesn't.
### How was this patch tested?
- Regenerated results of `ExtractBenchmark`
Closes#25772 from MaxGekk/date_part-benchmark.
Authored-by: Maxim Gekk <max.gekk@gmail.com>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
### What changes were proposed in this pull request?
reorganize the packages of DS v2 interfaces/classes:
1. `org.spark.sql.connector.catalog`: put `TableCatalog`, `Table` and other related interfaces/classes
2. `org.spark.sql.connector.expression`: put `Expression`, `Transform` and other related interfaces/classes
3. `org.spark.sql.connector.read`: put `ScanBuilder`, `Scan` and other related interfaces/classes
4. `org.spark.sql.connector.write`: put `WriteBuilder`, `BatchWrite` and other related interfaces/classes
### Why are the changes needed?
Data Source V2 has evolved a lot. It's a bit weird that `Expression` is in `org.spark.sql.catalog.v2` and `Table` is in `org.spark.sql.sources.v2`.
### Does this PR introduce any user-facing change?
No
### How was this patch tested?
existing tests
Closes#25700 from cloud-fan/package.
Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
### What changes were proposed in this pull request?
In method `SQLMetricsTestUtils.testMetricsDynamicPartition()`, there is a CREATE TABLE sentence without `withTable` block. It causes test failure if use same table name in other unit tests.
### Why are the changes needed?
To avoid "table already exists" in tests.
### Does this PR introduce any user-facing change?
No
### How was this patch tested?
Exist UT
Closes#25752 from LantaoJin/SPARK-29045.
Authored-by: LantaoJin <jinlantao@gmail.com>
Signed-off-by: Yuming Wang <wgyumg@gmail.com>
# What changes were proposed in this pull request?
This patch fixes the bug regarding NPE in SQLConf.get, which is only possible when SparkContext._dagScheduler is null due to stopping SparkContext. The logic doesn't seem to consider active SparkContext could be in progress of stopping.
Note that it can't be encountered easily as `SparkContext.stop()` blocks the main thread, but there're many cases which SQLConf.get is accessed concurrently while SparkContext.stop() is executing - users run another threads, or listener is accessing SQLConf.get after dagScheduler is set to null (this is the case what I encountered.)
### Why are the changes needed?
The bug brings NPE.
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
Added new UT to verify NPE doesn't occur. Without patch, the test fails with throwing NPE.
Closes#25753 from HeartSaVioR/SPARK-29046.
Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
JIRA :https://issues.apache.org/jira/browse/SPARK-29050
'a hdfs' change into 'an hdfs'
'an unique' change into 'a unique'
'an url' change into 'a url'
'a error' change into 'an error'
Closes#25756 from dengziming/feature_fix_typos.
Authored-by: dengziming <dengziming@growingio.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
Remove `InsertIntoTable` and replace it's usage by `InsertIntoStatement`
### Why are the changes needed?
`InsertIntoTable` and `InsertIntoStatement` are almost identical (except some namings). It doesn't make sense to keep 2 identical plans. After the removal of `InsertIntoTable`, the analysis process becomes:
1. parser creates `InsertIntoStatement`
2. v2 rule `ResolveInsertInto` converts `InsertIntoStatement` to v2 commands.
3. v1 rules like `DataSourceAnalysis` and `HiveAnalysis` convert `InsertIntoStatement` to v1 commands.
### Does this PR introduce any user-facing change?
No
### How was this patch tested?
existing tests
Closes#25763 from cloud-fan/remove.
Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
This PR proposes to allow `bytes` as an acceptable type for binary type for `createDataFrame`.
### Why are the changes needed?
`bytes` is a standard type for binary in Python. This should be respected in PySpark side.
### Does this PR introduce any user-facing change?
Yes, _when specified type is binary_, we will allow `bytes` as a binary type. Previously this was not allowed in both Python 2 and Python 3 as below:
```python
spark.createDataFrame([[b"abcd"]], "col binary")
```
in Python 3
```
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/.../spark/python/pyspark/sql/session.py", line 787, in createDataFrame
rdd, schema = self._createFromLocal(map(prepare, data), schema)
File "/.../spark/python/pyspark/sql/session.py", line 442, in _createFromLocal
data = list(data)
File "/.../spark/python/pyspark/sql/session.py", line 769, in prepare
verify_func(obj)
File "/.../forked/spark/python/pyspark/sql/types.py", line 1403, in verify
verify_value(obj)
File "/.../spark/python/pyspark/sql/types.py", line 1384, in verify_struct
verifier(v)
File "/.../spark/python/pyspark/sql/types.py", line 1403, in verify
verify_value(obj)
File "/.../spark/python/pyspark/sql/types.py", line 1397, in verify_default
verify_acceptable_types(obj)
File "/.../spark/python/pyspark/sql/types.py", line 1282, in verify_acceptable_types
% (dataType, obj, type(obj))))
TypeError: field col: BinaryType can not accept object b'abcd' in type <class 'bytes'>
```
in Python 2:
```
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/.../spark/python/pyspark/sql/session.py", line 787, in createDataFrame
rdd, schema = self._createFromLocal(map(prepare, data), schema)
File "/.../spark/python/pyspark/sql/session.py", line 442, in _createFromLocal
data = list(data)
File "/.../spark/python/pyspark/sql/session.py", line 769, in prepare
verify_func(obj)
File "/.../spark/python/pyspark/sql/types.py", line 1403, in verify
verify_value(obj)
File "/.../spark/python/pyspark/sql/types.py", line 1384, in verify_struct
verifier(v)
File "/.../spark/python/pyspark/sql/types.py", line 1403, in verify
verify_value(obj)
File "/.../spark/python/pyspark/sql/types.py", line 1397, in verify_default
verify_acceptable_types(obj)
File "/.../spark/python/pyspark/sql/types.py", line 1282, in verify_acceptable_types
% (dataType, obj, type(obj))))
TypeError: field col: BinaryType can not accept object 'abcd' in type <type 'str'>
```
So, it won't break anything.
### How was this patch tested?
Unittests were added and also manually tested as below.
```bash
./run-tests --python-executables=python2,python3 --testnames "pyspark.sql.tests.test_serde"
```
Closes#25749 from HyukjinKwon/SPARK-29041.
Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
This patch fixes the flaky test failure from StreamingContextSuite "stop slow receiver gracefully", via putting flag whether initializing slow receiver is completed, and wait for such flag to be true. As receiver should be submitted via job and initialized in executor, 500ms might not be enough for covering all cases.
### Why are the changes needed?
We got some reports for test failure on this test. Please refer [SPARK-24663](https://issues.apache.org/jira/browse/SPARK-24663)
### Does this PR introduce any user-facing change?
No
### How was this patch tested?
Modified UT. I've artificially made delay on handling job submission via adding below code in `DAGScheduler.submitJob`:
```
if (rdd != null && rdd.name != null && rdd.name.startsWith("Receiver")) {
println(s"Receiver Job! rdd name: ${rdd.name}")
Thread.sleep(1000)
}
```
and the test "stop slow receiver gracefully" failed on current master and passed on the patch.
Closes#25725 from HeartSaVioR/SPARK-24663.
Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan@gmail.com>
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
### What changes were proposed in this pull request?
This patch enforces tests to prevent leaking newly created SparkContext while is created via initializing StreamingContext. Leaking SparkContext in test would make most of following tests being failed as well, so this patch applies defensive programming, trying its best to ensure SparkContext is cleaned up.
### Why are the changes needed?
We got some case in CI build where SparkContext is being leaked and other tests are affected by leaked SparkContext. Ideally we should isolate the environment among tests if possible.
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
Modified UTs.
Closes#25709 from HeartSaVioR/SPARK-29007.
Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan@gmail.com>
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
### What changes were proposed in this pull request?
This patch ensures accessing recorded values in listener is always after letting listeners fully process all events. To ensure this, this patch adds new class to hide these values and access with methods which will ensure above condition. Without this guard, two threads are running concurrently - 1) listeners process thread 2) test main thread - and race condition would occur.
That's why we also see very odd thing, error message saying condition is met but test failed:
```
- Barrier task failures from the same stage attempt don't trigger multiple stage retries *** FAILED ***
ArrayBuffer(0) did not equal List(0) (DAGSchedulerSuite.scala:2656)
```
which means verification failed, and condition is met just before constructing error message.
The guard is properly placed in many spots, but missed in some places. This patch enforces that it can't be missed.
### Why are the changes needed?
UT fails intermittently and this patch will address the flakyness.
### Does this PR introduce any user-facing change?
No
### How was this patch tested?
Modified UT.
Also made the flaky tests artificially failing via applying 50ms of sleep on each onXXX method.
![Screen Shot 2019-09-07 at 7 44 15 AM](https://user-images.githubusercontent.com/1317309/64465178-1747ad00-d146-11e9-92f6-f4ed4a1f4b08.png)
I found 3 methods being failed. (They've marked as X. Just ignore ! as they failed on waiting listener in given timeout and these tests don't deal with these recorded values - it uses other timeout value 1000ms than 10000ms for this listener so affected via side-effect.)
When I applied same in this patch all tests marked as X passed.
Closes#25706 from HeartSaVioR/SPARK-26989.
Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan@gmail.com>
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
### What changes were proposed in this pull request?
PR #22112 fixed the todo added by PR #20393(SPARK-23207). We can remove it now.
### Why are the changes needed?
In order not to confuse developers.
### Does this PR introduce any user-facing change?
no
### How was this patch tested?
no need to test
Closes#25755 from LinhongLiu/remove-todo.
Authored-by: Liu,Linhong <liulinhong@baidu.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>