Commit graph

11738 commits

Author SHA1 Message Date
Gengliang Wang 5d45a415f3 Preparing Spark release v3.2.0-rc7 2021-10-06 11:45:26 +00:00
tianhanhu 9760c8ab60 [SPARK-36919][SQL] Make BadRecordException fields transient
### What changes were proposed in this pull request?
Migrating a Spark application from 2.4.x to 3.1.x and finding a difference in the exception chaining behavior. In a case of parsing a malformed CSV, where the root cause exception should be Caused by: java.lang.RuntimeException: Malformed CSV record, only the top level exception is kept, and all lower level exceptions and root cause are lost. Thus, when we call ExceptionUtils.getRootCause on the exception, we still get itself.
The reason for the difference is that RuntimeException is wrapped in BadRecordException, which has unserializable fields. When we try to serialize the exception from tasks and deserialize from scheduler, the exception is lost.
This PR makes unserializable fields of BadRecordException transient, so the rest of the exception could be serialized and deserialized properly.

### Why are the changes needed?
Make BadRecordException serializable

### Does this PR introduce _any_ user-facing change?
User could get root cause of BadRecordException

### How was this patch tested?
Unit testing

Closes #34167 from tianhanhu/master.

Authored-by: tianhanhu <adrianhu96@gmail.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(cherry picked from commit aed977c468)
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
2021-10-06 19:06:22 +09:00
Wenchen Fan 688808900d [SPARK-36926][3.2][SQL] Decimal average mistakenly overflow
backport https://github.com/apache/spark/pull/34180

### What changes were proposed in this pull request?

This bug was introduced by https://github.com/apache/spark/pull/33177

When checking overflow of the sum value in the average function, we should use the `sumDataType` instead of the input decimal type.

### Why are the changes needed?

fix a regression

### Does this PR introduce _any_ user-facing change?

Yes, the result was wrong before this PR.

### How was this patch tested?

a new test

Closes #34193 from cloud-fan/bug.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Gengliang Wang <gengliang@apache.org>
2021-10-06 13:31:03 +08:00
Kousuke Saruta 8ffe00e745 [SPARK-36874][SQL] DeduplicateRelations should copy dataset_id tag to avoid ambiguous self join
### What changes were proposed in this pull request?

This PR fixes an issue that ambiguous self join can't be detected if the left and right DataFrame are swapped.
This is an example.
```
val df1 = Seq((1, 2, "A1"),(2, 1, "A2")).toDF("key1", "key2", "value")
val df2 = df1.filter($"value" === "A2")

df1.join(df2, df1("key1") === df2("key2")) // Ambiguous self join is detected and AnalysisException is thrown.

df2.join(df1, df1("key1") === df2("key2)) // Ambiguous self join is not detected.
```

The root cause seems that an inner function `collectConflictPlans` in `DeduplicateRelations.` doesn't copy the `dataset_id` tag when it copies a `LogicalPlan`.

### Why are the changes needed?

Bug fix.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

New tests.

Closes #34172 from sarutak/fix-deduplication-issue.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit fa1805db48)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2021-10-05 11:17:12 +08:00
Gengliang Wang 4bd358474b Preparing development version 3.2.1-SNAPSHOT 2021-09-28 10:53:42 +00:00
Gengliang Wang dde73e2e1c Preparing Spark release v3.2.0-rc6 2021-09-28 10:53:35 +00:00
Richard Chen 493aad03ab [SPARK-36836][SQL] Fix incorrect result in sha2 expression
### What changes were proposed in this pull request?

`sha2(input, bit_length)` returns incorrect results when `bit_length == 224` for all inputs.
This error can be reproduced by running `spark.sql("SELECT sha2('abc', 224)").show()`, for instance, in spark-shell.

Spark currently returns
```
#\t}"4�"�B�w��U�*��你���l��
```
while the expected result is
```
23097d223405d8228642a477bda255b32aadbce4bda0b3f7e36c9da7
```

This appears to happen because the `MessageDigest.digest()` function appears to return bytes intended to be interpreted as a `BigInt` rather than a string. Thus, the output of `MessageDigest.digest()` must first be interpreted as a `BigInt` and then transformed into a hex string rather than directly being interpreted as a hex string.

### Why are the changes needed?

`sha2(input, bit_length)` with a `bit_length` input of `224` would previously return the incorrect result.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Added new test to `HashExpressionsSuite.scala` which previously failed and now pass

Closes #34086 from richardc-db/sha224.

Authored-by: Richard Chen <r.chen@databricks.com>
Signed-off-by: Gengliang Wang <gengliang@apache.org>
(cherry picked from commit 6c6291b3f6)
Signed-off-by: Gengliang Wang <gengliang@apache.org>
2021-09-28 18:38:33 +08:00
Yuming Wang 8f0c846b1d Revert "[SPARK-32855][SQL] Improve the cost model in pruningHasBenefit for filtering side can not build broadcast by join type"
### What changes were proposed in this pull request?

This reverts commit aaa0d2a66b.

### Why are the changes needed?

This approach has 2 disadvantages:
1. It needs to disable `spark.sql.optimizer.dynamicPartitionPruning.reuseBroadcastOnly`.
2. The filtering side will be evaluated 2 times. For example: https://github.com/apache/spark/pull/29726#issuecomment-780266596

Instead, we can use bloom filter join pruning in the future.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Existing tests.

Closes #34116 from wangyum/revert-SPARK-32855.

Authored-by: Yuming Wang <yumwangebay.com>
Signed-off-by: Yuming Wang <yumwangebay.com>

(cherry picked from commit e024bdc306)

Closes #34124 from wangyum/revert-SPARK-32855-branch-3.2.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2021-09-28 15:38:35 +08:00
Gengliang Wang 0c57bb8f7f Preparing development version 3.2.1-SNAPSHOT 2021-09-27 08:24:50 +00:00
Gengliang Wang 49aea14c5a Preparing Spark release v3.2.0-rc5 2021-09-27 08:24:44 +00:00
Gengliang Wang 2348cce37e Preparing development version 3.2.1-SNAPSHOT 2021-09-26 12:28:46 +00:00
Gengliang Wang 2ed8c08c5b Preparing Spark release v3.2.0-rc5 2021-09-26 12:28:40 +00:00
PengLei eb794a4f58 [SPARK-36851][SQL] Incorrect parsing of negative ANSI typed interval literals
### What changes were proposed in this pull request?
Handle incorrect parsing of negative ANSI typed interval literals
[SPARK-36851](https://issues.apache.org/jira/browse/SPARK-36851)

### Why are the changes needed?
Incorrect result:
```
spark-sql> select interval -'1' year;
1-0
```

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Add ut testcase

Closes #34107 from Peng-Lei/SPARK-36851.

Authored-by: PengLei <peng.8lei@gmail.com>
Signed-off-by: Gengliang Wang <gengliang@apache.org>
(cherry picked from commit 0fdca1f0df)
Signed-off-by: Gengliang Wang <gengliang@apache.org>
2021-09-26 18:43:38 +08:00
Gengliang Wang da722d43cb Preparing development version 3.2.1-SNAPSHOT 2021-09-24 10:03:23 +00:00
Gengliang Wang 9e35703211 Preparing Spark release v3.2.0-rc5 2021-09-24 10:03:16 +00:00
Angerszhuuuu b7174188e5 [SPARK-36792][SQL] InSet should handle NaN
### What changes were proposed in this pull request?
InSet should handle NaN
```
InSet(Literal(Double.NaN), Set(Double.NaN, 1d)) should return true, but return false.
```
### Why are the changes needed?
InSet should handle NaN

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Added UT

Closes #34033 from AngersZhuuuu/SPARK-36792.

Authored-by: Angerszhuuuu <angers.zhu@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 64f4bf47af)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2021-09-24 16:19:47 +08:00
allisonwang-db d0c97d6ed9 [SPARK-36747][SQL][3.2] Do not collapse Project with Aggregate when correlated subqueries are present in the project list
### What changes were proposed in this pull request?

This PR adds a check in the optimizer rule `CollapseProject` to avoid combining Project with Aggregate when the project list contains one or more correlated scalar subqueries that reference the output of the aggregate. Combining Project with Aggregate can lead to an invalid plan after correlated subquery rewrite. This is because correlated scalar subqueries' references are used as join conditions, which cannot host aggregate expressions.

For example
```sql
select (select sum(c2) from t where c1 = cast(s as int)) from (select sum(c2) s from t)
```

```
== Optimized Logical Plan ==
Aggregate [sum(c2)#10L AS scalarsubquery(s)#11L] <--- Aggregate has neither grouping nor aggregate expressions.
+- Project [sum(c2)#10L]
   +- Join LeftOuter, (c1#2 = cast(sum(c2#3) as int))  <--- Aggregate expression in join condition
      :- LocalRelation [c2#3]
      +- Aggregate [c1#2], [sum(c2#3) AS sum(c2)#10L, c1#2]
         +- LocalRelation [c1#2, c2#3]

java.lang.UnsupportedOperationException: Cannot generate code for expression: sum(input[0, int, false])
```
Currently, we only allow a correlated scalar subquery in Aggregate if it is also in the grouping expressions.
079a9c5292/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala (L661-L666)

### Why are the changes needed?

To fix an existing optimizer issue.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Unit test.

Authored-by: allisonwang-db <allison.wangdatabricks.com>
Signed-off-by: Wenchen Fan <wenchendatabricks.com>
(cherry picked from commit 4a8dc5f7a3)
Signed-off-by: allisonwang-db <allison.wangdatabricks.com>

Closes #34081 from allisonwang-db/cp-spark-36747.

Authored-by: allisonwang-db <allison.wang@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2021-09-24 16:14:49 +08:00
Gengliang Wang 0fb7127f85 Preparing development version 3.2.1-SNAPSHOT 2021-09-23 08:46:28 +00:00
Gengliang Wang b609f2fe0c Preparing Spark release v3.2.0-rc4 2021-09-23 08:46:22 +00:00
Michael Chen 89894a4b1d [SPARK-36795][SQL] Explain Formatted has Duplicate Node IDs
Fixed explain formatted mode so it doesn't have duplicate node IDs when InMemoryRelation is present in query plan.

Having duplicated node IDs in the plan makes it confusing.

Yes, explain formatted string will change.
Notice how `ColumnarToRow` and `InMemoryRelation` have node id of 2.
Before changes =>
```
== Physical Plan ==
AdaptiveSparkPlan (14)
+- == Final Plan ==
   * BroadcastHashJoin Inner BuildLeft (9)
   :- BroadcastQueryStage (5)
   :  +- BroadcastExchange (4)
   :     +- * Filter (3)
   :        +- * ColumnarToRow (2)
   :           +- InMemoryTableScan (1)
   :                 +- InMemoryRelation (2)
   :                       +- * ColumnarToRow (4)
   :                          +- Scan parquet default.t1 (3)
   +- * Filter (8)
      +- * ColumnarToRow (7)
         +- Scan parquet default.t2 (6)
+- == Initial Plan ==
   BroadcastHashJoin Inner BuildLeft (13)
   :- BroadcastExchange (11)
   :  +- Filter (10)
   :     +- InMemoryTableScan (1)
   :           +- InMemoryRelation (2)
   :                 +- * ColumnarToRow (4)
   :                    +- Scan parquet default.t1 (3)
   +- Filter (12)
      +- Scan parquet default.t2 (6)

(1) InMemoryTableScan
Output [1]: [k#x]
Arguments: [k#x], [isnotnull(k#x)]

(2) InMemoryRelation
Arguments: [k#x], CachedRDDBuilder(org.apache.spark.sql.execution.columnar.DefaultCachedBatchSerializer401788d5,StorageLevel(disk, memory, deserialized, 1 replicas),*(1) ColumnarToRow
+- FileScan parquet default.t1[k#x] Batched: true, DataFilters: [], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/Users/mike.chen/code/apacheSpark/spark/spark-warehouse/org.apach..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<k:int>
,None)

(3) Scan parquet default.t1
Output [1]: [k#x]
Batched: true
Location: InMemoryFileIndex [file:/Users/mike.chen/code/apacheSpark/spark/spark-warehouse/org.apache.spark.sql.ExplainSuiteAE/t1]
ReadSchema: struct<k:int>

(4) ColumnarToRow [codegen id : 1]
Input [1]: [k#x]

(5) BroadcastQueryStage
Output [1]: [k#x]
Arguments: 0

(6) Scan parquet default.t2
Output [1]: [key#x]
Batched: true
Location: InMemoryFileIndex [file:/Users/mike.chen/code/apacheSpark/spark/spark-warehouse/org.apache.spark.sql.ExplainSuiteAE/t2]
PushedFilters: [IsNotNull(key)]
ReadSchema: struct<key:int>

(7) ColumnarToRow
Input [1]: [key#x]

(8) Filter
Input [1]: [key#x]
Condition : isnotnull(key#x)

(9) BroadcastHashJoin [codegen id : 2]
Left keys [1]: [k#x]
Right keys [1]: [key#x]
Join condition: None

(10) Filter
Input [1]: [k#x]
Condition : isnotnull(k#x)

(11) BroadcastExchange
Input [1]: [k#x]
Arguments: HashedRelationBroadcastMode(List(cast(input[0, int, false] as bigint)),false), [id=#x]

(12) Filter
Input [1]: [key#x]
Condition : isnotnull(key#x)

(13) BroadcastHashJoin
Left keys [1]: [k#x]
Right keys [1]: [key#x]
Join condition: None

(14) AdaptiveSparkPlan
Output [2]: [k#x, key#x]
Arguments: isFinalPlan=true
```

After Changes =>
```
== Physical Plan ==
AdaptiveSparkPlan (17)
+- == Final Plan ==
   * BroadcastHashJoin Inner BuildLeft (12)
   :- BroadcastQueryStage (8)
   :  +- BroadcastExchange (7)
   :     +- * Filter (6)
   :        +- * ColumnarToRow (5)
   :           +- InMemoryTableScan (1)
   :                 +- InMemoryRelation (2)
   :                       +- * ColumnarToRow (4)
   :                          +- Scan parquet default.t1 (3)
   +- * Filter (11)
      +- * ColumnarToRow (10)
         +- Scan parquet default.t2 (9)
+- == Initial Plan ==
   BroadcastHashJoin Inner BuildLeft (16)
   :- BroadcastExchange (14)
   :  +- Filter (13)
   :     +- InMemoryTableScan (1)
   :           +- InMemoryRelation (2)
   :                 +- * ColumnarToRow (4)
   :                    +- Scan parquet default.t1 (3)
   +- Filter (15)
      +- Scan parquet default.t2 (9)

(1) InMemoryTableScan
Output [1]: [k#x]
Arguments: [k#x], [isnotnull(k#x)]

(2) InMemoryRelation
Arguments: [k#x], CachedRDDBuilder(org.apache.spark.sql.execution.columnar.DefaultCachedBatchSerializer3ccb12d,StorageLevel(disk, memory, deserialized, 1 replicas),*(1) ColumnarToRow
+- FileScan parquet default.t1[k#x] Batched: true, DataFilters: [], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/Users/mike.chen/code/apacheSpark/spark/spark-warehouse/org.apach..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<k:int>
,None)

(3) Scan parquet default.t1
Output [1]: [k#x]
Batched: true
Location: InMemoryFileIndex [file:/Users/mike.chen/code/apacheSpark/spark/spark-warehouse/org.apache.spark.sql.ExplainSuiteAE/t1]
ReadSchema: struct<k:int>

(4) ColumnarToRow [codegen id : 1]
Input [1]: [k#x]

(5) ColumnarToRow [codegen id : 1]
Input [1]: [k#x]

(6) Filter [codegen id : 1]
Input [1]: [k#x]
Condition : isnotnull(k#x)

(7) BroadcastExchange
Input [1]: [k#x]
Arguments: HashedRelationBroadcastMode(List(cast(input[0, int, false] as bigint)),false), [id=#x]

(8) BroadcastQueryStage
Output [1]: [k#x]
Arguments: 0

(9) Scan parquet default.t2
Output [1]: [key#x]
Batched: true
Location: InMemoryFileIndex [file:/Users/mike.chen/code/apacheSpark/spark/spark-warehouse/org.apache.spark.sql.ExplainSuiteAE/t2]
PushedFilters: [IsNotNull(key)]
ReadSchema: struct<key:int>

(10) ColumnarToRow
Input [1]: [key#x]

(11) Filter
Input [1]: [key#x]
Condition : isnotnull(key#x)

(12) BroadcastHashJoin [codegen id : 2]
Left keys [1]: [k#x]
Right keys [1]: [key#x]
Join condition: None

(13) Filter
Input [1]: [k#x]
Condition : isnotnull(k#x)

(14) BroadcastExchange
Input [1]: [k#x]
Arguments: HashedRelationBroadcastMode(List(cast(input[0, int, false] as bigint)),false), [id=#x]

(15) Filter
Input [1]: [key#x]
Condition : isnotnull(key#x)

(16) BroadcastHashJoin
Left keys [1]: [k#x]
Right keys [1]: [key#x]
Join condition: None

(17) AdaptiveSparkPlan
Output [2]: [k#x, key#x]
Arguments: isFinalPlan=true
```

add test

Closes #34036 from ChenMichael/SPARK-36795-Duplicate-node-id-with-inMemoryRelation.

Authored-by: Michael Chen <mike.chen@workday.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(cherry picked from commit 6d7ab7b52b)
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
2021-09-23 15:55:15 +09:00
Hyukjin Kwon af569d1b0a [MINOR][SQL][DOCS] Correct the 'options' description on UnresolvedRelation
### What changes were proposed in this pull request?

This PR fixes the 'options' description on `UnresolvedRelation`. This comment was added in https://github.com/apache/spark/pull/29535 but not valid anymore because V1 also uses this `options` (and merge the options with the table properties) per https://github.com/apache/spark/pull/29712.

This PR can go through from `master` to `branch-3.1`.

### Why are the changes needed?

To make `UnresolvedRelation.options`'s description clearer.

### Does this PR introduce _any_ user-facing change?

No, dev-only.

### How was this patch tested?

Scala linter by `dev/linter-scala`.

Closes #34075 from HyukjinKwon/minor-comment-unresolved-releation.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Huaxin Gao <huaxin_gao@apple.com>
(cherry picked from commit 0076eba8d0)
Signed-off-by: Huaxin Gao <huaxin_gao@apple.com>
2021-09-22 23:00:35 -07:00
Angerszhuuuu 2ff038a7b3 [SPARK-36753][SQL] ArrayExcept handle duplicated Double.NaN and Float.NaN
### What changes were proposed in this pull request?
For query
```
select array_except(array(cast('nan' as double), 1d), array(cast('nan' as double)))
```
This returns [NaN, 1d], but it should return [1d].
This issue is caused by `OpenHashSet` can't handle `Double.NaN` and `Float.NaN` too.
In this pr fix this based on https://github.com/apache/spark/pull/33955

### Why are the changes needed?
Fix bug

### Does this PR introduce _any_ user-facing change?
ArrayExcept won't show handle equal `NaN` value

### How was this patch tested?
Added UT

Closes #33994 from AngersZhuuuu/SPARK-36753.

Authored-by: Angerszhuuuu <angers.zhu@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit a7cbe69986)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2021-09-22 23:51:58 +08:00
Ivan Sadikov fc0b85fb26 [SPARK-36803][SQL] Fix ArrayType conversion when reading Parquet files written in legacy mode
### What changes were proposed in this pull request?

This PR fixes an issue when reading of a Parquet file written with legacy mode would fail due to incorrect Parquet LIST to ArrayType conversion.

The issue arises when using schema evolution and utilising the parquet-mr reader. 2-level LIST annotated types could be parsed incorrectly as 3-level LIST annotated types because their underlying element type does not match the full inferred Catalyst schema.

### Why are the changes needed?

It appears to be a long-standing issue with the legacy mode due to the imprecise check in ParquetRowConverter that was trying to determine Parquet backward compatibility using Catalyst schema: `DataType.equalsIgnoreCompatibleNullability(guessedElementType, elementType)` in https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala#L606.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?

Added a new test case in ParquetInteroperabilitySuite.scala.

Closes #34044 from sadikovi/parquet-legacy-write-mode-list-issue.

Authored-by: Ivan Sadikov <ivan.sadikov@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit ec26d94eac)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2021-09-22 17:40:55 +08:00
Chao Sun a28d8d9b0e [SPARK-36820][3.2][SQL] Disable tests related to LZ4 for Hadoop 2.7 profile
### What changes were proposed in this pull request?

Disable tests related to LZ4 in `FileSourceCodecSuite` and `FileSuite` when using `hadoop-2.7` profile.
### Why are the changes needed?

At the moment, parquet-mr uses LZ4 compression codec provided by Hadoop, and only since HADOOP-17292 (in 3.3.1/3.4.0) the latter added `lz4-java` to remove the restriction that the codec can only be run with native library. As consequence, the test will fail when using `hadoop-2.7` profile.

### Does this PR introduce _any_ user-facing change?

No, it's just test.

### How was this patch tested?

Existing test

Closes #34066 from sunchao/SpARK-36820-3.2.

Authored-by: Chao Sun <sunchao@apple.com>
Signed-off-by: Liang-Chi Hsieh <viirya@gmail.com>
2021-09-22 00:14:45 -07:00
Gengliang Wang affd7a4d47 [SPARK-36670][FOLLOWUP][TEST] Remove brotli-codec dependency
### What changes were proposed in this pull request?

Remove `com.github.rdblue:brotli-codec:0.1.1` dependency.

### Why are the changes needed?

As Stephen Coy pointed out in the dev list, we should not have `com.github.rdblue:brotli-codec:0.1.1` dependency which is not available on Maven Central. This is to avoid possible artifact changes on `Jitpack.io`.
Also, the dependency is for tests only. I suggest that we remove it now to unblock the 3.2.0 release ASAP.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

GA tests.

Closes #34059 from gengliangwang/removeDeps.

Authored-by: Gengliang Wang <gengliang@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit ba5708d944)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2021-09-21 10:57:34 -07:00
Max Gekk 7fa88b28a5 [SPARK-36807][SQL] Merge ANSI interval types to a tightest common type
### What changes were proposed in this pull request?
In the PR, I propose to modify `StructType` to support merging of ANSI interval types with different fields.

### Why are the changes needed?
This will allow merging of schemas from different datasource files.

### Does this PR introduce _any_ user-facing change?
No, the ANSI interval types haven't released yet.

### How was this patch tested?
Added new test to `StructTypeSuite`.

Closes #34049 from MaxGekk/merge-ansi-interval-types.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
(cherry picked from commit d2340f8e1c)
Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-09-21 10:20:27 +03:00
Angerszhuuuu 337a1979d2 [SPARK-36754][SQL] ArrayIntersect handle duplicated Double.NaN and Float.NaN
### What changes were proposed in this pull request?
For query
```
select array_intersect(array(cast('nan' as double), 1d), array(cast('nan' as double)))
```
This returns [NaN], but it should return [].
This issue is caused by `OpenHashSet` can't handle `Double.NaN` and `Float.NaN` too.
In this pr fix this based on https://github.com/apache/spark/pull/33955

### Why are the changes needed?
Fix bug

### Does this PR introduce _any_ user-facing change?
ArrayIntersect won't show equal `NaN` value

### How was this patch tested?
Added UT

Closes #33995 from AngersZhuuuu/SPARK-36754.

Authored-by: Angerszhuuuu <angers.zhu@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 2fc7f2f702)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2021-09-20 16:51:31 +08:00
Gengliang Wang b0249851f6 Preparing development version 3.2.1-SNAPSHOT 2021-09-18 11:30:12 +00:00
Gengliang Wang 96044e9735 Preparing Spark release v3.2.0-rc3 2021-09-18 11:30:06 +00:00
Liang-Chi Hsieh 275ad6bd0b [SPARK-36673][SQL][FOLLOWUP] Remove duplicate test in DataFrameSetOperationsSuite
### What changes were proposed in this pull request?

As a followup of #34025 to remove duplicate test.

### Why are the changes needed?

To remove duplicate test.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Existing test.

Closes #34032 from viirya/remove.

Authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Signed-off-by: Liang-Chi Hsieh <viirya@gmail.com>
(cherry picked from commit f9644cc253)
Signed-off-by: Liang-Chi Hsieh <viirya@gmail.com>
2021-09-17 11:52:26 -07:00
Angerszhuuuu 61d7f1da1b [SPARK-36767][SQL] ArrayMin/ArrayMax/SortArray/ArraySort add comment and Unit test
### What changes were proposed in this pull request?
Add comment about how ArrayMin/ArrayMax/SortArray/ArraySort handle NaN and add Unit test for this

### Why are the changes needed?
Add Unit test

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Added UT

Closes #34008 from AngersZhuuuu/SPARK-36740.

Authored-by: Angerszhuuuu <angers.zhu@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 69e006dd53)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2021-09-17 21:42:21 +08:00
Liang-Chi Hsieh 895218996a [SPARK-36673][SQL] Fix incorrect schema of nested types of union
### What changes were proposed in this pull request?

This patch proposes to fix incorrect schema of `union`.

### Why are the changes needed?

The current `union` result of nested struct columns is incorrect. By definition of `union` API, it should resolve columns by position, not by name. Right now when determining the `output` (aka. the schema) of union plan, we use `merge` API which actually merges two structs (simply think it as concatenate fields from two structs if not overlapping). The merging behavior doesn't match the `union` definition.

So currently we get incorrect schema but the query result is correct. We should fix the incorrect schema.

### Does this PR introduce _any_ user-facing change?

Yes, fixing a bug of incorrect schema.

### How was this patch tested?

Added unit test.

Closes #34025 from viirya/SPARK-36673.

Authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit cdd7ae937d)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2021-09-17 21:37:40 +08:00
Angerszhuuuu a78c4c44ed [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan
### What changes were proposed in this pull request?
For query
```
select array_distinct(array(cast('nan' as double), cast('nan' as double)))
```
This returns [NaN, NaN], but it should return [NaN].
This issue is caused by `OpenHashSet` can't handle `Double.NaN` and `Float.NaN` too.
In this pr fix this based on https://github.com/apache/spark/pull/33955

### Why are the changes needed?
Fix bug

### Does this PR introduce _any_ user-facing change?
ArrayDistinct won't show duplicated `NaN` value

### How was this patch tested?
Added UT

Closes #33993 from AngersZhuuuu/SPARK-36741.

Authored-by: Angerszhuuuu <angers.zhu@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit e356f6aa11)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2021-09-17 20:48:39 +08:00
Wenchen Fan 16215755b7 [SPARK-36789][SQL] Use the correct constant type as the null value holder in array functions
### What changes were proposed in this pull request?

In array functions, we use constant 0 as the placeholder when adding a null value to an array buffer. This PR makes sure the constant 0 matches the type of the array element.

### Why are the changes needed?

Fix a potential bug. Somehow we can hit this bug sometimes after https://github.com/apache/spark/pull/33955 .

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

existing tests

Closes #34029 from cloud-fan/minor.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(cherry picked from commit 4145498826)
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
2021-09-17 16:50:01 +09:00
Hyukjin Kwon 7d7c9915bb [SPARK-36788][SQL] Change log level of AQE for non-supported plans from warning to debug
### What changes were proposed in this pull request?

This PR suppresses the warnings for plans where AQE is not supported. Currently we show the warnings such as:

```
org.apache.spark.sql.execution.adaptive.InsertAdaptiveSparkPlan: spark.sql.adaptive.enabled is enabled but is not supported for query: Sort [a#324881 DESC NULLS FIRST], true, 23
+- Scan ExistingRDD[a#324881]
```

for every plan that AQE is not supported.

### Why are the changes needed?

It's too noisy now. Below is the example of `SortSuite` run:

```
14:51:40.675 WARN org.apache.spark.sql.execution.adaptive.InsertAdaptiveSparkPlan: spark.sql.adaptive.enabled is enabled but is not supported for query: Sort [a#324881 DESC NULLS FIRST], true, 23
+- Scan ExistingRDD[a#324881]
.
[info] - sorting on DayTimeIntervalType(0,1) with nullable=true, sortOrder=List('a DESC NULLS FIRST) (785 milliseconds)
14:51:41.416 WARN org.apache.spark.sql.execution.adaptive.InsertAdaptiveSparkPlan: spark.sql.adaptive.enabled is enabled but is not supported for query: ReferenceSort [a#324884 ASC NULLS FIRST], true
+- Scan ExistingRDD[a#324884]
.
14:51:41.467 WARN org.apache.spark.sql.execution.adaptive.InsertAdaptiveSparkPlan: spark.sql.adaptive.enabled is enabled but is not supported for query: Sort [a#324884 ASC NULLS FIRST], true, 23
+- Scan ExistingRDD[a#324884]
.
[info] - sorting on DayTimeIntervalType(0,1) with nullable=false, sortOrder=List('a ASC NULLS FIRST) (796 milliseconds)
14:51:42.210 WARN org.apache.spark.sql.execution.adaptive.InsertAdaptiveSparkPlan: spark.sql.adaptive.enabled is enabled but is not supported for query: ReferenceSort [a#324887 ASC NULLS LAST], true
+- Scan ExistingRDD[a#324887]
.
14:51:42.259 WARN org.apache.spark.sql.execution.adaptive.InsertAdaptiveSparkPlan: spark.sql.adaptive.enabled is enabled but is not supported for query: Sort [a#324887 ASC NULLS LAST], true, 23
+- Scan ExistingRDD[a#324887]
.
[info] - sorting on DayTimeIntervalType(0,1) with nullable=false, sortOrder=List('a ASC NULLS LAST) (797 milliseconds)
14:51:43.009 WARN org.apache.spark.sql.execution.adaptive.InsertAdaptiveSparkPlan: spark.sql.adaptive.enabled is enabled but is not supported for query: ReferenceSort [a#324890 DESC NULLS LAST], true
+- Scan ExistingRDD[a#324890]
.
14:51:43.061 WARN org.apache.spark.sql.execution.adaptive.InsertAdaptiveSparkPlan: spark.sql.adaptive.enabled is enabled but is not supported for query: Sort [a#324890 DESC NULLS LAST], true, 23
+- Scan ExistingRDD[a#324890]
.
[info] - sorting on DayTimeIntervalType(0,1) with nullable=false, sortOrder=List('a DESC NULLS LAST) (848 milliseconds)
14:51:43.857 WARN org.apache.spark.sql.execution.adaptive.InsertAdaptiveSparkPlan: spark.sql.adaptive.enabled is enabled but is not supported for query: ReferenceSort [a#324893 DESC NULLS FIRST], true
+- Scan ExistingRDD[a#324893]
.
14:51:43.903 WARN org.apache.spark.sql.execution.adaptive.InsertAdaptiveSparkPlan: spark.sql.adaptive.enabled is enabled but is not supported for query: Sort [a#324893 DESC NULLS FIRST], true, 23
+- Scan ExistingRDD[a#324893]
.
[info] - sorting on DayTimeIntervalType(0,1) with nullable=false, sortOrder=List('a DESC NULLS FIRST) (827 milliseconds)
14:51:44.682 WARN org.apache.spark.sql.execution.adaptive.InsertAdaptiveSparkPlan: spark.sql.adaptive.enabled is enabled but is not supported for query: ReferenceSort [a#324896 ASC NULLS FIRST], true
+- Scan ExistingRDD[a#324896]
.
14:51:44.748 WARN org.apache.spark.sql.execution.adaptive.InsertAdaptiveSparkPlan: spark.sql.adaptive.enabled is enabled but is not supported for query: Sort [a#324896 ASC NULLS FIRST], true, 23
+- Scan ExistingRDD[a#324896]
.
[info] - sorting on YearMonthIntervalType(0,1) with nullable=true, sortOrder=List('a ASC NULLS FIRST) (565 milliseconds)
14:51:45.248 WARN org.apache.spark.sql.execution.adaptive.InsertAdaptiveSparkPlan: spark.sql.adaptive.enabled is enabled but is not supported for query: ReferenceSort [a#324899 ASC NULLS LAST], true
+- Scan ExistingRDD[a#324899]
.
14:51:45.312 WARN org.apache.spark.sql.execution.adaptive.InsertAdaptiveSparkPlan: spark.sql.adaptive.enabled is enabled but is not supported for query: Sort [a#324899 ASC NULLS LAST], true, 23
+- Scan ExistingRDD[a#324899]
.
[info] - sorting on YearMonthIntervalType(0,1) with nullable=true, sortOrder=List('a ASC NULLS LAST) (591 milliseconds)
14:51:45.841 WARN org.apache.spark.sql.execution.adaptive.InsertAdaptiveSparkPlan: spark.sql.adaptive.enabled is enabled but is not supported for query: ReferenceSort [a#324902 DESC NULLS LAST], true
+- Scan ExistingRDD[a#324902]
.
14:51:45.905 WARN org.apache.spark.sql.execution.adaptive.InsertAdaptiveSparkPlan: spark.sql.adaptive.enabled is enabled but is not supported for query: Sort [a#324902 DESC NULLS LAST], true, 23
+- Scan ExistingRDD[a#324902]
.
```

### Does this PR introduce _any_ user-facing change?

Yes, it will show less warnings to users. Note that AQE is enabled by default from Spark 3.2, see SPARK-33679

### How was this patch tested?

Manually tested via unittests.

Closes #34026 from HyukjinKwon/minor-log-level.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(cherry picked from commit 917d7dad4d)
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
2021-09-17 12:01:53 +09:00
Wenchen Fan c1bfe1a5c4 [SPARK-36783][SQL] ScanOperation should not push Filter through nondeterministic Project
### What changes were proposed in this pull request?

`ScanOperation` collects adjacent Projects and Filters. The caller side always assume that the collected Filters should run before collected Projects, which means `ScanOperation` effectively pushes Filter through Project.

Following `PushPredicateThroughNonJoin`, we should not push Filter through nondeterministic Project. This PR fixes `ScanOperation` to follow this rule.

### Why are the changes needed?

Fix a bug that violates the semantic of nondeterministic expressions.

### Does this PR introduce _any_ user-facing change?

Most likely no change, but in some cases, this is a correctness bug fix which changes the query result.

### How was this patch tested?

existing tests

Closes #34023 from cloud-fan/scan.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit dfd5237c0c)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2021-09-17 10:51:29 +08:00
Josh Rosen 3502fda783 [SPARK-36774][CORE][TESTS] Move SparkSubmitTestUtils to core module and use it in SparkSubmitSuite
### What changes were proposed in this pull request?

This PR refactors test code in order to improve the debugability of `SparkSubmitSuite`.

The `sql/hive` module contains a `SparkSubmitTestUtils` helper class which launches `spark-submit` and captures its output in order to display better error messages when tests fail. This helper is currently used by `HiveSparkSubmitSuite` and `HiveExternalCatalogVersionsSuite`, but isn't used by `SparkSubmitSuite`.

In this PR, I moved `SparkSubmitTestUtils` and `ProcessTestUtils` into the `core` module and updated `SparkSubmitSuite`, `BufferHolderSparkSubmitSuite`, and `WholestageCodegenSparkSubmitSuite` to use the relocated helper classes. This required me to change `SparkSubmitTestUtils` to make its timeouts configurable and to generalize its method for locating the `spark-submit` binary.

### Why are the changes needed?

Previously, `SparkSubmitSuite` tests would fail with messages like:

```
[info] - launch simple application with spark-submit *** FAILED *** (1 second, 832 milliseconds)
[info]   Process returned with exit code 101. See the log4j logs for more detail. (SparkSubmitSuite.scala:1551)
[info]   org.scalatest.exceptions.TestFailedException:
[info]   at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)
```

which require the Spark developer to hunt in log4j logs in order to view the logs from the failed `spark-submit` command.

After this change, those tests will fail with detailed error messages that include the text of failed command plus timestamped logs captured from the failed proces:

```
[info] - launch simple application with spark-submit *** FAILED *** (2 seconds, 800 milliseconds)
[info]   spark-submit returned with exit code 101.
[info]   Command line: '/Users/joshrosen/oss-spark/bin/spark-submit' '--class' 'invalidClassName' '--name' 'testApp' '--master' 'local' '--conf' 'spark.ui.enabled=false' '--conf' 'spark.master.rest.enabled=false' 'file:/Users/joshrosen/oss-spark/target/tmp/spark-0a8a0c93-3aaf-435d-9cf3-b97abd318d91/testJar-1631768004882.jar'
[info]
[info]   2021-09-15 21:53:26.041 - stderr> SLF4J: Class path contains multiple SLF4J bindings.
[info]   2021-09-15 21:53:26.042 - stderr> SLF4J: Found binding in [jar:file:/Users/joshrosen/oss-spark/assembly/target/scala-2.12/jars/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
[info]   2021-09-15 21:53:26.042 - stderr> SLF4J: Found binding in [jar:file:/Users/joshrosen/.m2/repository/org/slf4j/slf4j-log4j12/1.7.30/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
[info]   2021-09-15 21:53:26.042 - stderr> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
[info]   2021-09-15 21:53:26.042 - stderr> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
[info]   2021-09-15 21:53:26.619 - stderr> Error: Failed to load class invalidClassName. (SparkSubmitTestUtils.scala:97)
[info]   org.scalatest.exceptions.TestFailedException:
[info]   at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

I manually ran the affected test suites.

Closes #34013 from JoshRosen/SPARK-36774-move-SparkSubmitTestUtils-to-core.

Authored-by: Josh Rosen <joshrosen@databricks.com>
Signed-off-by: Josh Rosen <joshrosen@databricks.com>
(cherry picked from commit 3ae6e6775b)
Signed-off-by: Josh Rosen <joshrosen@databricks.com>
2021-09-16 14:31:22 -07:00
Dongjoon Hyun 63b8417794 [SPARK-36732][SQL][BUILD] Upgrade ORC to 1.6.11
### What changes were proposed in this pull request?

This PR aims to upgrade Apache ORC to 1.6.11 to bring the latest bug fixes.

### Why are the changes needed?

Apache ORC 1.6.11 has the following fixes.
- https://issues.apache.org/jira/projects/ORC/versions/12350499

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

Closes #33971 from dongjoon-hyun/SPARK-36732.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit c217797297)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2021-09-15 23:36:36 -07:00
Chao Sun a7dc8242ea [SPARK-36726] Upgrade Parquet to 1.12.1
### What changes were proposed in this pull request?

Upgrade Apache Parquet to 1.12.1

### Why are the changes needed?

Parquet 1.12.1 contains the following bug fixes:
- PARQUET-2064: Make Range public accessible in RowRanges
- PARQUET-2022: ZstdDecompressorStream should close `zstdInputStream`
- PARQUET-2052: Integer overflow when writing huge binary using dictionary encoding
- PARQUET-1633: Fix integer overflow
- PARQUET-2054: fix TCP leaking when calling ParquetFileWriter.appendFile
- PARQUET-2072: Do Not Determine Both Min/Max for Binary Stats
- PARQUET-2073: Fix estimate remaining row count in ColumnWriteStoreBase
- PARQUET-2078: Failed to read parquet file after writing with the same

In particular PARQUET-2078 is a blocker for the upcoming Apache Spark 3.2.0 release.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Existing tests + a new test for the issue in SPARK-36696

Closes #33969 from sunchao/upgrade-parquet-12.1.

Authored-by: Chao Sun <sunchao@apple.com>
Signed-off-by: DB Tsai <d_tsai@apple.com>
(cherry picked from commit a927b0836b)
Signed-off-by: DB Tsai <d_tsai@apple.com>
2021-09-15 19:17:49 +00:00
Angerszhuuuu 75bffd972d [SPARK-36755][SQL] ArraysOverlap should handle duplicated Double.NaN and Float.NaN
### What changes were proposed in this pull request?
For query
```
select arrays_overlap(array(cast('nan' as double), 1d), array(cast('nan' as double)))
```
This returns [false], but it should return [true].
This issue is caused by `scala.mutable.HashSet` can't handle `Double.NaN` and `Float.NaN`.

### Why are the changes needed?
Fix bug

### Does this PR introduce _any_ user-facing change?
arrays_overlap won't handle equal `NaN` value

### How was this patch tested?
Added UT

Closes #34006 from AngersZhuuuu/SPARK-36755.

Authored-by: Angerszhuuuu <angers.zhu@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit b665782f0d)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2021-09-15 22:32:18 +08:00
Angerszhuuuu e64155691f [SPARK-36702][SQL][FOLLOWUP] ArrayUnion handle duplicated Double.NaN and Float.NaN
### What changes were proposed in this pull request?
According to https://github.com/apache/spark/pull/33955#discussion_r708570515 use normalized  NaN

### Why are the changes needed?
Use normalized NaN for duplicated NaN value

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Exiting UT

Closes #34003 from AngersZhuuuu/SPARK-36702-FOLLOWUP.

Authored-by: Angerszhuuuu <angers.zhu@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 638085953f)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2021-09-15 22:04:24 +08:00
Angerszhuuuu a472612eb8 [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
### What changes were proposed in this pull request?
For query
```
select array_union(array(cast('nan' as double), cast('nan' as double)), array())
```
This returns [NaN, NaN], but it should return [NaN].
This issue is caused by `OpenHashSet` can't handle `Double.NaN` and `Float.NaN` too.
In this pr we add a wrap for OpenHashSet that can handle `null`, `Double.NaN`, `Float.NaN` together

### Why are the changes needed?
Fix bug

### Does this PR introduce _any_ user-facing change?
ArrayUnion won't show duplicated `NaN` value

### How was this patch tested?
Added UT

Closes #33955 from AngersZhuuuu/SPARK-36702-WrapOpenHashSet.

Lead-authored-by: Angerszhuuuu <angers.zhu@gmail.com>
Co-authored-by: AngersZhuuuu <angers.zhu@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit f71f37755d)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2021-09-14 18:26:02 +08:00
Fu Chen 303590b3e9 [SPARK-36715][SQL] InferFiltersFromGenerate should not infer filter for udf
### What changes were proposed in this pull request?

Fix InferFiltersFromGenerate bug, InferFiltersFromGenerate should not infer filter for generate when the children contain an expression which is instance of `org.apache.spark.sql.catalyst.expressions.UserDefinedExpression`.
Before this pr, the following case will throw an exception.

```scala
spark.udf.register("vec", (i: Int) => (0 until i).toArray)
sql("select explode(vec(8)) as c1").show
```

```
Once strategy's idempotence is broken for batch Infer Filters
 GlobalLimit 21                                                        GlobalLimit 21
 +- LocalLimit 21                                                      +- LocalLimit 21
    +- Project [cast(c1#3 as string) AS c1#12]                            +- Project [cast(c1#3 as string) AS c1#12]
       +- Generate explode(vec(8)), false, [c1#3]                            +- Generate explode(vec(8)), false, [c1#3]
          +- Filter ((size(vec(8), true) > 0) AND isnotnull(vec(8)))            +- Filter ((size(vec(8), true) > 0) AND isnotnull(vec(8)))
!            +- OneRowRelation                                                     +- Filter ((size(vec(8), true) > 0) AND isnotnull(vec(8)))
!                                                                                     +- OneRowRelation

java.lang.RuntimeException:
Once strategy's idempotence is broken for batch Infer Filters
 GlobalLimit 21                                                        GlobalLimit 21
 +- LocalLimit 21                                                      +- LocalLimit 21
    +- Project [cast(c1#3 as string) AS c1#12]                            +- Project [cast(c1#3 as string) AS c1#12]
       +- Generate explode(vec(8)), false, [c1#3]                            +- Generate explode(vec(8)), false, [c1#3]
          +- Filter ((size(vec(8), true) > 0) AND isnotnull(vec(8)))            +- Filter ((size(vec(8), true) > 0) AND isnotnull(vec(8)))
!            +- OneRowRelation                                                     +- Filter ((size(vec(8), true) > 0) AND isnotnull(vec(8)))
!                                                                                     +- OneRowRelation

	at org.apache.spark.sql.errors.QueryExecutionErrors$.onceStrategyIdempotenceIsBrokenForBatchError(QueryExecutionErrors.scala:1200)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor.checkBatchIdempotence(RuleExecutor.scala:168)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:254)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1$adapted(RuleExecutor.scala:200)
	at scala.collection.immutable.List.foreach(List.scala:431)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:200)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$executeAndTrack$1(RuleExecutor.scala:179)
	at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:88)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor.executeAndTrack(RuleExecutor.scala:179)
	at org.apache.spark.sql.execution.QueryExecution.$anonfun$optimizedPlan$1(QueryExecution.scala:138)
	at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
	at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:196)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
	at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:196)
	at org.apache.spark.sql.execution.QueryExecution.optimizedPlan$lzycompute(QueryExecution.scala:134)
	at org.apache.spark.sql.execution.QueryExecution.optimizedPlan(QueryExecution.scala:130)
	at org.apache.spark.sql.execution.QueryExecution.assertOptimized(QueryExecution.scala:148)
	at org.apache.spark.sql.execution.QueryExecution.$anonfun$executedPlan$1(QueryExecution.scala:166)
	at org.apache.spark.sql.execution.QueryExecution.withCteMap(QueryExecution.scala:73)
	at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:163)
	at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:163)
	at org.apache.spark.sql.execution.QueryExecution.simpleString(QueryExecution.scala:214)
	at org.apache.spark.sql.execution.QueryExecution.org$apache$spark$sql$execution$QueryExecution$$explainString(QueryExecution.scala:259)
	at org.apache.spark.sql.execution.QueryExecution.explainString(QueryExecution.scala:228)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:98)
	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
	at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3731)
	at org.apache.spark.sql.Dataset.head(Dataset.scala:2755)
	at org.apache.spark.sql.Dataset.take(Dataset.scala:2962)
	at org.apache.spark.sql.Dataset.getRows(Dataset.scala:288)
	at org.apache.spark.sql.Dataset.showString(Dataset.scala:327)
	at org.apache.spark.sql.Dataset.show(Dataset.scala:807)
```

### Does this PR introduce _any_ user-facing change?

No, only bug fix.

### How was this patch tested?

Unit test.

Closes #33956 from cfmcgrady/SPARK-36715.

Authored-by: Fu Chen <cfmcgrady@gmail.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(cherry picked from commit 52c5ff20ca)
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
2021-09-14 09:26:21 +09:00
Lukas Rytz 2e7583799e [SPARK-36712][BUILD] Make scala-parallel-collections in 2.13 POM a direct dependency (not in maven profile)
As [reported on `devspark.apache.org`](https://lists.apache.org/thread.html/r84cff66217de438f1389899e6d6891b573780159cd45463acf3657aa%40%3Cdev.spark.apache.org%3E), the published POMs when building with Scala 2.13 have the `scala-parallel-collections` dependency only in the `scala-2.13` profile of the pom.

### What changes were proposed in this pull request?

This PR suggests to work around this by un-commenting the `scala-parallel-collections` dependency when switching to 2.13 using the the `change-scala-version.sh` script.

I included an upgrade to scala-parallel-collections version 1.0.3, the changes compared to 0.2.0 are minor.
  - removed OSGi metadata
  - renamed some internal inner classes
  - added `Automatic-Module-Name`

### Why are the changes needed?

According to the posts, this solves issues for developers that write unit tests for their applications.

Stephen Coy suggested to use the https://www.mojohaus.org/flatten-maven-plugin. While this sounds like a more principled solution, it is possibly too risky to do at this specific point in time?

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Locally

Closes #33948 from lrytz/parCollDep.

Authored-by: Lukas Rytz <lukas.rytz@gmail.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
(cherry picked from commit 1a62e6a2c1)
Signed-off-by: Sean Owen <srowen@gmail.com>
2021-09-13 11:06:58 -05:00
Yuto Akutsu b043ee4de7 [SPARK-36738][SQL][DOC] Fixed the wrong documentation on Cot API
### What changes were proposed in this pull request?

Fixed wrong documentation on Cot API

### Why are the changes needed?

[Doc](https://spark.apache.org/docs/latest/api/sql/index.html#cot) says `1/java.lang.Math.cot` but it should be `1/java.lang.Math.tan`.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manual check.

Closes #33978 from yutoacts/SPARK-36738.

Authored-by: Yuto Akutsu <yuto.akutsu@nttdata.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(cherry picked from commit 3747cfdb40)
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
2021-09-13 21:51:43 +09:00
Kousuke Saruta b8a23e9ccc [SPARK-36725][SQL][TESTS] Ensure HiveThriftServer2Suites to stop Thrift JDBC server on exit
### What changes were proposed in this pull request?

This PR aims to ensure that HiveThriftServer2Suites (e.g. `thriftserver.UISeleniumSuite`) stop Thrift JDBC server on exit using shutdown hook.

### Why are the changes needed?

Normally, HiveThriftServer2Suites stops Thrift JDBC server via `afterAll` method.
But, if they are killed by signal (e.g. Ctrl-C), Thrift JDBC server will be remain.
```
$ jps
2792969 SparkSubmit
```
### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Killed `thriftserver.UISeleniumSuite` by Ctrl-C and confirmed no Thrift JDBC server is remain by jps.

Closes #33967 from sarutak/stop-thrift-on-exit.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit c36d70836d)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2021-09-11 15:54:48 -07:00
Liang-Chi Hsieh b52fbeee2d [SPARK-36669][SQL] Add Lz4 wrappers for Hadoop Lz4 codec
### What changes were proposed in this pull request?

This patch proposes to add a few LZ4 wrapper classes for Parquet Lz4 compression output that uses Hadoop Lz4 codec.

### Why are the changes needed?

Currently we use Hadop 3.3.1's shaded client libraries. Lz4 is a provided dependency in Hadoop Common 3.3.1 for Lz4Codec. But it isn't excluded from relocation in these libraries. So to use lz4 as Parquet codec, we will hit the exception even we include lz4 as dependency.

```
[info]   Cause: java.lang.NoClassDefFoundError: org/apache/hadoop/shaded/net/jpountz/lz4/LZ4Factory
[info]   at org.apache.hadoop.io.compress.lz4.Lz4Compressor.<init>(Lz4Compressor.java:66)
[info]   at org.apache.hadoop.io.compress.Lz4Codec.createCompressor(Lz4Codec.java:119)
[info]   at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:152)
[info]   at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:168)
```

Before the issue is fixed at Hadoop new release, we can add a few wrapper classes for Lz4 codec.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Modified test.

Closes #33940 from viirya/lz4-wrappers.

Authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Signed-off-by: Liang-Chi Hsieh <viirya@gmail.com>
(cherry picked from commit 6bcf330191)
Signed-off-by: Liang-Chi Hsieh <viirya@gmail.com>
2021-09-09 09:31:11 -07:00
Andrew Liu 6cb23c163c [SPARK-36686][SQL] Fix SimplifyConditionalsInPredicate to be null-safe
### What changes were proposed in this pull request?

fix SimplifyConditionalsInPredicate to be null-safe

Reproducible:

```
import org.apache.spark.sql.types.{StructField, BooleanType, StructType}
import org.apache.spark.sql.Row

val schema = List(
  StructField("b", BooleanType, true)
)
val data = Seq(
  Row(true),
  Row(false),
  Row(null)
)
val df = spark.createDataFrame(
  spark.sparkContext.parallelize(data),
  StructType(schema)
)

// cartesian product of true / false / null
val df2 = df.select(col("b") as "cond").crossJoin(df.select(col("b") as "falseVal"))
df2.createOrReplaceTempView("df2")

spark.sql("SELECT * FROM df2 WHERE IF(cond, FALSE, falseVal)").show()
// actual:
// +-----+--------+
// | cond|falseVal|
// +-----+--------+
// |false|    true|
// +-----+--------+
spark.sql("SET spark.sql.optimizer.excludedRules=org.apache.spark.sql.catalyst.optimizer.SimplifyConditionalsInPredicate")
spark.sql("SELECT * FROM df2 WHERE IF(cond, FALSE, falseVal)").show()
// expected:
// +-----+--------+
// | cond|falseVal|
// +-----+--------+
// |false|    true|
// | null|    true|
// +-----+--------+
```

### Why are the changes needed?

is a regression that leads to incorrect results

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

existing tests

Closes #33928 from hypercubestart/fix-SimplifyConditionalsInPredicate.

Authored-by: Andrew Liu <andrewlliu@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 9b633f2075)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2021-09-09 11:32:59 +08:00
Huaxin Gao 7e8860751c [SPARK-34952][SQL][FOLLOWUP] Change column type to be NamedReference
### What changes were proposed in this pull request?
Currently, we have `FieldReference` for aggregate column type, should be `NamedReference` instead

### Why are the changes needed?
`FieldReference` is a private class, should use `NamedReference` instead

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
existing tests

Closes #33927 from huaxingao/agg_followup.

Authored-by: Huaxin Gao <huaxin_gao@apple.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 23794fb303)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2021-09-08 14:06:26 +08:00
yangjie01 c4332c7bf0 [SPARK-36684][SQL][TESTS] Add Jackson test dependencies to sql/core module at hadoop-2.7 profile
### What changes were proposed in this pull request?
SPARK-26346 upgrade Parquet related modules from 1.10.1 to 1.11.1 and `parquet-jackson 1.11.1` use `com.fasterxml.jackson` instead of `org.codehaus.jackson`.

So, there are warning logs related to

```
17:12:17.605 WARN org.apache.hadoop.fs.FileSystem: Cannot load filesystem
java.util.ServiceConfigurationError: org.apache.hadoop.fs.FileSystem: Provider org.apache.hadoop.hdfs.web.WebHdfsFileSystem could not be instantiated
...
Caused by: java.lang.ClassNotFoundException: org.codehaus.jackson.map.ObjectMapper
	at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
...
```

when test `sql/core` modules with `hadoop-2.7` profile.

This pr adds test dependencies related to `org.codehaus.jackson` in `sql/core` module when `hadoop-2.7` profile is activated.

### Why are the changes needed?
Clean up test warning logs that shouldn't exist.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?

- Pass GA or Jenkins Tests.
- Manual test `mvn clean test -pl sql/core -am -DwildcardSuites=none -Phadoop-2.7`

**Before**

No test failed, but warning logs as follows:

```
[INFO] Running test.org.apache.spark.sql.JavaBeanDeserializationSuite
22:42:45.211 WARN org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
22:42:46.827 WARN org.apache.hadoop.fs.FileSystem: Cannot load filesystem
java.util.ServiceConfigurationError: org.apache.hadoop.fs.FileSystem: Provider org.apache.hadoop.hdfs.web.WebHdfsFileSystem could not be instantiated
	at java.util.ServiceLoader.fail(ServiceLoader.java:232)
	at java.util.ServiceLoader.access$100(ServiceLoader.java:185)
	at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:384)
	at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
	at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
	at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:2631)
	at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2650)
	at org.apache.hadoop.fs.FsUrlStreamHandlerFactory.<init>(FsUrlStreamHandlerFactory.java:62)
	at org.apache.spark.sql.internal.SharedState$.liftedTree1$1(SharedState.scala:181)
	at org.apache.spark.sql.internal.SharedState$.org$apache$spark$sql$internal$SharedState$$setFsUrlStreamHandlerFactory(SharedState.scala:180)
	at org.apache.spark.sql.internal.SharedState.<init>(SharedState.scala:54)
	at org.apache.spark.sql.SparkSession.$anonfun$sharedState$1(SparkSession.scala:135)
	at scala.Option.getOrElse(Option.scala:189)
	at org.apache.spark.sql.SparkSession.sharedState$lzycompute(SparkSession.scala:135)
	at org.apache.spark.sql.SparkSession.sharedState(SparkSession.scala:134)
	at org.apache.spark.sql.internal.BaseSessionStateBuilder.build(BaseSessionStateBuilder.scala:335)
	at org.apache.spark.sql.test.TestSparkSession.sessionState$lzycompute(TestSQLContext.scala:42)
	at org.apache.spark.sql.test.TestSparkSession.sessionState(TestSQLContext.scala:41)
	at org.apache.spark.sql.SparkSession.$anonfun$new$3(SparkSession.scala:109)
	at scala.Option.map(Option.scala:230)
	at org.apache.spark.sql.SparkSession.$anonfun$new$1(SparkSession.scala:109)
	at org.apache.spark.sql.internal.SQLConf$.get(SQLConf.scala:194)
	at org.apache.spark.sql.types.DataType.sameType(DataType.scala:97)
	at org.apache.spark.sql.catalyst.analysis.TypeCoercion$.$anonfun$haveSameType$1(TypeCoercion.scala:291)
	at org.apache.spark.sql.catalyst.analysis.TypeCoercion$.$anonfun$haveSameType$1$adapted(TypeCoercion.scala:291)
	at scala.collection.LinearSeqOptimized.forall(LinearSeqOptimized.scala:85)
	at scala.collection.LinearSeqOptimized.forall$(LinearSeqOptimized.scala:82)
	at scala.collection.immutable.List.forall(List.scala:89)
	at org.apache.spark.sql.catalyst.analysis.TypeCoercion$.haveSameType(TypeCoercion.scala:291)
	at org.apache.spark.sql.catalyst.expressions.ComplexTypeMergingExpression.dataTypeCheck(Expression.scala:1074)
	at org.apache.spark.sql.catalyst.expressions.ComplexTypeMergingExpression.dataTypeCheck$(Expression.scala:1069)
	at org.apache.spark.sql.catalyst.expressions.If.dataTypeCheck(conditionalExpressions.scala:37)
	at org.apache.spark.sql.catalyst.expressions.ComplexTypeMergingExpression.org$apache$spark$sql$catalyst$expressions$ComplexTypeMergingExpression$$internalDataType(Expression.scala:1080)
	at org.apache.spark.sql.catalyst.expressions.ComplexTypeMergingExpression.org$apache$spark$sql$catalyst$expressions$ComplexTypeMergingExpression$$internalDataType$(Expression.scala:1079)
	at org.apache.spark.sql.catalyst.expressions.If.org$apache$spark$sql$catalyst$expressions$ComplexTypeMergingExpression$$internalDataType$lzycompute(conditionalExpressions.scala:37)
	at org.apache.spark.sql.catalyst.expressions.If.org$apache$spark$sql$catalyst$expressions$ComplexTypeMergingExpression$$internalDataType(conditionalExpressions.scala:37)
	at org.apache.spark.sql.catalyst.expressions.ComplexTypeMergingExpression.dataType(Expression.scala:1084)
	at org.apache.spark.sql.catalyst.expressions.ComplexTypeMergingExpression.dataType$(Expression.scala:1084)
	at org.apache.spark.sql.catalyst.expressions.If.dataType(conditionalExpressions.scala:37)
	at org.apache.spark.sql.catalyst.expressions.objects.MapObjects.$anonfun$dataType$4(objects.scala:815)
	at scala.Option.getOrElse(Option.scala:189)
	at org.apache.spark.sql.catalyst.expressions.objects.MapObjects.dataType(objects.scala:815)
	at org.apache.spark.sql.catalyst.expressions.CreateNamedStruct.$anonfun$dataType$9(complexTypeCreator.scala:416)
	at scala.collection.immutable.List.map(List.scala:290)
	at org.apache.spark.sql.catalyst.expressions.CreateNamedStruct.dataType$lzycompute(complexTypeCreator.scala:410)
	at org.apache.spark.sql.catalyst.expressions.CreateNamedStruct.dataType(complexTypeCreator.scala:409)
	at org.apache.spark.sql.catalyst.expressions.CreateNamedStruct.dataType(complexTypeCreator.scala:398)
	at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.isSerializedAsStruct(ExpressionEncoder.scala:309)
	at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.isSerializedAsStructForTopLevel(ExpressionEncoder.scala:319)
	at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.<init>(ExpressionEncoder.scala:248)
	at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.javaBean(ExpressionEncoder.scala:75)
	at org.apache.spark.sql.Encoders$.bean(Encoders.scala:154)
	at org.apache.spark.sql.Encoders.bean(Encoders.scala)
	at test.org.apache.spark.sql.JavaBeanDeserializationSuite.testBeanWithArrayFieldDeserialization(JavaBeanDeserializationSuite.java:75)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
	at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
	at org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
	at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
	at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
	at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
	at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
	at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:364)
	at org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:272)
	at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:237)
	at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:158)
	at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:428)
	at org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:162)
	at org.apache.maven.surefire.booter.ForkedBooter.run(ForkedBooter.java:562)
	at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:548)
Caused by: java.lang.NoClassDefFoundError: org/codehaus/jackson/map/ObjectMapper
	at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.<clinit>(WebHdfsFileSystem.java:129)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at java.lang.Class.newInstance(Class.java:442)
	at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:380)
	... 81 more
Caused by: java.lang.ClassNotFoundException: org.codehaus.jackson.map.ObjectMapper
	at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:419)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:352)
	... 88 more
```

**After**

There are no more warning logs like above

Closes #33926 from LuciferYang/SPARK-36684.

Authored-by: yangjie01 <yangjie01@baidu.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(cherry picked from commit acd9c92fa8)
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2021-09-07 21:40:53 -07:00