### What changes were proposed in this pull request?
Instead of returning NULL, the next_day function throws runtime IllegalArgumentException when ansiMode is enable and receiving invalid input of the dayOfWeek parameter.
### Why are the changes needed?
For ansiMode.
### Does this PR introduce _any_ user-facing change?
Yes.
When spark.sql.ansi.enabled = true, the next_day function will throw IllegalArgumentException when receiving invalid input of the dayOfWeek parameter.
When spark.sql.ansi.enabled = false, same behaviour as before.
### How was this patch tested?
Ansi mode is tested with existing tests.
End-to-end tests have been added.
Closes#30807 from chongguang/SPARK-33794.
Authored-by: Chongguang LIU <chongguang.liu@laposte.fr>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
### What changes were proposed in this pull request?
Added the `RemoveNoopOperators` rule to optimization batch `Union`. Also made sure that the `RemoveNoopOperators` would be idempotent.
### Why are the changes needed?
In several TPCDS queries the `CombineUnions` rule does not manage to combine unions, because they have noop `Project`s between them.
The `Project`s will be removed by `RemoveNoopOperators`, but by then `ReplaceDistinctWithAggregate` has been applied and there are aggregates between the unions. Adding a copy of `RemoveNoopOperators` earlier in the optimization chain allows `CombineUnions` to work on more queries.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
New UTs and the output of `PlanStabilitySuite`
Closes#30996 from tanelk/SPARK-33964_combine_unions.
Authored-by: tanel.kiis@gmail.com <tanel.kiis@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
Remove partition data by `ALTER TABLE .. DROP PARTITION` in V2 table catalog used in tests.
### Why are the changes needed?
This is a bug fix. Before the fix, `ALTER TABLE .. DROP PARTITION` does not remove the data belongs to the dropped partition. As a consequence of that, the `select` query returns removed data.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
By running tests suites for v1 and v2 catalogs:
```
$ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *AlterTableDropPartitionSuite"
```
Closes#31014 from MaxGekk/fix-drop-partition-v2.
Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
This PR proposes to implement `DESCRIBE COLUMN` for v2 tables.
Note that `isExnteded` option is not implemented in this PR.
### Why are the changes needed?
Parity with v1 tables.
### Does this PR introduce _any_ user-facing change?
Yes, now, `DESCRIBE COLUMN` works for v2 tables.
```scala
sql("CREATE TABLE testcat.tbl (id bigint, data string COMMENT 'hello') USING foo")
sql("DESCRIBE testcat.tbl data").show
```
```
+---------+----------+
|info_name|info_value|
+---------+----------+
| col_name| data|
|data_type| string|
| comment| hello|
+---------+----------+
```
Before this PR, the command would fail with: `Describing columns is not supported for v2 tables.`
### How was this patch tested?
Added new test.
Closes#30881 from imback82/describe_col_v2.
Authored-by: Terry Kim <yuminkim@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
### What changes were proposed in this pull request?
This pr fix some operator missing rowCount when enable CBO, e.g.:
```scala
spark.range(1000).selectExpr("id as a", "id as b").write.saveAsTable("t1")
spark.sql("ANALYZE TABLE t1 COMPUTE STATISTICS FOR ALL COLUMNS")
spark.sql("set spark.sql.cbo.enabled=true")
spark.sql("set spark.sql.cbo.planStats.enabled=true")
spark.sql("select * from (select * from t1 distribute by a limit 100) distribute by b").explain("cost")
```
Before this pr:
```
== Optimized Logical Plan ==
RepartitionByExpression [b#2129L], Statistics(sizeInBytes=2.3 KiB)
+- GlobalLimit 100, Statistics(sizeInBytes=2.3 KiB, rowCount=100)
+- LocalLimit 100, Statistics(sizeInBytes=23.4 KiB)
+- RepartitionByExpression [a#2128L], Statistics(sizeInBytes=23.4 KiB)
+- Relation[a#2128L,b#2129L] parquet, Statistics(sizeInBytes=23.4 KiB, rowCount=1.00E+3)
```
After this pr:
```
== Optimized Logical Plan ==
RepartitionByExpression [b#2129L], Statistics(sizeInBytes=2.3 KiB, rowCount=100)
+- GlobalLimit 100, Statistics(sizeInBytes=2.3 KiB, rowCount=100)
+- LocalLimit 100, Statistics(sizeInBytes=23.4 KiB, rowCount=1.00E+3)
+- RepartitionByExpression [a#2128L], Statistics(sizeInBytes=23.4 KiB, rowCount=1.00E+3)
+- Relation[a#2128L,b#2129L] parquet, Statistics(sizeInBytes=23.4 KiB, rowCount=1.00E+3)
```
### Why are the changes needed?
[`JoinEstimation.estimateInnerOuterJoin`](d6a68e0b67/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/JoinEstimation.scala (L55-L156)) need the row count.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Unit test.
Closes#30987 from wangyum/SPARK-33954.
Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
### What changes were proposed in this pull request?
The error messages for specifying filter and distinct for the aggregate function are mixed together and should be separated. This can increase readability and ease of use.
### Why are the changes needed?
increase readability and ease of use.
### Does this PR introduce _any_ user-facing change?
'Yes'.
### How was this patch tested?
Jenkins test
Closes#30982 from beliefer/SPARK-33951.
Lead-authored-by: gengjiaan <gengjiaan@360.cn>
Co-authored-by: beliefer <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
### What changes were proposed in this pull request?
This patch proposes to add latest offset to source progress for streaming queries.
### Why are the changes needed?
Currently we record start and end offsets per source in streaming process. Latest offset is an important information for streaming process but the progress lacks of this info. We can use it to track the process lag and adjust streaming queries. We should add latest offset to source progress.
### Does this PR introduce _any_ user-facing change?
Yes, for new metric about latest source offset in source progress.
### How was this patch tested?
Unit test. Manually test in Spark cluster:
```
"description" : "KafkaV2[Subscribe[page_view_events]]",
"startOffset" : {
"page_view_events" : {
"2" : 582370921,
"4" : 391910836,
"1" : 631009201,
"3" : 406601346,
"0" : 195799112
}
},
"endOffset" : {
"page_view_events" : {
"2" : 583764414,
"4" : 392338002,
"1" : 632183480,
"3" : 407101489,
"0" : 197304028
}
},
"latestOffset" : {
"page_view_events" : {
"2" : 589852545,
"4" : 394204277,
"1" : 637313869,
"3" : 409286602,
"0" : 203878962
}
},
"numInputRows" : 4999997,
"inputRowsPerSecond" : 29287.70501405811,
```
Closes#30988 from viirya/latest-offset.
Authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
Skip table stats in canonicalizing of `HiveTableRelation`.
### Why are the changes needed?
The changes fix a regression comparing to Spark 3.0, see SPARK-33963.
### Does this PR introduce _any_ user-facing change?
Yes. After changes Spark behaves as in the version 3.0.1.
### How was this patch tested?
By running new UT:
```
$ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *CachedTableSuite"
```
Closes#30995 from MaxGekk/fix-caching-hive-table.
Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
This pr improve the statistics estimation of the `Tail`:
```scala
spark.sql("set spark.sql.cbo.enabled=true")
spark.range(100).selectExpr("id as a", "id as b", "id as c", "id as e").write.saveAsTable("t1")
println(Tail(Literal(5), spark.sql("SELECT * FROM t1").queryExecution.logical).queryExecution.stringWithStats)
```
Before this pr:
```
== Optimized Logical Plan ==
Tail 5, Statistics(sizeInBytes=3.8 KiB)
+- Relation[a#24L,b#25L,c#26L,e#27L] parquet, Statistics(sizeInBytes=3.8 KiB)
```
After this pr:
```
== Optimized Logical Plan ==
Tail 5, Statistics(sizeInBytes=200.0 B, rowCount=5)
+- Relation[a#24L,b#25L,c#26L,e#27L] parquet, Statistics(sizeInBytes=3.8 KiB)
```
### Why are the changes needed?
Import statistics estimation.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Unit test.
Closes#30991 from wangyum/SPARK-33959.
Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
This pr add rowCount for `Range` operator:
```scala
spark.sql("set spark.sql.cbo.enabled=true")
spark.sql("select id from range(100)").explain("cost")
```
Before this pr:
```
== Optimized Logical Plan ==
Range (0, 100, step=1, splits=None), Statistics(sizeInBytes=800.0 B)
```
After this pr:
```
== Optimized Logical Plan ==
Range (0, 100, step=1, splits=None), Statistics(sizeInBytes=800.0 B, rowCount=100)
```
### Why are the changes needed?
[`JoinEstimation.estimateInnerOuterJoin`](d6a68e0b67/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/JoinEstimation.scala (L55-L156)) need the row count.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Unit test.
Closes#30989 from wangyum/SPARK-33956.
Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
As a follow-up task to SPARK-32958, this patch takes safer approach to only prune columns from JsonToStructs if the parsing option is empty. It is to avoid unexpected behavior change regarding parsing.
This patch also adds a few e2e tests to make sure failfast parsing behavior is not changed.
### Why are the changes needed?
It is to avoid unexpected behavior change regarding parsing.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Unit test.
Closes#30970 from viirya/SPARK-33907-3.2.
Authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Signed-off-by: Liang-Chi Hsieh <viirya@gmail.com>
### What changes were proposed in this pull request?
In the `saveAsTable()` and `insertInto()` methods of `DataFrameWriter`, recognize `spark_catalog` as the default session catalog in table names.
### Why are the changes needed?
1. To simplify writing of unified v1 and v2 tests
2. To improve Spark SQL user experience. `insertInto()` should have feature parity with the `INSERT INTO` sql command. Currently, `insertInto()` fails on a table from a namespace in `spark_catalog`:
```scala
scala> sql("CREATE NAMESPACE spark_catalog.ns")
scala> Seq(0).toDF().write.saveAsTable("spark_catalog.ns.tbl")
org.apache.spark.sql.AnalysisException: Couldn't find a catalog to handle the identifier spark_catalog.ns.tbl.
at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:629)
... 47 elided
scala> Seq(0).toDF().write.insertInto("spark_catalog.ns.tbl")
org.apache.spark.sql.AnalysisException: Couldn't find a catalog to handle the identifier spark_catalog.ns.tbl.
at org.apache.spark.sql.DataFrameWriter.insertInto(DataFrameWriter.scala:498)
... 47 elided
```
but `INSERT INTO` succeed:
```sql
spark-sql> create table spark_catalog.ns.tbl (c int);
spark-sql> insert into spark_catalog.ns.tbl select 0;
spark-sql> select * from spark_catalog.ns.tbl;
0
```
### Does this PR introduce _any_ user-facing change?
Yes. After the changes for the example above:
```scala
scala> Seq(0).toDF().write.saveAsTable("spark_catalog.ns.tbl")
scala> Seq(1).toDF().write.insertInto("spark_catalog.ns.tbl")
scala> spark.table("spark_catalog.ns.tbl").show(false)
+-----+
|value|
+-----+
|0 |
|1 |
+-----+
```
### How was this patch tested?
By running the affected test suites:
```
$ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *.ShowPartitionsSuite"
$ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *.FileFormatWriterSuite"
```
Closes#30919 from MaxGekk/insert-into-spark_catalog.
Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
### What changes were proposed in this pull request?
The current implement of trim/trimleft/trimright have somewhat redundant.
### Why are the changes needed?
Improve the implement of trim/trimleft/trimright
### Does this PR introduce _any_ user-facing change?
'No'.
### How was this patch tested?
Jenkins test
Closes#30905 from beliefer/SPARK-33890.
Lead-authored-by: gengjiaan <gengjiaan@360.cn>
Co-authored-by: beliefer <beliefer@163.com>
Co-authored-by: Jiaan Geng <beliefer@163.com>
Co-authored-by: Wenchen Fan <cloud0fan@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
### What changes were proposed in this pull request?
Add the `since` tag to methods and interfaces added recently.
### Why are the changes needed?
1. To follow the existing convention for Spark API.
2. To inform devs when Spark API was changed.
### Does this PR introduce _any_ user-facing change?
Should not.
### How was this patch tested?
`dev/scalastyle`
Closes#30966 from MaxGekk/spark-23889-interfaces-followup.
Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
This pr fix remove the `CaseWhen` if elseValue is empty and other outputs are null because of we should consider deterministic.
### Why are the changes needed?
Fix bug.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Unit test.
Closes#30960 from wangyum/SPARK-33847-2.
Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
### What changes were proposed in this pull request?
Add the version 3.2.0 to new method `renamePartition()` in the `SupportsPartitionManagement` interface.
### Why are the changes needed?
To inform Spark devs when the method appears in the interface.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
`./dev/scalastyle`
Closes#30964 from MaxGekk/alter-table-rename-partition-v2-followup.
Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
### What changes were proposed in this pull request?
Introduce allowList push into (if / case) branches to fix potential bug.
### Why are the changes needed?
Fix potential bug.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Existing test.
Closes#30955 from wangyum/SPARK-33848-2.
Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
### What changes were proposed in this pull request?
Move seed is legal check to `CheckAnalysis`.
### Why are the changes needed?
It's better to check seed expression is legal at analyzer side instead of execution, and user can get exception as soon as possible.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Add test.
Closes#30923 from ulysses-you/SPARK-33909.
Authored-by: ulysses-you <ulyssesyou18@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
### What changes were proposed in this pull request?
1. Add `renamePartition()` to the `SupportsPartitionManagement`
2. Implement `renamePartition()` in `InMemoryPartitionTable`
3. Add v2 execution node `AlterTableRenamePartitionExec`
4. Resolve the logical node `AlterTableRenamePartition` to `AlterTableRenamePartitionExec` for v2 tables that support `SupportsPartitionManagement`
5. Move v1 tests to the base suite `org.apache.spark.sql.execution.command.AlterTableRenamePartitionSuiteBase` to run them for v2 table catalogs.
### Why are the changes needed?
To have feature parity with Datasource V1.
### Does this PR introduce _any_ user-facing change?
Yes
### How was this patch tested?
By running the unified tests:
```
$ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *AlterTableRenamePartitionSuite"
```
Closes#30935 from MaxGekk/alter-table-rename-partition-v2.
Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
### What changes were proposed in this pull request?
This patch proposes to do column pruning for CsvToStructs expression if we only require some fields from it.
### Why are the changes needed?
`CsvToStructs` takes a schema parameter used to tell CSV Parser what fields are needed to parse. If `CsvToStructs` is followed by GetStructField. We can prune the schema to only parse certain field.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Unit test
Closes#30912 from viirya/SPARK-32968.
Lead-authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Co-authored-by: Hyukjin Kwon <gurwls223@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
This pr simplify `CaseWhen`clauses with (true and false) and (false and true):
Expression | cond.nullable | After simplify
-- | -- | --
case when cond then true else false end | true | cond <=> true
case when cond then true else false end | false | cond
case when cond then false else true end | true | !(cond <=> true)
case when cond then false else true end | false | !cond
### Why are the changes needed?
Improve query performance.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Unit test.
Closes#30898 from wangyum/SPARK-33884.
Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
### What changes were proposed in this pull request?
For `InMemoryPartitionTable` used in tests, set empty partition metadata only when a partition doesn't exists.
### Why are the changes needed?
This bug fix is needed to use `INSERT INTO .. PARTITION` in other tests.
### Does this PR introduce _any_ user-facing change?
No. It affects only the v2 table catalog used in tests.
### How was this patch tested?
Added new UT to `DataSourceV2SQLSuite`, and run the affected test suite by:
```
$ build/sbt -Phive -Phive-thriftserver "test:testOnly org.apache.spark.sql.connector.DataSourceV2SQLSuite"
```
Closes#30952 from MaxGekk/fix-insert-into-partition-v2.
Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
### What changes were proposed in this pull request?
This is a followup of https://github.com/apache/spark/pull/30849, to fix a correctness issue caused by null value handling.
### Why are the changes needed?
Fix a correctness issue. `If(null, true, false)` should return false, not true.
### Does this PR introduce _any_ user-facing change?
Yes, but the bug only exist in the master branch.
### How was this patch tested?
updated tests.
Closes#30953 from cloud-fan/bug.
Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
After CTAS / CREATE TABLE LIKE / CVAS/ alter table add columns, the target tables will display string instead of char/varchar
### Why are the changes needed?
bugfix
### Does this PR introduce _any_ user-facing change?
no
### How was this patch tested?
new tests
Closes#30918 from yaooqinn/SPARK-33901.
Lead-authored-by: Kent Yao <yao@apache.org>
Co-authored-by: Kent Yao <yaooqinn@hotmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
### What changes were proposed in this pull request?
There are total 15 compilation warnings about `Unicode escapes in triple quoted strings are deprecated` in Spark code now:
```
[WARNING] /spark-source/core/src/main/scala/org/apache/spark/util/Utils.scala:2930: Unicode escapes in triple quoted strings are deprecated, use the literal character instead
[WARNING] /spark-source/core/src/main/scala/org/apache/spark/util/Utils.scala:2931: Unicode escapes in triple quoted strings are deprecated, use the literal character instead
[WARNING] /spark-source/core/src/main/scala/org/apache/spark/util/Utils.scala:2932: Unicode escapes in triple quoted strings are deprecated, use the literal character instead
[WARNING] /spark-source/core/src/main/scala/org/apache/spark/util/Utils.scala:2933: Unicode escapes in triple quoted strings are deprecated, use the literal character instead
[WARNING] /spark-source/core/src/main/scala/org/apache/spark/util/Utils.scala:2934: Unicode escapes in triple quoted strings are deprecated, use the literal character instead
[WARNING] /spark-source/core/src/main/scala/org/apache/spark/util/Utils.scala:2935: Unicode escapes in triple quoted strings are deprecated, use the literal character instead
[WARNING] /spark-source/core/src/main/scala/org/apache/spark/util/Utils.scala:2936: Unicode escapes in triple quoted strings are deprecated, use the literal character instead
[WARNING] /spark-source/core/src/main/scala/org/apache/spark/util/Utils.scala:2937: Unicode escapes in triple quoted strings are deprecated, use the literal character instead
[WARNING] /spark-source/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVExprUtils.scala:82: Unicode escapes in triple quoted strings are deprecated, use the literal character instead
[WARNING] /spark-source/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/csv/CSVExprUtilsSuite.scala:32: Unicode escapes in triple quoted strings are deprecated, use the literal character instead
[WARNING] /spark-source/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/csv/CSVExprUtilsSuite.scala:79: Unicode escapes in triple quoted strings are deprecated, use the literal character instead
[WARNING] /spark-source/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/ParserUtilsSuite.scala:97: Unicode escapes in triple quoted strings are deprecated, use the literal character instead
[WARNING] /spark-source/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/ParserUtilsSuite.scala:101: Unicode escapes in triple quoted strings are deprecated, use the literal character instead
[WARNING] /spark-source/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonParsingOptionsSuite.scala:76: Unicode escapes in triple quoted strings are deprecated, use the literal character instead
[WARNING] /spark-source/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonParsingOptionsSuite.scala:83: Unicode escapes in triple quoted strings are deprecated, use the literal character instead
```
This pr try to fix these warnnings.
### Why are the changes needed?
Cleanup compilation warnings about `Unicode escapes in triple quoted strings are deprecated`
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Pass the Jenkins or GitHub Action
Closes#30926 from LuciferYang/SPARK-33801.
Authored-by: yangjie01 <yangjie01@baidu.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
Currently, there are many DDL commands where the position of the unresolved identifiers are incorrect:
```
scala> sql("DROP VIEW unknown")
org.apache.spark.sql.AnalysisException: View not found: unknown; line 1 pos 0;
```
, whereas the `pos` should be `10`.
This PR proposes to fix this issue for commands using `UnresolvedTable`:
```
DROP VIEW v
ALTER VIEW v SET TBLPROPERTIES ('k'='v')
ALTER VIEW v UNSET TBLPROPERTIES ('k')
ALTER VIEW v AS SELECT 1
```
### Why are the changes needed?
To fix a bug.
### Does this PR introduce _any_ user-facing change?
Yes, now the above example will print the following:
```
org.apache.spark.sql.AnalysisException: View not found: unknown; line 1 pos 10;
```
### How was this patch tested?
Add a new suite of tests.
Closes#30936 from imback82/position_view_fix.
Authored-by: Terry Kim <yuminkim@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
### What changes were proposed in this pull request?
[The PySpark documentation](https://spark.apache.org/docs/3.0.1/api/python/pyspark.sql.html#pyspark.sql.DataFrame.join) says "Must be one of: inner, cross, outer, full, fullouter, full_outer, left, leftouter, left_outer, right, rightouter, right_outer, semi, leftsemi, left_semi, anti, leftanti and left_anti."
However, I get the following error when I set the cross option.
```
scala> val df1 = spark.createDataFrame(Seq((1,"a"),(2,"b")))
df1: org.apache.spark.sql.DataFrame = [_1: int, _2: string]
scala> val df2 = spark.createDataFrame(Seq((1,"A"),(2,"B"), (3, "C")))
df2: org.apache.spark.sql.DataFrame = [_1: int, _2: string]
scala> df1.join(right = df2, usingColumns = Seq("_1"), joinType = "cross").show()
java.lang.IllegalArgumentException: requirement failed: Unsupported using join type Cross
at scala.Predef$.require(Predef.scala:281)
at org.apache.spark.sql.catalyst.plans.UsingJoin.<init>(joinTypes.scala:106)
at org.apache.spark.sql.Dataset.join(Dataset.scala:1025)
... 53 elided
```
### Why are the changes needed?
The documentation says cross option can be set, but when I try to set it, I get an java.lang.IllegalArgumentException.
### Does this PR introduce _any_ user-facing change?
Accepting this PR fix will behave the same as the documentation.
### How was this patch tested?
There is already a test for [JoinTypes](1b9fd67904/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/plans/JoinTypesTest.scala), but I can't find a test for the join option itself.
Closes#30803 from kozakana/allow_cross_option.
Authored-by: kozakana <goki727@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
Unify the seed of random functions
1. Add a hold place expression `UnresolvedSeed ` as the defualt seed.
2. Change `Rand`,`Randn`,`Uuid`,`Shuffle` default seed to `UnresolvedSeed `.
3. Replace `UnresolvedSeed ` to real seed at `ResolveRandomSeed` rule.
### Why are the changes needed?
`Uuid` and `Shuffle` use the `ResolveRandomSeed` rule to set the seed if user doesn't give a seed value. `Rand` and `Randn` do this at constructing.
It's better to unify the default seed at Analyzer side since we have used `ExpressionWithRandomSeed` at streaming query.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Pass exists test and add test.
Closes#30864 from ulysses-you/SPARK-33857.
Authored-by: ulysses-you <ulyssesyou18@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
This pr simplify conditional in predicate, after this change we can push down the filter to datasource:
Expression | After simplify
-- | --
IF(cond, trueVal, false) | AND(cond, trueVal)
IF(cond, trueVal, true) | OR(NOT(cond), trueVal)
IF(cond, false, falseVal) | AND(NOT(cond), elseVal)
IF(cond, true, falseVal) | OR(cond, elseVal)
CASE WHEN cond THEN trueVal ELSE false END | AND(cond, trueVal)
CASE WHEN cond THEN trueVal END | AND(cond, trueVal)
CASE WHEN cond THEN trueVal ELSE null END | AND(cond, trueVal)
CASE WHEN cond THEN trueVal ELSE true END | OR(NOT(cond), trueVal)
CASE WHEN cond THEN false ELSE elseVal END | AND(NOT(cond), elseVal)
CASE WHEN cond THEN false END | false
CASE WHEN cond THEN true ELSE elseVal END | OR(cond, elseVal)
CASE WHEN cond THEN true END | cond
### Why are the changes needed?
Improve query performance.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Unit test.
Closes#30865 from wangyum/SPARK-33861.
Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
### What changes were proposed in this pull request?
Currently, there are many DDL commands where the position of the unresolved identifiers are incorrect:
```
scala> sql("MSCK REPAIR TABLE unknown")
org.apache.spark.sql.AnalysisException: Table not found: unknown; line 1 pos 0;
```
, whereas the `pos` should be 18.
This PR proposes to fix this issue for commands using `UnresolvedTable`:
```
MSCK REPAIR TABLE t
LOAD DATA LOCAL INPATH 'filepath' INTO TABLE t
TRUNCATE TABLE t
SHOW PARTITIONS t
ALTER TABLE t RECOVER PARTITIONS
ALTER TABLE t ADD PARTITION (p=1)
ALTER TABLE t PARTITION (p=1) RENAME TO PARTITION (p=2)
ALTER TABLE t DROP PARTITION (p=1)
ALTER TABLE t SET SERDEPROPERTIES ('a'='b')
COMMENT ON TABLE t IS 'hello'"
```
### Why are the changes needed?
To fix a bug.
### Does this PR introduce _any_ user-facing change?
Yes, now the above example will print the following:
```
org.apache.spark.sql.AnalysisException: Table not found: unknown; line 1 pos 18;
```
### How was this patch tested?
Add a new suite of tests.
Closes#30900 from imback82/position_Fix.
Authored-by: Terry Kim <yuminkim@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
### What changes were proposed in this pull request?
1. Enhance `ReplaceNullWithFalseInPredicate` to replace None of elseValue inside `CaseWhen` with `FalseLiteral` if all branches are `FalseLiteral` . The use case is:
```sql
create table t1 using parquet as select id from range(10);
explain select id from t1 where (CASE WHEN id = 1 THEN 'a' WHEN id = 3 THEN 'b' end) = 'c';
```
Before this pr:
```
== Physical Plan ==
*(1) Filter CASE WHEN (id#1L = 1) THEN false WHEN (id#1L = 3) THEN false END
+- *(1) ColumnarToRow
+- FileScan parquet default.t1[id#1L] Batched: true, DataFilters: [CASE WHEN (id#1L = 1) THEN false WHEN (id#1L = 3) THEN false END], Format: Parquet, Location: InMemoryFileIndex[file:/Users/yumwang/opensource/spark/spark-warehouse/org.apache.spark.sql.DataF..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<id:bigint>
```
After this pr:
```
== Physical Plan ==
LocalTableScan <empty>, [id#1L]
```
2. Enhance `SimplifyConditionals` if elseValue is None and all outputs are null.
### Why are the changes needed?
Improve query performance.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Unit test.
Closes#30852 from wangyum/SPARK-33847.
Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
### What changes were proposed in this pull request?
1. Move the `ALTER TABLE .. RENAME PARTITION` parsing tests to `AlterTableRenamePartitionParserSuite`
2. Place the v1 tests for `ALTER TABLE .. RENAME PARTITION` from `DDLSuite` to `v1.AlterTableRenamePartitionSuite` and v2 tests from `AlterTablePartitionV2SQLSuite` to `v2.AlterTableRenamePartitionSuite`, so, the tests will run for V1, Hive V1 and V2 DS.
### Why are the changes needed?
- The unification will allow to run common `ALTER TABLE .. RENAME PARTITION` tests for both DSv1 and Hive DSv1, DSv2
- We can detect missing features and differences between DSv1 and DSv2 implementations.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
By running new test suites:
```
$ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *AlterTableRenamePartitionParserSuite"
$ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *AlterTableRenamePartitionSuite"
```
Closes#30863 from MaxGekk/unify-rename-partition-tests.
Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
### What changes were proposed in this pull request?
This PR aims to override maxRows method in these follow `LogicalPlan`:
* `ReturnAnswer`
* `Join`
* `Range`
* `Sample`
* `RepartitionOperation`
* `Deduplicate`
* `LocalRelation`
* `Window`
### Why are the changes needed?
1. Logically, we know the max rows info with these `LogicalPlan`.
2. Before this PR, we already have some max rows with `LogicalPlan`, so we can eliminate limit with more case if we expand more.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Add test.
Closes#30443 from ulysses-you/SPARK-33497.
Lead-authored-by: ulysses-you <youxiduo@weidian.com>
Co-authored-by: ulysses-you <ulyssesyou18@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
### What changes were proposed in this pull request?
1. Add new methods `purgePartition()`/`purgePartitions()` to the interfaces `SupportsPartitionManagement`/`SupportsAtomicPartitionManagement`.
2. Default implementation of new methods throw the exception `UnsupportedOperationException`.
3. Add tests for new methods to `SupportsPartitionManagementSuite`/`SupportsAtomicPartitionManagementSuite`.
4. Add `ALTER TABLE .. DROP PARTITION` tests for DS v1 and v2.
Closes#30776Closes#30821
### Why are the changes needed?
Currently, the `PURGE` option that user can set in `ALTER TABLE .. DROP PARTITION` is completely ignored. We should pass this flag to the catalog implementation, so, the catalog should decide how to handle the flag.
### Does this PR introduce _any_ user-facing change?
The changes can impact on behavior of `ALTER TABLE .. DROP PARTITION` for v2 tables.
### How was this patch tested?
By running the affected test suites, for instance:
```
$ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *AlterTableDropPartitionSuite"
```
Closes#30886 from MaxGekk/purge-partition.
Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
### What changes were proposed in this pull request?
```sql
spark-sql> select * from t10 where c0='abcd';
20/12/22 15:43:38 ERROR SparkSQLDriver: Failed in [select * from t10 where c0='abcd']
scala.MatchError: CharType(10) (of class org.apache.spark.sql.types.CharType)
at org.apache.spark.sql.catalyst.expressions.CastBase.cast(Cast.scala:815)
at org.apache.spark.sql.catalyst.expressions.CastBase.cast$lzycompute(Cast.scala:842)
at org.apache.spark.sql.catalyst.expressions.CastBase.cast(Cast.scala:842)
at org.apache.spark.sql.catalyst.expressions.CastBase.nullSafeEval(Cast.scala:844)
at org.apache.spark.sql.catalyst.expressions.UnaryExpression.eval(Expression.scala:476)
at org.apache.spark.sql.catalyst.catalog.CatalogTablePartition.$anonfun$toRow$2(interface.scala:164)
at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
at scala.collection.Iterator.foreach(Iterator.scala:941)
at scala.collection.Iterator.foreach$(Iterator.scala:941)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
at scala.collection.IterableLike.foreach(IterableLike.scala:74)
at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
at org.apache.spark.sql.types.StructType.foreach(StructType.scala:102)
at scala.collection.TraversableLike.map(TraversableLike.scala:238)
at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
at org.apache.spark.sql.types.StructType.map(StructType.scala:102)
at org.apache.spark.sql.catalyst.catalog.CatalogTablePartition.toRow(interface.scala:158)
at org.apache.spark.sql.catalyst.catalog.ExternalCatalogUtils$.$anonfun$prunePartitionsByFilter$3(ExternalCatalogUtils.scala:157)
at org.apache.spark.sql.catalyst.catalog.ExternalCatalogUtils$.$anonfun$prunePartitionsByFilter$3$adapted(ExternalCatalogUtils.scala:156)
```
c0 is a partition column, it fails in the partition pruning rule
In this PR, we relace char/varchar w/ string type before the CAST happends
### Why are the changes needed?
bugfix, see the case above
### Does this PR introduce _any_ user-facing change?
no
### How was this patch tested?
yes, new tests
Closes#30887 from yaooqinn/SPARK-33879.
Authored-by: Kent Yao <yaooqinn@hotmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
Make test stable and fix docs.
### Why are the changes needed?
Query timeout sometime since we set an another config after set query timeout.
```
sbt.ForkMain$ForkError: java.sql.SQLTimeoutException: Query timed out after 0 seconds
at org.apache.hive.jdbc.HiveStatement.waitForOperationToComplete(HiveStatement.java:381)
at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:254)
at org.apache.spark.sql.hive.thriftserver.ThriftServerWithSparkContextSuite.$anonfun$$init$$13(ThriftServerWithSparkContextSuite.scala:107)
at org.apache.spark.sql.hive.thriftserver.ThriftServerWithSparkContextSuite.$anonfun$$init$$13$adapted(ThriftServerWithSparkContextSuite.scala:106)
at scala.collection.immutable.List.foreach(List.scala:392)
at org.apache.spark.sql.hive.thriftserver.ThriftServerWithSparkContextSuite.$anonfun$$init$$12(ThriftServerWithSparkContextSuite.scala:106)
at org.apache.spark.sql.hive.thriftserver.ThriftServerWithSparkContextSuite.$anonfun$$init$$12$adapted(ThriftServerWithSparkContextSuite.scala:89)
at org.apache.spark.sql.hive.thriftserver.SharedThriftServer.$anonfun$withJdbcStatement$4(SharedThriftServer.scala:95)
at org.apache.spark.sql.hive.thriftserver.SharedThriftServer.$anonfun$withJdbcStatement$4$adapted(SharedThriftServer.scala:95)
```
The reason is:
1. we execute `set spark.sql.thriftServer.queryTimeout = 1`, then all the option will be limited in 1s.
2. we execute `set spark.sql.thriftServer.interruptOnCancel = false/true`. This sql will get timeout exception if there is something hung within 1s. It's not our expected.
Reset the timeout before we do the step2 can avoid this problem.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Fix test.
Closes#30897 from ulysses-you/SPARK-33526-followup.
Authored-by: ulysses-you <ulyssesyou18@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
This is a followup of https://github.com/apache/spark/pull/30267
Inspired by https://github.com/apache/spark/pull/30886, it's better to have 2 methods `def dropTable` and `def purgeTable`, than `def dropTable(ident)` and `def dropTable(ident, purge)`.
### Why are the changes needed?
1. make the APIs orthogonal. Previously, `def dropTable(ident, purge)` calls `def dropTable(ident)` and is a superset.
2. simplifies the catalog implementation a little bit. Now the `if (purge) ... else ...` check is done at the Spark side.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
existing tests
Closes#30890 from cloud-fan/purgeTable.
Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
Add support for Java Enums (`java.lang.Enum`) from the Scala typed Dataset APIs. This involves adding an implicit for `Encoder` creation in `SQLImplicits`, and updating `ScalaReflection` to handle Java Enums on the serialization and deserialization pathways.
Enums are mapped to a `StringType` which is just the name of the Enum value.
### Why are the changes needed?
In [SPARK-21255](https://issues.apache.org/jira/browse/SPARK-21255), support for (de)serialization of Java Enums was added, but only when called from Java code. It is common for Scala code to rely on Java libraries that are out of control of the Scala developer. Today, if there is a dependency on some Java code which defines an Enum, it would be necessary to define a corresponding Scala class. This change brings closer feature parity between Scala and Java APIs.
### Does this PR introduce _any_ user-facing change?
Yes, previously something like:
```
val ds = Seq(MyJavaEnum.VALUE1, MyJavaEnum.VALUE2).toDS
// or
val ds = Seq(CaseClass(MyJavaEnum.VALUE1), CaseClass(MyJavaEnum.VALUE2)).toDS
```
would fail. Now, it will succeed.
### How was this patch tested?
Additional unit tests are added in `DatasetSuite`. Tests include validating top-level enums, enums inside of case classes, enums inside of arrays, and validating that the Enum is stored as the expected string.
Closes#30877 from xkrogen/xkrogen-SPARK-23862-scalareflection-java-enums.
Lead-authored-by: Erik Krogen <xkrogen@apache.org>
Co-authored-by: Fangshi Li <fli@linkedin.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
This PR adds the length check to the existing ApplyCharPadding rule. Tables will have external locations when users execute
SET LOCATION or CREATE TABLE ... LOCATION. If the location contains over length values we should FAIL ON READ.
### Why are the changes needed?
```sql
spark-sql> INSERT INTO t2 VALUES ('1', 'b12345');
Time taken: 0.141 seconds
spark-sql> alter table t set location '/tmp/hive_one/t2';
Time taken: 0.095 seconds
spark-sql> select * from t;
1 b1234
```
the above case should fail rather than implicitly applying truncation
### Does this PR introduce _any_ user-facing change?
no
### How was this patch tested?
new tests
Closes#30882 from yaooqinn/SPARK-33876.
Authored-by: Kent Yao <yaooqinn@hotmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
### What changes were proposed in this pull request?
```scala
val nestedStruct = new StructType()
.add(StructField("b", StringType).withComment("Nested comment"))
val struct = new StructType()
.add(StructField("a", nestedStruct).withComment("comment"))
struct.toDDL
```
Currently, returns:
```
`a` STRUCT<`b`: STRING> COMMENT 'comment'`
```
With this PR, the code above returns:
```
`a` STRUCT<`b`: STRING COMMENT 'Nested comment'> COMMENT 'comment'`
```
### Why are the changes needed?
My team is using nested columns as first citizens, and I thought it would be nice to have comments for nested columns.
### Does this PR introduce _any_ user-facing change?
Now, when users call something like this,
```scala
spark.table("foo.bar").schema.fields.map(_.toDDL).mkString(", ")
```
they will get comments for the nested columns.
### How was this patch tested?
I added unit tests under `org.apache.spark.sql.types.StructTypeSuite`. They test if nested StructType's comment is included in the DDL string.
Closes#30851 from jacobhjkim/structtype-toddl.
Authored-by: Jacob Kim <me@jacobkim.io>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
This PR tries to rename `dataSourceRewriteRules` into something more generic.
### Why are the changes needed?
These changes are needed to address the post-review discussion [here](https://github.com/apache/spark/pull/30558#discussion_r533885837).
### Does this PR introduce _any_ user-facing change?
Yes but the changes haven't been released yet.
### How was this patch tested?
Existing tests.
Closes#30808 from aokolnychyi/spark-33784.
Authored-by: Anton Okolnychyi <aokolnychyi@apple.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
### What changes were proposed in this pull request?
This PR adds logic to build logical writes introduced in SPARK-33779.
Note: This PR contains a subset of changes discussed in PR #29066.
### Why are the changes needed?
These changes are the next step as discussed in the [design doc](https://docs.google.com/document/d/1X0NsQSryvNmXBY9kcvfINeYyKC-AahZarUqg3nS1GQs/edit#) for SPARK-23889.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Existing tests.
Closes#30806 from aokolnychyi/spark-33808.
Authored-by: Anton Okolnychyi <aokolnychyi@apple.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
### What changes were proposed in this pull request?
Add some case to match Array whose element type is primitive.
### Why are the changes needed?
We will get exception when use `Literal.create(Array(1, 2, 3), ArrayType(IntegerType))` .
```
Exception in thread "main" java.lang.IllegalArgumentException: requirement failed: Literal must have a corresponding value to array<int>, but class int[] found.
at scala.Predef$.require(Predef.scala:281)
at org.apache.spark.sql.catalyst.expressions.Literal$.validateLiteralValue(literals.scala:215)
at org.apache.spark.sql.catalyst.expressions.Literal.<init>(literals.scala:292)
at org.apache.spark.sql.catalyst.expressions.Literal$.create(literals.scala:140)
```
And same problem with other array whose element is primitive.
### Does this PR introduce _any_ user-facing change?
Yes.
### How was this patch tested?
Add test.
Closes#30868 from ulysses-you/SPARK-33860.
Authored-by: ulysses-you <ulyssesyou18@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
Verify ALTER TABLE CHANGE COLUMN with Char and Varchar and avoid unexpected change
For v1 table, changing type is not allowed, we fix a regression that uses the replaced string instead of the original char/varchar type when altering char/varchar columns
For v2 table,
char/varchar to string,
char(x) to char(x),
char(x)/varchar(x) to varchar(y) if x <=y are valid cases,
other changes are invalid
### Why are the changes needed?
Verify ALTER TABLE CHANGE COLUMN with Char and Varchar and avoid unexpected change
### Does this PR introduce _any_ user-facing change?
no
### How was this patch tested?
new test
Closes#30833 from yaooqinn/SPARK-33834.
Authored-by: Kent Yao <yaooqinn@hotmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
### What changes were proposed in this pull request?
* Implement `SparkScriptTransformationExec` based on `BaseScriptTransformationExec`
* Implement `SparkScriptTransformationWriterThread` based on `BaseScriptTransformationWriterThread` of writing data
* Add rule `SparkScripts` to support convert script LogicalPlan to SparkPlan in Spark SQL (without hive mode)
* Add `SparkScriptTransformationSuite` test spark spec case
* add test in `SQLQueryTestSuite`
And we will close#29085 .
### Why are the changes needed?
Support user use Script Transform without Hive
### Does this PR introduce _any_ user-facing change?
User can use Script Transformation without hive in no serde mode.
Such as :
**default no serde **
```
SELECT TRANSFORM(a, b, c)
USING 'cat' AS (a int, b string, c long)
FROM testData
```
**no serde with spec ROW FORMAT DELIMITED**
```
SELECT TRANSFORM(a, b, c)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
COLLECTION ITEMS TERMINATED BY '\u0002'
MAP KEYS TERMINATED BY '\u0003'
LINES TERMINATED BY '\n'
NULL DEFINED AS 'null'
USING 'cat' AS (a, b, c)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
COLLECTION ITEMS TERMINATED BY '\u0004'
MAP KEYS TERMINATED BY '\u0005'
LINES TERMINATED BY '\n'
NULL DEFINED AS 'NULL'
FROM testData
```
### How was this patch tested?
Added UT
Closes#29414 from AngersZhuuuu/SPARK-32106-MINOR.
Authored-by: angerszhu <angers.zhu@gmail.com>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
### What changes were proposed in this pull request?
This pr push the `UnaryExpression` into (if / case) branches. The use case is:
```sql
create table t1 using parquet as select id from range(10);
explain select id from t1 where (CASE WHEN id = 1 THEN '1' WHEN id = 3 THEN '2' end) > 3;
```
Before this pr:
```
== Physical Plan ==
*(1) Filter (cast(CASE WHEN (id#1L = 1) THEN 1 WHEN (id#1L = 3) THEN 2 END as int) > 3)
+- *(1) ColumnarToRow
+- FileScan parquet default.t1[id#1L] Batched: true, DataFilters: [(cast(CASE WHEN (id#1L = 1) THEN 1 WHEN (id#1L = 3) THEN 2 END as int) > 3)], Format: Parquet, Location: InMemoryFileIndex[file:/Users/yumwang/opensource/spark/spark-warehouse/org.apache.spark.sql.DataF..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<id:bigint>
```
After this pr:
```
== Physical Plan ==
LocalTableScan <empty>, [id#1L]
```
This change can also improve this case:
a78d6ce376/sql/core/src/test/resources/tpcds/q62.sql (L5-L22)
### Why are the changes needed?
Improve query performance.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Unit test.
Closes#30853 from wangyum/SPARK-33848.
Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>