## What changes were proposed in this pull request?
This PR proposes to respect `SessionConfigSupport` in SS datasources as well. Currently these are only respected in batch sources:
e06da95cd9/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala (L198-L203)e06da95cd9/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala (L244-L249)
If a developer makes a datasource V2 that supports both structured streaming and batch jobs, batch jobs respect a specific configuration, let's say, URL to connect and fetch data (which end users might not be aware of); however, structured streaming ends up with not supporting this (and should explicitly be set into options).
## How was this patch tested?
Unit tests were added.
Closes#22462 from HyukjinKwon/SPARK-25460.
Authored-by: hyukjinkwon <gurwls223@apache.org>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
## What changes were proposed in this pull request?
This goes to revert sequential PRs based on some discussion and comments at https://github.com/apache/spark/pull/16677#issuecomment-422650759.
#22344#22330#22239#16677
## How was this patch tested?
Existing tests.
Closes#22481 from viirya/revert-SPARK-19355-1.
Authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
## What changes were proposed in this pull request?
Refactor `FilterPushdownBenchmark` use `main` method. we can use 3 ways to run this test now:
1. bin/spark-submit --class org.apache.spark.sql.execution.benchmark.FilterPushdownBenchmark spark-sql_2.11-2.5.0-SNAPSHOT-tests.jar
2. build/sbt "sql/test:runMain org.apache.spark.sql.execution.benchmark.FilterPushdownBenchmark"
3. SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain org.apache.spark.sql.execution.benchmark.FilterPushdownBenchmark"
The method 2 and the method 3 do not need to compile the `spark-sql_*-tests.jar` package. So these two methods are mainly for developers to quickly do benchmark.
## How was this patch tested?
manual tests
Closes#22443 from wangyum/SPARK-25339.
Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
## What changes were proposed in this pull request?
Addition of numeric and list hints for SparkR.
## How was this patch tested?
Add test in test_sparkSQL.R
Author: Huaxin Gao <huaxing@us.ibm.com>
Closes#21649 from huaxingao/spark-23648.
## What changes were proposed in this pull request?
This patch adds an "optimizer" prefix to nested schema pruning.
## How was this patch tested?
Should be covered by existing tests.
Closes#22475 from rxin/SPARK-4502.
Authored-by: Reynold Xin <rxin@databricks.com>
Signed-off-by: gatorsmile <gatorsmile@gmail.com>
## What changes were proposed in this pull request?
The PR proposes to return the data type of the operands as a result for the `div` operator. Before the PR, `bigint` is always returned. It introduces also a `spark.sql.legacy.integralDivide.returnBigint` config in order to let the users restore the legacy behavior.
## How was this patch tested?
added UTs
Closes#22465 from mgaido91/SPARK-25457.
Authored-by: Marco Gaido <marcogaido91@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
## What changes were proposed in this pull request?
The PR updates the migration guide in order to explain the changes introduced in the behavior of the IN operator with subqueries, in particular, the improved handling of struct attributes in these situations.
## How was this patch tested?
NA
Closes#22469 from mgaido91/SPARK-24341_followup.
Authored-by: Marco Gaido <marcogaido91@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
## What changes were proposed in this pull request?
This patch changes the config option `spark.sql.streaming.noDataMicroBatchesEnabled` to `spark.sql.streaming.noDataMicroBatches.enabled` to be more consistent with rest of the configs. Unfortunately there is one streaming config called `spark.sql.streaming.metricsEnabled`. For that one we should just use a fallback config and change it in a separate patch.
## How was this patch tested?
Made sure no other references to this config are in the code base:
```
> git grep "noDataMicro"
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: buildConf("spark.sql.streaming.noDataMicroBatches.enabled")
```
Closes#22476 from rxin/SPARK-24157.
Authored-by: Reynold Xin <rxin@databricks.com>
Signed-off-by: Reynold Xin <rxin@databricks.com>
## What changes were proposed in this pull request?
Fix test that constructs a Pandas DataFrame by specifying the column order. Previously this test assumed the columns would be sorted alphabetically, however when using Python 3.6 with Pandas 0.23 or higher, the original column order is maintained. This causes the columns to get mixed up and the test errors.
Manually tested with `python/run-tests` using Python 3.6.6 and Pandas 0.23.4
Closes#22477 from BryanCutler/pyspark-tests-py36-pd23-SPARK-25471.
Authored-by: Bryan Cutler <cutlerb@gmail.com>
Signed-off-by: hyukjinkwon <gurwls223@apache.org>
## What changes were proposed in this pull request?
Improve testcase "image datasource test: read non image" to tolerate different schema representation.
Because file:/path and file:///path are both valid URI-ifications so in some environment the testcase will fail.
## How was this patch tested?
Manual.
Closes#22449 from WeichenXu123/image_url.
Authored-by: WeichenXu <weichen.xu@databricks.com>
Signed-off-by: Xiangrui Meng <meng@databricks.com>
## What changes were proposed in this pull request?
For self-join/self-union, Spark will produce a physical plan which has multiple `DataSourceV2ScanExec` instances referring to the same `ReadSupport` instance. In this case, the streaming source is indeed scanned multiple times, and the `numInputRows` metrics should be counted for each scan.
Actually we already have 2 test cases to verify the behavior:
1. `StreamingQuerySuite.input row calculation with same V2 source used twice in self-join`
2. `KafkaMicroBatchSourceSuiteBase.ensure stream-stream self-join generates only one offset in log and correct metrics`.
However, in these 2 tests, the expected result is different, which is super confusing. It turns out that, the first test doesn't trigger exchange reuse, so the source is scanned twice. The second test triggers exchange reuse, and the source is scanned only once.
This PR proposes to improve these 2 tests, to test with/without exchange reuse.
## How was this patch tested?
test only change
Closes#22402 from cloud-fan/bug.
Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
## What changes were proposed in this pull request?
In SPARK-23711, `UnsafeProjection` supports fallback to an interpreted mode. Therefore, this pr fixed code to support the same fallback mode in `MutableProjection` based on `CodeGeneratorWithInterpretedFallback`.
## How was this patch tested?
Added tests in `CodeGeneratorWithInterpretedFallbackSuite`.
Closes#22355 from maropu/SPARK-25358.
Authored-by: Takeshi Yamamuro <yamamuro@apache.org>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
## What changes were proposed in this pull request?
This is a follow up for #22441.
1. Remove flag "-Pkafka-0-8" for Scala 2.12 build.
2. Clean up the script, simpler logic.
3. Switch to Scala version to 2.11 before script exit.
## How was this patch tested?
Manual test.
Closes#22454 from gengliangwang/revise_release_build.
Authored-by: Gengliang Wang <gengliang.wang@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
## What changes were proposed in this pull request?
To be more consistent with other statistics based configs.
## How was this patch tested?
N/A - straightforward rename of config option. Used `git grep` to make sure there are no mention of it.
Closes#22457 from rxin/SPARK-24626.
Authored-by: Reynold Xin <rxin@databricks.com>
Signed-off-by: gatorsmile <gatorsmile@gmail.com>
## What changes were proposed in this pull request?
`spark.sql.fromJsonForceNullableSchema` -> `spark.sql.function.fromJson.forceNullable`
## How was this patch tested?
Made sure there are no more references to `spark.sql.fromJsonForceNullableSchema`.
Closes#22459 from rxin/SPARK-23173.
Authored-by: Reynold Xin <rxin@databricks.com>
Signed-off-by: gatorsmile <gatorsmile@gmail.com>
Several configuration parameters related to Kubernetes need to be
reset, as they are changed with each invokation of spark-submit and
thus prevents recovery of Spark Streaming tasks.
## What changes were proposed in this pull request?
When using the Kubernetes cluster-manager and spawning a Streaming workload, it is important to reset many spark.kubernetes.* properties that are generated by spark-submit but which would get rewritten when restoring a Checkpoint. This is so, because the spark-submit codepath creates Kubernetes resources, such as a ConfigMap, a Secret and other variables, which have an autogenerated name and the previous one will not resolve anymore.
In short, this change enables checkpoint restoration for streaming workloads, and thus enables Spark Streaming workloads in Kubernetes, which were not possible to restore from a checkpoint before if the workload went down.
## How was this patch tested?
This patch needs would benefit from testing in different k8s clusters.
This is similar to the YARN related code for resetting a Spark Streaming workload, but for the Kubernetes scheduler. This PR removes the initcontainers properties that existed before because they are now removed in master.
For a previous discussion, see the non-rebased work at: apache-spark-on-k8s#516
Closes#22392 from ssaavedra/fix-checkpointing-master.
Authored-by: Santiago Saavedra <santiagosaavedra@gmail.com>
Signed-off-by: Yinan Li <ynli@google.com>
PythonForeachWriterSuite was failing because RowQueue now needs to have a handle on a SparkEnv with a SerializerManager, so added a mock env with a serializer manager.
Also fixed a typo in the `finally` that was hiding the real exception.
Tested PythonForeachWriterSuite locally, full tests via jenkins.
Closes#22452 from squito/SPARK-25456.
Authored-by: Imran Rashid <irashid@cloudera.com>
Signed-off-by: Imran Rashid <irashid@cloudera.com>
## What changes were proposed in this pull request?
Added fix to flakiness that was present in PySpark tests w.r.t Executors not being tested.
Important fix to executorConf which was failing tests when executors *were* tested
## How was this patch tested?
Unit and Integration tests
Closes#22415 from ifilonenko/SPARK-25291.
Authored-by: Ilan Filonenko <if56@cornell.edu>
Signed-off-by: Yinan Li <ynli@google.com>
## What changes were proposed in this pull request?
Update `tuning.md` and `rdd-programming-guide.md` to use JDK8.
## How was this patch tested?
manual tests
Closes#22446 from wangyum/java8.
Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
## What changes were proposed in this pull request?
update the package and publish steps, to support scala 2.12
## How was this patch tested?
manual test
Closes#22441 from cloud-fan/scala.
Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
## What changes were proposed in this pull request?
SPARK-22333 introduced a regression in the resolution of `CURRENT_DATE` and `CURRENT_TIMESTAMP`. Before that ticket, these 2 functions were resolved in a case insensitive way. After, this depends on the value of `spark.sql.caseSensitive`.
The PR restores the previous behavior and makes their resolution case insensitive anyhow. The PR takes over #21217, therefore it closes#21217 and credit for this patch should be given to jamesthomp.
## How was this patch tested?
added UT
Closes#22440 from mgaido91/SPARK-24151.
Lead-authored-by: James Thompson <jamesthomp@users.noreply.github.com>
Co-authored-by: Marco Gaido <marcogaido91@gmail.com>
Signed-off-by: gatorsmile <gatorsmile@gmail.com>
## What changes were proposed in this pull request?
This PR makes `GenArrayData.genCodeToCreateArrayData` method simple by using `ArrayData.createArrayData` method.
Before this PR, `genCodeToCreateArrayData` method was complicated
* Generated a temporary Java array to create `ArrayData`
* Had separate code generation path to assign values for `GenericArrayData` and `UnsafeArrayData`
After this PR, the method
* Directly generates `GenericArrayData` or `UnsafeArrayData` without a temporary array
* Has only code generation path to assign values
## How was this patch tested?
Existing UTs
Closes#22439 from kiszk/SPARK-25444.
Authored-by: Kazuaki Ishizaki <ishizaki@jp.ibm.com>
Signed-off-by: Takuya UESHIN <ueshin@databricks.com>
## What changes were proposed in this pull request?
These 2 changes are required to build the docs for Spark 2.4.0 RC1:
1. install `mkdocs` in the docker image
2. set locale to C.UTF-8. Otherwise jekyll fails to build the doc.
## How was this patch tested?
tested manually when doing the 2.4.0 RC1
Closes#22438 from cloud-fan/infra.
Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
## What changes were proposed in this pull request?
The PR takes over #14036 and it introduces a new expression `IntegralDivide` in order to avoid the several unneded cast added previously.
In order to prove the performance gain, the following benchmark has been run:
```
test("Benchmark IntegralDivide") {
val r = new scala.util.Random(91)
val nData = 1000000
val testDataInt = (1 to nData).map(_ => (r.nextInt(), r.nextInt()))
val testDataLong = (1 to nData).map(_ => (r.nextLong(), r.nextLong()))
val testDataShort = (1 to nData).map(_ => (r.nextInt().toShort, r.nextInt().toShort))
// old code
val oldExprsInt = testDataInt.map(x =>
Cast(Divide(Cast(Literal(x._1), DoubleType), Cast(Literal(x._2), DoubleType)), LongType))
val oldExprsLong = testDataLong.map(x =>
Cast(Divide(Cast(Literal(x._1), DoubleType), Cast(Literal(x._2), DoubleType)), LongType))
val oldExprsShort = testDataShort.map(x =>
Cast(Divide(Cast(Literal(x._1), DoubleType), Cast(Literal(x._2), DoubleType)), LongType))
// new code
val newExprsInt = testDataInt.map(x => IntegralDivide(x._1, x._2))
val newExprsLong = testDataLong.map(x => IntegralDivide(x._1, x._2))
val newExprsShort = testDataShort.map(x => IntegralDivide(x._1, x._2))
Seq(("Long", "old", oldExprsLong),
("Long", "new", newExprsLong),
("Int", "old", oldExprsInt),
("Int", "new", newExprsShort),
("Short", "old", oldExprsShort),
("Short", "new", oldExprsShort)).foreach { case (dt, t, ds) =>
val start = System.nanoTime()
ds.foreach(e => e.eval(EmptyRow))
val endNoCodegen = System.nanoTime()
println(s"Running $nData op with $t code on $dt (no-codegen): ${(endNoCodegen - start) / 1000000} ms")
}
}
```
The results on my laptop are:
```
Running 1000000 op with old code on Long (no-codegen): 600 ms
Running 1000000 op with new code on Long (no-codegen): 112 ms
Running 1000000 op with old code on Int (no-codegen): 560 ms
Running 1000000 op with new code on Int (no-codegen): 135 ms
Running 1000000 op with old code on Short (no-codegen): 317 ms
Running 1000000 op with new code on Short (no-codegen): 153 ms
```
Showing a 2-5X improvement. The benchmark doesn't include code generation as it is pretty hard to test the performance there as for such simple operations the most of the time is spent in the code generation/compilation process.
## How was this patch tested?
added UTs
Closes#22395 from mgaido91/SPARK-16323.
Authored-by: Marco Gaido <marcogaido91@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
## What changes were proposed in this pull request?
Output `dataFilters` in `DataSourceScanExec.metadata`.
## How was this patch tested?
unit tests
Closes#22435 from wangyum/SPARK-25423.
Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
## What changes were proposed in this pull request?
Fix location of licenses-binary in binary release, and remove binary items from source release
## How was this patch tested?
N/A
Closes#22436 from srowen/SPARK-24654.2.
Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
## What changes were proposed in this pull request?
There are some mistakes in examples of newly added functions. Also the format of the example results are not unified. We should fix them.
## How was this patch tested?
Manually executed the examples.
Closes#22437 from ueshin/issues/SPARK-25431/fix_examples_2.
Authored-by: Takuya UESHIN <ueshin@databricks.com>
Signed-off-by: hyukjinkwon <gurwls223@apache.org>
## What changes were proposed in this pull request?
Spark supports BloomFilter creation for ORC files. This PR aims to add test coverages to prevent accidental regressions like [SPARK-12417](https://issues.apache.org/jira/browse/SPARK-12417).
## How was this patch tested?
Pass the Jenkins with newly added test cases.
Closes#22418 from dongjoon-hyun/SPARK-25427.
Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
What changes were proposed in this pull request
Updated the Migration guide for the behavior changes done in the JIRA issue SPARK-23425.
How was this patch tested?
Manually verified.
Closes#22396 from sujith71955/master_newtest.
Authored-by: s71955 <sujithchacko.2010@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
## What changes were proposed in this pull request?
`without-hadoop` profile doesn't exist in Maven, instead the name should be `hadoop-provided`, this is a regression introduced by SPARK-24685. So here fix it.
## How was this patch tested?
Local test.
Closes#22434 from jerryshao/SPARK-24685-followup.
Authored-by: jerryshao <sshao@hortonworks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Continuation of #22370. Summary of discussion there:
There is some inconsistency in the R manual w.r.t. supercedent functions linking back to deprecated functions.
- `createOrReplaceTempView` and `createTable` both link back to functions which are deprecated (`registerTempTable` and `createExternalTable`, respectively)
- `sparkR.session` and `dropTempView` do _not_ link back to deprecated functions
This PR takes the view that it is preferable _not_ to link back to deprecated functions, and removes these references from `?createOrReplaceTempView` and `?createTable`.
As `registerTempTable` was included in the `SparkDataFrame functions` `family` of functions, other documentation pages which included a link to `?registerTempTable` will similarly be altered.
Author: Michael Chirico <michael.chirico@grabtaxi.com>
Author: Michael Chirico <michaelchirico4@gmail.com>
Closes#22393 from MichaelChirico/axe_deprecated_doc_refs.
## What changes were proposed in this pull request?
Fixes TPCH DDL datatype of `customer.c_nationkey` from `STRING` to `BIGINT` according to spec and `nation.nationkey` in `TPCHQuerySuite.scala`. The rest of the keys are OK.
Note, this will lead to **non-comparable previous results** to new runs involving the customer table.
## How was this patch tested?
Manual tests
Author: npoggi <npmnpm@gmail.com>
Closes#22430 from npoggi/SPARK-25439_Fix-TPCH-customer-c_nationkey.
## What changes were proposed in this pull request?
This PR aims to fix three things in `FilterPushdownBenchmark`.
**1. Use the same memory assumption.**
The following configurations are used in ORC and Parquet.
- Memory buffer for writing
- parquet.block.size (default: 128MB)
- orc.stripe.size (default: 64MB)
- Compression chunk size
- parquet.page.size (default: 1MB)
- orc.compress.size (default: 256KB)
SPARK-24692 used 1MB, the default value of `parquet.page.size`, for `parquet.block.size` and `orc.stripe.size`. But, it missed to match `orc.compress.size`. So, the current benchmark shows the result from ORC with 256KB memory for compression and Parquet with 1MB. To compare correctly, we need to be consistent.
**2. Dictionary encoding should not be enforced for all cases.**
SPARK-24206 enforced dictionary encoding for all test cases. This PR recovers the default behavior in general and enforces dictionary encoding only in case of `prepareStringDictTable`.
**3. Generate test result on AWS r3.xlarge**
SPARK-24206 generated the result on AWS in order to reproduce and compare easily. This PR also aims to update the result on the same machine again in the same reason. Specifically, AWS r3.xlarge with Instance Store is used.
## How was this patch tested?
Manual. Enable the test cases and run `FilterPushdownBenchmark` on `AWS r3.xlarge`. It takes about 4 hours 15 minutes.
Closes#22427 from dongjoon-hyun/SPARK-25438.
Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
## What changes were proposed in this pull request?
In the PR, I propose overriding session options by extra options in DataSource V2. Extra options are more specific and set via `.option()`, and should overwrite more generic session options. Entries from seconds map overwrites entries with the same key from the first map, for example:
```Scala
scala> Map("option" -> false) ++ Map("option" -> true)
res0: scala.collection.immutable.Map[String,Boolean] = Map(option -> true)
```
## How was this patch tested?
Added a test for checking which option is propagated to a data source in `load()`.
Closes#22413 from MaxGekk/session-options.
Lead-authored-by: Maxim Gekk <maxim.gekk@databricks.com>
Co-authored-by: Dongjoon Hyun <dongjoon@apache.org>
Co-authored-by: Maxim Gekk <max.gekk@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
## What changes were proposed in this pull request?
In the dev list, we can still discuss whether the next version is 2.5.0 or 3.0.0. Let us first bump the master branch version to `2.5.0-SNAPSHOT`.
## How was this patch tested?
N/A
Closes#22426 from gatorsmile/bumpVersionMaster.
Authored-by: gatorsmile <gatorsmile@gmail.com>
Signed-off-by: gatorsmile <gatorsmile@gmail.com>
## What changes were proposed in this pull request?
This pr removed the duplicate fallback logic in `UnsafeProjection`.
This pr comes from #22355.
## How was this patch tested?
Added tests in `CodeGeneratorWithInterpretedFallbackSuite`.
Closes#22417 from maropu/SPARK-25426.
Authored-by: Takeshi Yamamuro <yamamuro@apache.org>
Signed-off-by: gatorsmile <gatorsmile@gmail.com>
## What changes were proposed in this pull request?
There are some mistakes in examples of newly added functions. Also the format of the example results are not unified. We should fix and unify them.
## How was this patch tested?
Manually executed the examples.
Closes#22421 from ueshin/issues/SPARK-25431/fix_examples.
Authored-by: Takuya UESHIN <ueshin@databricks.com>
Signed-off-by: gatorsmile <gatorsmile@gmail.com>
## What changes were proposed in this pull request?
When Hive support enabled, Hive catalog puts extra storage properties into table metadata even for DataSource tables, but we should not have them.
## How was this patch tested?
Modified a test.
Closes#22410 from ueshin/issues/SPARK-25418/hive_metadata.
Authored-by: Takuya UESHIN <ueshin@databricks.com>
Signed-off-by: gatorsmile <gatorsmile@gmail.com>
We've seen some flakiness in jenkins in SchedulerIntegrationSuite which looks like it just needs a
longer timeout.
Closes#22385 from squito/SPARK-25400.
Authored-by: Imran Rashid <irashid@cloudera.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
## What changes were proposed in this pull request?
This PR ensures to call `super.afterAll()` in `override afterAll()` method for test suites.
* Some suites did not call `super.afterAll()`
* Some suites may call `super.afterAll()` only under certain condition
* Others never call `super.afterAll()`.
This PR also ensures to call `super.beforeAll()` in `override beforeAll()` for test suites.
## How was this patch tested?
Existing UTs
Closes#22337 from kiszk/SPARK-25338.
Authored-by: Kazuaki Ishizaki <ishizaki@jp.ibm.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(Link to Jira: https://issues.apache.org/jira/browse/SPARK-25406)
## What changes were proposed in this pull request?
The current use of `withSQLConf` in `ParquetSchemaPruningSuite.scala` is incorrect. The desired configuration settings are not being set when running the test cases.
This PR fixes that defective usage and addresses the test failures that were previously masked by that defect.
## How was this patch tested?
I added code to relevant test cases to print the expected SQL configuration settings and found that the settings were not being set as expected. When I changed the order of calls to `test` and `withSQLConf` I found that the configuration settings were being set as expected.
Closes#22394 from mallman/spark-25406-fix_broken_schema_pruning_tests.
Authored-by: Michael Allman <msa@allman.ms>
Signed-off-by: DB Tsai <d_tsai@apple.com>