spark-instrumented-optimizer/sql/hive
Steven Aerts 109247f02e [SPARK-35985][SQL] push partitionFilters for empty readDataSchema
this commit makes sure that for File Source V2 partition filters are
also taken into account when the readDataSchema is empty.
This is the case for queries like:

    SELECT count(*) FROM tbl WHERE partition=foo
    SELECT input_file_name() FROM tbl WHERE partition=foo

### What changes were proposed in this pull request?

As described in SPARK-35985 there is bug in the File Datasource V2 which prevents it to push down to the FileScanner for queries like the ones listed above.

### Why are the changes needed?

If partitions filters are not pushed down, the whole dataset will be scanned while only one partition is interesting.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?

An extra test was added which relies on the output of explain, as is done in other places.

Closes #33191 from steven-aerts/SPARK-35985.

Authored-by: Steven Aerts <steven.aerts@airties.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit f06aa4a3f3)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2021-07-16 04:53:23 +00:00
..
benchmarks [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines 2021-04-03 23:02:56 +03:00
compatibility/src/test/scala/org/apache/spark/sql/hive/execution Revert "[SPARK-33428][SQL] Conv UDF use BigInt to avoid Long value overflow" 2021-03-16 13:56:50 +08:00
src [SPARK-35985][SQL] push partitionFilters for empty readDataSchema 2021-07-16 04:53:23 +00:00
pom.xml [SPARK-35429][CORE] Remove commons-httpclient from Hadoop-3.2 profile due to EOL and CVEs 2021-06-15 14:43:30 -07:00