spark-instrumented-optimizer

History

Steven Aerts 109247f02e [SPARK-35985][SQL] push partitionFilters for empty readDataSchema this commit makes sure that for File Source V2 partition filters are also taken into account when the readDataSchema is empty. This is the case for queries like: SELECT count(*) FROM tbl WHERE partition=foo SELECT input_file_name() FROM tbl WHERE partition=foo ### What changes were proposed in this pull request? As described in SPARK-35985 there is bug in the File Datasource V2 which prevents it to push down to the FileScanner for queries like the ones listed above. ### Why are the changes needed? If partitions filters are not pushed down, the whole dataset will be scanned while only one partition is interesting. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? An extra test was added which relies on the output of explain, as is done in other places. Closes #33191 from steven-aerts/SPARK-35985. Authored-by: Steven Aerts <steven.aerts@airties.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit `f06aa4a3f3`) Signed-off-by: Wenchen Fan <wenchen@databricks.com>		2021-07-16 04:53:23 +00:00
..
benchmarks	[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines	2021-04-03 23:02:56 +03:00
compatibility/src/test/scala/org/apache/spark/sql/hive/execution	Revert "[SPARK-33428][SQL] Conv UDF use BigInt to avoid Long value overflow"	2021-03-16 13:56:50 +08:00
src	[SPARK-35985][SQL] push partitionFilters for empty readDataSchema	2021-07-16 04:53:23 +00:00
pom.xml	[SPARK-35429][CORE] Remove commons-httpclient from Hadoop-3.2 profile due to EOL and CVEs	2021-06-15 14:43:30 -07:00