643cd876e4
### What changes were proposed in this pull request? We support partially push partition filters since SPARK-28169. We can also support partially push down data filters if it mixed in partition filters and data filters. For example: ``` spark.sql( s""" |CREATE TABLE t(i INT, p STRING) |USING parquet |PARTITIONED BY (p)""".stripMargin) spark.range(0, 1000).selectExpr("id as col").createOrReplaceTempView("temp") for (part <- Seq(1, 2, 3, 4)) { sql(s""" |INSERT OVERWRITE TABLE t PARTITION (p='$part') |SELECT col FROM temp""".stripMargin) } spark.sql("SELECT * FROM t WHERE WHERE (p = '1' AND i = 1) OR (p = '2' and i = 2)").explain() ``` We can also push down ```i = 1 or i = 2 ``` ### Why are the changes needed? Extract more data filter to FileSourceScanExec ### Does this PR introduce _any_ user-facing change? NO ### How was this patch tested? Added UT Closes #29406 from AngersZhuuuu/SPARK-32352. Authored-by: angerszhu <angers.zhu@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> |
||
---|---|---|
.. | ||
benchmarks | ||
src | ||
v1.2/src | ||
v2.3/src | ||
pom.xml |