spark-instrumented-optimizer

History

Yuming Wang a8b568800e [SPARK-32659][SQL] Fix the data issue when pruning DPP on non-atomic type ### What changes were proposed in this pull request? Use `InSet` expression to fix data issue when pruning DPP on non-atomic type. for example: ```scala spark.range(1000) .select(col("id"), col("id").as("k")) .write .partitionBy("k") .format("parquet") .mode("overwrite") .saveAsTable("df1"); spark.range(100) .select(col("id"), col("id").as("k")) .write .partitionBy("k") .format("parquet") .mode("overwrite") .saveAsTable("df2") spark.sql("set spark.sql.optimizer.dynamicPartitionPruning.fallbackFilterRatio=2") spark.sql("set spark.sql.optimizer.dynamicPartitionPruning.reuseBroadcastOnly=false") spark.sql("SELECT df1.id, df2.k FROM df1 JOIN df2 ON struct(df1.k) = struct(df2.k) AND df2.id < 2").show ``` It should return two records, but it returns empty. ### Why are the changes needed? Fix data issue ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Add new unit test. Closes #29475 from wangyum/SPARK-32659. Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>		2020-08-26 06:57:43 +00:00
..
benchmarks	[SPARK-30648][SQL] Support filters pushdown in JSON datasource	2020-07-17 00:01:13 +09:00
src	[SPARK-32659][SQL] Fix the data issue when pruning DPP on non-atomic type	2020-08-26 06:57:43 +00:00
v1.2/src	[SPARK-32646][SQL][TEST-HADOOP2.7][TEST-HIVE1.2] ORC predicate pushdown should work with case-insensitive analysis	2020-08-25 13:47:52 +09:00
v2.3/src	[SPARK-32646][SQL][TEST-HADOOP2.7][TEST-HIVE1.2] ORC predicate pushdown should work with case-insensitive analysis	2020-08-25 13:47:52 +09:00
pom.xml	[SPARK-31336][SQL] Support Oracle Kerberos login in JDBC connector	2020-06-30 10:30:22 -07:00