spark-instrumented-optimizer/sql/core
Yuming Wang a8b568800e [SPARK-32659][SQL] Fix the data issue when pruning DPP on non-atomic type
### What changes were proposed in this pull request?

Use `InSet` expression to fix data issue when pruning DPP on non-atomic type. for example:
   ```scala
    spark.range(1000)
    .select(col("id"), col("id").as("k"))
    .write
    .partitionBy("k")
    .format("parquet")
    .mode("overwrite")
    .saveAsTable("df1");

   spark.range(100)
   .select(col("id"), col("id").as("k"))
   .write
   .partitionBy("k")
   .format("parquet")
   .mode("overwrite")
   .saveAsTable("df2")

   spark.sql("set spark.sql.optimizer.dynamicPartitionPruning.fallbackFilterRatio=2")
   spark.sql("set spark.sql.optimizer.dynamicPartitionPruning.reuseBroadcastOnly=false")
   spark.sql("SELECT df1.id, df2.k FROM df1 JOIN df2 ON struct(df1.k) = struct(df2.k) AND df2.id < 2").show
   ```
   It should return two records, but it returns empty.

### Why are the changes needed?

Fix data issue

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Add new unit test.

Closes #29475 from wangyum/SPARK-32659.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2020-08-26 06:57:43 +00:00
..
benchmarks [SPARK-30648][SQL] Support filters pushdown in JSON datasource 2020-07-17 00:01:13 +09:00
src [SPARK-32659][SQL] Fix the data issue when pruning DPP on non-atomic type 2020-08-26 06:57:43 +00:00
v1.2/src [SPARK-32646][SQL][TEST-HADOOP2.7][TEST-HIVE1.2] ORC predicate pushdown should work with case-insensitive analysis 2020-08-25 13:47:52 +09:00
v2.3/src [SPARK-32646][SQL][TEST-HADOOP2.7][TEST-HIVE1.2] ORC predicate pushdown should work with case-insensitive analysis 2020-08-25 13:47:52 +09:00
pom.xml [SPARK-31336][SQL] Support Oracle Kerberos login in JDBC connector 2020-06-30 10:30:22 -07:00