spark-instrumented-optimizer/sql/core
Marco Gaido 88e0c7bbd5 [SPARK-24341][SQL] Support only IN subqueries with the same number of items per row
## What changes were proposed in this pull request?

Using struct types in subqueries with the `IN` clause can generate invalid plans in `RewritePredicateSubquery`. Indeed, we are not handling clearly the cases when the outer value is a struct or the output of the inner subquery is a struct.

The PR aims to make Spark's behavior the same as the one of the other RDBMS - namely Oracle and Postgres behavior were checked. So we consider valid only queries having the same number of fields in the outer value and in the subquery. This means that:

 - `(a, b) IN (select c, d from ...)` is a valid query;
 - `(a, b) IN (select (c, d) from ...)` throws an AnalysisException, as in the subquery we have only one field of type struct while in the outer value we have 2 fields;
 - `a IN (select (c, d) from ...)` - where `a` is a struct - is a valid query.

## How was this patch tested?

Added UT

Closes #21403 from mgaido91/SPARK-24313.

Authored-by: Marco Gaido <marcogaido91@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2018-08-07 15:43:41 +08:00
..
benchmarks [SPARK-24549][SQL] Support Decimal type push down to the parquet data sources 2018-07-16 15:44:51 +08:00
src [SPARK-24341][SQL] Support only IN subqueries with the same number of items per row 2018-08-07 15:43:41 +08:00
pom.xml [SPARK-25019][BUILD] Fix orc dependency to use the same exclusion rules 2018-08-06 12:00:39 -07:00