spark-instrumented-optimizer/sql/hive
Wenchen Fan 9b262f6a08 [SPARK-22356][SQL] data source table should support overlapped columns between data and partition schema
## What changes were proposed in this pull request?

This is a regression introduced by #14207. After Spark 2.1, we store the inferred schema when creating the table, to avoid inferring schema again at read path. However, there is one special case: overlapped columns between data and partition. For this case, it breaks the assumption of table schema that there is on ovelap between data and partition schema, and partition columns should be at the end. The result is, for Spark 2.1, the table scan has incorrect schema that puts partition columns at the end. For Spark 2.2, we add a check in CatalogTable to validate table schema, which fails at this case.

To fix this issue, a simple and safe approach is to fallback to old behavior when overlapeed columns detected, i.e. store empty schema in metastore.

## How was this patch tested?

new regression test

Author: Wenchen Fan <wenchen@databricks.com>

Closes #19579 from cloud-fan/bug2.
2017-10-26 17:39:53 -07:00
..
compatibility/src/test/scala/org/apache/spark/sql/hive/execution [SPARK-21831][TEST] Remove spark.sql.hive.convertMetastoreOrc config in HiveCompatibilitySuite 2017-08-25 19:51:13 -07:00
src [SPARK-22356][SQL] data source table should support overlapped columns between data and partition schema 2017-10-26 17:39:53 -07:00
pom.xml [SPARK-21936][SQL] backward compatibility test framework for HiveExternalCatalog 2017-09-07 23:21:49 -07:00