[SPARK-36269][SQL] Fix only set data columns to Hive column names config

### What changes were proposed in this pull request?

When reading Hive table, we set the Hive column id and column name configs (`hive.io.file.readcolumn.ids` and `hive.io.file.readcolumn.names`). We should set non-partition columns (data columns) for both configs, as Spark always [appends partition columns in its own Hive reader](https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala#L240). The column id config has only non-partition columns, but column name config has both partition and non-partition columns. We should keep them to be consistent with only non-partition columns. This does not cause issue for public OSS Hive file format for now. But for customized internal Hive file format, it causes the issue as we are expecting these two configs to be same.

### Why are the changes needed?

Fix the code logic to be more consistent.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing Hive tests.

Closes #33489 from c21/hive-col.

Authored-by: Cheng Su <chengsu@fb.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit e5616e32ee)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
This commit is contained in:
Cheng Su 2021-07-26 18:48:06 +08:00 committed by Wenchen Fan
parent ec91818e14
commit f42cc10512

View file

@ -117,8 +117,9 @@ case class HiveTableScanExec(
// Specifies needed column IDs for those non-partitioning columns.
val columnOrdinals = AttributeMap(relation.dataCols.zipWithIndex)
val neededColumnIDs = output.flatMap(columnOrdinals.get).map(o => o: Integer)
val neededColumnNames = output.filter(columnOrdinals.contains).map(_.name)
HiveShim.appendReadColumns(hiveConf, neededColumnIDs, output.map(_.name))
HiveShim.appendReadColumns(hiveConf, neededColumnIDs, neededColumnNames)
val deserializer = tableDesc.getDeserializerClass.getConstructor().newInstance()
deserializer.initialize(hiveConf, tableDesc.getProperties)