[SPARK-32352][SQL][FOLLOW-UP][TEST-HADOOP2.7][TEST-HIVE1.2] Exclude partition columns from data columns
### What changes were proposed in this pull request? This PR fixes a bug of #29406. #29406 partially pushes down data filter even if it mixed in partition filters. But in some cases partition columns might be in data columns too. It will possibly push down a predicate with partition column to datasource. ### Why are the changes needed? The test "org.apache.spark.sql.hive.orc.HiveOrcHadoopFsRelationSuite.save()/load() - partitioned table - simple queries - partition columns in data" is currently failed with hive-1.2 profile in master branch. ``` [info] - save()/load() - partitioned table - simple queries - partition columns in data *** FAILED *** (1 second, 457 milliseconds) [info] java.util.NoSuchElementException: key not found: p1 [info] at scala.collection.immutable.Map$Map2.apply(Map.scala:138) [info] at org.apache.spark.sql.hive.orc.OrcFilters$.buildLeafSearchArgument(OrcFilters.scala:250) [info] at org.apache.spark.sql.hive.orc.OrcFilters$.convertibleFiltersHelper$1(OrcFilters.scala:143) [info] at org.apache.spark.sql.hive.orc.OrcFilters$.$anonfun$convertibleFilters$4(OrcFilters.scala:146) [info] at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:245) [info] at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36) [info] at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33) [info] at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38) [info] at scala.collection.TraversableLike.flatMap(TraversableLike.scala:245) [info] at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:242) [info] at scala.collection.AbstractTraversable.flatMap(Traversable.scala:108) [info] at org.apache.spark.sql.hive.orc.OrcFilters$.convertibleFilters(OrcFilters.scala:145) [info] at org.apache.spark.sql.hive.orc.OrcFilters$.createFilter(OrcFilters.scala:83) [info] at org.apache.spark.sql.hive.orc.OrcFileFormat.buildReader(OrcFileFormat.scala:142) ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Unit test. Closes #29526 from viirya/SPARK-32352-followup. Authored-by: Liang-Chi Hsieh <viirya@gmail.com> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
This commit is contained in:
parent
db74fd0d33
commit
11c6a23c13
|
@ -176,9 +176,11 @@ object FileSourceStrategy extends Strategy with PredicateHelper with Logging {
|
|||
l.resolve(fsRelation.dataSchema, fsRelation.sparkSession.sessionState.analyzer.resolver)
|
||||
|
||||
// Partition keys are not available in the statistics of the files.
|
||||
// `dataColumns` might have partition columns, we need to filter them out.
|
||||
val dataColumnsWithoutPartitionCols = dataColumns.filterNot(partitionColumns.contains)
|
||||
val dataFilters = normalizedFiltersWithoutSubqueries.flatMap { f =>
|
||||
if (f.references.intersect(partitionSet).nonEmpty) {
|
||||
extractPredicatesWithinOutputSet(f, AttributeSet(dataColumns))
|
||||
extractPredicatesWithinOutputSet(f, AttributeSet(dataColumnsWithoutPartitionCols))
|
||||
} else {
|
||||
Some(f)
|
||||
}
|
||||
|
|
Loading…
Reference in a new issue