spark-instrumented-optimizer

History

Karen Feng 2e54d68eb9 [SPARK-34547][SQL] Only use metadata columns for resolution as last resort ### What changes were proposed in this pull request? Today, child expressions may be resolved based on "real" or metadata output attributes. We should prefer the real attribute during resolution if one exists. ### Why are the changes needed? Today, attempting to resolve an expression when there is a "real" output attribute and a metadata attribute with the same name results in resolution failure. This is likely unexpected, as the user may not know about the metadata attribute. ### Does this PR introduce _any_ user-facing change? Yes. Previously, the user would see an error message when resolving a column with the same name as a "real" output attribute and a metadata attribute as below: ``` org.apache.spark.sql.AnalysisException: Reference 'index' is ambiguous, could be: testcat.ns1.ns2.tableTwo.index, testcat.ns1.ns2.tableOne.index.; line 1 pos 71 at org.apache.spark.sql.catalyst.expressions.package$AttributeSeq.resolve(package.scala:363) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveChildren(LogicalPlan.scala:107) ``` Now, resolution succeeds and provides the "real" output attribute. ### How was this patch tested? Added a unit test. Closes #31654 from karenfeng/fallback-resolve-metadata. Authored-by: Karen Feng <karen.feng@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>		2021-03-02 17:27:13 +08:00
..
benchmarks	[SPARK-34192][SQL] Move char padding to write side and remove length check on read side too	2021-01-26 02:08:35 +08:00
src	[SPARK-34547][SQL] Only use metadata columns for resolution as last resort	2021-03-02 17:27:13 +08:00
pom.xml	[SPARK-33662][BUILD] Setting version to 3.2.0-SNAPSHOT	2020-12-04 14:10:42 -08:00