spark-instrumented-optimizer

History

manuzhang c43460cf82 [SPARK-32753][SQL] Only copy tags to node with no tags ### What changes were proposed in this pull request? Only copy tags to node with no tags when transforming plans. ### Why are the changes needed? cloud-fan [made a good point](https://github.com/apache/spark/pull/29593#discussion_r482013121) that it doesn't make sense to append tags to existing nodes when nodes are removed. That will cause such bugs as duplicate rows when deduplicating and repartitioning by the same column with AQE. ``` spark.range(10).union(spark.range(10)).createOrReplaceTempView("v1") val df = spark.sql("select id from v1 group by id distribute by id") println(df.collect().toArray.mkString(",")) println(df.queryExecution.executedPlan) // With AQE [4],[0],[3],[2],[1],[7],[6],[8],[5],[9],[4],[0],[3],[2],[1],[7],[6],[8],[5],[9] AdaptiveSparkPlan(isFinalPlan=true) +- CustomShuffleReader local +- ShuffleQueryStage 0 +- Exchange hashpartitioning(id#183L, 10), true +- (3) HashAggregate(keys=[id#183L], functions=[], output=[id#183L]) +- Union :- (1) Range (0, 10, step=1, splits=2) +- (2) Range (0, 10, step=1, splits=2) // Without AQE [4],[7],[0],[6],[8],[3],[2],[5],[1],[9] (4) HashAggregate(keys=[id#206L], functions=[], output=[id#206L]) +- Exchange hashpartitioning(id#206L, 10), true +- (3) HashAggregate(keys=[id#206L], functions=[], output=[id#206L]) +- Union :- (1) Range (0, 10, step=1, splits=2) +- *(2) Range (0, 10, step=1, splits=2) ``` It's too expensive to detect node removal so we make a compromise only to copy tags to node with no tags. ### Does this PR introduce _any_ user-facing change? Yes. Fix a bug. ### How was this patch tested? Add test. Closes #29593 from manuzhang/spark-32753. Authored-by: manuzhang <owenzhang1990@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>		2020-09-07 16:08:57 +00:00
..
benchmarks	[SPARK-30648][SQL] Support filters pushdown in JSON datasource	2020-07-17 00:01:13 +09:00
src	[SPARK-32753][SQL] Only copy tags to node with no tags	2020-09-07 16:08:57 +00:00
v1.2/src	[SPARK-32646][SQL][TEST-HADOOP2.7][TEST-HIVE1.2] ORC predicate pushdown should work with case-insensitive analysis	2020-08-25 13:47:52 +09:00
v2.3/src	[SPARK-32646][SQL][TEST-HADOOP2.7][TEST-HIVE1.2] ORC predicate pushdown should work with case-insensitive analysis	2020-08-25 13:47:52 +09:00
pom.xml	[SPARK-31336][SQL] Support Oracle Kerberos login in JDBC connector	2020-06-30 10:30:22 -07:00