spark-instrumented-optimizer

History

John Zhuge 36ea55e97e [SPARK-24940][SQL] Coalesce and Repartition Hint for SQL Queries ## What changes were proposed in this pull request? Many Spark SQL users in my company have asked for a way to control the number of output files in Spark SQL. The users prefer not to use function repartition(n) or coalesce(n, shuffle) that require them to write and deploy Scala/Java/Python code. We propose adding the following Hive-style Coalesce and Repartition Hint to Spark SQL: ``` ... SELECT /+ COALESCE(numPartitions) / ... ... SELECT /+ REPARTITION(numPartitions) / ... ``` Multiple such hints are allowed. Multiple nodes are inserted into the logical plan, and the optimizer will pick the leftmost hint. ``` INSERT INTO s SELECT /+ REPARTITION(100), COALESCE(500), COALESCE(10) / * FROM t == Logical Plan == 'InsertIntoTable 'UnresolvedRelation `s`, false, false +- 'UnresolvedHint REPARTITION, [100] +- 'UnresolvedHint COALESCE, [500] +- 'UnresolvedHint COALESCE, [10] +- 'Project [*] +- 'UnresolvedRelation `t` == Optimized Logical Plan == InsertIntoHadoopFsRelationCommand ... +- Repartition 100, true +- HiveTableRelation ... ``` ## How was this patch tested? All unit tests. Manual tests using explain. Author: John Zhuge <jzhuge@apache.org> Closes #21911 from jzhuge/SPARK-24940.		2018-08-04 02:27:15 -04:00
..
java/org/apache/spark/sql	[SPARK-10399][SPARK-23879][HOTFIX] Fix Java lint errors	2018-04-06 10:23:26 -07:00
resources	[SPARK-14134][CORE] Change the package name used for shading classes.	2016-04-06 19:33:51 -07:00
scala/org/apache/spark/sql	[SPARK-24940][SQL] Coalesce and Repartition Hint for SQL Queries	2018-08-04 02:27:15 -04:00