spark-instrumented-optimizer

History

Ioana Delaney 8163911594 [SPARK-17791][SQL] Join reordering using star schema detection ## What changes were proposed in this pull request? Star schema consists of one or more fact tables referencing a number of dimension tables. In general, queries against star schema are expected to run fast because of the established RI constraints among the tables. This design proposes a join reordering based on natural, generally accepted heuristics for star schema queries: - Finds the star join with the largest fact table and places it on the driving arm of the left-deep join. This plan avoids large tables on the inner, and thus favors hash joins. - Applies the most selective dimensions early in the plan to reduce the amount of data flow. The design document was included in SPARK-17791. Link to the google doc: [StarSchemaDetection](https://docs.google.com/document/d/1UAfwbm_A6wo7goHlVZfYK99pqDMEZUumi7pubJXETEA/edit?usp=sharing) ## How was this patch tested? A new test suite StarJoinSuite.scala was implemented. Author: Ioana Delaney <ioanamdelaney@gmail.com> Closes #15363 from ioana-delaney/starJoinReord2.		2017-03-20 16:04:58 +08:00
..
antlr4/org/apache/spark/sql/catalyst/parser	[SPARK-19850][SQL] Allow the use of aliases in SQL function calls	2017-03-14 12:49:30 +01:00
java/org/apache/spark/sql	[SPARK-19067][SS] Processing-time-based timeout in MapGroupsWithState	2017-03-19 14:07:49 -07:00
scala/org/apache/spark/sql	[SPARK-17791][SQL] Join reordering using star schema detection	2017-03-20 16:04:58 +08:00