spark-instrumented-optimizer/sql/catalyst/src/main
Ioana Delaney 8163911594 [SPARK-17791][SQL] Join reordering using star schema detection
## What changes were proposed in this pull request?

Star schema consists of one or more fact tables referencing a number of dimension tables. In general, queries against star schema are expected to run fast because of the established RI constraints among the tables. This design proposes a join reordering based on natural, generally accepted heuristics for star schema queries:
- Finds the star join with the largest fact table and places it on the driving arm of the left-deep join. This plan avoids large tables on the inner, and thus favors hash joins.
- Applies the most selective dimensions early in the plan to reduce the amount of data flow.

The design document was included in SPARK-17791.

Link to the google doc: [StarSchemaDetection](https://docs.google.com/document/d/1UAfwbm_A6wo7goHlVZfYK99pqDMEZUumi7pubJXETEA/edit?usp=sharing)

## How was this patch tested?

A new test suite StarJoinSuite.scala was implemented.

Author: Ioana Delaney <ioanamdelaney@gmail.com>

Closes #15363 from ioana-delaney/starJoinReord2.
2017-03-20 16:04:58 +08:00
..
antlr4/org/apache/spark/sql/catalyst/parser [SPARK-19850][SQL] Allow the use of aliases in SQL function calls 2017-03-14 12:49:30 +01:00
java/org/apache/spark/sql [SPARK-19067][SS] Processing-time-based timeout in MapGroupsWithState 2017-03-19 14:07:49 -07:00
scala/org/apache/spark/sql [SPARK-17791][SQL] Join reordering using star schema detection 2017-03-20 16:04:58 +08:00