spark-instrumented-optimizer/sql/hive
Josh Rosen 91e9389f39 [SPARK-9729] [SPARK-9363] [SQL] Use sort merge join for left and right outer join
This patch adds a new `SortMergeOuterJoin` operator that performs left and right outer joins using sort merge join.  It also refactors `SortMergeJoin` in order to improve performance and code clarity.

Along the way, I also performed a couple pieces of minor cleanup and optimization:

- Rename the `HashJoin` physical planner rule to `EquiJoinSelection`, since it's also used for non-hash joins.
- Rewrite the comment at the top of `HashJoin` to better explain the precedence for choosing join operators.
- Update `JoinSuite` to use `SqlTestUtils.withConf` for changing SQLConf settings.

This patch incorporates several ideas from adrian-wang's patch, #5717.

Closes #5717.

<!-- Reviewable:start -->
[<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/7904)
<!-- Reviewable:end -->

Author: Josh Rosen <joshrosen@databricks.com>
Author: Daoyuan Wang <daoyuan.wang@intel.com>

Closes #7904 from JoshRosen/outer-join-smj and squashes 1 commits.
2015-08-10 22:04:41 -07:00
..
compatibility/src/test/scala/org/apache/spark/sql/hive/execution [SPARK-9141] [SQL] Remove project collapsing from DataFrame API 2015-08-05 09:01:45 -07:00
src [SPARK-9729] [SPARK-9363] [SQL] Use sort merge join for left and right outer join 2015-08-10 22:04:41 -07:00
pom.xml [SPARK-8064] [SQL] Build against Hive 1.2.1 2015-08-03 15:24:42 -07:00