spark-instrumented-optimizer/docs/sql-ref-syntax-qry-select-hints.md

---
layout: global
title: Join Hints
displayTitle: Join Hints
license: |
  Licensed to the Apache Software Foundation (ASF) under one or more
  contributor license agreements.  See the NOTICE file distributed with
  this work for additional information regarding copyright ownership.
  The ASF licenses this file to You under the Apache License, Version 2.0
  (the "License"); you may not use this file except in compliance with
  the License.  You may obtain a copy of the License at

     http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License.
---
### Description

Join Hints allow users to suggest the join strategy that Spark should use. Prior to Spark 3.0, only the `BROADCAST` Join Hint was supported. `MERGE`, `SHUFFLE_HASH` and `SHUFFLE_REPLICATE_NL` Joint Hints support was added in 3.0. When different join strategy hints are specified on both sides of a join, Spark prioritizes hints in the following order: `BROADCAST` over `MERGE` over `SHUFFLE_HASH` over `SHUFFLE_REPLICATE_NL`. When both sides are specified with the `BROADCAST` hint or the `SHUFFLE_HASH` hint, Spark will pick the build side based on the join type and the sizes of the relations. Since a given strategy may not support all join types, Spark is not guaranteed to use the join strategy suggested by the hint.

### Join Hints Types

<dl>
  <dt><code><em>BROADCAST</em></code></dt>
  <dd>
    Suggests that Spark use broadcast join. The join side with the hint will be broadcast regardless of <code>autoBroadcastJoinThreshold</code>. If both sides of the join have the broadcast hints, the one with the smaller size (based on stats) will be broadcast. The aliases for <code>BROADCAST</code> are <code>BROADCASTJOIN</code> and <code>MAPJOIN</code>.
  </dd>
</dl>

<dl>
  <dt><code><em>MERGE</em></code></dt>
  <dd>
     Suggests that Spark use shuffle sort merge join. The aliases for <code>MERGE</code> are <code>SHUFFLE_MERGE</code> and <code>MERGEJOIN</code>.
  </dd>
</dl>

<dl>
  <dt><code><em>SHUFFLE_HASH</em></code></dt>
  <dd>
     Suggests that Spark use shuffle hash join. If both sides have the shuffle hash hints, Spark chooses the smaller side (based on stats) as the build side.
  </dd>
</dl>

<dl>
  <dt><code><em>SHUFFLE_REPLICATE_NL</em></code></dt>
  <dd>
    Suggests that Spark use shuffle-and-replicate nested loop join.
  </dd>
</dl>

### Examples

{% highlight sql %}

-- Join Hints for broadcast join
SELECT /*+ BROADCAST(t1) */ * FROM t1 INNER JOIN t2 ON t1.key = t2.key;
SELECT /*+ BROADCASTJOIN (t1) */ * FROM t1 left JOIN t2 ON t1.key = t2.key;
SELECT /*+ MAPJOIN(t2) */ * FROM t1 right JOIN t2 ON t1.key = t2.key;

-- Join Hints for shuffle sort merge join
SELECT /*+ SHUFFLE_MERGE(t1) */ * FROM t1 INNER JOIN t2 ON t1.key = t2.key;
SELECT /*+ MERGEJOIN(t2) */ * FROM t1 INNER JOIN t2 ON t1.key = t2.key;
SELECT /*+ MERGE(t1) */ * FROM t1 INNER JOIN t2 ON t1.key = t2.key;

-- Join Hints for shuffle hash join
SELECT /*+ SHUFFLE_HASH(t1) */ * FROM t1 INNER JOIN t2 ON t1.key = t2.key;

-- Join Hints for shuffle-and-replicate nested loop join
SELECT /*+ SHUFFLE_REPLICATE_NL(t1) */ * FROM t1 INNER JOIN t2 ON t1.key = t2.key;

-- When different join strategy hints are specified on both sides of a join, Spark
-- prioritizes the BROADCAST hint over the MERGE hint over the SHUFFLE_HASH hint
-- over the SHUFFLE_REPLICATE_NL hint.
-- Spark will issue Warning in the following example
-- org.apache.spark.sql.catalyst.analysis.HintErrorLogger: Hint (strategy=merge)
-- is overridden by another hint and will not take effect.
SELECT /*+ BROADCAST(t1) */ /*+ MERGE(t1, t2) */ * FROM t1 INNER JOIN t2 ON t1.key = t2.key;

{% endhighlight %}

### Related Statements
- [JOIN](sql-ref-syntax-qry-select-join.html)
- [SELECT](sql-ref-syntax-qry-select.html)
[SPARK-28734][DOC] Initial table of content in the left hand side bar for SQL doc ## What changes were proposed in this pull request? This is a initial PR that creates the table of content for SQL reference guide. The left side bar will displays additional menu items corresponding to supported SQL constructs. One this PR is merged, we will fill in the content incrementally. Additionally this PR contains a minor change to make the left sidebar scrollable. Currently it is not possible to scroll in the left hand side window. ## How was this patch tested? Used jekyll build and serve to verify. Closes #25459 from dilipbiswal/ref-doc. Authored-by: Dilip Biswal <dbiswal@us.ibm.com> Signed-off-by: gatorsmile <gatorsmile@gmail.com> 2019-08-19 02:17:50 -04:00			`---`
			`layout: global`
[SPARK-31333][SQL][DOCS] Document Join Hints ### What changes were proposed in this pull request? Document Join Hints ### Why are the changes needed? To make SQL Reference complete ### Does this PR introduce any user-facing change? Yes <img width="1049" alt="Screen Shot 2020-04-03 at 9 20 15 AM" src="https://user-images.githubusercontent.com/13592258/78382976-7c546b80-758c-11ea-9a8e-e46cfb7106f5.png"> <img width="1051" alt="Screen Shot 2020-04-03 at 10 39 55 AM" src="https://user-images.githubusercontent.com/13592258/78389778-356c7300-7598-11ea-8e6c-3742dadda11c.png"> ### How was this patch tested? Manually build and check Closes #28113 from huaxingao/join-hints. Authored-by: Huaxin Gao <huaxing@us.ibm.com> Signed-off-by: Sean Owen <srowen@gmail.com> 2020-04-06 10:02:22 -04:00			`title: Join Hints`
			`displayTitle: Join Hints`
[SPARK-28734][DOC] Initial table of content in the left hand side bar for SQL doc ## What changes were proposed in this pull request? This is a initial PR that creates the table of content for SQL reference guide. The left side bar will displays additional menu items corresponding to supported SQL constructs. One this PR is merged, we will fill in the content incrementally. Additionally this PR contains a minor change to make the left sidebar scrollable. Currently it is not possible to scroll in the left hand side window. ## How was this patch tested? Used jekyll build and serve to verify. Closes #25459 from dilipbiswal/ref-doc. Authored-by: Dilip Biswal <dbiswal@us.ibm.com> Signed-off-by: gatorsmile <gatorsmile@gmail.com> 2019-08-19 02:17:50 -04:00			`license: \|`
			`Licensed to the Apache Software Foundation (ASF) under one or more`
			`contributor license agreements. See the NOTICE file distributed with`
			`this work for additional information regarding copyright ownership.`
			`The ASF licenses this file to You under the Apache License, Version 2.0`
			`(the "License"); you may not use this file except in compliance with`
			`the License. You may obtain a copy of the License at`
[SPARK-31333][SQL][DOCS] Document Join Hints ### What changes were proposed in this pull request? Document Join Hints ### Why are the changes needed? To make SQL Reference complete ### Does this PR introduce any user-facing change? Yes <img width="1049" alt="Screen Shot 2020-04-03 at 9 20 15 AM" src="https://user-images.githubusercontent.com/13592258/78382976-7c546b80-758c-11ea-9a8e-e46cfb7106f5.png"> <img width="1051" alt="Screen Shot 2020-04-03 at 10 39 55 AM" src="https://user-images.githubusercontent.com/13592258/78389778-356c7300-7598-11ea-8e6c-3742dadda11c.png"> ### How was this patch tested? Manually build and check Closes #28113 from huaxingao/join-hints. Authored-by: Huaxin Gao <huaxing@us.ibm.com> Signed-off-by: Sean Owen <srowen@gmail.com> 2020-04-06 10:02:22 -04:00
[SPARK-28734][DOC] Initial table of content in the left hand side bar for SQL doc ## What changes were proposed in this pull request? This is a initial PR that creates the table of content for SQL reference guide. The left side bar will displays additional menu items corresponding to supported SQL constructs. One this PR is merged, we will fill in the content incrementally. Additionally this PR contains a minor change to make the left sidebar scrollable. Currently it is not possible to scroll in the left hand side window. ## How was this patch tested? Used jekyll build and serve to verify. Closes #25459 from dilipbiswal/ref-doc. Authored-by: Dilip Biswal <dbiswal@us.ibm.com> Signed-off-by: gatorsmile <gatorsmile@gmail.com> 2019-08-19 02:17:50 -04:00			`http://www.apache.org/licenses/LICENSE-2.0`
[SPARK-31333][SQL][DOCS] Document Join Hints ### What changes were proposed in this pull request? Document Join Hints ### Why are the changes needed? To make SQL Reference complete ### Does this PR introduce any user-facing change? Yes <img width="1049" alt="Screen Shot 2020-04-03 at 9 20 15 AM" src="https://user-images.githubusercontent.com/13592258/78382976-7c546b80-758c-11ea-9a8e-e46cfb7106f5.png"> <img width="1051" alt="Screen Shot 2020-04-03 at 10 39 55 AM" src="https://user-images.githubusercontent.com/13592258/78389778-356c7300-7598-11ea-8e6c-3742dadda11c.png"> ### How was this patch tested? Manually build and check Closes #28113 from huaxingao/join-hints. Authored-by: Huaxin Gao <huaxing@us.ibm.com> Signed-off-by: Sean Owen <srowen@gmail.com> 2020-04-06 10:02:22 -04:00
[SPARK-28734][DOC] Initial table of content in the left hand side bar for SQL doc ## What changes were proposed in this pull request? This is a initial PR that creates the table of content for SQL reference guide. The left side bar will displays additional menu items corresponding to supported SQL constructs. One this PR is merged, we will fill in the content incrementally. Additionally this PR contains a minor change to make the left sidebar scrollable. Currently it is not possible to scroll in the left hand side window. ## How was this patch tested? Used jekyll build and serve to verify. Closes #25459 from dilipbiswal/ref-doc. Authored-by: Dilip Biswal <dbiswal@us.ibm.com> Signed-off-by: gatorsmile <gatorsmile@gmail.com> 2019-08-19 02:17:50 -04:00			`Unless required by applicable law or agreed to in writing, software`
			`distributed under the License is distributed on an "AS IS" BASIS,`
			`WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.`
			`See the License for the specific language governing permissions and`
			`limitations under the License.`
			`---`
[SPARK-31333][SQL][DOCS] Document Join Hints ### What changes were proposed in this pull request? Document Join Hints ### Why are the changes needed? To make SQL Reference complete ### Does this PR introduce any user-facing change? Yes <img width="1049" alt="Screen Shot 2020-04-03 at 9 20 15 AM" src="https://user-images.githubusercontent.com/13592258/78382976-7c546b80-758c-11ea-9a8e-e46cfb7106f5.png"> <img width="1051" alt="Screen Shot 2020-04-03 at 10 39 55 AM" src="https://user-images.githubusercontent.com/13592258/78389778-356c7300-7598-11ea-8e6c-3742dadda11c.png"> ### How was this patch tested? Manually build and check Closes #28113 from huaxingao/join-hints. Authored-by: Huaxin Gao <huaxing@us.ibm.com> Signed-off-by: Sean Owen <srowen@gmail.com> 2020-04-06 10:02:22 -04:00			`### Description`

			Join Hints allow users to suggest the join strategy that Spark should use. Prior to Spark 3.0, only the `BROADCAST` Join Hint was supported. `MERGE`, `SHUFFLE_HASH` and `SHUFFLE_REPLICATE_NL` Joint Hints support was added in 3.0. When different join strategy hints are specified on both sides of a join, Spark prioritizes hints in the following order: `BROADCAST` over `MERGE` over `SHUFFLE_HASH` over `SHUFFLE_REPLICATE_NL`. When both sides are specified with the `BROADCAST` hint or the `SHUFFLE_HASH` hint, Spark will pick the build side based on the join type and the sizes of the relations. Since a given strategy may not support all join types, Spark is not guaranteed to use the join strategy suggested by the hint.

			`### Join Hints Types`

			`<dl>`
			`<dt><code><em>BROADCAST</em></code></dt>`
			`<dd>`
			`Suggests that Spark use broadcast join. The join side with the hint will be broadcast regardless of <code>autoBroadcastJoinThreshold</code>. If both sides of the join have the broadcast hints, the one with the smaller size (based on stats) will be broadcast. The aliases for <code>BROADCAST</code> are <code>BROADCASTJOIN</code> and <code>MAPJOIN</code>.`
			`</dd>`
			`</dl>`

			`<dl>`
			`<dt><code><em>MERGE</em></code></dt>`
			`<dd>`
			`Suggests that Spark use shuffle sort merge join. The aliases for <code>MERGE</code> are <code>SHUFFLE_MERGE</code> and <code>MERGEJOIN</code>.`
			`</dd>`
			`</dl>`

			`<dl>`
			`<dt><code><em>SHUFFLE_HASH</em></code></dt>`
			`<dd>`
			`Suggests that Spark use shuffle hash join. If both sides have the shuffle hash hints, Spark chooses the smaller side (based on stats) as the build side.`
			`</dd>`
			`</dl>`

			`<dl>`
			`<dt><code><em>SHUFFLE_REPLICATE_NL</em></code></dt>`
			`<dd>`
			`Suggests that Spark use shuffle-and-replicate nested loop join.`
			`</dd>`
			`</dl>`

			`### Examples`

			`{% highlight sql %}`

			`-- Join Hints for broadcast join`
			`SELECT /+ BROADCAST(t1) / * FROM t1 INNER JOIN t2 ON t1.key = t2.key;`
			`SELECT /+ BROADCASTJOIN (t1) / * FROM t1 left JOIN t2 ON t1.key = t2.key;`
			`SELECT /+ MAPJOIN(t2) / * FROM t1 right JOIN t2 ON t1.key = t2.key;`

			`-- Join Hints for shuffle sort merge join`
			`SELECT /+ SHUFFLE_MERGE(t1) / * FROM t1 INNER JOIN t2 ON t1.key = t2.key;`
			`SELECT /+ MERGEJOIN(t2) / * FROM t1 INNER JOIN t2 ON t1.key = t2.key;`
			`SELECT /+ MERGE(t1) / * FROM t1 INNER JOIN t2 ON t1.key = t2.key;`

			`-- Join Hints for shuffle hash join`
			`SELECT /+ SHUFFLE_HASH(t1) / * FROM t1 INNER JOIN t2 ON t1.key = t2.key;`

			`-- Join Hints for shuffle-and-replicate nested loop join`
			`SELECT /+ SHUFFLE_REPLICATE_NL(t1) / * FROM t1 INNER JOIN t2 ON t1.key = t2.key;`

			`-- When different join strategy hints are specified on both sides of a join, Spark`
			`-- prioritizes the BROADCAST hint over the MERGE hint over the SHUFFLE_HASH hint`
			`-- over the SHUFFLE_REPLICATE_NL hint.`
			`-- Spark will issue Warning in the following example`
			`-- org.apache.spark.sql.catalyst.analysis.HintErrorLogger: Hint (strategy=merge)`
			`-- is overridden by another hint and will not take effect.`
			`SELECT /+ BROADCAST(t1) / /+ MERGE(t1, t2) / * FROM t1 INNER JOIN t2 ON t1.key = t2.key;`
[SPARK-28734][DOC] Initial table of content in the left hand side bar for SQL doc ## What changes were proposed in this pull request? This is a initial PR that creates the table of content for SQL reference guide. The left side bar will displays additional menu items corresponding to supported SQL constructs. One this PR is merged, we will fill in the content incrementally. Additionally this PR contains a minor change to make the left sidebar scrollable. Currently it is not possible to scroll in the left hand side window. ## How was this patch tested? Used jekyll build and serve to verify. Closes #25459 from dilipbiswal/ref-doc. Authored-by: Dilip Biswal <dbiswal@us.ibm.com> Signed-off-by: gatorsmile <gatorsmile@gmail.com> 2019-08-19 02:17:50 -04:00
[SPARK-31333][SQL][DOCS] Document Join Hints ### What changes were proposed in this pull request? Document Join Hints ### Why are the changes needed? To make SQL Reference complete ### Does this PR introduce any user-facing change? Yes <img width="1049" alt="Screen Shot 2020-04-03 at 9 20 15 AM" src="https://user-images.githubusercontent.com/13592258/78382976-7c546b80-758c-11ea-9a8e-e46cfb7106f5.png"> <img width="1051" alt="Screen Shot 2020-04-03 at 10 39 55 AM" src="https://user-images.githubusercontent.com/13592258/78389778-356c7300-7598-11ea-8e6c-3742dadda11c.png"> ### How was this patch tested? Manually build and check Closes #28113 from huaxingao/join-hints. Authored-by: Huaxin Gao <huaxing@us.ibm.com> Signed-off-by: Sean Owen <srowen@gmail.com> 2020-04-06 10:02:22 -04:00			`{% endhighlight %}`
[SPARK-31348][SQL][DOCS] Document Join in SQL Reference ### What changes were proposed in this pull request? Document join in SQL Reference. ### Why are the changes needed? To make SQL Reference complete. ### Does this PR introduce any user-facing change? Yes <img width="1050" alt="Screen Shot 2020-04-05 at 8 46 47 PM" src="https://user-images.githubusercontent.com/13592258/78521722-ab7efe80-777f-11ea-90f5-1fac09282721.png"> <img width="1049" alt="Screen Shot 2020-04-05 at 8 47 20 PM" src="https://user-images.githubusercontent.com/13592258/78521724-ade15880-777f-11ea-9238-183d999ed918.png"> <img width="1049" alt="Screen Shot 2020-04-05 at 8 47 41 PM" src="https://user-images.githubusercontent.com/13592258/78521726-b043b280-777f-11ea-996f-a8e86d453c01.png"> <img width="1049" alt="Screen Shot 2020-04-05 at 8 48 11 PM" src="https://user-images.githubusercontent.com/13592258/78521731-b3d73980-777f-11ea-85c8-c24798ef41ac.png"> <img width="1049" alt="Screen Shot 2020-04-05 at 8 48 33 PM" src="https://user-images.githubusercontent.com/13592258/78521734-b5a0fd00-777f-11ea-8b2c-96af30f3bf49.png"> ### How was this patch tested? Manually build and check. Closes #28121 from huaxingao/join. Authored-by: Huaxin Gao <huaxing@us.ibm.com> Signed-off-by: Sean Owen <srowen@gmail.com> 2020-04-12 14:57:54 -04:00
			`### Related Statements`
			`- [JOIN](sql-ref-syntax-qry-select-join.html)`
			`- [SELECT](sql-ref-syntax-qry-select.html)`