spark-instrumented-optimizer

History

Davies Liu 0e70fd61b4 [SPARK-15438][SQL] improve explain of whole stage codegen ## What changes were proposed in this pull request? Currently, the explain of a query with whole-stage codegen looks like this ``` >>> df = sqlCtx.range(1000);df2 = sqlCtx.range(1000);df.join(pyspark.sql.functions.broadcast(df2), 'id').explain() == Physical Plan == WholeStageCodegen : +- Project [id#1L] : +- BroadcastHashJoin [id#1L], [id#4L], Inner, BuildRight, None : :- Range 0, 1, 4, 1000, [id#1L] : +- INPUT +- BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint])) +- WholeStageCodegen : +- Range 0, 1, 4, 1000, [id#4L] ``` The problem is that the plan looks much different than logical plan, make us hard to understand the plan (especially when the logical plan is not showed together). This PR will change it to: ``` >>> df = sqlCtx.range(1000);df2 = sqlCtx.range(1000);df.join(pyspark.sql.functions.broadcast(df2), 'id').explain() == Physical Plan == Project [id#0L] +- BroadcastHashJoin [id#0L], [id#3L], Inner, BuildRight, None :- Range 0, 1, 4, 1000, [id#0L] +- BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, false])) +- Range 0, 1, 4, 1000, [id#3L] ``` The `*`before the plan means that it's part of whole-stage codegen, it's easy to understand. ## How was this patch tested? Manually ran some queries and check the explain. Author: Davies Liu <davies@databricks.com> Closes #13204 from davies/explain_codegen.	2016-05-20 13:21:53 -07:00
..
src	[SPARK-15438][SQL] improve explain of whole stage codegen	2016-05-20 13:21:53 -07:00
pom.xml	[SPARK-15290][BUILD] Move annotations, like @Since / @DeveloperApi, into spark-tags	2016-05-17 09:55:53 +01:00

Davies Liu 0e70fd61b4 [SPARK-15438][SQL] improve explain of whole stage codegen

## What changes were proposed in this pull request?

Currently, the explain of a query with whole-stage codegen looks like this
```
>>> df = sqlCtx.range(1000);df2 = sqlCtx.range(1000);df.join(pyspark.sql.functions.broadcast(df2), 'id').explain()
== Physical Plan ==
WholeStageCodegen
:  +- Project [id#1L]
:     +- BroadcastHashJoin [id#1L], [id#4L], Inner, BuildRight, None
:        :- Range 0, 1, 4, 1000, [id#1L]
:        +- INPUT
+- BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint]))
   +- WholeStageCodegen
      :  +- Range 0, 1, 4, 1000, [id#4L]
```

The problem is that the plan looks much different than logical plan, make us hard to understand the plan (especially when the logical plan is not showed together).

This PR will change it to:

```
>>> df = sqlCtx.range(1000);df2 = sqlCtx.range(1000);df.join(pyspark.sql.functions.broadcast(df2), 'id').explain()
== Physical Plan ==
*Project [id#0L]
+- *BroadcastHashJoin [id#0L], [id#3L], Inner, BuildRight, None
   :- *Range 0, 1, 4, 1000, [id#0L]
   +- BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, false]))
      +- *Range 0, 1, 4, 1000, [id#3L]
```

The `*`before the plan means that it's part of whole-stage codegen, it's easy to understand.

## How was this patch tested?

Manually ran some queries and check the explain.

Author: Davies Liu <davies@databricks.com>

Closes #13204 from davies/explain_codegen.

2016-05-20 13:21:53 -07:00

src

[SPARK-15438][SQL] improve explain of whole stage codegen

2016-05-20 13:21:53 -07:00

pom.xml

[SPARK-15290][BUILD] Move annotations, like @Since / @DeveloperApi, into spark-tags

2016-05-17 09:55:53 +01:00