spark-instrumented-optimizer/sql/catalyst
Davies Liu 0f72e4f04b [SPARK-16958] [SQL] Reuse subqueries within the same query
## What changes were proposed in this pull request?

There could be multiple subqueries that generate same results, we could re-use the result instead of running it multiple times.

This PR also cleanup up how we run subqueries.

For SQL query
```sql
select id,(select avg(id) from t) from t where id > (select avg(id) from t)
```
The explain is
```
== Physical Plan ==
*Project [id#15L, Subquery subquery29 AS scalarsubquery()#35]
:  +- Subquery subquery29
:     +- *HashAggregate(keys=[], functions=[avg(id#15L)])
:        +- Exchange SinglePartition
:           +- *HashAggregate(keys=[], functions=[partial_avg(id#15L)])
:              +- *Range (0, 1000, splits=4)
+- *Filter (cast(id#15L as double) > Subquery subquery29)
   :  +- Subquery subquery29
   :     +- *HashAggregate(keys=[], functions=[avg(id#15L)])
   :        +- Exchange SinglePartition
   :           +- *HashAggregate(keys=[], functions=[partial_avg(id#15L)])
   :              +- *Range (0, 1000, splits=4)
   +- *Range (0, 1000, splits=4)
```
The visualized plan:

![reuse-subquery](https://cloud.githubusercontent.com/assets/40902/17573229/e578d93c-5f0d-11e6-8a3c-0150d81d3aed.png)

## How was this patch tested?

Existing tests.

Author: Davies Liu <davies@databricks.com>

Closes #14548 from davies/subq.
2016-08-11 09:47:19 -07:00
..
src [SPARK-16958] [SQL] Reuse subqueries within the same query 2016-08-11 09:47:19 -07:00
pom.xml [SPARK-16535][BUILD] In pom.xml, remove groupId which is redundant definition and inherited from the parent 2016-07-19 11:59:46 +01:00