spark-instrumented-optimizer

History

Cheng Lian f2e855fba8 [SPARK-13473][SQL] Simplifies PushPredicateThroughProject ## What changes were proposed in this pull request? This is a follow-up of PR #11348. After PR #11348, a predicate is never pushed through a project as long as the project contains any non-deterministic fields. Thus, it's impossible that the candidate filter condition can reference any non-deterministic projected fields, and related logic can be safely cleaned up. To be more specific, the following optimization is allowed: ```scala // From: df.select('a, 'b).filter('c > rand(42)) // To: df.filter('c > rand(42)).select('a, 'b) ``` while this isn't: ```scala // From: df.select('a, rand('b) as 'rb, 'c).filter('c > 'rb) // To: df.filter('c > rand('b)).select('a, rand('b) as 'rb, 'c) ``` ## How was this patch tested? Existing test cases should do the work. Author: Cheng Lian <lian@databricks.com> Closes #11864 from liancheng/spark-13473-cleanup.	2016-03-22 19:20:56 +08:00
..
src	[SPARK-13473][SQL] Simplifies PushPredicateThroughProject	2016-03-22 19:20:56 +08:00
pom.xml	[SPARK-11565] Replace deprecated DigestUtils.shaHex call	2016-02-10 09:52:35 +00:00

Cheng Lian f2e855fba8 [SPARK-13473][SQL] Simplifies PushPredicateThroughProject

## What changes were proposed in this pull request?

This is a follow-up of PR #11348.

After PR #11348, a predicate is never pushed through a project as long as the project contains any non-deterministic fields. Thus, it's impossible that the candidate filter condition can reference any non-deterministic projected fields, and related logic can be safely cleaned up.

To be more specific, the following optimization is allowed:

```scala
// From:
df.select('a, 'b).filter('c > rand(42))
// To:
df.filter('c > rand(42)).select('a, 'b)
```

while this isn't:

```scala
// From:
df.select('a, rand('b) as 'rb, 'c).filter('c > 'rb)
// To:
df.filter('c > rand('b)).select('a, rand('b) as 'rb, 'c)
```

## How was this patch tested?

Existing test cases should do the work.

Author: Cheng Lian <lian@databricks.com>

Closes #11864 from liancheng/spark-13473-cleanup.

2016-03-22 19:20:56 +08:00

src

[SPARK-13473][SQL] Simplifies PushPredicateThroughProject

2016-03-22 19:20:56 +08:00

pom.xml

[SPARK-11565] Replace deprecated DigestUtils.shaHex call

2016-02-10 09:52:35 +00:00