spark-instrumented-optimizer/sql/catalyst
davidvrba afc943ff8a [SPARK-28477][SQL] Rewrite CaseWhen with single branch to If
### What changes were proposed in this pull request?
Spark org.apache.spark.sql.functions do not have `if` function so conditions are expressed using `when-otherwise` function. However `If` (which is available in SQL) has more efficient code gen. This pr rewrites `when-otherwise` conditions to `If` if it is possible (`when-otherwise` with single branch)

### Why are the changes needed?
It is an optimization enhancement. Here is a simple performance comparison (tested in local mode (with 4 cores)):
```
val df = spark.range(10000000000L).withColumn("x", rand)
val resultA = df.withColumn("r", when($"x" < 0.5, lit(1)).otherwise(lit(0))).agg(sum($"r"))
val resultB = df.withColumn("r", expr("if(x < 0.5, 1, 0)")).agg(sum($"r"))

resultA.collect() // takes 56s to finish
resultB.collect() // takes 30s to finish
```
### Does this PR introduce any user-facing change?
No

### How was this patch tested?
New test is added.

Closes #26294 from davidvrba/spark-28477_rewriteCaseWhenToIf.

Authored-by: davidvrba <vrba.dave@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2019-11-08 21:25:48 +08:00
..
benchmarks [SPARK-29300][TESTS] Compare catalyst and avro module benchmark in JDK8/11 2019-09-30 17:59:43 -07:00
src [SPARK-28477][SQL] Rewrite CaseWhen with single branch to If 2019-11-08 21:25:48 +08:00
pom.xml Revert "Prepare Spark release v3.0.0-preview-rc2" 2019-10-30 17:45:44 -07:00