afc943ff8a
### What changes were proposed in this pull request? Spark org.apache.spark.sql.functions do not have `if` function so conditions are expressed using `when-otherwise` function. However `If` (which is available in SQL) has more efficient code gen. This pr rewrites `when-otherwise` conditions to `If` if it is possible (`when-otherwise` with single branch) ### Why are the changes needed? It is an optimization enhancement. Here is a simple performance comparison (tested in local mode (with 4 cores)): ``` val df = spark.range(10000000000L).withColumn("x", rand) val resultA = df.withColumn("r", when($"x" < 0.5, lit(1)).otherwise(lit(0))).agg(sum($"r")) val resultB = df.withColumn("r", expr("if(x < 0.5, 1, 0)")).agg(sum($"r")) resultA.collect() // takes 56s to finish resultB.collect() // takes 30s to finish ``` ### Does this PR introduce any user-facing change? No ### How was this patch tested? New test is added. Closes #26294 from davidvrba/spark-28477_rewriteCaseWhenToIf. Authored-by: davidvrba <vrba.dave@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> |
||
---|---|---|
.. | ||
benchmarks | ||
src | ||
pom.xml |