[SPARK-36838][SQL] Improve InSet generated code performance

### What changes were proposed in this pull request?
Since Set can't check is NaN value is contained in current set.
With codegen, only when value set contains NaN then we have  necessary to check if the value is NaN, or we just need t
o check is the Set contains the value.

### Why are the changes needed?
Improve generated code's performance. Make only check NaN when Set contains NaN.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Existed UT

Closes #34097 from AngersZhuuuu/SPARK-36838.

Authored-by: Angerszhuuuu <angers.zhu@gmail.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
This commit is contained in:
Angerszhuuuu 2021-09-27 12:13:47 +09:00 committed by Hyukjin Kwon
parent 0b1eec133c
commit 00b986384d

View file

@ -612,26 +612,27 @@ case class InSet(child: Expression, hset: Set[Any]) extends UnaryExpression with
"" ""
} }
val ret = child.dataType match { val isNaNCode = child.dataType match {
case DoubleType => Some((v: Any) => s"java.lang.Double.isNaN($v)") case DoubleType => Some((v: Any) => s"java.lang.Double.isNaN($v)")
case FloatType => Some((v: Any) => s"java.lang.Float.isNaN($v)") case FloatType => Some((v: Any) => s"java.lang.Float.isNaN($v)")
case _ => None case _ => None
} }
ret.map { isNaN => if (hasNaN && isNaNCode.isDefined) {
s""" s"""
|if ($setTerm.contains($c)) { |if ($setTerm.contains($c)) {
| ${ev.value} = true; | ${ev.value} = true;
|} else if (${isNaN(c)}) { |} else if (${isNaNCode.get(c)}) {
| ${ev.value} = $hasNaN; | ${ev.value} = true;
|} |}
|$setIsNull |$setIsNull
|""".stripMargin """.stripMargin
}.getOrElse( } else {
s""" s"""
|${ev.value} = $setTerm.contains($c); |${ev.value} = $setTerm.contains($c);
|$setIsNull |$setIsNull
""".stripMargin) """.stripMargin
}
}) })
} }