spark-instrumented-optimizer

History

John Ayad 8c2849a695 [SPARK-30082][SQL] Do not replace Zeros when replacing NaNs ### What changes were proposed in this pull request? Do not cast `NaN` to an `Integer`, `Long`, `Short` or `Byte`. This is because casting `NaN` to those types results in a `0` which erroneously replaces `0`s while only `NaN`s should be replaced. ### Why are the changes needed? This Scala code snippet: ``` import scala.math; println(Double.NaN.toLong) ``` returns `0` which is problematic as if you run the following Spark code, `0`s get replaced as well: ``` >>> df = spark.createDataFrame([(1.0, 0), (0.0, 3), (float('nan'), 0)], ("index", "value")) >>> df.show() +-----+-----+ \|index\|value\| +-----+-----+ \| 1.0\| 0\| \| 0.0\| 3\| \| NaN\| 0\| +-----+-----+ >>> df.replace(float('nan'), 2).show() +-----+-----+ \|index\|value\| +-----+-----+ \| 1.0\| 2\| \| 0.0\| 3\| \| 2.0\| 2\| +-----+-----+ ``` ### Does this PR introduce any user-facing change? Yes, after the PR, running the same above code snippet returns the correct expected results: ``` >>> df = spark.createDataFrame([(1.0, 0), (0.0, 3), (float('nan'), 0)], ("index", "value")) >>> df.show() +-----+-----+ \|index\|value\| +-----+-----+ \| 1.0\| 0\| \| 0.0\| 3\| \| NaN\| 0\| +-----+-----+ >>> df.replace(float('nan'), 2).show() +-----+-----+ \|index\|value\| +-----+-----+ \| 1.0\| 0\| \| 0.0\| 3\| \| 2.0\| 0\| +-----+-----+ ``` ### How was this patch tested? Added unit tests to verify replacing `NaN` only affects columns of type `Float` and `Double` Closes #26738 from johnhany97/SPARK-30082. Lead-authored-by: John Ayad <johnhany97@gmail.com> Co-authored-by: John Ayad <jayad@palantir.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>		2019-12-04 00:04:55 +08:00
..
benchmarks	[SPARK-30026][SQL] Whitespaces can be identified as delimiters in interval string	2019-11-27 01:20:38 +08:00
src	[SPARK-30082][SQL] Do not replace Zeros when replacing NaNs	2019-12-04 00:04:55 +08:00
v1.2/src	[SPARK-29981][BUILD][FOLLOWUP] Change hive.version.short	2019-11-23 12:50:50 -08:00
v2.3/src	[SPARK-29981][BUILD][FOLLOWUP] Change hive.version.short	2019-11-23 12:50:50 -08:00
pom.xml	[SPARK-30015][BUILD] Move hive-storage-api dependency from `hive-2.3` to `sql/core`	2019-11-25 10:54:14 -08:00