spark-instrumented-optimizer/sql/core
John Ayad 8c2849a695 [SPARK-30082][SQL] Do not replace Zeros when replacing NaNs
### What changes were proposed in this pull request?
Do not cast `NaN` to an `Integer`, `Long`, `Short` or `Byte`. This is because casting `NaN` to those types results in a `0` which erroneously replaces `0`s while only `NaN`s should be replaced.

### Why are the changes needed?
This Scala code snippet:
```
import scala.math;

println(Double.NaN.toLong)
```
returns `0` which is problematic as if you run the following Spark code, `0`s get replaced as well:
```
>>> df = spark.createDataFrame([(1.0, 0), (0.0, 3), (float('nan'), 0)], ("index", "value"))
>>> df.show()
+-----+-----+
|index|value|
+-----+-----+
|  1.0|    0|
|  0.0|    3|
|  NaN|    0|
+-----+-----+
>>> df.replace(float('nan'), 2).show()
+-----+-----+
|index|value|
+-----+-----+
|  1.0|    2|
|  0.0|    3|
|  2.0|    2|
+-----+-----+
```

### Does this PR introduce any user-facing change?
Yes, after the PR, running the same above code snippet returns the correct expected results:
```
>>> df = spark.createDataFrame([(1.0, 0), (0.0, 3), (float('nan'), 0)], ("index", "value"))
>>> df.show()
+-----+-----+
|index|value|
+-----+-----+
|  1.0|    0|
|  0.0|    3|
|  NaN|    0|
+-----+-----+

>>> df.replace(float('nan'), 2).show()
+-----+-----+
|index|value|
+-----+-----+
|  1.0|    0|
|  0.0|    3|
|  2.0|    0|
+-----+-----+
```

### How was this patch tested?

Added unit tests to verify replacing `NaN` only affects columns of type `Float` and `Double`

Closes #26738 from johnhany97/SPARK-30082.

Lead-authored-by: John Ayad <johnhany97@gmail.com>
Co-authored-by: John Ayad <jayad@palantir.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2019-12-04 00:04:55 +08:00
..
benchmarks [SPARK-30026][SQL] Whitespaces can be identified as delimiters in interval string 2019-11-27 01:20:38 +08:00
src [SPARK-30082][SQL] Do not replace Zeros when replacing NaNs 2019-12-04 00:04:55 +08:00
v1.2/src [SPARK-29981][BUILD][FOLLOWUP] Change hive.version.short 2019-11-23 12:50:50 -08:00
v2.3/src [SPARK-29981][BUILD][FOLLOWUP] Change hive.version.short 2019-11-23 12:50:50 -08:00
pom.xml [SPARK-30015][BUILD] Move hive-storage-api dependency from hive-2.3 to sql/core 2019-11-25 10:54:14 -08:00