spark-instrumented-optimizer/python/pyspark/sql
hyukjinkwon 4b4ee26010 [SPARK-23328][PYTHON] Disallow default value None in na.replace/replace when 'to_replace' is not a dictionary
## What changes were proposed in this pull request?

This PR proposes to disallow default value None when 'to_replace' is not a dictionary.

It seems weird we set the default value of `value` to `None` and we ended up allowing the case as below:

```python
>>> df.show()
```
```
+----+------+-----+
| age|height| name|
+----+------+-----+
|  10|    80|Alice|
...
```

```python
>>> df.na.replace('Alice').show()
```
```
+----+------+----+
| age|height|name|
+----+------+----+
|  10|    80|null|
...
```

**After**

This PR targets to disallow the case above:

```python
>>> df.na.replace('Alice').show()
```
```
...
TypeError: value is required when to_replace is not a dictionary.
```

while we still allow when `to_replace` is a dictionary:

```python
>>> df.na.replace({'Alice': None}).show()
```
```
+----+------+----+
| age|height|name|
+----+------+----+
|  10|    80|null|
...
```

## How was this patch tested?

Manually tested, tests were added in `python/pyspark/sql/tests.py` and doctests were fixed.

Author: hyukjinkwon <gurwls223@gmail.com>

Closes #20499 from HyukjinKwon/SPARK-19454-followup.
2018-02-09 14:21:10 +08:00
..
__init__.py [SPARK-22369][PYTHON][DOCS] Exposes catalog API documentation in PySpark 2017-11-02 15:22:52 +01:00
catalog.py [SPARK-23122][PYTHON][SQL] Deprecate register* for UDFs in SQLContext and Catalog in PySpark 2018-01-18 14:51:05 +09:00
column.py [SPARK-19165][PYTHON][SQL] PySpark APIs using columns as arguments should validate input types for column 2017-08-24 20:29:03 +09:00
conf.py [SPARK-15464][ML][MLLIB][SQL][TESTS] Replace SQLContext and SparkContext with SparkSession using builder pattern in python test code 2016-05-23 18:14:48 -07:00
context.py [SPARK-23122][PYTHON][SQL] Deprecate register* for UDFs in SQLContext and Catalog in PySpark 2018-01-18 14:51:05 +09:00
dataframe.py [SPARK-23328][PYTHON] Disallow default value None in na.replace/replace when 'to_replace' is not a dictionary 2018-02-09 14:21:10 +08:00
functions.py [SPARK-23327][SQL] Update the description and tests of three external API or functions 2018-02-06 16:46:43 -08:00
group.py [SPARK-23261][PYSPARK] Rename Pandas UDFs 2018-01-30 21:55:55 +09:00
readwriter.py [SPARK-22818][SQL] csv escape of quote escape 2017-12-29 07:30:06 +08:00
session.py [SPARK-23319][TESTS] Explicitly specify Pandas and PyArrow versions in PySpark tests (to skip or test) 2018-02-07 23:28:10 +09:00
streaming.py [SPARK-23143][SS][PYTHON] Added python API for setting continuous trigger 2018-01-18 12:25:52 -08:00
tests.py [SPARK-23328][PYTHON] Disallow default value None in na.replace/replace when 'to_replace' is not a dictionary 2018-02-09 14:21:10 +08:00
types.py [SPARK-23290][SQL][PYTHON] Use datetime.date for date type when converting Spark DataFrame to Pandas DataFrame. 2018-02-06 14:52:25 +08:00
udf.py [SPARK-23122][PYSPARK][FOLLOWUP] Replace registerTempTable by createOrReplaceTempView 2018-02-07 23:24:16 +09:00
utils.py [SPARK-23319][TESTS] Explicitly specify Pandas and PyArrow versions in PySpark tests (to skip or test) 2018-02-07 23:28:10 +09:00
window.py [SPARK-18690][PYTHON][SQL] Backward compatibility of unbounded frames 2016-12-02 17:39:28 -08:00