spark-instrumented-optimizer

History

bravo-zhang 84454d7d33 [SPARK-14932][SQL] Allow DataFrame.replace() to replace values with None ## What changes were proposed in this pull request? Currently `df.na.replace("", Map[String, String]("NULL" -> null))` will produce exception. This PR enables passing null/None as value in the replacement map in DataFrame.replace(). Note that the replacement map keys and values should still be the same type, while the values can have a mix of null/None and that type. This PR enables following operations for example: `df.na.replace("", Map[String, String]("NULL" -> null))`(scala) `df.na.replace("", Map[Any, Any](60 -> null, 70 -> 80))`(scala) `df.na.replace('Alice', None)`(python) `df.na.replace([10, 20])`(python, replacing with None is by default) One use case could be: I want to replace all the empty strings with null/None because they were incorrectly generated and then drop all null/None data `df.na.replace("", Map("" -> null)).na.drop()`(scala) `df.replace(u'', None).dropna()`(python) ## How was this patch tested? Scala unit test. Python doctest and unit test. Author: bravo-zhang <mzhang1230@gmail.com> Closes #18820 from bravo-zhang/spark-14932.		2017-08-09 17:42:21 -07:00
..
__init__.py	[SPARK-16772][PYTHON][DOCS] Fix API doc references to UDFRegistration + Update "important classes"	2016-08-06 05:02:59 +01:00
catalog.py	[SPARK-18777][PYTHON][SQL] Return UDF from udf.register	2017-05-06 22:28:42 -07:00
column.py	[SPARK-20290][MINOR][PYTHON][SQL] Add PySpark wrapper for eqNullSafe	2017-05-01 09:43:32 -07:00
conf.py	[SPARK-15464][ML][MLLIB][SQL][TESTS] Replace SQLContext and SparkContext with SparkSession using builder pattern in python test code	2016-05-23 18:14:48 -07:00
context.py	[SPARK-20586][SQL] Add deterministic to ScalaUDF	2017-07-25 17:19:44 -07:00
dataframe.py	[SPARK-14932][SQL] Allow DataFrame.replace() to replace values with None	2017-08-09 17:42:21 -07:00
functions.py	[SPARK][DOCS] Added note on meaning of position to substring function	2017-08-07 17:16:03 +01:00
group.py	[MINOR][PYSPARK][DOC] Fix wrongly formatted examples in PySpark documentation	2016-07-06 10:45:51 -07:00
readwriter.py	[SPARK-20431][SS][FOLLOWUP] Specify a schema by using a DDL-formatted string in DataStreamReader	2017-06-24 11:39:41 +08:00
session.py	[SPARK-19507][SPARK-21296][PYTHON] Avoid per-record type dispatch in schema verification and improve exception message	2017-07-04 20:45:58 +08:00
streaming.py	[SPARK-20431][SS][FOLLOWUP] Specify a schema by using a DDL-formatted string in DataStreamReader	2017-06-24 11:39:41 +08:00
tests.py	[SPARK-14932][SQL] Allow DataFrame.replace() to replace values with None	2017-08-09 17:42:21 -07:00
types.py	[SPARK-20090][PYTHON] Add StructType.fieldNames in PySpark	2017-07-28 20:59:32 -07:00
utils.py	[MINOR][DOCS] Remove consecutive duplicated words/typo in Spark Repo	2017-01-04 15:07:29 +00:00
window.py	[SPARK-18690][PYTHON][SQL] Backward compatibility of unbounded frames	2016-12-02 17:39:28 -08:00