spark-instrumented-optimizer

History

goldmedal a28728a9af [SPARK-21513][SQL][FOLLOWUP] Allow UDF to_json support converting MapType to json for PySpark and SparkR ## What changes were proposed in this pull request? In previous work SPARK-21513, we has allowed `MapType` and `ArrayType` of `MapType`s convert to a json string but only for Scala API. In this follow-up PR, we will make SparkSQL support it for PySpark and SparkR, too. We also fix some little bugs and comments of the previous work in this follow-up PR. ### For PySpark ``` >>> data = [(1, {"name": "Alice"})] >>> df = spark.createDataFrame(data, ("key", "value")) >>> df.select(to_json(df.value).alias("json")).collect() [Row(json=u'{"name":"Alice")'] >>> data = [(1, [{"name": "Alice"}, {"name": "Bob"}])] >>> df = spark.createDataFrame(data, ("key", "value")) >>> df.select(to_json(df.value).alias("json")).collect() [Row(json=u'[{"name":"Alice"},{"name":"Bob"}]')] ``` ### For SparkR ``` # Converts a map into a JSON object df2 <- sql("SELECT map('name', 'Bob')) as people") df2 <- mutate(df2, people_json = to_json(df2$people)) # Converts an array of maps into a JSON array df2 <- sql("SELECT array(map('name', 'Bob'), map('name', 'Alice')) as people") df2 <- mutate(df2, people_json = to_json(df2$people)) ``` ## How was this patch tested? Add unit test cases. cc viirya HyukjinKwon Author: goldmedal <liugs963@gmail.com> Closes #19223 from goldmedal/SPARK-21513-fp-PySaprkAndSparkR.		2017-09-15 11:53:10 +09:00
..
ml	[SPARK-18608][ML][FOLLOWUP] Fix double caching for PySpark OneVsRest.	2017-09-14 14:09:44 +08:00
mllib	[SPARK-20862][MLLIB][PYTHON] Avoid passing float to ndarray.reshape in LogisticRegressionModel	2017-05-24 22:55:38 +08:00
sql	[SPARK-21513][SQL][FOLLOWUP] Allow UDF to_json support converting MapType to json for PySpark and SparkR	2017-09-15 11:53:10 +09:00
streaming	[SPARK-21893][BUILD][STREAMING][WIP] Put Kafka 0.8 behind a profile	2017-09-13 10:10:40 +01:00
__init__.py	[MINOR] Fix some typo of the document	2017-06-19 20:35:58 +01:00
accumulators.py	[SPARK-8652] [PYSPARK] Check return value for all uses of doctest.testmod()	2015-06-26 08:12:22 -07:00
broadcast.py	[SPARK-12717][PYTHON] Adding thread-safe broadcast pickle registry	2017-08-02 07:12:23 +09:00
cloudpickle.py	[SPARK-21070][PYSPARK] Attempt to update cloudpickle again	2017-08-22 11:17:53 +09:00
conf.py	[SPARK-18447][DOCS] Fix the markdown for `Note:`/`NOTE:`/`Note that` across Python API documentation	2016-11-22 11:40:18 +00:00
context.py	[SPARK-12717][PYTHON] Adding thread-safe broadcast pickle registry	2017-08-02 07:12:23 +09:00
daemon.py	[SPARK-4897] [PySpark] Python 3 support	2015-04-16 16:20:57 -07:00
files.py	[SPARK-3309] [PySpark] Put all public API in __all__	2014-09-03 11:49:45 -07:00
find_spark_home.py	[SPARK-1267][SPARK-18129] Allow PySpark to be pip installed	2016-11-16 14:22:15 -08:00
heapq3.py	[SPARK-8652] [PYSPARK] Check return value for all uses of doctest.testmod()	2015-06-26 08:12:22 -07:00
java_gateway.py	[SPARK-1267][SPARK-18129] Allow PySpark to be pip installed	2016-11-16 14:22:15 -08:00
join.py	[SPARK-14202] [PYTHON] Use generator expression instead of list comp in python_full_outer_jo…	2016-03-28 14:51:36 -07:00
profiler.py	[SPARK-8652] [PYSPARK] Check return value for all uses of doctest.testmod()	2015-06-26 08:12:22 -07:00
rdd.py	[SPARK-21551][PYTHON] Increase timeout for PythonRDD.serveIterator	2017-08-09 14:03:18 -07:00
rddsampler.py	[SPARK-4897] [PySpark] Python 3 support	2015-04-16 16:20:57 -07:00
resultiterable.py	[SPARK-3074] [PySpark] support groupByKey() with single huge key	2015-04-09 17:07:23 -07:00
serializers.py	[SPARK-13534][PYSPARK] Using Apache Arrow to increase performance of DataFrame.toPandas	2017-07-10 15:21:03 -07:00
shell.py	[SPARK-19570][PYSPARK] Allow to disable hive in pyspark shell	2017-04-12 10:54:50 -07:00
shuffle.py	[SPARK-10710] Remove ability to disable spilling in core and SQL	2015-09-19 21:40:21 -07:00
statcounter.py	[SPARK-6919] [PYSPARK] Add asDict method to StatCounter	2015-09-29 13:38:15 -07:00
status.py	[SPARK-4172] [PySpark] Progress API in Python	2015-02-17 13:36:43 -08:00
storagelevel.py	[SPARK-13992][CORE][PYSPARK][FOLLOWUP] Update OFF_HEAP semantics for Java api and Python api	2016-04-12 23:06:55 -07:00
taskcontext.py	[SPARK-18576][PYTHON] Add basic TaskContext information to PySpark	2016-12-20 15:51:21 -08:00
tests.py	[SPARK-12717][PYTHON] Adding thread-safe broadcast pickle registry	2017-08-02 07:12:23 +09:00
traceback_utils.py	[SPARK-1087] Move python traceback utilities into new traceback_utils.py file.	2014-09-15 19:28:17 -07:00
util.py	[SPARK-19505][PYTHON] AttributeError on Exception.message in Python3	2017-04-11 12:18:31 -07:00
version.py	[MINOR] Bump SparkR and PySpark version to 2.3.0.	2017-06-19 11:13:03 +01:00
worker.py	[SPARK-20685] Fix BatchPythonEvaluation bug in case of single UDF w/ repeated arg.	2017-05-10 16:50:57 -07:00