spark-instrumented-optimizer

History

Michael Armbrust fe33121a53 [SPARK-17699] Support for parsing JSON string columns Spark SQL has great support for reading text files that contain JSON data. However, in many cases the JSON data is just one column amongst others. This is particularly true when reading from sources such as Kafka. This PR adds a new functions `from_json` that converts a string column into a nested `StructType` with a user specified schema. Example usage: ```scala val df = Seq("""{"a": 1}""").toDS() val schema = new StructType().add("a", IntegerType) df.select(from_json($"value", schema) as 'json) // => [json: <a: int>] ``` This PR adds support for java, scala and python. I leveraged our existing JSON parsing support by moving it into catalyst (so that we could define expressions using it). I left SQL out for now, because I'm not sure how users would specify a schema. Author: Michael Armbrust <michael@databricks.com> Closes #15274 from marmbrus/jsonParser.		2016-09-29 13:01:10 -07:00
..
docs	[SPARK-16781][PYSPARK] java launched by PySpark as gateway may not be the same java used in the spark environment	2016-08-24 20:04:09 +01:00
lib	[SPARK-16781][PYSPARK] java launched by PySpark as gateway may not be the same java used in the spark environment	2016-08-24 20:04:09 +01:00
pyspark	[SPARK-17699] Support for parsing JSON string columns	2016-09-29 13:01:10 -07:00
test_support	[SPARK-17585][PYSPARK][CORE] PySpark SparkContext.addFile supports adding files recursively	2016-09-21 01:37:03 -07:00
.gitignore	[SPARK-3946] gitignore in /python includes wrong directory	2014-10-14 14:09:39 -07:00
pylintrc	[SPARK-13596][BUILD] Move misc top-level build files into appropriate subdirs	2016-03-07 14:48:02 -08:00
run-tests	[SPARK-8583] [SPARK-5482] [BUILD] Refactor python/run-tests to integrate with dev/run-tests module system	2015-06-27 20:24:34 -07:00
run-tests.py	[SPARK-13579][BUILD] Stop building the main Spark assembly.	2016-04-04 16:52:22 -07:00