spark-instrumented-optimizer/R/pkg/R
hyukjinkwon 2bfd5accdc [SPARK-21266][R][PYTHON] Support schema a DDL-formatted string in dapply/gapply/from_json
## What changes were proposed in this pull request?

This PR supports schema in a DDL formatted string for `from_json` in R/Python and `dapply` and `gapply` in R, which are commonly used and/or consistent with Scala APIs.

Additionally, this PR exposes `structType` in R to allow working around in other possible corner cases.

**Python**

`from_json`

```python
from pyspark.sql.functions import from_json

data = [(1, '''{"a": 1}''')]
df = spark.createDataFrame(data, ("key", "value"))
df.select(from_json(df.value, "a INT").alias("json")).show()
```

**R**

`from_json`

```R
df <- sql("SELECT named_struct('name', 'Bob') as people")
df <- mutate(df, people_json = to_json(df$people))
head(select(df, from_json(df$people_json, "name STRING")))
```

`structType.character`

```R
structType("a STRING, b INT")
```

`dapply`

```R
dapply(createDataFrame(list(list(1.0)), "a"), function(x) {x}, "a DOUBLE")
```

`gapply`

```R
gapply(createDataFrame(list(list(1.0)), "a"), "a", function(key, x) { x }, "a DOUBLE")
```

## How was this patch tested?

Doc tests for `from_json` in Python and unit tests `test_sparkSQL.R` in R.

Author: hyukjinkwon <gurwls223@gmail.com>

Closes #18498 from HyukjinKwon/SPARK-21266.
2017-07-10 10:40:03 -07:00
..
backend.R [SPARK-17919] Make timeout to RBackend configurable in SparkR 2016-10-30 16:17:23 -07:00
broadcast.R [SPARK-15490][R][DOC] SparkR 2.0 QA: New R APIs and API docs for non-MLib changes 2016-06-16 19:39:33 -07:00
catalog.R [SPARK-20195][SPARKR][SQL] add createTable catalog API and deprecate createExternalTable 2017-04-06 09:15:13 -07:00
client.R [SPARK-17919] Make timeout to RBackend configurable in SparkR 2016-10-30 16:17:23 -07:00
column.R [SPARKR] Fix bad examples in DataFrame methods and style issues 2017-05-19 11:18:20 -07:00
context.R [SPARK-20726][SPARKR] wrapper for SQL broadcast 2017-05-14 13:22:19 -07:00
DataFrame.R [SPARK-21266][R][PYTHON] Support schema a DDL-formatted string in dapply/gapply/from_json 2017-07-10 10:40:03 -07:00
deserialize.R [SPARK-12922][SPARKR][WIP] Implement gapply() on DataFrame in SparkR 2016-06-15 21:42:05 -07:00
functions.R [SPARK-21266][R][PYTHON] Support schema a DDL-formatted string in dapply/gapply/from_json 2017-07-10 10:40:03 -07:00
generics.R [SPARK-20889][SPARKR][FOLLOWUP] Clean up grouped doc for column methods 2017-07-04 21:05:05 -07:00
group.R [SPARK-21266][R][PYTHON] Support schema a DDL-formatted string in dapply/gapply/from_json 2017-07-10 10:40:03 -07:00
install.R [SPARK-20877][SPARKR][FOLLOWUP] clean up after test move 2017-06-11 03:00:44 -07:00
jobj.R [SPARK-14995][R] Add since tag in Roxygen documentation for SparkR API methods 2016-06-20 14:24:41 -07:00
jvm.R [SPARK-16581][SPARKR] Make JVM backend calling functions public 2016-08-29 12:55:32 -07:00
mllib_classification.R [SPARK-20906][SPARKR] Constrained Logistic Regression for SparkR 2017-06-21 20:42:45 -07:00
mllib_clustering.R [SPARKR][DOC] update doc for fpgrowth 2017-04-04 22:32:46 -07:00
mllib_fpm.R [SPARKR][DOC] update doc for fpgrowth 2017-04-04 22:32:46 -07:00
mllib_recommendation.R [SPARK-18862][SPARKR][ML] Split SparkR mllib.R into multiple files 2017-01-08 01:10:36 -08:00
mllib_regression.R [SPARK-20917][ML][SPARKR] SparkR supports string encoding consistent with R 2017-06-21 10:35:16 -07:00
mllib_stat.R [SPARK-18862][SPARKR][ML] Split SparkR mllib.R into multiple files 2017-01-08 01:10:36 -08:00
mllib_tree.R [SPARK-20307][SPARKR] SparkR: pass on setHandleInvalid to spark.mllib functions that use StringIndexer 2017-07-07 23:51:32 -07:00
mllib_utils.R [SPARK-15767][ML][SPARKR] Decision Tree wrapper in SparkR 2017-05-22 10:40:49 -07:00
pairRDD.R [SPARK-18788][SPARKR] Add API for getNumPartitions 2017-01-26 21:06:39 -08:00
RDD.R [SPARK-20020][SPARKR] DataFrame checkpoint API 2017-03-19 22:34:18 -07:00
schema.R [SPARK-21266][R][PYTHON] Support schema a DDL-formatted string in dapply/gapply/from_json 2017-07-10 10:40:03 -07:00
serialize.R [SPARK-13812][SPARKR] Fix SparkR lint-r test errors. 2016-03-13 14:30:44 -07:00
sparkR.R [SPARK-21149][R] Add job description API for R 2017-06-23 09:59:24 -07:00
SQLContext.R [SPARK-21224][R] Specify a schema by using a DDL-formatted string when reading in R 2017-06-28 19:36:00 -07:00
stats.R [SPARK-20889][SPARKR] Grouped documentation for AGGREGATE column methods 2017-06-19 19:41:24 -07:00
streaming.R [SPARK-20541][SPARKR][SS] support awaitTermination without timeout 2017-04-30 23:23:49 -07:00
types.R [SPARK-19342][SPARKR] bug fixed in collect method for collecting timestamp column 2017-02-12 10:42:15 -08:00
utils.R [SPARK-20877][SPARKR][FOLLOWUP] clean up after test move 2017-06-11 03:00:44 -07:00
window.R [SPARKR][MINOR] Fix windowPartitionBy example 2016-08-31 21:28:53 -07:00
WindowSpec.R [SPARKR] Fix bad examples in DataFrame methods and style issues 2017-05-19 11:18:20 -07:00