spark-instrumented-optimizer/R/pkg
hyukjinkwon 2bfd5accdc [SPARK-21266][R][PYTHON] Support schema a DDL-formatted string in dapply/gapply/from_json
## What changes were proposed in this pull request?

This PR supports schema in a DDL formatted string for `from_json` in R/Python and `dapply` and `gapply` in R, which are commonly used and/or consistent with Scala APIs.

Additionally, this PR exposes `structType` in R to allow working around in other possible corner cases.

**Python**

`from_json`

```python
from pyspark.sql.functions import from_json

data = [(1, '''{"a": 1}''')]
df = spark.createDataFrame(data, ("key", "value"))
df.select(from_json(df.value, "a INT").alias("json")).show()
```

**R**

`from_json`

```R
df <- sql("SELECT named_struct('name', 'Bob') as people")
df <- mutate(df, people_json = to_json(df$people))
head(select(df, from_json(df$people_json, "name STRING")))
```

`structType.character`

```R
structType("a STRING, b INT")
```

`dapply`

```R
dapply(createDataFrame(list(list(1.0)), "a"), function(x) {x}, "a DOUBLE")
```

`gapply`

```R
gapply(createDataFrame(list(list(1.0)), "a"), "a", function(key, x) { x }, "a DOUBLE")
```

## How was this patch tested?

Doc tests for `from_json` in Python and unit tests `test_sparkSQL.R` in R.

Author: hyukjinkwon <gurwls223@gmail.com>

Closes #18498 from HyukjinKwon/SPARK-21266.
2017-07-10 10:40:03 -07:00
..
inst [SPARK-21093][R] Terminate R's worker processes in the parent of R's daemon to prevent a leak 2017-07-08 14:24:37 -07:00
R [SPARK-21266][R][PYTHON] Support schema a DDL-formatted string in dapply/gapply/from_json 2017-07-10 10:40:03 -07:00
src-native [SPARK-6811] Copy SparkR lib in make-distribution.sh 2015-05-23 00:04:01 -07:00
tests [SPARK-21266][R][PYTHON] Support schema a DDL-formatted string in dapply/gapply/from_json 2017-07-10 10:40:03 -07:00
vignettes [SPARK-20849][DOC][SPARKR] Document R DecisionTree 2017-05-25 23:00:50 -07:00
.lintr [SPARK-20278][R] Disable 'multiple_dots_linter' lint rule that is against project's code style 2017-04-16 11:27:27 -07:00
.Rbuildignore [SPARK-20877][SPARKR][FOLLOWUP] clean up after test move 2017-06-11 03:00:44 -07:00
DESCRIPTION [MINOR] Bump SparkR and PySpark version to 2.3.0. 2017-06-19 11:13:03 +01:00
NAMESPACE [SPARK-21266][R][PYTHON] Support schema a DDL-formatted string in dapply/gapply/from_json 2017-07-10 10:40:03 -07:00