spark-instrumented-optimizer/sql/catalyst/src/main
Sun Rui 4ae9fe091c [SPARK-12919][SPARKR] Implement dapply() on DataFrame in SparkR.
## What changes were proposed in this pull request?

dapply() applies an R function on each partition of a DataFrame and returns a new DataFrame.

The function signature is:

	dapply(df, function(localDF) {}, schema = NULL)

R function input: local data.frame from the partition on local node
R function output: local data.frame

Schema specifies the Row format of the resulting DataFrame. It must match the R function's output.
If schema is not specified, each partition of the result DataFrame will be serialized in R into a single byte array. Such resulting DataFrame can be processed by successive calls to dapply().

## How was this patch tested?
SparkR unit tests.

Author: Sun Rui <rui.sun@intel.com>
Author: Sun Rui <sunrui2016@gmail.com>

Closes #12493 from sun-rui/SPARK-12919.
2016-04-29 16:41:07 -07:00
..
antlr4/org/apache/spark/sql/catalyst/parser [SPARK-14991][SQL] Remove HiveNativeCommand 2016-04-28 21:58:48 -07:00
java/org/apache/spark/sql [SPARK-14724] Use radix sort for shuffles and sort operator when possible 2016-04-21 16:48:51 -07:00
scala/org/apache/spark/sql [SPARK-12919][SPARKR] Implement dapply() on DataFrame in SparkR. 2016-04-29 16:41:07 -07:00