spark-instrumented-optimizer/R/pkg
Timothy Hunter 769a909d13 [SPARK-7264][ML] Parallel lapply for sparkR
## What changes were proposed in this pull request?

This PR adds a new function in SparkR called `sparkLapply(list, function)`. This function implements a distributed version of `lapply` using Spark as a backend.

TODO:
 - [x] check documentation
 - [ ] check tests

Trivial example in SparkR:

```R
sparkLapply(1:5, function(x) { 2 * x })
```

Output:

```
[[1]]
[1] 2

[[2]]
[1] 4

[[3]]
[1] 6

[[4]]
[1] 8

[[5]]
[1] 10
```

Here is a slightly more complex example to perform distributed training of multiple models. Under the hood, Spark broadcasts the dataset.

```R
library("MASS")
data(menarche)
families <- c("gaussian", "poisson")
train <- function(family){glm(Menarche ~ Age  , family=family, data=menarche)}
results <- sparkLapply(families, train)
```

## How was this patch tested?

This PR was tested in SparkR. I am unfamiliar with R and SparkR, so any feedback on style, testing, etc. will be much appreciated.

cc falaki davies

Author: Timothy Hunter <timhunter@databricks.com>

Closes #12426 from thunterdb/7264.
2016-04-28 22:42:48 -07:00
..
inst [SPARK-7264][ML] Parallel lapply for sparkR 2016-04-28 22:42:48 -07:00
R [SPARK-7264][ML] Parallel lapply for sparkR 2016-04-28 22:42:48 -07:00
src-native [SPARK-6811] Copy SparkR lib in make-distribution.sh 2015-05-23 00:04:01 -07:00
tests [SPARK-12034][SPARKR] Eliminate warnings in SparkR test cases. 2015-12-07 10:38:17 -08:00
.lintr [SPARK-12327][SPARKR] fix code for lintr warning for commented code 2016-01-03 20:53:35 +05:30
DESCRIPTION [SPARK-13010][ML][SPARKR] Implement a simple wrapper of AFTSurvivalRegression in SparkR 2016-03-24 22:29:34 -07:00
NAMESPACE [SPARK-7264][ML] Parallel lapply for sparkR 2016-04-28 22:42:48 -07:00