spark-instrumented-optimizer/R/pkg
Hossein 5cc503f4fe [SPARK-17790][SPARKR] Support for parallelizing R data.frame larger than 2GB
## What changes were proposed in this pull request?
If the R data structure that is being parallelized is larger than `INT_MAX` we use files to transfer data to JVM. The serialization protocol mimics Python pickling. This allows us to simply call `PythonRDD.readRDDFromFile` to create the RDD.

I tested this on my MacBook. Following code works with this patch:
```R
intMax <- .Machine$integer.max
largeVec <- 1:intMax
rdd <- SparkR:::parallelize(sc, largeVec, 2)
```

## How was this patch tested?
* [x] Unit tests

Author: Hossein <hossein@databricks.com>

Closes #15375 from falaki/SPARK-17790.
2016-10-12 10:32:38 -07:00
..
inst [SPARK-17790][SPARKR] Support for parallelizing R data.frame larger than 2GB 2016-10-12 10:32:38 -07:00
R [SPARK-17790][SPARKR] Support for parallelizing R data.frame larger than 2GB 2016-10-12 10:32:38 -07:00
src-native [SPARK-6811] Copy SparkR lib in make-distribution.sh 2015-05-23 00:04:01 -07:00
tests [SPARK-12034][SPARKR] Eliminate warnings in SparkR test cases. 2015-12-07 10:38:17 -08:00
vignettes [SPARKR][DOC] minor formatting and output cleanup for R vignettes 2016-10-04 09:22:26 -07:00
.lintr [SPARK-12327][SPARKR] fix code for lintr warning for commented code 2016-01-03 20:53:35 +05:30
.Rbuildignore [SPARK-16507][SPARKR] Add a CRAN checker, fix Rd aliases 2016-07-16 17:06:44 -07:00
DESCRIPTION [SPARK-16581][SPARKR] Make JVM backend calling functions public 2016-08-29 12:55:32 -07:00
NAMESPACE [SPARK-17577][SPARKR][CORE] SparkR support add files to Spark job and get by executors 2016-09-21 20:08:28 -07:00