spark-instrumented-optimizer/R/pkg/inst
Hossein 5cc503f4fe [SPARK-17790][SPARKR] Support for parallelizing R data.frame larger than 2GB
## What changes were proposed in this pull request?
If the R data structure that is being parallelized is larger than `INT_MAX` we use files to transfer data to JVM. The serialization protocol mimics Python pickling. This allows us to simply call `PythonRDD.readRDDFromFile` to create the RDD.

I tested this on my MacBook. Following code works with this patch:
```R
intMax <- .Machine$integer.max
largeVec <- 1:intMax
rdd <- SparkR:::parallelize(sc, largeVec, 2)
```

## How was this patch tested?
* [x] Unit tests

Author: Hossein <hossein@databricks.com>

Closes #15375 from falaki/SPARK-17790.
2016-10-12 10:32:38 -07:00
..
profile [SPARK-15159][SPARKR] SparkR SparkSession API 2016-06-17 21:36:01 -07:00
tests/testthat [SPARK-17790][SPARKR] Support for parallelizing R data.frame larger than 2GB 2016-10-12 10:32:38 -07:00
worker [SPARK-16785] R dapply doesn't return array or raw columns 2016-09-06 23:40:37 -07:00