spark-instrumented-optimizer

History

Hossein 5cc503f4fe [SPARK-17790][SPARKR] Support for parallelizing R data.frame larger than 2GB ## What changes were proposed in this pull request? If the R data structure that is being parallelized is larger than `INT_MAX` we use files to transfer data to JVM. The serialization protocol mimics Python pickling. This allows us to simply call `PythonRDD.readRDDFromFile` to create the RDD. I tested this on my MacBook. Following code works with this patch: ```R intMax <- .Machine$integer.max largeVec <- 1:intMax rdd <- SparkR:::parallelize(sc, largeVec, 2) ``` ## How was this patch tested? * [x] Unit tests Author: Hossein <hossein@databricks.com> Closes #15375 from falaki/SPARK-17790.		2016-10-12 10:32:38 -07:00
..
profile	[SPARK-15159][SPARKR] SparkR SparkSession API	2016-06-17 21:36:01 -07:00
tests/testthat	[SPARK-17790][SPARKR] Support for parallelizing R data.frame larger than 2GB	2016-10-12 10:32:38 -07:00
worker	[SPARK-16785] R dapply doesn't return array or raw columns	2016-09-06 23:40:37 -07:00