spark-instrumented-optimizer

History

Hossein 5cc503f4fe [SPARK-17790][SPARKR] Support for parallelizing R data.frame larger than 2GB ## What changes were proposed in this pull request? If the R data structure that is being parallelized is larger than `INT_MAX` we use files to transfer data to JVM. The serialization protocol mimics Python pickling. This allows us to simply call `PythonRDD.readRDDFromFile` to create the RDD. I tested this on my MacBook. Following code works with this patch: ```R intMax <- .Machine$integer.max largeVec <- 1:intMax rdd <- SparkR:::parallelize(sc, largeVec, 2) ``` ## How was this patch tested? * [x] Unit tests Author: Hossein <hossein@databricks.com> Closes #15375 from falaki/SPARK-17790.		2016-10-12 10:32:38 -07:00
..
inst	[SPARK-17790][SPARKR] Support for parallelizing R data.frame larger than 2GB	2016-10-12 10:32:38 -07:00
R	[SPARK-17790][SPARKR] Support for parallelizing R data.frame larger than 2GB	2016-10-12 10:32:38 -07:00
src-native	[SPARK-6811] Copy SparkR lib in make-distribution.sh	2015-05-23 00:04:01 -07:00
tests	[SPARK-12034][SPARKR] Eliminate warnings in SparkR test cases.	2015-12-07 10:38:17 -08:00
vignettes	[SPARKR][DOC] minor formatting and output cleanup for R vignettes	2016-10-04 09:22:26 -07:00
.lintr	[SPARK-12327][SPARKR] fix code for lintr warning for commented code	2016-01-03 20:53:35 +05:30
.Rbuildignore	[SPARK-16507][SPARKR] Add a CRAN checker, fix Rd aliases	2016-07-16 17:06:44 -07:00
DESCRIPTION	[SPARK-16581][SPARKR] Make JVM backend calling functions public	2016-08-29 12:55:32 -07:00
NAMESPACE	[SPARK-17577][SPARKR][CORE] SparkR support add files to Spark job and get by executors	2016-09-21 20:08:28 -07:00