d7d9fa0b87
Use `dropFactors` column-wise instead of nested loop when `createDataFrame` from a `data.frame` At this moment SparkR createDataFrame is using nested loop to convert factors to character when called on a local data.frame. It works but is incredibly slow especially with data.table (~ 2 orders of magnitude compared to PySpark / Pandas version on a DateFrame of size 1M rows x 2 columns). A simple improvement is to apply `dropFactor `column-wise and then reshape output list. It should at least partially address [SPARK-8277](https://issues.apache.org/jira/browse/SPARK-8277). Author: zero323 <matthew.szymkiewicz@gmail.com> Closes #9099 from zero323/SPARK-11086. |
||
---|---|---|
.. | ||
jarTest.R | ||
packageInAJarTest.R | ||
test_binary_function.R | ||
test_binaryFile.R | ||
test_broadcast.R | ||
test_client.R | ||
test_context.R | ||
test_includeJAR.R | ||
test_includePackage.R | ||
test_mllib.R | ||
test_parallelize_collect.R | ||
test_rdd.R | ||
test_Serde.R | ||
test_shuffle.R | ||
test_sparkSQL.R | ||
test_take.R | ||
test_textFile.R | ||
test_utils.R |