spark-instrumented-optimizer

History

Shivaram Venkataraman 0a901dd3a1 [SPARK-7231] [SPARKR] Changes to make SparkR DataFrame dplyr friendly. Changes include 1. Rename sortDF to arrange 2. Add new aliases `group_by` and `sample_frac`, `summarize` 3. Add more user friendly column addition (mutate), rename 4. Support mean as an alias for avg in Scala and also support n_distinct, n as in dplyr Using these changes we can pretty much run the examples as described in http://cran.rstudio.com/web/packages/dplyr/vignettes/introduction.html with the same syntax The only thing missing in SparkR is auto resolving column names when used in an expression i.e. making something like `select(flights, delay)` works in dply but we right now need `select(flights, flights$delay)` or `select(flights, "delay")`. But this is a complicated change and I'll file a new issue for it cc sun-rui rxin Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu> Closes #6005 from shivaram/sparkr-df-api and squashes the following commits: 5e0716a [Shivaram Venkataraman] Fix some roxygen bugs 1254953 [Shivaram Venkataraman] Merge branch 'master' of https://github.com/apache/spark into sparkr-df-api 0521149 [Shivaram Venkataraman] Changes to make SparkR DataFrame dplyr friendly. Changes include 1. Rename sortDF to arrange 2. Add new aliases `group_by` and `sample_frac`, `summarize` 3. Add more user friendly column addition (mutate), rename 4. Support mean as an alias for avg in Scala and also support n_distinct, n as in dplyr		2015-05-08 18:29:57 -07:00
..
test_binary_function.R	[SPARK-6991] [SPARKR] Adds support for zipPartitions.	2015-04-27 15:04:37 -07:00
test_binaryFile.R	[SPARK-6850] [SparkR] use one partition when we need to compare the whole result	2015-04-10 15:35:45 -07:00
test_broadcast.R	[SPARK-7230] [SPARKR] Make RDD private in SparkR.	2015-05-05 14:40:33 -07:00
test_context.R	[SPARK-5654] Integrate SparkR	2015-04-08 22:45:40 -07:00
test_includePackage.R	[SPARK-5654] Integrate SparkR	2015-04-08 22:45:40 -07:00
test_parallelize_collect.R	[SPARK-5654] Integrate SparkR	2015-04-08 22:45:40 -07:00
test_rdd.R	[SPARK-6856] [R] Make RDD information more useful in SparkR	2015-04-27 13:38:25 -07:00
test_shuffle.R	[SPARK-6807] [SparkR] Merge recent SparkR-pkg changes	2015-04-17 13:42:19 -07:00
test_sparkSQL.R	[SPARK-7231] [SPARKR] Changes to make SparkR DataFrame dplyr friendly.	2015-05-08 18:29:57 -07:00
test_take.R	[SPARK-5654] Integrate SparkR	2015-04-08 22:45:40 -07:00
test_textFile.R	[SPARK-6850] [SparkR] use one partition when we need to compare the whole result	2015-04-10 15:35:45 -07:00
test_utils.R	[SPARK-7230] [SPARKR] Make RDD private in SparkR.	2015-05-05 14:40:33 -07:00