Commit graph

12 commits

Author SHA1 Message Date
Felix Cheung 671bc08ed5 [SPARK-19399][SPARKR] Add R coalesce API for DataFrame and Column
## What changes were proposed in this pull request?

Add coalesce on DataFrame for down partitioning without shuffle and coalesce on Column

## How was this patch tested?

manual, unit tests

Author: Felix Cheung <felixcheung_m@hotmail.com>

Closes #16739 from felixcheung/rcoalesce.
2017-02-15 10:45:37 -08:00
Felix Cheung 90817a6cd0 [SPARK-18788][SPARKR] Add API for getNumPartitions
## What changes were proposed in this pull request?

With doc to say this would convert DF into RDD

## How was this patch tested?

unit tests, manual tests

Author: Felix Cheung <felixcheung_m@hotmail.com>

Closes #16668 from felixcheung/rgetnumpartitions.
2017-01-26 21:06:39 -08:00
Felix Cheung b0e8eb6d3e [SPARK-18335][SPARKR] createDataFrame to support numPartitions parameter
## What changes were proposed in this pull request?

To allow specifying number of partitions when the DataFrame is created

## How was this patch tested?

manual, unit tests

Author: Felix Cheung <felixcheung_m@hotmail.com>

Closes #16512 from felixcheung/rnumpart.
2017-01-13 10:08:14 -08:00
Felix Cheung c34b546d67 [SPARK-16519][SPARKR] Handle SparkR RDD generics that create warnings in R CMD check
## What changes were proposed in this pull request?

Rename RDD functions for now to avoid CRAN check warnings.
Some RDD functions are sharing generics with DataFrame functions (hence the problem) so after the renames we need to add new generics, for now.

## How was this patch tested?

unit tests

Author: Felix Cheung <felixcheung_m@hotmail.com>

Closes #14626 from felixcheung/rrddfunctions.
2016-08-16 11:19:18 -07:00
Felix Cheung d27fe9ba67 [SPARK-16027][SPARKR] Fix R tests SparkSession init/stop
## What changes were proposed in this pull request?

Fix R SparkSession init/stop, and warnings of reusing existing Spark Context

## How was this patch tested?

unit tests

shivaram

Author: Felix Cheung <felixcheung_m@hotmail.com>

Closes #14177 from felixcheung/rsessiontest.
2016-07-17 19:02:21 -07:00
Felix Cheung 8c198e246d [SPARK-15159][SPARKR] SparkR SparkSession API
## What changes were proposed in this pull request?

This PR introduces the new SparkSession API for SparkR.
`sparkR.session.getOrCreate()` and `sparkR.session.stop()`

"getOrCreate" is a bit unusual in R but it's important to name this clearly.

SparkR implementation should
- SparkSession is the main entrypoint (vs SparkContext; due to limited functionality supported with SparkContext in SparkR)
- SparkSession replaces SQLContext and HiveContext (both a wrapper around SparkSession, and because of API changes, supporting all 3 would be a lot more work)
- Changes to SparkSession is mostly transparent to users due to SPARK-10903
- Full backward compatibility is expected - users should be able to initialize everything just in Spark 1.6.1 (`sparkR.init()`), but with deprecation warning
- Mostly cosmetic changes to parameter list - users should be able to move to `sparkR.session.getOrCreate()` easily
- An advanced syntax with named parameters (aka varargs aka "...") is supported; that should be closer to the Builder syntax that is in Scala/Python (which unfortunately does not work in R because it will look like this: `enableHiveSupport(config(config(master(appName(builder(), "foo"), "local"), "first", "value"), "next, "value"))`
- Updating config on an existing SparkSession is supported, the behavior is the same as Python, in which config is applied to both SparkContext and SparkSession
- Some SparkSession changes are not matched in SparkR, mostly because it would be breaking API change: `catalog` object, `createOrReplaceTempView`
- Other SQLContext workarounds are replicated in SparkR, eg. `tables`, `tableNames`
- `sparkR` shell is updated to use the SparkSession entrypoint (`sqlContext` is removed, just like with Scale/Python)
- All tests are updated to use the SparkSession entrypoint
- A bug in `read.jdbc` is fixed

TODO
- [x] Add more tests
- [ ] Separate PR - update all roxygen2 doc coding example
- [ ] Separate PR - update SparkR programming guide

## How was this patch tested?

unit tests, manual tests

shivaram sun-rui rxin

Author: Felix Cheung <felixcheung_m@hotmail.com>
Author: felixcheung <felixcheung_m@hotmail.com>

Closes #13635 from felixcheung/rsparksession.
2016-06-17 21:36:01 -07:00
Sun Rui d3638d7bff [SPARK-12792] [SPARKR] Refactor RRDD to support R UDF.
## What changes were proposed in this pull request?

Refactor RRDD by separating the common logic interacting with the R worker to a new class RRunner, which can be used to evaluate R UDFs.

Now RRDD relies on RRuner for RDD computation and RRDD could be reomved if we want to remove RDD API in SparkR later.

## How was this patch tested?
dev/lint-r
SparkR unit tests

Author: Sun Rui <rui.sun@intel.com>

Closes #12024 from sun-rui/SPARK-12792_new.
2016-03-28 21:51:02 -07:00
Davies Liu e5a1b301fb Revert "[SPARK-12792] [SPARKR] Refactor RRDD to support R UDF."
This reverts commit 40984f6706.
2016-03-28 10:21:02 -07:00
Sun Rui 40984f6706 [SPARK-12792] [SPARKR] Refactor RRDD to support R UDF.
Refactor RRDD by separating the common logic interacting with the R worker to a new class RRunner, which can be used to evaluate R UDFs.

Now RRDD relies on RRuner for RDD computation and RRDD could be reomved if we want to remove RDD API in SparkR later.

Author: Sun Rui <rui.sun@intel.com>

Closes #10947 from sun-rui/SPARK-12792.
2016-03-28 10:14:28 -07:00
Sun Rui c7e68c3968 [SPARK-13812][SPARKR] Fix SparkR lint-r test errors.
## What changes were proposed in this pull request?

This PR fixes all newly captured SparkR lint-r errors after the lintr package is updated from github.

## How was this patch tested?

dev/lint-r
SparkR unit tests

Author: Sun Rui <rui.sun@intel.com>

Closes #11652 from sun-rui/SPARK-13812.
2016-03-13 14:30:44 -07:00
felixcheung c3d505602d [SPARK-12327][SPARKR] fix code for lintr warning for commented code
shivaram

Author: felixcheung <felixcheung_m@hotmail.com>

Closes #10408 from felixcheung/rcodecomment.
2016-01-03 20:53:35 +05:30
Sun Rui 39d677c8f1 [SPARK-12034][SPARKR] Eliminate warnings in SparkR test cases.
This PR:
1. Suppress all known warnings.
2. Cleanup test cases and fix some errors in test cases.
3. Fix errors in HiveContext related test cases. These test cases are actually not run previously due to a bug of creating TestHiveContext.
4. Support 'testthat' package version 0.11.0 which prefers that test cases be under 'tests/testthat'
5. Make sure the default Hadoop file system is local when running test cases.
6. Turn on warnings into errors.

Author: Sun Rui <rui.sun@intel.com>

Closes #10030 from sun-rui/SPARK-12034.
2015-12-07 10:38:17 -08:00
Renamed from R/pkg/inst/tests/test_rdd.R (Browse further)