spark-instrumented-optimizer/R/CRAN_RELEASE.md
Felix Cheung c3d3a9d0e8 [SPARK-18590][SPARKR] build R source package when making distribution
## What changes were proposed in this pull request?

This PR has 2 key changes. One, we are building source package (aka bundle package) for SparkR which could be released on CRAN. Two, we should include in the official Spark binary distributions SparkR installed from this source package instead (which would have help/vignettes rds needed for those to work when the SparkR package is loaded in R, whereas earlier approach with devtools does not)

But, because of various differences in how R performs different tasks, this PR is a fair bit more complicated. More details below.

This PR also includes a few minor fixes.

### more details

These are the additional steps in make-distribution; please see [here](https://github.com/apache/spark/blob/master/R/CRAN_RELEASE.md) on what's going to a CRAN release, which is now run during make-distribution.sh.
1. package needs to be installed because the first code block in vignettes is `library(SparkR)` without lib path
2. `R CMD build` will build vignettes (this process runs Spark/SparkR code and captures outputs into pdf documentation)
3. `R CMD check` on the source package will install package and build vignettes again (this time from source packaged) - this is a key step required to release R package on CRAN
 (will skip tests here but tests will need to pass for CRAN release process to success - ideally, during release signoff we should install from the R source package and run tests)
4. `R CMD Install` on the source package (this is the only way to generate doc/vignettes rds files correctly, not in step # 1)
 (the output of this step is what we package into Spark dist and sparkr.zip)

Alternatively,
   R CMD build should already be installing the package in a temp directory though it might just be finding this location and set it to lib.loc parameter; another approach is perhaps we could try calling `R CMD INSTALL --build pkg` instead.
 But in any case, despite installing the package multiple times this is relatively fast.
Building vignettes takes a while though.

## How was this patch tested?

Manually, CI.

Author: Felix Cheung <felixcheung_m@hotmail.com>

Closes #16014 from felixcheung/rdist.
2016-12-08 11:29:31 -08:00

92 lines
3.2 KiB
Markdown

# SparkR CRAN Release
To release SparkR as a package to CRAN, we would use the `devtools` package. Please work with the
`dev@spark.apache.org` community and R package maintainer on this.
### Release
First, check that the `Version:` field in the `pkg/DESCRIPTION` file is updated. Also, check for stale files not under source control.
Note that while `run-tests.sh` runs `check-cran.sh` (which runs `R CMD check`), it is doing so with `--no-manual --no-vignettes`, which skips a few vignettes or PDF checks - therefore it will be preferred to run `R CMD check` on the source package built manually before uploading a release. Also note that for CRAN checks for pdf vignettes to success, `qpdf` tool must be there (to install it, eg. `yum -q -y install qpdf`).
To upload a release, we would need to update the `cran-comments.md`. This should generally contain the results from running the `check-cran.sh` script along with comments on status of all `WARNING` (should not be any) or `NOTE`. As a part of `check-cran.sh` and the release process, the vignettes is build - make sure `SPARK_HOME` is set and Spark jars are accessible.
Once everything is in place, run in R under the `SPARK_HOME/R` directory:
```R
paths <- .libPaths(); .libPaths(c("lib", paths)); Sys.setenv(SPARK_HOME=tools::file_path_as_absolute("..")); devtools::release(); .libPaths(paths)
```
For more information please refer to http://r-pkgs.had.co.nz/release.html#release-check
### Testing: build package manually
To build package manually such as to inspect the resulting `.tar.gz` file content, we would also use the `devtools` package.
Source package is what get released to CRAN. CRAN would then build platform-specific binary packages from the source package.
#### Build source package
To build source package locally without releasing to CRAN, run in R under the `SPARK_HOME/R` directory:
```R
paths <- .libPaths(); .libPaths(c("lib", paths)); Sys.setenv(SPARK_HOME=tools::file_path_as_absolute("..")); devtools::build("pkg"); .libPaths(paths)
```
(http://r-pkgs.had.co.nz/vignettes.html#vignette-workflow-2)
Similarly, the source package is also created by `check-cran.sh` with `R CMD build pkg`.
For example, this should be the content of the source package:
```sh
DESCRIPTION R inst tests
NAMESPACE build man vignettes
inst/doc/
sparkr-vignettes.html
sparkr-vignettes.Rmd
sparkr-vignettes.Rman
build/
vignette.rds
man/
*.Rd files...
vignettes/
sparkr-vignettes.Rmd
```
#### Test source package
To install, run this:
```sh
R CMD INSTALL SparkR_2.1.0.tar.gz
```
With "2.1.0" replaced with the version of SparkR.
This command installs SparkR to the default libPaths. Once that is done, you should be able to start R and run:
```R
library(SparkR)
vignette("sparkr-vignettes", package="SparkR")
```
#### Build binary package
To build binary package locally, run in R under the `SPARK_HOME/R` directory:
```R
paths <- .libPaths(); .libPaths(c("lib", paths)); Sys.setenv(SPARK_HOME=tools::file_path_as_absolute("..")); devtools::build("pkg", binary = TRUE); .libPaths(paths)
```
For example, this should be the content of the binary package:
```sh
DESCRIPTION Meta R html tests
INDEX NAMESPACE help profile worker
```