[SPARK-18264][SPARKR] build vignettes with package, update vignettes for CRAN release build and add info on release

## What changes were proposed in this pull request?

Changes to DESCRIPTION to build vignettes.
Changes the metadata for vignettes to generate the recommended format (which is about <10% of size before). Unfortunately it does not look as nice
(before - left, after - right)

![image](https://cloud.githubusercontent.com/assets/8969467/20040492/b75883e6-a40d-11e6-9534-25cdd5d59a8b.png)

![image](https://cloud.githubusercontent.com/assets/8969467/20040490/a40f4d42-a40d-11e6-8c91-af00ddcbdad9.png)

Also add information on how to run build/release to CRAN later.

## How was this patch tested?

manually, unit tests

shivaram

We need this for branch-2.1

Author: Felix Cheung <felixcheung_m@hotmail.com>

Closes #15790 from felixcheung/rpkgvignettes.
This commit is contained in:
Felix Cheung 2016-11-11 15:49:55 -08:00 committed by Shivaram Venkataraman
parent 6e95325fc3
commit ba23f768f7
6 changed files with 134 additions and 35 deletions

91
R/CRAN_RELEASE.md Normal file
View file

@ -0,0 +1,91 @@
# SparkR CRAN Release
To release SparkR as a package to CRAN, we would use the `devtools` package. Please work with the
`dev@spark.apache.org` community and R package maintainer on this.
### Release
First, check that the `Version:` field in the `pkg/DESCRIPTION` file is updated. Also, check for stale files not under source control.
Note that while `check-cran.sh` is running `R CMD check`, it is doing so with `--no-manual --no-vignettes`, which skips a few vignettes or PDF checks - therefore it will be preferred to run `R CMD check` on the source package built manually before uploading a release.
To upload a release, we would need to update the `cran-comments.md`. This should generally contain the results from running the `check-cran.sh` script along with comments on status of all `WARNING` (should not be any) or `NOTE`. As a part of `check-cran.sh` and the release process, the vignettes is build - make sure `SPARK_HOME` is set and Spark jars are accessible.
Once everything is in place, run in R under the `SPARK_HOME/R` directory:
```R
paths <- .libPaths(); .libPaths(c("lib", paths)); Sys.setenv(SPARK_HOME=tools::file_path_as_absolute("..")); devtools::release(); .libPaths(paths)
```
For more information please refer to http://r-pkgs.had.co.nz/release.html#release-check
### Testing: build package manually
To build package manually such as to inspect the resulting `.tar.gz` file content, we would also use the `devtools` package.
Source package is what get released to CRAN. CRAN would then build platform-specific binary packages from the source package.
#### Build source package
To build source package locally without releasing to CRAN, run in R under the `SPARK_HOME/R` directory:
```R
paths <- .libPaths(); .libPaths(c("lib", paths)); Sys.setenv(SPARK_HOME=tools::file_path_as_absolute("..")); devtools::build("pkg"); .libPaths(paths)
```
(http://r-pkgs.had.co.nz/vignettes.html#vignette-workflow-2)
Similarly, the source package is also created by `check-cran.sh` with `R CMD build pkg`.
For example, this should be the content of the source package:
```sh
DESCRIPTION R inst tests
NAMESPACE build man vignettes
inst/doc/
sparkr-vignettes.html
sparkr-vignettes.Rmd
sparkr-vignettes.Rman
build/
vignette.rds
man/
*.Rd files...
vignettes/
sparkr-vignettes.Rmd
```
#### Test source package
To install, run this:
```sh
R CMD INSTALL SparkR_2.1.0.tar.gz
```
With "2.1.0" replaced with the version of SparkR.
This command installs SparkR to the default libPaths. Once that is done, you should be able to start R and run:
```R
library(SparkR)
vignette("sparkr-vignettes", package="SparkR")
```
#### Build binary package
To build binary package locally, run in R under the `SPARK_HOME/R` directory:
```R
paths <- .libPaths(); .libPaths(c("lib", paths)); Sys.setenv(SPARK_HOME=tools::file_path_as_absolute("..")); devtools::build("pkg", binary = TRUE); .libPaths(paths)
```
For example, this should be the content of the binary package:
```sh
DESCRIPTION Meta R html tests
INDEX NAMESPACE help profile worker
```

View file

@ -6,7 +6,7 @@ SparkR is an R package that provides a light-weight frontend to use Spark from R
Libraries of sparkR need to be created in `$SPARK_HOME/R/lib`. This can be done by running the script `$SPARK_HOME/R/install-dev.sh`.
By default the above script uses the system wide installation of R. However, this can be changed to any user installed location of R by setting the environment variable `R_HOME` the full path of the base directory where R is installed, before running install-dev.sh script.
Example:
Example:
```bash
# where /home/username/R is where R is installed and /home/username/R/bin contains the files R and RScript
export R_HOME=/home/username/R
@ -46,7 +46,7 @@ Sys.setenv(SPARK_HOME="/Users/username/spark")
# This line loads SparkR from the installed directory
.libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths()))
library(SparkR)
sc <- sparkR.init(master="local")
sparkR.session()
```
#### Making changes to SparkR
@ -54,11 +54,11 @@ sc <- sparkR.init(master="local")
The [instructions](https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark) for making contributions to Spark also apply to SparkR.
If you only make R file changes (i.e. no Scala changes) then you can just re-install the R package using `R/install-dev.sh` and test your changes.
Once you have made your changes, please include unit tests for them and run existing unit tests using the `R/run-tests.sh` script as described below.
#### Generating documentation
The SparkR documentation (Rd files and HTML files) are not a part of the source repository. To generate them you can run the script `R/create-docs.sh`. This script uses `devtools` and `knitr` to generate the docs and these packages need to be installed on the machine before using the script. Also, you may need to install these [prerequisites](https://github.com/apache/spark/tree/master/docs#prerequisites). See also, `R/DOCUMENTATION.md`
### Examples, Unit tests
SparkR comes with several sample programs in the `examples/src/main/r` directory.

View file

@ -36,11 +36,27 @@ if [ ! -z "$R_HOME" ]
fi
echo "USING R_HOME = $R_HOME"
# Build the latest docs
# Build the latest docs, but not vignettes, which is built with the package next
$FWDIR/create-docs.sh
# Build a zip file containing the source package
"$R_SCRIPT_PATH/"R CMD build $FWDIR/pkg
# Build source package with vignettes
SPARK_HOME="$(cd "${FWDIR}"/..; pwd)"
. "${SPARK_HOME}"/bin/load-spark-env.sh
if [ -f "${SPARK_HOME}/RELEASE" ]; then
SPARK_JARS_DIR="${SPARK_HOME}/jars"
else
SPARK_JARS_DIR="${SPARK_HOME}/assembly/target/scala-$SPARK_SCALA_VERSION/jars"
fi
if [ -d "$SPARK_JARS_DIR" ]; then
# Build a zip file containing the source package with vignettes
SPARK_HOME="${SPARK_HOME}" "$R_SCRIPT_PATH/"R CMD build $FWDIR/pkg
find pkg/vignettes/. -not -name '.' -not -name '*.Rmd' -not -name '*.md' -not -name '*.pdf' -not -name '*.html' -delete
else
echo "Error Spark JARs not found in $SPARK_HOME"
exit 1
fi
# Run check as-cran.
VERSION=`grep Version $FWDIR/pkg/DESCRIPTION | awk '{print $NF}'`
@ -54,11 +70,16 @@ fi
if [ -n "$NO_MANUAL" ]
then
CRAN_CHECK_OPTIONS=$CRAN_CHECK_OPTIONS" --no-manual"
CRAN_CHECK_OPTIONS=$CRAN_CHECK_OPTIONS" --no-manual --no-vignettes"
fi
echo "Running CRAN check with $CRAN_CHECK_OPTIONS options"
"$R_SCRIPT_PATH/"R CMD check $CRAN_CHECK_OPTIONS SparkR_"$VERSION".tar.gz
if [ -n "$NO_TESTS" ] && [ -n "$NO_MANUAL" ]
then
"$R_SCRIPT_PATH/"R CMD check $CRAN_CHECK_OPTIONS SparkR_"$VERSION".tar.gz
else
# This will run tests and/or build vignettes, and require SPARK_HOME
SPARK_HOME="${SPARK_HOME}" "$R_SCRIPT_PATH/"R CMD check $CRAN_CHECK_OPTIONS SparkR_"$VERSION".tar.gz
fi
popd > /dev/null

View file

@ -20,7 +20,7 @@
# Script to create API docs and vignettes for SparkR
# This requires `devtools`, `knitr` and `rmarkdown` to be installed on the machine.
# After running this script the html docs can be found in
# After running this script the html docs can be found in
# $SPARK_HOME/R/pkg/html
# The vignettes can be found in
# $SPARK_HOME/R/pkg/vignettes/sparkr_vignettes.html
@ -52,21 +52,4 @@ Rscript -e 'libDir <- "../../lib"; library(SparkR, lib.loc=libDir); library(knit
popd
# Find Spark jars.
if [ -f "${SPARK_HOME}/RELEASE" ]; then
SPARK_JARS_DIR="${SPARK_HOME}/jars"
else
SPARK_JARS_DIR="${SPARK_HOME}/assembly/target/scala-$SPARK_SCALA_VERSION/jars"
fi
# Only create vignettes if Spark JARs exist
if [ -d "$SPARK_JARS_DIR" ]; then
# render creates SparkR vignettes
Rscript -e 'library(rmarkdown); paths <- .libPaths(); .libPaths(c("lib", paths)); Sys.setenv(SPARK_HOME=tools::file_path_as_absolute("..")); render("pkg/vignettes/sparkr-vignettes.Rmd"); .libPaths(paths)'
find pkg/vignettes/. -not -name '.' -not -name '*.Rmd' -not -name '*.md' -not -name '*.pdf' -not -name '*.html' -delete
else
echo "Skipping R vignettes as Spark JARs not found in $SPARK_HOME"
fi
popd

View file

@ -1,8 +1,8 @@
Package: SparkR
Type: Package
Title: R Frontend for Apache Spark
Version: 2.0.0
Date: 2016-08-27
Version: 2.1.0
Date: 2016-11-06
Authors@R: c(person("Shivaram", "Venkataraman", role = c("aut", "cre"),
email = "shivaram@cs.berkeley.edu"),
person("Xiangrui", "Meng", role = "aut",
@ -18,7 +18,9 @@ Depends:
Suggests:
testthat,
e1071,
survival
survival,
knitr,
rmarkdown
Description: The SparkR package provides an R frontend for Apache Spark.
License: Apache License (== 2.0)
Collate:
@ -48,3 +50,4 @@ Collate:
'utils.R'
'window.R'
RoxygenNote: 5.0.1
VignetteBuilder: knitr

View file

@ -1,12 +1,13 @@
---
title: "SparkR - Practical Guide"
output:
html_document:
theme: united
rmarkdown::html_vignette:
toc: true
toc_depth: 4
toc_float: true
highlight: textmate
vignette: >
%\VignetteIndexEntry{SparkR - Practical Guide}
%\VignetteEngine{knitr::rmarkdown}
\usepackage[utf8]{inputenc}
---
## Overview