Commit graph

28 commits

Author SHA1 Message Date
Felix Cheung b73defdd79 [SPARKR][DOCS] fix broken url in doc
## What changes were proposed in this pull request?

Fix broken url, also,

sparkR.session.stop doc page should have it in the header, instead of saying "sparkR.stop"
![image](https://cloud.githubusercontent.com/assets/8969467/17080129/26d41308-50d9-11e6-8967-79d6c920313f.png)

Data type section is in the middle of a list of gapply/gapplyCollect subsections:
![image](https://cloud.githubusercontent.com/assets/8969467/17080122/f992d00a-50d8-11e6-8f2c-fd5786213920.png)

## How was this patch tested?

manual test

Author: Felix Cheung <felixcheung_m@hotmail.com>

Closes #14329 from felixcheung/rdoclinkfix.
2016-07-25 11:25:41 -07:00
Felix Cheung 75f0efe74d [SPARKR][DOCS] minor code sample update in R programming guide
## What changes were proposed in this pull request?

Fix code style from ad hoc review of RC4 doc

## How was this patch tested?

manual

shivaram

Author: Felix Cheung <felixcheung_m@hotmail.com>

Closes #14250 from felixcheung/rdocs2rc4.
2016-07-18 16:01:57 -07:00
Narine Kokhlikyan 4167304836 [SPARK-16112][SPARKR] Programming guide for gapply/gapplyCollect
## What changes were proposed in this pull request?

Updates programming guide for spark.gapply/spark.gapplyCollect.

Similar to other examples I used `faithful` dataset to demonstrate gapply's functionality.
Please, let me know if you prefer another example.

## How was this patch tested?
Existing test cases in R

Author: Narine Kokhlikyan <narine@slice.com>

Closes #14090 from NarineK/gapplyProgGuide.
2016-07-16 16:56:16 -07:00
Felix Cheung fb2e8eeb0b [SPARKR][DOCS][MINOR] R programming guide to include csv data source example
## What changes were proposed in this pull request?

Minor documentation update for code example, code style, and missed reference to "sparkR.init"

## How was this patch tested?

manual

shivaram

Author: Felix Cheung <felixcheung_m@hotmail.com>

Closes #14178 from felixcheung/rcsvprogrammingguide.
2016-07-13 15:09:23 -07:00
Yanbo Liang 2ad031be67 [SPARKR][DOC] SparkR ML user guides update for 2.0
## What changes were proposed in this pull request?
* Update SparkR ML section to make them consistent with SparkR API docs.
* Since #13972 adds labelling support for the ```include_example``` Jekyll plugin, so that we can split the single ```ml.R``` example file into multiple line blocks with different labels, and include them in different algorithms/models in the generated HTML page.

## How was this patch tested?
Only docs update, manually check the generated docs.

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #14011 from yanboliang/r-user-guide-update.
2016-07-11 14:31:11 -07:00
Felix Cheung b5a997667f [SPARK-16088][SPARKR] update setJobGroup, cancelJobGroup, clearJobGroup
## What changes were proposed in this pull request?

Updated setJobGroup, cancelJobGroup, clearJobGroup to not require sc/SparkContext as parameter.
Also updated roxygen2 doc and R programming guide on deprecations.

## How was this patch tested?

unit tests

Author: Felix Cheung <felixcheung_m@hotmail.com>

Closes #13838 from felixcheung/rjobgroup.
2016-06-23 09:45:01 -07:00
Kai Jiang 43b04b7ecb [SPARK-15672][R][DOC] R programming guide update
## What changes were proposed in this pull request?
Guide for
- UDFs with dapply, dapplyCollect
- spark.lapply for running parallel R functions

## How was this patch tested?
build locally
<img width="654" alt="screen shot 2016-06-14 at 03 12 56" src="https://cloud.githubusercontent.com/assets/3419881/16039344/12a3b6a0-31de-11e6-8d77-fe23308075c0.png">

Author: Kai Jiang <jiangkai@gmail.com>

Closes #13660 from vectorijk/spark-15672-R-guide-update.
2016-06-22 12:50:36 -07:00
Felix Cheung 58f6e27dd7 [SPARK-15863][SQL][DOC][SPARKR] sql programming guide updates to include sparkSession in R
## What changes were proposed in this pull request?

Update doc as per discussion in PR #13592

## How was this patch tested?

manual

shivaram liancheng

Author: Felix Cheung <felixcheung_m@hotmail.com>

Closes #13799 from felixcheung/rsqlprogrammingguide.
2016-06-21 13:56:37 +08:00
Felix Cheung 359c2e827d [SPARK-15159][SPARKR] SparkSession roxygen2 doc, programming guide, example updates
## What changes were proposed in this pull request?

roxygen2 doc, programming guide, example updates

## How was this patch tested?

manual checks
shivaram

Author: Felix Cheung <felixcheung_m@hotmail.com>

Closes #13751 from felixcheung/rsparksessiondoc.
2016-06-20 13:46:24 -07:00
GayathriMurali af2a4b0826 [SPARK-15129][R][DOC] R API changes in ML
## What changes were proposed in this pull request?

Make user guide changes to SparkR documentation for all changes that happened in 2.0 to Machine Learning APIs

Author: GayathriMurali <gayathri.m@intel.com>

Closes #13285 from GayathriMurali/SPARK-15129.
2016-06-17 21:10:29 -07:00
felixcheung c82883239e [SPARK-10903] followup - update API doc for SqlContext
## What changes were proposed in this pull request?

Follow up on the earlier PR - in here we are fixing up roxygen2 doc examples.
Also add to the programming guide migration section.

## How was this patch tested?

SparkR tests

Author: felixcheung <felixcheung_m@hotmail.com>

Closes #13340 from felixcheung/sqlcontextdoc.
2016-05-26 21:42:36 -07:00
Krishna Kalyan 9082b7968a [SPARK-12071][DOC] Document the behaviour of NA in R
## What changes were proposed in this pull request?

Under Upgrading From SparkR 1.5.x to 1.6.x section added the information, SparkSQL converts `NA` in R to `null`.

## How was this patch tested?

Document update, no tests.

Author: Krishna Kalyan <krishnakalyan3@gmail.com>

Closes #13268 from krishnakalyan3/spark-12071-1.
2016-05-24 22:21:52 -07:00
Yanbo Liang ee3b171562 [MINOR] [SPARKR] Update data-manipulation.R to use native csv reader
## What changes were proposed in this pull request?
* Since Spark has supported native csv reader, it does not necessary to use the third party ```spark-csv``` in ```examples/src/main/r/data-manipulation.R```. Meanwhile, remove all ```spark-csv``` usage in SparkR.
* Running R applications through ```sparkR``` is not supported as of Spark 2.0, so we change to use ```./bin/spark-submit``` to run the example.

## How was this patch tested?
Offline test.

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #13005 from yanboliang/r-df-examples.
2016-05-09 09:58:36 -07:00
Dongjoon Hyun 6ab4d9e0c7 [SPARK-14883][DOCS] Fix wrong R examples and make them up-to-date
## What changes were proposed in this pull request?

This issue aims to fix some errors in R examples and make them up-to-date in docs and example modules.

- Remove the wrong usage of `map`. We need to use `lapply` in `sparkR` if needed. However, `lapply` is private so far. The corrected example will be added later.
- Fix the wrong example in Section `Generic Load/Save Functions` of `docs/sql-programming-guide.md` for consistency
- Fix datatypes in `sparkr.md`.
- Update a data result in `sparkr.md`.
- Replace deprecated functions to remove warnings: jsonFile -> read.json, parquetFile -> read.parquet
- Use up-to-date R-like functions: loadDF -> read.df, saveDF -> write.df, saveAsParquetFile -> write.parquet
- Replace `SparkR DataFrame` with `SparkDataFrame` in `dataframe.R` and `data-manipulation.R`.
- Other minor syntax fixes and a typo.

## How was this patch tested?

Manual.

Author: Dongjoon Hyun <dongjoon@apache.org>

Closes #12649 from dongjoon-hyun/SPARK-14883.
2016-04-24 22:10:27 -07:00
felixcheung 1b7eab74e6 [SPARK-12148][SPARKR] fix doc after renaming DataFrame to SparkDataFrame
## What changes were proposed in this pull request?

Fixed inadvertent roxygen2 doc changes, added class name change to programming guide
Follow up of #12621

## How was this patch tested?

manually checked

Author: felixcheung <felixcheung_m@hotmail.com>

Closes #12647 from felixcheung/rdataframe.
2016-04-23 18:20:31 -07:00
felixcheung 488bbb216c [SPARK-12232][SPARKR] New R API for read.table to avoid name conflict
shivaram sorry it took longer to fix some conflicts, this is the change to add an alias for `table`

Author: felixcheung <felixcheung_m@hotmail.com>

Closes #10406 from felixcheung/readtable.
2016-01-19 18:31:03 -08:00
felixcheung 8896ec9f02 [SPARKR][DOC] minor doc update for version in migration guide
checked that the change is in Spark 1.6.0.
shivaram

Author: felixcheung <felixcheung_m@hotmail.com>

Closes #10574 from felixcheung/rwritemodedoc.
2016-01-05 08:39:58 +05:30
Jeff Zhang 2eb5af5f0d [SPARK-12318][SPARKR] Save mode in SparkR should be error by default
shivaram  Please help review.

Author: Jeff Zhang <zjffdu@apache.org>

Closes #10290 from zjffdu/SPARK-12318.
2015-12-16 10:32:32 -08:00
felixcheung 43c575cb17 [SPARK-12116][SPARKR][DOCS] document how to workaround function name conflicts with dplyr
shivaram

Author: felixcheung <felixcheung_m@hotmail.com>

Closes #10119 from felixcheung/rdocdplyrmasked.
2015-12-03 09:22:21 -08:00
felixcheung 1a93323c5b [SPARK-11339][SPARKR] Document the list of functions in R base package that are masked by functions with same name in SparkR
Added tests for function that are reported as masked, to make sure the base:: or stats:: function can be called.

For those we can't call, added them to SparkR programming guide.

It would seem to me `table, sample, subset, filter, cov` not working are not actually expected - I investigated/experimented with them but couldn't get them to work. It looks like as they are defined in base or stats they are missing the S3 generic, eg.
```
> methods("transform")
[1] transform,ANY-method       transform.data.frame
[3] transform,DataFrame-method transform.default
see '?methods' for accessing help and source code
> methods("subset")
[1] subset.data.frame       subset,DataFrame-method subset.default
[4] subset.matrix
see '?methods' for accessing help and source code
Warning message:
In .S3methods(generic.function, class, parent.frame()) :
  function 'subset' appears not to be S3 generic; found functions that look like S3 methods
```
Any idea?

More information on masking:
http://www.ats.ucla.edu/stat/r/faq/referencing_objects.htm
http://www.sfu.ca/~sweldon/howTo/guide4.pdf

This is what the output doc looks like (minus css):
![image](https://cloud.githubusercontent.com/assets/8969467/11229714/2946e5de-8d4d-11e5-94b0-dda9696b6fdd.png)

Author: felixcheung <felixcheung_m@hotmail.com>

Closes #9785 from felixcheung/rmasked.
2015-11-18 23:32:49 -08:00
Yanbo Liang e222d75849 [SPARK-11684][R][ML][DOC] Update SparkR glm API doc, user guide and example codes
This PR includes:
* Update SparkR:::glm, SparkR:::summary API docs.
* Update SparkR machine learning user guide and example codes to show:
  * supporting feature interaction in R formula.
  * summary for gaussian GLM model.
  * coefficients for binomial GLM model.

mengxr

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #9727 from yanboliang/spark-11684.
2015-11-18 13:30:29 -08:00
felixcheung a9676cc710 [SPARK-11407][SPARKR] Add doc for running from RStudio
![image](https://cloud.githubusercontent.com/assets/8969467/10871746/612ba44a-80a4-11e5-99a0-40b9931dee52.png)
(This is without css, but you get the idea)
shivaram

Author: felixcheung <felixcheung_m@hotmail.com>

Closes #9401 from felixcheung/rstudioprogrammingguide.
2015-11-03 11:53:10 -08:00
felixcheung bb5a2af034 [SPARK-11340][SPARKR] Support setting driver properties when starting Spark from R programmatically or from RStudio
Mapping spark.driver.memory from sparkEnvir to spark-submit commandline arguments.

shivaram suggested that we possibly add other spark.driver.* properties - do we want to add all of those? I thought those could be set in SparkConf?
sun-rui

Author: felixcheung <felixcheung_m@hotmail.com>

Closes #9290 from felixcheung/rdrivermem.
2015-10-30 13:51:32 -07:00
Eric Liang 74a293f453 [SPARK-9713] [ML] Document SparkR MLlib glm() integration in Spark 1.5
This documents the use of R model formulae in the SparkR guide. Also fixes some bugs in the R api doc.

mengxr

Author: Eric Liang <ekl@databricks.com>

Closes #8085 from ericl/docs.
2015-08-11 21:26:03 -07:00
Shivaram Venkataraman 374c8a8a4a [SPARK-8900] [SPARKR] Fix sparkPackages in init documentation
cc pwendell

Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>

Closes #7293 from shivaram/sparkr-packages-doc and squashes the following commits:

c91471d [Shivaram Venkataraman] Fix sparkPackages in init documentation
2015-07-08 12:39:32 -07:00
Sun Rui bf02e37716 [SPARK-8894] [SPARKR] [DOC] Example code errors in SparkR documentation.
Author: Sun Rui <rui.sun@intel.com>

Closes #7287 from sun-rui/SPARK-8894 and squashes the following commits:

da63898 [Sun Rui] [SPARK-8894][SPARKR][DOC] Example code errors in SparkR documentation.
2015-07-08 09:48:16 -07:00
Holden Karau 43e66192f4 [SPARK-8506] Add pakages to R context created through init.
Author: Holden Karau <holden@pigscanfly.ca>

Closes #6928 from holdenk/SPARK-8506-sparkr-does-not-provide-an-easy-way-to-depend-on-spark-packages-when-performing-init-from-inside-of-r and squashes the following commits:

b60dd63 [Holden Karau] Add an example with the spark-csv package
fa8bc92 [Holden Karau] typo: sparm -> spark
865a90c [Holden Karau] strip spaces for comparision
c7a4471 [Holden Karau] Add some documentation
c1a9233 [Holden Karau] refactor for testing
c818556 [Holden Karau] Add pakages to R
2015-06-24 11:55:20 -07:00
Shivaram Venkataraman 5f48e5c33b [SPARK-6806] [SPARKR] [DOCS] Add a new SparkR programming guide
This PR adds a new SparkR programming guide at the top-level. This will be useful for R users as our APIs don't directly match the Scala/Python APIs and as we need to explain SparkR without using RDDs as examples etc.

cc rxin davies pwendell

cc cafreeman -- Would be great if you could also take a look at this !

Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>

Closes #6490 from shivaram/sparkr-guide and squashes the following commits:

d5ff360 [Shivaram Venkataraman] Add a section on HiveContext, HQL queries
408dce5 [Shivaram Venkataraman] Fix link
dbb86e3 [Shivaram Venkataraman] Fix minor typo
9aff5e0 [Shivaram Venkataraman] Address comments, use dplyr-like syntax in example
d09703c [Shivaram Venkataraman] Fix default argument in read.df
ea816a1 [Shivaram Venkataraman] Add a new SparkR programming guide Also update write.df, read.df to handle defaults better
2015-05-29 14:11:58 -07:00