Commit graph

200 commits

Author SHA1 Message Date
Andrew Or 20ddf5fddf [SPARK-14014][SQL] Integrate session catalog (attempt #2)
## What changes were proposed in this pull request?

This reopens #11836, which was merged but promptly reverted because it introduced flaky Hive tests.

## How was this patch tested?

See `CatalogTestCases`, `SessionCatalogSuite` and `HiveContextSuite`.

Author: Andrew Or <andrew@databricks.com>

Closes #11938 from andrewor14/session-catalog-again.
2016-03-24 22:59:35 -07:00
Yanbo Liang 13cbb2de70 [SPARK-13010][ML][SPARKR] Implement a simple wrapper of AFTSurvivalRegression in SparkR
## What changes were proposed in this pull request?
This PR continues the work in #11447, we implemented the wrapper of ```AFTSurvivalRegression``` named ```survreg``` in SparkR.

## How was this patch tested?
Test against output from R package survival's survreg.

cc mengxr felixcheung

Close #11447

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #11932 from yanboliang/spark-13010-new.
2016-03-24 22:29:34 -07:00
Andrew Or c44d140cae Revert "[SPARK-14014][SQL] Replace existing catalog with SessionCatalog"
This reverts commit 5dfc01976b.
2016-03-23 22:21:15 -07:00
Andrew Or 5dfc01976b [SPARK-14014][SQL] Replace existing catalog with SessionCatalog
## What changes were proposed in this pull request?

`SessionCatalog`, introduced in #11750, is a catalog that keeps track of temporary functions and tables, and delegates metastore operations to `ExternalCatalog`. This functionality overlaps a lot with the existing `analysis.Catalog`.

As of this commit, `SessionCatalog` and `ExternalCatalog` will no longer be dead code. There are still things that need to be done after this patch, namely:
- SPARK-14013: Properly implement temporary functions in `SessionCatalog`
- SPARK-13879: Decide which DDL/DML commands to support natively in Spark
- SPARK-?????: Implement the ones we do want to support through `SessionCatalog`.
- SPARK-?????: Merge SQL/HiveContext

## How was this patch tested?

This is largely a refactoring task so there are no new tests introduced. The particularly relevant tests are `SessionCatalogSuite` and `ExternalCatalogSuite`.

Author: Andrew Or <andrew@databricks.com>
Author: Yin Huai <yhuai@databricks.com>

Closes #11836 from andrewor14/use-session-catalog.
2016-03-23 13:34:22 -07:00
Xusen Yin d6dc12ef01 [SPARK-13449] Naive Bayes wrapper in SparkR
## What changes were proposed in this pull request?

This PR continues the work in #11486 from yinxusen with some code refactoring. In R package e1071, `naiveBayes` supports both categorical (Bernoulli) and continuous features (Gaussian), while in MLlib we support Bernoulli and multinomial. This PR implements the common subset: Bernoulli.

I moved the implementation out from SparkRWrappers to NaiveBayesWrapper to make it easier to read. Argument names, default values, and summary now match e1071's naiveBayes.

I removed the preprocess part that omit NA values because we don't know which columns to process.

## How was this patch tested?

Test against output from R package e1071's naiveBayes.

cc: yanboliang yinxusen

Closes #11486

Author: Xusen Yin <yinxusen@gmail.com>
Author: Xiangrui Meng <meng@databricks.com>

Closes #11890 from mengxr/SPARK-13449.
2016-03-22 14:16:51 -07:00
Dongjoon Hyun 2082a49569 [MINOR][DOCS] Use spark-submit instead of sparkR to submit R script.
## What changes were proposed in this pull request?

Since `sparkR` is not used for submitting R Scripts from Spark 2.0, a user faces the following error message if he follows the instruction on `R/README.md`. This PR updates `R/README.md`.
```bash
$ ./bin/sparkR examples/src/main/r/dataframe.R
Running R applications through 'sparkR' is not supported as of Spark 2.0.
Use ./bin/spark-submit <R file>
```

## How was this patch tested?

Manual.

Author: Dongjoon Hyun <dongjoon@apache.org>

Closes #11842 from dongjoon-hyun/update_r_readme.
2016-03-19 13:23:34 +00:00
Sun Rui c7e68c3968 [SPARK-13812][SPARKR] Fix SparkR lint-r test errors.
## What changes were proposed in this pull request?

This PR fixes all newly captured SparkR lint-r errors after the lintr package is updated from github.

## How was this patch tested?

dev/lint-r
SparkR unit tests

Author: Sun Rui <rui.sun@intel.com>

Closes #11652 from sun-rui/SPARK-13812.
2016-03-13 14:30:44 -07:00
Yanbo Liang 4d535d1f1c [SPARK-13389][SPARKR] SparkR support first/last with ignore NAs
## What changes were proposed in this pull request?

SparkR support first/last with ignore NAs

cc sun-rui felixcheung shivaram

## How was the this patch tested?

unit tests

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #11267 from yanboliang/spark-13389.
2016-03-10 17:31:19 -08:00
Oscar D. Lara Yejas 416e71af4d [SPARK-13327][SPARKR] Added parameter validations for colnames<-
Author: Oscar D. Lara Yejas <odlaraye@oscars-mbp.attlocal.net>
Author: Oscar D. Lara Yejas <odlaraye@oscars-mbp.usca.ibm.com>

Closes #11220 from olarayej/SPARK-13312-3.
2016-03-10 17:10:23 -08:00
Yanbo Liang 50e60e36f7 [SPARK-13504] [SPARKR] Add approxQuantile for SparkR
## What changes were proposed in this pull request?
Add ```approxQuantile``` for SparkR.
## How was this patch tested?
unit tests

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #11383 from yanboliang/spark-13504 and squashes the following commits:

4f17adb [Yanbo Liang] Add approxQuantile for SparkR
2016-02-25 21:23:41 -08:00
Liang-Chi Hsieh 8930181833 [SPARK-13472] [SPARKR] Fix unstable Kmeans test in R
JIRA: https://issues.apache.org/jira/browse/SPARK-13472

## What changes were proposed in this pull request?

One Kmeans test in R is unstable and sometimes fails. We should fix it.

## How was this patch tested?

Unit test is modified in this PR.

Author: Liang-Chi Hsieh <viirya@gmail.com>

Closes #11345 from viirya/fix-kmeans-r-test and squashes the following commits:

f959f61 [Liang-Chi Hsieh] Sort resulted clusters.
2016-02-24 07:05:20 -08:00
Xusen Yin 8d29001dec [SPARK-13011] K-means wrapper in SparkR
https://issues.apache.org/jira/browse/SPARK-13011

Author: Xusen Yin <yinxusen@gmail.com>

Closes #11124 from yinxusen/SPARK-13011.
2016-02-23 15:42:58 -08:00
Dongjoon Hyun 024482bf51 [MINOR][DOCS] Fix all typos in markdown files of doc and similar patterns in other comments
## What changes were proposed in this pull request?

This PR tries to fix all typos in all markdown files under `docs` module,
and fixes similar typos in other comments, too.

## How was the this patch tested?

manual tests.

Author: Dongjoon Hyun <dongjoon@apache.org>

Closes #11300 from dongjoon-hyun/minor_fix_typos.
2016-02-22 09:52:07 +00:00
Cheng Lian d9efe63ecd [SPARK-12799] Simplify various string output for expressions
This PR introduces several major changes:

1. Replacing `Expression.prettyString` with `Expression.sql`

   The `prettyString` method is mostly an internal, developer faced facility for debugging purposes, and shouldn't be exposed to users.

1. Using SQL-like representation as column names for selected fields that are not named expression (back-ticks and double quotes should be removed)

   Before, we were using `prettyString` as column names when possible, and sometimes the result column names can be weird.  Here are several examples:

   Expression         | `prettyString` | `sql`      | Note
   ------------------ | -------------- | ---------- | ---------------
   `a && b`           | `a && b`       | `a AND b`  |
   `a.getField("f")`  | `a[f]`         | `a.f`      | `a` is a struct

1. Adding trait `NonSQLExpression` extending from `Expression` for expressions that don't have a SQL representation (e.g. Scala UDF/UDAF and Java/Scala object expressions used for encoders)

   `NonSQLExpression.sql` may return an arbitrary user facing string representation of the expression.

Author: Cheng Lian <lian@databricks.com>

Closes #10757 from liancheng/spark-12799.simplify-expression-string-methods.
2016-02-21 22:53:15 +08:00
Sean Owen fb7e21797e [SPARK-13339][DOCS] Clarify commutative / associative operator requirements for reduce, fold
Clarify that reduce functions need to be commutative, and fold functions do not

See https://github.com/apache/spark/pull/11091

Author: Sean Owen <sowen@cloudera.com>

Closes #11217 from srowen/SPARK-13339.
2016-02-19 10:26:38 +00:00
Sasaki Toru c2f21d8898 [SPARK-13264][DOC] Removed multi-byte characters in spark-env.sh.template
In spark-env.sh.template, there are multi-byte characters, this PR will remove it.

Author: Sasaki Toru <sasakitoa@nttdata.co.jp>

Closes #11149 from sasakitoa/remove_multibyte_in_sparkenv.
2016-02-11 09:30:36 +00:00
Yanbo Liang e7f9199e70 [SPARK-12903][SPARKR] Add covar_samp and covar_pop for SparkR
Add ```covar_samp``` and ```covar_pop``` for SparkR.
Should we also provide ```cov``` alias for ```covar_samp```? There is ```cov``` implementation at stats.R which masks ```stats::cov``` already, but may bring to breaking API change.

cc sun-rui felixcheung shivaram

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #10829 from yanboliang/spark-12903.
2016-01-26 19:29:47 -08:00
Narine Kokhlikyan 8a88e12128 [SPARK-12629][SPARKR] Fixes for DataFrame saveAsTable method
I've tried to solve some of the issues mentioned in: https://issues.apache.org/jira/browse/SPARK-12629
Please, let me know what do you think.
Thanks!

Author: Narine Kokhlikyan <narine.kokhlikyan@gmail.com>

Closes #10580 from NarineK/sparkrSavaAsRable.
2016-01-22 10:35:02 -08:00
Sun Rui 1b2a918e59 [SPARK-12204][SPARKR] Implement drop method for DataFrame in SparkR.
Author: Sun Rui <rui.sun@intel.com>

Closes #10201 from sun-rui/SPARK-12204.
2016-01-20 21:08:15 -08:00
Shubhanshu Mishra d7415991a1 [SPARK-12910] Fixes : R version for installing sparkR
Testing code:
```
$ ./install-dev.sh
USING R_HOME = /usr/bin
ERROR: this R is version 2.15.1, package 'SparkR' requires R >= 3.0
```

Using the new argument:
```
$ ./install-dev.sh /content/username/SOFTWARE/R-3.2.3
USING R_HOME = /content/username/SOFTWARE/R-3.2.3/bin
* installing *source* package ‘SparkR’ ...
** R
** inst
** preparing package for lazy loading
Creating a new generic function for ‘colnames’ in package ‘SparkR’
Creating a new generic function for ‘colnames<-’ in package ‘SparkR’
Creating a new generic function for ‘cov’ in package ‘SparkR’
Creating a new generic function for ‘na.omit’ in package ‘SparkR’
Creating a new generic function for ‘filter’ in package ‘SparkR’
Creating a new generic function for ‘intersect’ in package ‘SparkR’
Creating a new generic function for ‘sample’ in package ‘SparkR’
Creating a new generic function for ‘transform’ in package ‘SparkR’
Creating a new generic function for ‘subset’ in package ‘SparkR’
Creating a new generic function for ‘summary’ in package ‘SparkR’
Creating a new generic function for ‘lag’ in package ‘SparkR’
Creating a new generic function for ‘rank’ in package ‘SparkR’
Creating a new generic function for ‘sd’ in package ‘SparkR’
Creating a new generic function for ‘var’ in package ‘SparkR’
Creating a new generic function for ‘predict’ in package ‘SparkR’
Creating a new generic function for ‘rbind’ in package ‘SparkR’
Creating a generic function for ‘lapply’ from package ‘base’ in package ‘SparkR’
Creating a generic function for ‘Filter’ from package ‘base’ in package ‘SparkR’
Creating a generic function for ‘alias’ from package ‘stats’ in package ‘SparkR’
Creating a generic function for ‘substr’ from package ‘base’ in package ‘SparkR’
Creating a generic function for ‘%in%’ from package ‘base’ in package ‘SparkR’
Creating a generic function for ‘mean’ from package ‘base’ in package ‘SparkR’
Creating a generic function for ‘unique’ from package ‘base’ in package ‘SparkR’
Creating a generic function for ‘nrow’ from package ‘base’ in package ‘SparkR’
Creating a generic function for ‘ncol’ from package ‘base’ in package ‘SparkR’
Creating a generic function for ‘head’ from package ‘utils’ in package ‘SparkR’
Creating a generic function for ‘factorial’ from package ‘base’ in package ‘SparkR’
Creating a generic function for ‘atan2’ from package ‘base’ in package ‘SparkR’
Creating a generic function for ‘ifelse’ from package ‘base’ in package ‘SparkR’
** help
No man pages found in package  ‘SparkR’
*** installing help indices
** building package indices
** testing if installed package can be loaded
* DONE (SparkR)

```

Author: Shubhanshu Mishra <smishra8@illinois.edu>

Closes #10836 from napsternxg/master.
2016-01-20 18:06:06 -08:00
Herman van Hovell 1017327930 [SPARK-12848][SQL] Change parsed decimal literal datatype from Double to Decimal
The current parser turns a decimal literal, for example ```12.1```, into a Double. The problem with this approach is that we convert an exact literal into a non-exact ```Double```. The PR changes this behavior, a Decimal literal is now converted into an extact ```BigDecimal```.

The behavior for scientific decimals, for example ```12.1e01```, is unchanged. This will be converted into a Double.

This PR replaces the ```BigDecimal``` literal by a ```Double``` literal, because the ```BigDecimal``` is the default now. You can use the double literal by appending a 'D' to the value, for instance: ```3.141527D```

cc davies rxin

Author: Herman van Hovell <hvanhovell@questtec.nl>

Closes #10796 from hvanhovell/SPARK-12848.
2016-01-20 15:13:01 -08:00
felixcheung 488bbb216c [SPARK-12232][SPARKR] New R API for read.table to avoid name conflict
shivaram sorry it took longer to fix some conflicts, this is the change to add an alias for `table`

Author: felixcheung <felixcheung_m@hotmail.com>

Closes #10406 from felixcheung/readtable.
2016-01-19 18:31:03 -08:00
Sun Rui 3ac648289c [SPARK-12337][SPARKR] Implement dropDuplicates() method of DataFrame in SparkR.
Author: Sun Rui <rui.sun@intel.com>

Closes #10309 from sun-rui/SPARK-12337.
2016-01-19 16:37:18 -08:00
felixcheung 37fefa66cb [SPARK-12168][SPARKR] Add automated tests for conflicted function in R
Currently this is reported when loading the SparkR package in R (probably would add is.nan)
```
Loading required package: methods

Attaching package: ‘SparkR’

The following objects are masked from ‘package:stats’:

    cov, filter, lag, na.omit, predict, sd, var

The following objects are masked from ‘package:base’:

    colnames, colnames<-, intersect, rank, rbind, sample, subset,
    summary, table, transform
```

Adding this test adds an automated way to track changes to masked method.
Also, the second part of this test check for those functions that would not be accessible without namespace/package prefix.

Incidentally, this might point to how we would fix those inaccessible functions in base or stats.
Looking for feedback for adding this test.

Author: felixcheung <felixcheung_m@hotmail.com>

Closes #10171 from felixcheung/rmaskedtest.
2016-01-19 16:33:48 -08:00
felixcheung 92502703f4 [SPARK-12862][SPARKR] Jenkins does not run R tests
Slight correction: I'm leaving sparkR as-is (ie. R file not supported) and fixed only run-tests.sh as shivaram described.

I also assume we are going to cover all doc changes in https://issues.apache.org/jira/browse/SPARK-12846 instead of here.

rxin shivaram zjffdu

Author: felixcheung <felixcheung_m@hotmail.com>

Closes #10792 from felixcheung/sparkRcmd.
2016-01-17 09:29:08 -08:00
Oscar D. Lara Yejas ba4a641902 [SPARK-11031][SPARKR] Method str() on a DataFrame
Author: Oscar D. Lara Yejas <odlaraye@oscars-mbp.usca.ibm.com>
Author: Oscar D. Lara Yejas <olarayej@mail.usf.edu>
Author: Oscar D. Lara Yejas <oscar.lara.yejas@us.ibm.com>
Author: Oscar D. Lara Yejas <odlaraye@oscars-mbp.attlocal.net>

Closes #9613 from olarayej/SPARK-11031.
2016-01-15 07:37:54 -08:00
Wenchen Fan 962e9bcf94 [SPARK-12756][SQL] use hash expression in Exchange
This PR makes bucketing and exchange share one common hash algorithm, so that we can guarantee the data distribution is same between shuffle and bucketed data source, which enables us to only shuffle one side when join a bucketed table and a normal one.

This PR also fixes the tests that are broken by the new hash behaviour in shuffle.

Author: Wenchen Fan <wenchen@databricks.com>

Closes #10703 from cloud-fan/use-hash-expr-in-shuffle.
2016-01-13 22:43:28 -08:00
Yanbo Liang 3d77cffec0 [SPARK-12645][SPARKR] SparkR support hash function
Add ```hash``` function for SparkR ```DataFrame```.

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #10597 from yanboliang/spark-12645.
2016-01-09 12:29:51 +05:30
Yanbo Liang d1fea41363 [SPARK-12393][SPARKR] Add read.text and write.text for SparkR
Add ```read.text``` and ```write.text``` for SparkR.
cc sun-rui felixcheung shivaram

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #10348 from yanboliang/spark-12393.
2016-01-06 12:05:41 +05:30
felixcheung cc4d5229c9 [SPARK-12625][SPARKR][SQL] replace R usage of Spark SQL deprecated API
rxin davies shivaram
Took save mode from my PR #10480, and move everything to writer methods. This is related to PR #10559

- [x] it seems jsonRDD() is broken, need to investigate - this is not a public API though; will look into some more tonight. (fixed)

Author: felixcheung <felixcheung_m@hotmail.com>

Closes #10584 from felixcheung/rremovedeprecated.
2016-01-04 22:32:07 -08:00
felixcheung c3d505602d [SPARK-12327][SPARKR] fix code for lintr warning for commented code
shivaram

Author: felixcheung <felixcheung_m@hotmail.com>

Closes #10408 from felixcheung/rcodecomment.
2016-01-03 20:53:35 +05:30
Hossein f6ecf14333 [SPARK-11199][SPARKR] Improve R context management story and add getOrCreate
* Changes api.r.SQLUtils to use ```SQLContext.getOrCreate``` instead of creating a new context.
* Adds a simple test

[SPARK-11199] #comment link with JIRA

Author: Hossein <hossein@databricks.com>

Closes #9185 from falaki/SPARK-11199.
2015-12-29 11:44:20 -08:00
Forest Fang d80cc90b55 [SPARK-12526][SPARKR] ifelse, when, otherwise` unable to take Column as value
`ifelse`, `when`, `otherwise` is unable to take `Column` typed S4 object as values.

For example:
```r
ifelse(lit(1) == lit(1), lit(2), lit(3))
ifelse(df$mpg > 0, df$mpg, 0)
```
will both fail with
```r
attempt to replicate an object of type 'environment'
```

The PR replaces `ifelse` calls with `if ... else ...` inside the function implementations to avoid attempt to vectorize(i.e. `rep()`). It remains to be discussed whether we should instead support vectorization in these functions for consistency because `ifelse` in base R is vectorized but I cannot foresee any scenarios these functions will want to be vectorized in SparkR.

For reference, added test cases which trigger failures:
```r
. Error: when(), otherwise() and ifelse() with column on a DataFrame ----------
error in evaluating the argument 'x' in selecting a method for function 'collect':
  error in evaluating the argument 'col' in selecting a method for function 'select':
  attempt to replicate an object of type 'environment'
Calls: when -> when -> ifelse -> ifelse

1: withCallingHandlers(eval(code, new_test_environment), error = capture_calls, message = function(c) invokeRestart("muffleMessage"))
2: eval(code, new_test_environment)
3: eval(expr, envir, enclos)
4: expect_equal(collect(select(df, when(df$a > 1 & df$b > 2, lit(1))))[, 1], c(NA, 1)) at test_sparkSQL.R:1126
5: expect_that(object, equals(expected, label = expected.label, ...), info = info, label = label)
6: condition(object)
7: compare(actual, expected, ...)
8: collect(select(df, when(df$a > 1 & df$b > 2, lit(1))))
Error: Test failures
Execution halted
```

Author: Forest Fang <forest.fang@outlook.com>

Closes #10481 from saurfang/spark-12526.
2015-12-29 12:45:24 +05:30
Reynold Xin f496031bd2 Bump master version to 2.0.0-SNAPSHOT.
Author: Reynold Xin <rxin@databricks.com>

Closes #10387 from rxin/version-bump.
2015-12-19 15:13:05 -08:00
Yanbo Liang 22f6cd86fc [SPARK-12310][SPARKR] Add write.json and write.parquet for SparkR
Add ```write.json``` and ```write.parquet``` for SparkR, and deprecated ```saveAsParquetFile```.

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #10281 from yanboliang/spark-12310.
2015-12-16 10:34:30 -08:00
Jeff Zhang 2eb5af5f0d [SPARK-12318][SPARKR] Save mode in SparkR should be error by default
shivaram  Please help review.

Author: Jeff Zhang <zjffdu@apache.org>

Closes #10290 from zjffdu/SPARK-12318.
2015-12-16 10:32:32 -08:00
Shivaram Venkataraman fb3778de68 [SPARK-12327] Disable commented code lintr temporarily
cc yhuai felixcheung shaneknapp

Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>

Closes #10300 from shivaram/comment-lintr-disable.
2015-12-14 16:13:55 -08:00
gatorsmile 1e3526c2d3 [SPARK-12158][SPARKR][SQL] Fix 'sample' functions that break R unit test cases
The existing sample functions miss the parameter `seed`, however, the corresponding function interface in `generics` has such a parameter. Thus, although the function caller can call the function with the 'seed', we are not using the value.

This could cause SparkR unit tests failed. For example, I hit it in another PR:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47213/consoleFull

Author: gatorsmile <gatorsmile@gmail.com>

Closes #10160 from gatorsmile/sampleR.
2015-12-11 20:55:16 -08:00
Yanbo Liang 0fb9825556 [SPARK-12146][SPARKR] SparkR jsonFile should support multiple input files
* ```jsonFile``` should support multiple input files, such as:
```R
jsonFile(sqlContext, c(“path1”, “path2”)) # character vector as arguments
jsonFile(sqlContext, “path1,path2”)
```
* Meanwhile, ```jsonFile``` has been deprecated by Spark SQL and will be removed at Spark 2.0. So we mark ```jsonFile``` deprecated and use ```read.json``` at SparkR side.
* Replace all ```jsonFile``` with ```read.json``` at test_sparkSQL.R, but still keep jsonFile test case.
* If this PR is accepted, we should also make almost the same change for ```parquetFile```.

cc felixcheung sun-rui shivaram

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #10145 from yanboliang/spark-12146.
2015-12-11 11:47:35 -08:00
Yanbo Liang d9d354ed40 [SPARK-12234][SPARKR] Fix ``subset` function error when only set `select`` argument
Fix ```subset``` function error when only set ```select``` argument. Please refer to the [JIRA](https://issues.apache.org/jira/browse/SPARK-12234) about the error and how to reproduce it.

cc sun-rui felixcheung shivaram

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #10217 from yanboliang/spark-12234.
2015-12-10 10:18:58 -08:00
Yanbo Liang eeb58722ad [SPARK-12198][SPARKR] SparkR support read.parquet and deprecate parquetFile
SparkR support ```read.parquet``` and deprecate ```parquetFile```. This change is similar with #10145 for ```jsonFile```.

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #10191 from yanboliang/spark-12198.
2015-12-10 09:44:53 -08:00
Sun Rui 39d677c8f1 [SPARK-12034][SPARKR] Eliminate warnings in SparkR test cases.
This PR:
1. Suppress all known warnings.
2. Cleanup test cases and fix some errors in test cases.
3. Fix errors in HiveContext related test cases. These test cases are actually not run previously due to a bug of creating TestHiveContext.
4. Support 'testthat' package version 0.11.0 which prefers that test cases be under 'tests/testthat'
5. Make sure the default Hadoop file system is local when running test cases.
6. Turn on warnings into errors.

Author: Sun Rui <rui.sun@intel.com>

Closes #10030 from sun-rui/SPARK-12034.
2015-12-07 10:38:17 -08:00
Yanbo Liang b6e8e63a0d [SPARK-12044][SPARKR] Fix usage of isnan, isNaN
1, Add ```isNaN``` to ```Column``` for SparkR. ```Column``` should has three related variable functions: ```isNaN, isNull, isNotNull```.
2, Replace ```DataFrame.isNaN``` with ```DataFrame.isnan``` at SparkR side. Because ```DataFrame.isNaN``` has been deprecated and will be removed at Spark 2.0.
<del>3, Add ```isnull``` to ```DataFrame``` for SparkR. ```DataFrame``` should has two related functions: ```isnan, isnull```.<del>

cc shivaram sun-rui felixcheung

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #10037 from yanboliang/spark-12044.
2015-12-05 22:51:05 -08:00
Yanbo Liang 6979edf4e1 [SPARK-12115][SPARKR] Change numPartitions() to getNumPartitions() to be consistent with Scala/Python
Change ```numPartitions()``` to ```getNumPartitions()``` to be consistent with Scala/Python.
<del>Note: If we can not catch up with 1.6 release, it will be breaking change for 1.7 that we also need to explain in release note.<del>

cc sun-rui felixcheung shivaram

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #10123 from yanboliang/spark-12115.
2015-12-05 16:39:01 -08:00
felixcheung 895b6c4747 [SPARK-11715][SPARKR] Add R support corr for Column Aggregration
Need to match existing method signature

Author: felixcheung <felixcheung_m@hotmail.com>

Closes #9680 from felixcheung/rcorr.
2015-12-05 16:00:12 -08:00
Sun Rui c8d0e160da [SPARK-11774][SPARKR] Implement struct(), encode(), decode() functions in SparkR.
Author: Sun Rui <rui.sun@intel.com>

Closes #9804 from sun-rui/SPARK-11774.
2015-12-05 15:49:51 -08:00
Sun Rui 5011f264fb [SPARK-12104][SPARKR] collect() does not handle multiple columns with same name.
Author: Sun Rui <rui.sun@intel.com>

Closes #10118 from sun-rui/SPARK-12104.
2015-12-03 21:11:10 -08:00
felixcheung 2213441e5e [SPARK-12019][SPARKR] Support character vector for sparkR.init(), check param and fix doc
and add tests.
Spark submit expects comma-separated list

Author: felixcheung <felixcheung_m@hotmail.com>

Closes #10034 from felixcheung/sparkrinitdoc.
2015-12-03 13:25:20 -08:00
Sun Rui cc7a1bc937 [SPARK-11781][SPARKR] SparkR has problem in inferring type of raw type.
Author: Sun Rui <rui.sun@intel.com>

Closes #9769 from sun-rui/SPARK-11781.
2015-11-29 11:08:26 -08:00
felixcheung c793d2d9a1 [SPARK-9319][SPARKR] Add support for setting column names, types
Add support for for colnames, colnames<-, coltypes<-
Also added tests for names, names<- which have no test previously.

I merged with PR 8984 (coltypes). Clicked the wrong thing, crewed up the PR. Recreated it here. Was #9218

shivaram sun-rui

Author: felixcheung <felixcheung_m@hotmail.com>

Closes #9654 from felixcheung/colnamescoltypes.
2015-11-28 21:16:21 -08:00