## What changes were proposed in this pull request?
In order to support running SQL directly on files, we added some code in ResolveRelations to catch the exception thrown by catalog.lookupRelation and ignore it. This unfortunately masks all the exceptions. This patch changes the logic to simply test the table's existence.
## How was this patch tested?
I manually hacked some bugs into Spark and made sure the exceptions were being propagated up.
Author: Reynold Xin <rxin@databricks.com>
Closes#12634 from rxin/SPARK-14869.
## What changes were proposed in this pull request?
Changed class name defined in R from "DataFrame" to "SparkDataFrame". A popular package, S4Vector already defines "DataFrame" - this change is to avoid conflict.
Aside from class name and API/roxygen2 references, SparkR APIs like `createDataFrame`, `as.DataFrame` are not changed (S4Vector does not define a "as.DataFrame").
Since in R, one would rarely reference type/class, this change should have minimal/almost-no impact to a SparkR user in terms of back compat.
## How was this patch tested?
SparkR tests, manually loading S4Vector then SparkR package
Author: felixcheung <felixcheung_m@hotmail.com>
Closes#12621 from felixcheung/rdataframe.
## What changes were proposed in this pull request?
This issue aims to expose Scala `bround` function in Python/R API.
`bround` function is implemented in SPARK-14614 by extending current `round` function.
We used the following semantics from Hive.
```java
public static double bround(double input, int scale) {
if (Double.isNaN(input) || Double.isInfinite(input)) {
return input;
}
return BigDecimal.valueOf(input).setScale(scale, RoundingMode.HALF_EVEN).doubleValue();
}
```
After this PR, `pyspark` and `sparkR` also support `bround` function.
**PySpark**
```python
>>> from pyspark.sql.functions import bround
>>> sqlContext.createDataFrame([(2.5,)], ['a']).select(bround('a', 0).alias('r')).collect()
[Row(r=2.0)]
```
**SparkR**
```r
> df = createDataFrame(sqlContext, data.frame(x = c(2.5, 3.5)))
> head(collect(select(df, bround(df$x, 0))))
bround(x, 0)
1 2
2 4
```
## How was this patch tested?
Pass the Jenkins tests (including new testcases).
Author: Dongjoon Hyun <dongjoon@apache.org>
Closes#12509 from dongjoon-hyun/SPARK-14639.
## What changes were proposed in this pull request?
Change the signature of as.data.frame() to be consistent with that in the R base package to meet R user's convention.
## How was this patch tested?
dev/lint-r
SparkR unit tests
Author: Sun Rui <rui.sun@intel.com>
Closes#11811 from sun-rui/SPARK-13905.
#### What changes were proposed in this pull request?
This PR is to address the comment: https://github.com/apache/spark/pull/12146#discussion-diff-59092238. It removes the function `isViewSupported` from `SessionCatalog`. After the removal, we still can capture the user errors if users try to drop a table using `DROP VIEW`.
#### How was this patch tested?
Modified the existing test cases
Author: gatorsmile <gatorsmile@gmail.com>
Closes#12284 from gatorsmile/followupDropTable.
## What changes were proposed in this pull request?
The `window` function was added to Dataset with [this PR](https://github.com/apache/spark/pull/12008).
This PR adds the R API for this function.
With this PR, SQL, Java, and Scala will share the same APIs as in users can use:
- `window(timeColumn, windowDuration)`
- `window(timeColumn, windowDuration, slideDuration)`
- `window(timeColumn, windowDuration, slideDuration, startTime)`
In Python and R, users can access all APIs above, but in addition they can do
- In R:
`window(timeColumn, windowDuration, startTime=...)`
that is, they can provide the startTime without providing the `slideDuration`. In this case, we will generate tumbling windows.
## How was this patch tested?
Unit tests + manual tests
Author: Burak Yavuz <brkyvz@gmail.com>
Closes#12141 from brkyvz/R-windows.
## What changes were proposed in this pull request?
This reopens#11836, which was merged but promptly reverted because it introduced flaky Hive tests.
## How was this patch tested?
See `CatalogTestCases`, `SessionCatalogSuite` and `HiveContextSuite`.
Author: Andrew Or <andrew@databricks.com>
Closes#11938 from andrewor14/session-catalog-again.
## What changes were proposed in this pull request?
`SessionCatalog`, introduced in #11750, is a catalog that keeps track of temporary functions and tables, and delegates metastore operations to `ExternalCatalog`. This functionality overlaps a lot with the existing `analysis.Catalog`.
As of this commit, `SessionCatalog` and `ExternalCatalog` will no longer be dead code. There are still things that need to be done after this patch, namely:
- SPARK-14013: Properly implement temporary functions in `SessionCatalog`
- SPARK-13879: Decide which DDL/DML commands to support natively in Spark
- SPARK-?????: Implement the ones we do want to support through `SessionCatalog`.
- SPARK-?????: Merge SQL/HiveContext
## How was this patch tested?
This is largely a refactoring task so there are no new tests introduced. The particularly relevant tests are `SessionCatalogSuite` and `ExternalCatalogSuite`.
Author: Andrew Or <andrew@databricks.com>
Author: Yin Huai <yhuai@databricks.com>
Closes#11836 from andrewor14/use-session-catalog.
## What changes were proposed in this pull request?
This PR fixes all newly captured SparkR lint-r errors after the lintr package is updated from github.
## How was this patch tested?
dev/lint-r
SparkR unit tests
Author: Sun Rui <rui.sun@intel.com>
Closes#11652 from sun-rui/SPARK-13812.
## What changes were proposed in this pull request?
SparkR support first/last with ignore NAs
cc sun-rui felixcheung shivaram
## How was the this patch tested?
unit tests
Author: Yanbo Liang <ybliang8@gmail.com>
Closes#11267 from yanboliang/spark-13389.
Author: Oscar D. Lara Yejas <odlaraye@oscars-mbp.attlocal.net>
Author: Oscar D. Lara Yejas <odlaraye@oscars-mbp.usca.ibm.com>
Closes#11220 from olarayej/SPARK-13312-3.
## What changes were proposed in this pull request?
Add ```approxQuantile``` for SparkR.
## How was this patch tested?
unit tests
Author: Yanbo Liang <ybliang8@gmail.com>
Closes#11383 from yanboliang/spark-13504 and squashes the following commits:
4f17adb [Yanbo Liang] Add approxQuantile for SparkR
This PR introduces several major changes:
1. Replacing `Expression.prettyString` with `Expression.sql`
The `prettyString` method is mostly an internal, developer faced facility for debugging purposes, and shouldn't be exposed to users.
1. Using SQL-like representation as column names for selected fields that are not named expression (back-ticks and double quotes should be removed)
Before, we were using `prettyString` as column names when possible, and sometimes the result column names can be weird. Here are several examples:
Expression | `prettyString` | `sql` | Note
------------------ | -------------- | ---------- | ---------------
`a && b` | `a && b` | `a AND b` |
`a.getField("f")` | `a[f]` | `a.f` | `a` is a struct
1. Adding trait `NonSQLExpression` extending from `Expression` for expressions that don't have a SQL representation (e.g. Scala UDF/UDAF and Java/Scala object expressions used for encoders)
`NonSQLExpression.sql` may return an arbitrary user facing string representation of the expression.
Author: Cheng Lian <lian@databricks.com>
Closes#10757 from liancheng/spark-12799.simplify-expression-string-methods.
Add ```covar_samp``` and ```covar_pop``` for SparkR.
Should we also provide ```cov``` alias for ```covar_samp```? There is ```cov``` implementation at stats.R which masks ```stats::cov``` already, but may bring to breaking API change.
cc sun-rui felixcheung shivaram
Author: Yanbo Liang <ybliang8@gmail.com>
Closes#10829 from yanboliang/spark-12903.
I've tried to solve some of the issues mentioned in: https://issues.apache.org/jira/browse/SPARK-12629
Please, let me know what do you think.
Thanks!
Author: Narine Kokhlikyan <narine.kokhlikyan@gmail.com>
Closes#10580 from NarineK/sparkrSavaAsRable.
The current parser turns a decimal literal, for example ```12.1```, into a Double. The problem with this approach is that we convert an exact literal into a non-exact ```Double```. The PR changes this behavior, a Decimal literal is now converted into an extact ```BigDecimal```.
The behavior for scientific decimals, for example ```12.1e01```, is unchanged. This will be converted into a Double.
This PR replaces the ```BigDecimal``` literal by a ```Double``` literal, because the ```BigDecimal``` is the default now. You can use the double literal by appending a 'D' to the value, for instance: ```3.141527D```
cc davies rxin
Author: Herman van Hovell <hvanhovell@questtec.nl>
Closes#10796 from hvanhovell/SPARK-12848.
shivaram sorry it took longer to fix some conflicts, this is the change to add an alias for `table`
Author: felixcheung <felixcheung_m@hotmail.com>
Closes#10406 from felixcheung/readtable.
Slight correction: I'm leaving sparkR as-is (ie. R file not supported) and fixed only run-tests.sh as shivaram described.
I also assume we are going to cover all doc changes in https://issues.apache.org/jira/browse/SPARK-12846 instead of here.
rxin shivaram zjffdu
Author: felixcheung <felixcheung_m@hotmail.com>
Closes#10792 from felixcheung/sparkRcmd.
Author: Oscar D. Lara Yejas <odlaraye@oscars-mbp.usca.ibm.com>
Author: Oscar D. Lara Yejas <olarayej@mail.usf.edu>
Author: Oscar D. Lara Yejas <oscar.lara.yejas@us.ibm.com>
Author: Oscar D. Lara Yejas <odlaraye@oscars-mbp.attlocal.net>
Closes#9613 from olarayej/SPARK-11031.
This PR makes bucketing and exchange share one common hash algorithm, so that we can guarantee the data distribution is same between shuffle and bucketed data source, which enables us to only shuffle one side when join a bucketed table and a normal one.
This PR also fixes the tests that are broken by the new hash behaviour in shuffle.
Author: Wenchen Fan <wenchen@databricks.com>
Closes#10703 from cloud-fan/use-hash-expr-in-shuffle.
Add ```read.text``` and ```write.text``` for SparkR.
cc sun-rui felixcheung shivaram
Author: Yanbo Liang <ybliang8@gmail.com>
Closes#10348 from yanboliang/spark-12393.
rxin davies shivaram
Took save mode from my PR #10480, and move everything to writer methods. This is related to PR #10559
- [x] it seems jsonRDD() is broken, need to investigate - this is not a public API though; will look into some more tonight. (fixed)
Author: felixcheung <felixcheung_m@hotmail.com>
Closes#10584 from felixcheung/rremovedeprecated.
* Changes api.r.SQLUtils to use ```SQLContext.getOrCreate``` instead of creating a new context.
* Adds a simple test
[SPARK-11199] #comment link with JIRA
Author: Hossein <hossein@databricks.com>
Closes#9185 from falaki/SPARK-11199.
`ifelse`, `when`, `otherwise` is unable to take `Column` typed S4 object as values.
For example:
```r
ifelse(lit(1) == lit(1), lit(2), lit(3))
ifelse(df$mpg > 0, df$mpg, 0)
```
will both fail with
```r
attempt to replicate an object of type 'environment'
```
The PR replaces `ifelse` calls with `if ... else ...` inside the function implementations to avoid attempt to vectorize(i.e. `rep()`). It remains to be discussed whether we should instead support vectorization in these functions for consistency because `ifelse` in base R is vectorized but I cannot foresee any scenarios these functions will want to be vectorized in SparkR.
For reference, added test cases which trigger failures:
```r
. Error: when(), otherwise() and ifelse() with column on a DataFrame ----------
error in evaluating the argument 'x' in selecting a method for function 'collect':
error in evaluating the argument 'col' in selecting a method for function 'select':
attempt to replicate an object of type 'environment'
Calls: when -> when -> ifelse -> ifelse
1: withCallingHandlers(eval(code, new_test_environment), error = capture_calls, message = function(c) invokeRestart("muffleMessage"))
2: eval(code, new_test_environment)
3: eval(expr, envir, enclos)
4: expect_equal(collect(select(df, when(df$a > 1 & df$b > 2, lit(1))))[, 1], c(NA, 1)) at test_sparkSQL.R:1126
5: expect_that(object, equals(expected, label = expected.label, ...), info = info, label = label)
6: condition(object)
7: compare(actual, expected, ...)
8: collect(select(df, when(df$a > 1 & df$b > 2, lit(1))))
Error: Test failures
Execution halted
```
Author: Forest Fang <forest.fang@outlook.com>
Closes#10481 from saurfang/spark-12526.
Add ```write.json``` and ```write.parquet``` for SparkR, and deprecated ```saveAsParquetFile```.
Author: Yanbo Liang <ybliang8@gmail.com>
Closes#10281 from yanboliang/spark-12310.
The existing sample functions miss the parameter `seed`, however, the corresponding function interface in `generics` has such a parameter. Thus, although the function caller can call the function with the 'seed', we are not using the value.
This could cause SparkR unit tests failed. For example, I hit it in another PR:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47213/consoleFull
Author: gatorsmile <gatorsmile@gmail.com>
Closes#10160 from gatorsmile/sampleR.
* ```jsonFile``` should support multiple input files, such as:
```R
jsonFile(sqlContext, c(“path1”, “path2”)) # character vector as arguments
jsonFile(sqlContext, “path1,path2”)
```
* Meanwhile, ```jsonFile``` has been deprecated by Spark SQL and will be removed at Spark 2.0. So we mark ```jsonFile``` deprecated and use ```read.json``` at SparkR side.
* Replace all ```jsonFile``` with ```read.json``` at test_sparkSQL.R, but still keep jsonFile test case.
* If this PR is accepted, we should also make almost the same change for ```parquetFile```.
cc felixcheung sun-rui shivaram
Author: Yanbo Liang <ybliang8@gmail.com>
Closes#10145 from yanboliang/spark-12146.
Fix ```subset``` function error when only set ```select``` argument. Please refer to the [JIRA](https://issues.apache.org/jira/browse/SPARK-12234) about the error and how to reproduce it.
cc sun-rui felixcheung shivaram
Author: Yanbo Liang <ybliang8@gmail.com>
Closes#10217 from yanboliang/spark-12234.
SparkR support ```read.parquet``` and deprecate ```parquetFile```. This change is similar with #10145 for ```jsonFile```.
Author: Yanbo Liang <ybliang8@gmail.com>
Closes#10191 from yanboliang/spark-12198.
This PR:
1. Suppress all known warnings.
2. Cleanup test cases and fix some errors in test cases.
3. Fix errors in HiveContext related test cases. These test cases are actually not run previously due to a bug of creating TestHiveContext.
4. Support 'testthat' package version 0.11.0 which prefers that test cases be under 'tests/testthat'
5. Make sure the default Hadoop file system is local when running test cases.
6. Turn on warnings into errors.
Author: Sun Rui <rui.sun@intel.com>
Closes#10030 from sun-rui/SPARK-12034.
2015-12-07 10:38:17 -08:00
Renamed from R/pkg/inst/tests/test_sparkSQL.R (Browse further)