Commit graph

505 commits

Author SHA1 Message Date
Xiao Li 2051428173 [SPARK-20980][SQL] Rename wholeFile to multiLine for both CSV and JSON
### What changes were proposed in this pull request?
The current option name `wholeFile` is misleading for CSV users. Currently, it is not representing a record per file. Actually, one file could have multiple records. Thus, we should rename it. Now, the proposal is `multiLine`.

### How was this patch tested?
N/A

Author: Xiao Li <gatorsmile@gmail.com>

Closes #18202 from gatorsmile/renameCVSOption.
2017-06-15 13:18:19 +08:00
Felix Cheung 9f4ff95524 [SPARK-20877][SPARKR][FOLLOWUP] clean up after test move
## What changes were proposed in this pull request?

clean up after big test move

## How was this patch tested?

unit tests, jenkins

Author: Felix Cheung <felixcheung_m@hotmail.com>

Closes #18267 from felixcheung/rtestset2.
2017-06-11 03:00:44 -07:00
Felix Cheung dc4c351837 [SPARK-20877][SPARKR] refactor tests to basic tests only for CRAN
## What changes were proposed in this pull request?

Move all existing tests to non-installed directory so that it will never run by installing SparkR package

For a follow-up PR:
- remove all skip_on_cran() calls in tests
- clean up test timer
- improve or change basic tests that do run on CRAN (if anyone has suggestion)

It looks like `R CMD build pkg` will still put pkg\tests (ie. the full tests) into the source package but `R CMD INSTALL` on such source package does not install these tests (and so `R CMD check` does not run them)

## How was this patch tested?

- [x] unit tests, Jenkins
- [x] AppVeyor
- [x] make a source package, install it, `R CMD check` it - verify the full tests are not installed or run

Author: Felix Cheung <felixcheung_m@hotmail.com>

Closes #18264 from felixcheung/rtestset.
2017-06-11 00:00:33 -07:00
Reynold Xin b78e3849b2 [SPARK-21042][SQL] Document Dataset.union is resolution by position
## What changes were proposed in this pull request?
Document Dataset.union is resolution by position, not by name, since this has been a confusing point for a lot of users.

## How was this patch tested?
N/A - doc only change.

Author: Reynold Xin <rxin@databricks.com>

Closes #18256 from rxin/SPARK-21042.
2017-06-09 18:29:33 -07:00
Felix Cheung 382fefd187 [SPARK-20877][SPARKR][WIP] add timestamps to test runs
## What changes were proposed in this pull request?

to investigate how long they run

## How was this patch tested?

Jenkins, AppVeyor

Author: Felix Cheung <felixcheung_m@hotmail.com>

Closes #18104 from felixcheung/rtimetest.
2017-05-30 22:33:29 -07:00
Zheng RuiFeng a97c497045 [SPARK-20849][DOC][SPARKR] Document R DecisionTree
## What changes were proposed in this pull request?
1, add an example for sparkr `decisionTree`
2, document it in user guide

## How was this patch tested?
local submit

Author: Zheng RuiFeng <ruifengz@foxmail.com>

Closes #18067 from zhengruifeng/dt_example.
2017-05-25 23:00:50 -07:00
Yanbo Liang ad09e4ca04 [MINOR][SPARKR][ML] Joint coefficients with intercept for SparkR linear SVM summary.
## What changes were proposed in this pull request?
Joint coefficients with intercept for SparkR linear SVM summary.

## How was this patch tested?
Existing tests.

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #18035 from yanboliang/svm-r.
2017-05-23 16:16:14 +08:00
Shivaram Venkataraman d06610f992 [SPARK-20727] Skip tests that use Hadoop utils on CRAN Windows
## What changes were proposed in this pull request?

This change skips tests that use the Hadoop libraries while running
on CRAN check with Windows as the operating system. This is to handle
cases where the Hadoop winutils binaries are missing on the target
system. The skipped tests consist of
1. Tests that save, load a model in MLlib
2. Tests that save, load CSV, JSON and Parquet files in SQL
3. Hive tests

## How was this patch tested?

Tested by running on a local windows VM with HADOOP_HOME unset. Also testing with https://win-builder.r-project.org

Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>

Closes #17966 from shivaram/sparkr-windows-cran.
2017-05-22 23:04:22 -07:00
Zheng RuiFeng 4be3375835 [SPARK-15767][ML][SPARKR] Decision Tree wrapper in SparkR
## What changes were proposed in this pull request?
support decision tree in R

## How was this patch tested?
added tests

Author: Zheng RuiFeng <ruifengz@foxmail.com>

Closes #17981 from zhengruifeng/dt_r.
2017-05-22 10:40:49 -07:00
Wayne Zhang 7f203a248f [SPARKR] Fix bad examples in DataFrame methods and style issues
## What changes were proposed in this pull request?
Some examples in the DataFrame methods are syntactically wrong, even though they are pseudo code. Fix these and some style issues.

Author: Wayne Zhang <actuaryzhang@uber.com>

Closes #18003 from actuaryzhang/sparkRDoc3.
2017-05-19 11:18:20 -07:00
zero323 2d90c04f23 [SPARKR][DOCS][MINOR] Use consistent names in rollup and cube examples
## What changes were proposed in this pull request?

Rename `carsDF` to `df` in SparkR `rollup` and `cube` examples.

## How was this patch tested?

Manual tests.

Author: zero323 <zero323@users.noreply.github.com>

Closes #17988 from zero323/cube-docs.
2017-05-19 11:04:38 -07:00
zero323 5a799fd8c3 [SPARK-20726][SPARKR] wrapper for SQL broadcast
## What changes were proposed in this pull request?

- Adds R wrapper for `o.a.s.sql.functions.broadcast`.
- Renames `broadcast` to `broadcast_`.

## How was this patch tested?

Unit tests, check `check-cran.sh`.

Author: zero323 <zero323@users.noreply.github.com>

Closes #17965 from zero323/SPARK-20726.
2017-05-14 13:22:19 -07:00
zero323 aa3df15904 [DOCS][SPARKR] Use verbose names for family annotations in functions.R
## What changes were proposed in this pull request?

- Change current short annotations (same as Scala `group`) to verbose names (same as Scala `groupname`).

    Before:

    ![image](https://cloud.githubusercontent.com/assets/1554276/26033909/9a98b596-38b4-11e7-961e-15fd9ea7440d.png)

    After:
    ![image](https://cloud.githubusercontent.com/assets/1554276/26033903/727a9944-38b4-11e7-8873-b09c553f4ec3.png)

- Add missing `family` annotations.

## How was this patch tested?

`check-cran.R` (skipping tests), manual inspection.

Author: zero323 <zero323@users.noreply.github.com>

Closes #17976 from zero323/SPARKR-FUNCTIONS-DOCSTRINGS.
2017-05-14 11:43:28 -07:00
hyukjinkwon 720708ccdd [SPARK-20639][SQL] Add single argument support for to_timestamp in SQL with documentation improvement
## What changes were proposed in this pull request?

This PR proposes three things as below:

- Use casting rules to a timestamp in `to_timestamp` by default (it was `yyyy-MM-dd HH:mm:ss`).

- Support single argument for `to_timestamp` similarly with APIs in other languages.

  For example, the one below works

  ```
  import org.apache.spark.sql.functions._
  Seq("2016-12-31 00:12:00.00").toDF("a").select(to_timestamp(col("a"))).show()
  ```

  prints

  ```
  +----------------------------------------+
  |to_timestamp(`a`, 'yyyy-MM-dd HH:mm:ss')|
  +----------------------------------------+
  |                     2016-12-31 00:12:00|
  +----------------------------------------+
  ```

  whereas this does not work in SQL.

  **Before**

  ```
  spark-sql> SELECT to_timestamp('2016-12-31 00:12:00');
  Error in query: Invalid number of arguments for function to_timestamp; line 1 pos 7
  ```

  **After**

  ```
  spark-sql> SELECT to_timestamp('2016-12-31 00:12:00');
  2016-12-31 00:12:00
  ```

- Related document improvement for SQL function descriptions and other API descriptions accordingly.

  **Before**

  ```
  spark-sql> DESCRIBE FUNCTION extended to_date;
  ...
  Usage: to_date(date_str, fmt) - Parses the `left` expression with the `fmt` expression. Returns null with invalid input.
  Extended Usage:
      Examples:
        > SELECT to_date('2016-12-31', 'yyyy-MM-dd');
         2016-12-31
  ```

  ```
  spark-sql> DESCRIBE FUNCTION extended to_timestamp;
  ...
  Usage: to_timestamp(timestamp, fmt) - Parses the `left` expression with the `format` expression to a timestamp. Returns null with invalid input.
  Extended Usage:
      Examples:
        > SELECT to_timestamp('2016-12-31', 'yyyy-MM-dd');
         2016-12-31 00:00:00.0
  ```

  **After**

  ```
  spark-sql> DESCRIBE FUNCTION extended to_date;
  ...
  Usage:
      to_date(date_str[, fmt]) - Parses the `date_str` expression with the `fmt` expression to
        a date. Returns null with invalid input. By default, it follows casting rules to a date if
        the `fmt` is omitted.

  Extended Usage:
      Examples:
        > SELECT to_date('2009-07-30 04:17:52');
         2009-07-30
        > SELECT to_date('2016-12-31', 'yyyy-MM-dd');
         2016-12-31
  ```

  ```
  spark-sql> DESCRIBE FUNCTION extended to_timestamp;
  ...
   Usage:
      to_timestamp(timestamp[, fmt]) - Parses the `timestamp` expression with the `fmt` expression to
        a timestamp. Returns null with invalid input. By default, it follows casting rules to
        a timestamp if the `fmt` is omitted.

  Extended Usage:
      Examples:
        > SELECT to_timestamp('2016-12-31 00:12:00');
         2016-12-31 00:12:00
        > SELECT to_timestamp('2016-12-31', 'yyyy-MM-dd');
         2016-12-31 00:00:00
  ```

## How was this patch tested?

Added tests in `datetime.sql`.

Author: hyukjinkwon <gurwls223@gmail.com>

Closes #17901 from HyukjinKwon/to_timestamp_arg.
2017-05-12 16:42:58 +08:00
Felix Cheung 888b84abe8 [SPARK-20704][SPARKR] change CRAN test to run single thread
## What changes were proposed in this pull request?

- [x] need to test by running R CMD check --as-cran
- [x] sanity check vignettes

## How was this patch tested?

Jenkins

Author: Felix Cheung <felixcheung_m@hotmail.com>

Closes #17945 from felixcheung/rchangesforpackage.
2017-05-11 23:10:04 -07:00
Felix Cheung b952b44af4 [SPARK-20661][SPARKR][TEST][FOLLOWUP] SparkR tableNames() test fails
## What changes were proposed in this pull request?

Change it to check for relative count like in this test https://github.com/apache/spark/blame/master/R/pkg/inst/tests/testthat/test_sparkSQL.R#L3355 for catalog APIs

## How was this patch tested?

unit tests, this needs to combine with another commit with SQL change to check

Author: Felix Cheung <felixcheung_m@hotmail.com>

Closes #17905 from felixcheung/rtabletests.
2017-05-08 22:49:40 -07:00
Hossein 2abfee18b6 [SPARK-20661][SPARKR][TEST] SparkR tableNames() test fails
## What changes were proposed in this pull request?
Cleaning existing temp tables before running tableNames tests

## How was this patch tested?
SparkR Unit tests

Author: Hossein <hossein@databricks.com>

Closes #17903 from falaki/SPARK-20661.
2017-05-08 14:48:11 -07:00
Wayne Zhang 2fdaeb52bb [SPARKR][DOC] fix typo in vignettes
## What changes were proposed in this pull request?
Fix typo in vignettes

Author: Wayne Zhang <actuaryzhang@uber.com>

Closes #17884 from actuaryzhang/typo.
2017-05-07 23:16:30 -07:00
Felix Cheung c24bdaab5a [SPARK-20626][SPARKR] address date test warning with timezone on windows
## What changes were proposed in this pull request?

set timezone on windows

## How was this patch tested?

unit test, AppVeyor

Author: Felix Cheung <felixcheung_m@hotmail.com>

Closes #17892 from felixcheung/rtimestamptest.
2017-05-07 23:10:18 -07:00
zero323 1f73d3589a [SPARK-20550][SPARKR] R wrapper for Dataset.alias
## What changes were proposed in this pull request?

- Add SparkR wrapper for `Dataset.alias`.
- Adjust roxygen annotations for `functions.alias` (including example usage).

## How was this patch tested?

Unit tests, `check_cran.sh`.

Author: zero323 <zero323@users.noreply.github.com>

Closes #17825 from zero323/SPARK-20550.
2017-05-07 16:24:42 -07:00
Felix Cheung 7087e01194 [SPARK-20543][SPARKR][FOLLOWUP] Don't skip tests on AppVeyor
## What changes were proposed in this pull request?

add environment

## How was this patch tested?

wait for appveyor run

Author: Felix Cheung <felixcheung_m@hotmail.com>

Closes #17878 from felixcheung/appveyorrcran.
2017-05-07 13:10:10 -07:00
Felix Cheung 57b64703e6 [SPARK-20571][SPARKR][SS] Flaky Structured Streaming tests
## What changes were proposed in this pull request?

Make tests more reliable by having it till processed.
Increasing timeout value might help but ultimately the flakiness from processing delay when Jenkins is hard to account for. This isn't an actual public API supported

## How was this patch tested?
unit tests

Author: Felix Cheung <felixcheung_m@hotmail.com>

Closes #17857 from felixcheung/rsstestrelia.
2017-05-04 01:54:59 -07:00
zero323 f21897fc15 [SPARK-20544][SPARKR] R wrapper for input_file_name
## What changes were proposed in this pull request?

Adds wrapper for `o.a.s.sql.functions.input_file_name`

## How was this patch tested?

Existing unit tests, additional unit tests, `check-cran.sh`.

Author: zero323 <zero323@users.noreply.github.com>

Closes #17818 from zero323/SPARK-20544.
2017-05-04 01:51:37 -07:00
zero323 9c36aa2791 [SPARK-20585][SPARKR] R generic hint support
## What changes were proposed in this pull request?

Adds support for generic hints on `SparkDataFrame`

## How was this patch tested?

Unit tests, `check-cran.sh`

Author: zero323 <zero323@users.noreply.github.com>

Closes #17851 from zero323/SPARK-20585.
2017-05-04 01:41:36 -07:00
Felix Cheung b8302ccd02 [SPARK-20015][SPARKR][SS][DOC][EXAMPLE] Document R Structured Streaming (experimental) in R vignettes and R & SS programming guide, R example
## What changes were proposed in this pull request?

Add
- R vignettes
- R programming guide
- SS programming guide
- R example

Also disable spark.als in vignettes for now since it's failing (SPARK-20402)

## How was this patch tested?

manually

Author: Felix Cheung <felixcheung_m@hotmail.com>

Closes #17814 from felixcheung/rdocss.
2017-05-04 00:27:10 -07:00
Felix Cheung fc472bddd1 [SPARK-20543][SPARKR] skip tests when running on CRAN
## What changes were proposed in this pull request?

General rule on skip or not:
skip if
- RDD tests
- tests could run long or complicated (streaming, hivecontext)
- tests on error conditions
- tests won't likely change/break

## How was this patch tested?

unit tests, `R CMD check --as-cran`, `R CMD check`

Author: Felix Cheung <felixcheung_m@hotmail.com>

Closes #17817 from felixcheung/rskiptest.
2017-05-03 21:40:18 -07:00
Felix Cheung 13f47dc503 [SPARK-20490][SPARKR][DOC] add family tag for not function
## What changes were proposed in this pull request?

doc only

## How was this patch tested?

manual

Author: Felix Cheung <felixcheung_m@hotmail.com>

Closes #17828 from felixcheung/rnotfamily.
2017-05-02 09:37:01 -07:00
zero323 90d77e971f [SPARK-20532][SPARKR] Implement grouping and grouping_id
## What changes were proposed in this pull request?

Adds R wrappers for:

- `o.a.s.sql.functions.grouping` as `o.a.s.sql.functions.is_grouping` (to avoid shading `base::grouping`
- `o.a.s.sql.functions.grouping_id`

## How was this patch tested?

Existing unit tests, additional unit tests. `check-cran.sh`.

Author: zero323 <zero323@users.noreply.github.com>

Closes #17807 from zero323/SPARK-20532.
2017-05-01 21:39:17 -07:00
Felix Cheung a355b667a3 [SPARK-20541][SPARKR][SS] support awaitTermination without timeout
## What changes were proposed in this pull request?

Add without param for timeout - will need this to submit a job that runs until stopped
Need this for 2.2

## How was this patch tested?

manually, unit test

Author: Felix Cheung <felixcheung_m@hotmail.com>

Closes #17815 from felixcheung/rssawaitinfinite.
2017-04-30 23:23:49 -07:00
zero323 80e9cf1b59 [SPARK-20490][SPARKR] Add R wrappers for eqNullSafe and ! / not
## What changes were proposed in this pull request?

- Add null-safe equality operator `%<=>%` (sames as `o.a.s.sql.Column.eqNullSafe`, `o.a.s.sql.Column.<=>`)
- Add boolean negation operator `!` and function `not `.

## How was this patch tested?

Existing unit tests, additional unit tests, `check-cran.sh`.

Author: zero323 <zero323@users.noreply.github.com>

Closes #17783 from zero323/SPARK-20490.
2017-04-30 22:07:12 -07:00
zero323 ae3df4e98f [SPARK-20535][SPARKR] R wrappers for explode_outer and posexplode_outer
## What changes were proposed in this pull request?

Ad R wrappers for

- `o.a.s.sql.functions.explode_outer`
- `o.a.s.sql.functions.posexplode_outer`

## How was this patch tested?

Additional unit tests, manual testing.

Author: zero323 <zero323@users.noreply.github.com>

Closes #17809 from zero323/SPARK-20535.
2017-04-30 12:33:03 -07:00
hyukjinkwon 70f1bcd7bc [SPARK-20493][R] De-duplicate parse logics for DDL-like type strings in R
## What changes were proposed in this pull request?

It seems we are using `SQLUtils.getSQLDataType` for type string in structField. It looks we can replace this with `CatalystSqlParser.parseDataType`.

They look similar DDL-like type definitions as below:

```scala
scala> Seq(Tuple1(Tuple1("a"))).toDF.show()
```
```
+---+
| _1|
+---+
|[a]|
+---+
```

```scala
scala> Seq(Tuple1(Tuple1("a"))).toDF.select($"_1".cast("struct<_1:string>")).show()
```
```
+---+
| _1|
+---+
|[a]|
+---+
```

Such type strings looks identical when R’s one as below:

```R
> write.df(sql("SELECT named_struct('_1', 'a') as struct"), "/tmp/aa", "parquet")
> collect(read.df("/tmp/aa", "parquet", structType(structField("struct", "struct<_1:string>"))))
  struct
1      a
```

R’s one is stricter because we are checking the types via regular expressions in R side ahead.

Actual logics there look a bit different but as we check it ahead in R side, it looks replacing it would not introduce (I think) no behaviour changes. To make this sure, the tests dedicated for it were added in SPARK-20105. (It looks `structField` is the only place that calls this method).

## How was this patch tested?

Existing tests - https://github.com/apache/spark/blob/master/R/pkg/inst/tests/testthat/test_sparkSQL.R#L143-L194 should cover this.

Author: hyukjinkwon <gurwls223@gmail.com>

Closes #17785 from HyukjinKwon/SPARK-20493.
2017-04-29 11:02:17 -07:00
zero323 b58cf77c4d [DOCS][MINOR] Add missing since to SparkR repeat_string note.
## What changes were proposed in this pull request?

Replace

    note repeat_string 2.3.0

with

    note repeat_string since 2.3.0

## How was this patch tested?

`create-docs.sh`

Author: zero323 <zero323@users.noreply.github.com>

Closes #17779 from zero323/REPEAT-NOTE.
2017-04-27 00:29:43 -07:00
Takeshi Yamamuro b4724db19a [SPARK-20425][SQL] Support a vertical display mode for Dataset.show
## What changes were proposed in this pull request?
This pr added a new display mode for `Dataset.show` to print output rows vertically (one line per column value). In the current master, when printing Dataset with many columns, the readability is low like;

```
scala> val df = spark.range(100).selectExpr((0 until 100).map(i => s"rand() AS c$i"): _*)
scala> df.show(3, 0)
+------------------+------------------+------------------+-------------------+------------------+------------------+-------------------+------------------+------------------+------------------+------------------+-------------------+------------------+------------------+------------------+-------------------+-------------------+-------------------+------------------+------------------+-------------------+------------------+-------------------+------------------+-------------------+-------------------+-------------------+--------------------+-------------------+------------------+-------------------+--------------------+------------------+------------------+-------------------+-------------------+-------------------+------------------+------------------+-------------------+------------------+------------------+-------------------+-------------------+-------------------+------------------+--------------------+--------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+--------------------+-------------------+-------------------+-------------------+-------------------+------------------+------------------+-------------------+-------------------+------------------+-------------------+------------------+------------------+-----------------+-------------------+-------------------+------------------+-------------------+------------------+-------------------+-------------------+-------------------+------------------+-------------------+------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+------------------+-------------------+-------------------+------------------+------------------+------------------+-------------------+------------------+-------------------+------------------+-------------------+-------------------+-------------------+
|c0                |c1                |c2                |c3                 |c4                |c5                |c6                 |c7                |c8                |c9                |c10               |c11                |c12               |c13               |c14               |c15                |c16                |c17                |c18               |c19               |c20                |c21               |c22                |c23               |c24                |c25                |c26                |c27                 |c28                |c29               |c30                |c31                 |c32               |c33               |c34                |c35                |c36                |c37               |c38               |c39                |c40               |c41               |c42                |c43                |c44                |c45               |c46                 |c47                 |c48                |c49                |c50                |c51                |c52                |c53                |c54                 |c55                |c56                |c57                |c58                |c59               |c60               |c61                |c62                |c63               |c64                |c65               |c66               |c67              |c68                |c69                |c70               |c71                |c72               |c73                |c74                |c75                |c76               |c77                |c78               |c79                |c80                |c81                |c82                |c83                |c84                |c85                |c86                |c87               |c88                |c89                |c90               |c91               |c92               |c93                |c94               |c95                |c96               |c97                |c98                |c99                |
+------------------+------------------+------------------+-------------------+------------------+------------------+-------------------+------------------+------------------+------------------+------------------+-------------------+------------------+------------------+------------------+-------------------+-------------------+-------------------+------------------+------------------+-------------------+------------------+-------------------+------------------+-------------------+-------------------+-------------------+--------------------+-------------------+------------------+-------------------+--------------------+------------------+------------------+-------------------+-------------------+-------------------+------------------+------------------+-------------------+------------------+------------------+-------------------+-------------------+-------------------+------------------+--------------------+--------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+--------------------+-------------------+-------------------+-------------------+-------------------+------------------+------------------+-------------------+-------------------+------------------+-------------------+------------------+------------------+-----------------+-------------------+-------------------+------------------+-------------------+------------------+-------------------+-------------------+-------------------+------------------+-------------------+------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+------------------+-------------------+-------------------+------------------+------------------+------------------+-------------------+------------------+-------------------+------------------+-------------------+-------------------+-------------------+
|0.6306087152476858|0.9174349686288383|0.5511324165035159|0.3320844128641819 |0.7738486877101489|0.2154915886962553|0.4754997600674299 |0.922780639280355 |0.7136894772661909|0.2277580838165979|0.5926874459847249|0.40311408392226633|0.467830264333843 |0.8330466896984213|0.1893258482389527|0.6320849515511165 |0.7530911056912044 |0.06700254871955424|0.370528597355559 |0.2755437445193154|0.23704391110980128|0.8067400174905822|0.13597793616251852|0.1708888820162453|0.01672725007605702|0.983118121881555  |0.25040195628629924|0.060537253723083384|0.20000530582637488|0.3400572407133511|0.9375689433322597 |0.057039316954370256|0.8053269714347623|0.5247817572228813|0.28419308820527944|0.9798908885194533 |0.31805988175678146|0.7034448027077574|0.5400575751346084|0.25336322371116216|0.9361634546853429|0.6118681368289798|0.6295081549153907 |0.13417468943957422|0.41617137072255794|0.7267230869252035|0.023792726137561115|0.5776157058356362  |0.04884204913195467|0.26728716103441275|0.646680370807925  |0.9782712690657244 |0.16434031314818154|0.20985522381321275|0.24739842475440077 |0.26335189682977334|0.19604841662422068|0.10742950487300651|0.20283136488091502|0.3100312319723688|0.886959006630645 |0.25157102269776244|0.34428775168410786|0.3500506818575777|0.3781142441912052 |0.8560316444386715|0.4737104888956839|0.735903101602148|0.02236617130529006|0.8769074095835873 |0.2001426662503153|0.5534032319238532 |0.7289496620397098|0.41955191309992157|0.9337700133660436 |0.34059094378451005|0.6419144759403556|0.08167496930341167|0.9947099478497635|0.48010888605366586|0.22314796858167918|0.17786598882331306|0.7351521162297135 |0.5422057170020095 |0.9521927872726792 |0.7459825486368227 |0.40907708791990627|0.8903819313311575|0.7251413746923618 |0.2977174938745204 |0.9515209660203555|0.9375968604766713|0.5087851740042524|0.4255237544908751 |0.8023768698664653|0.48003189618006703|0.1775841829745185|0.09050775629268382|0.6743909291138167 |0.2498415755876865 |
|0.6866473844170801|0.4774360641212433|0.631696201340726 |0.33979113021468343|0.5663049010847052|0.7280190472258865|0.41370958502324806|0.9977433873622218|0.7671957338989901|0.2788708556233931|0.3355106391656496|0.88478952319287   |0.0333974166999893|0.6061744715862606|0.9617779139652359|0.22484954822341863|0.12770906021550898|0.5577789629508672 |0.2877649024640704|0.5566577406549361|0.9334933255278052 |0.9166720585157266|0.9689249324600591 |0.6367502457478598|0.7993572745928459 |0.23213222324218108|0.11928284054154137|0.6173493362456599  |0.0505122058694798 |0.9050228629552983|0.17112767911121707|0.47395598348370005 |0.5820498657823081|0.6241124650645072|0.18587258258036776|0.14987593554122225|0.3079446253653946 |0.9414228822867968|0.8362276265462365|0.9155655305576353 |0.5121559807153562|0.8963362656525707|0.22765970274318037|0.8177039187132797 |0.8190326635933787 |0.5256005177032199|0.8167598457269669  |0.030936807130934496|0.6733006585281015 |0.4208049626816347 |0.24603085738518538|0.22719198954208153|0.1622280557565281 |0.22217325159218038|0.014684419513742553|0.08987111517447499|0.2157764759142622 |0.8223414104088321 |0.4868624404491777 |0.4016191733088167|0.6169281906889263|0.15603611040433385|0.18289285085714913|0.9538408988218972|0.15037154865295121|0.5364516961987454|0.8077254873163031|0.712600478545675|0.7277477241003857 |0.19822912960348305|0.8305051199208777|0.18631911396566114|0.8909532487898342|0.3470409226992506 |0.35306974180587636|0.9107058868891469 |0.3321327206004986|0.48952332459050607|0.3630403307479373|0.5400046826340376 |0.5387377194310529 |0.42860539421837585|0.23214101630985995|0.21438968839794847|0.15370603160082352|0.04355605642700022|0.6096006707067466 |0.6933354157094292|0.06302172470859002|0.03174631856164001|0.664243581650643 |0.7833239547446621|0.696884598352864 |0.34626385933237736|0.9263495598791336|0.404818892816584  |0.2085585394755507|0.6150004897990109 |0.05391193524302473|0.28188484028329097|
+------------------+------------------+------------------+-------------------+------------------+------------------+-------------------+------------------+------------------+------------------+------------------+-------------------+------------------+------------------+------------------+-------------------+-------------------+-------------------+------------------+------------------+-------------------+------------------+-------------------+------------------+-------------------+-------------------+-------------------+--------------------+-------------------+------------------+-------------------+--------------------+------------------+------------------+-------------------+-------------------+-------------------+------------------+------------------+-------------------+------------------+------------------+-------------------+-------------------+-------------------+------------------+--------------------+--------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+--------------------+-------------------+-------------------+-------------------+-------------------+------------------+------------------+-------------------+-------------------+------------------+-------------------+------------------+------------------+-----------------+-------------------+-------------------+------------------+-------------------+------------------+-------------------+-------------------+-------------------+------------------+-------------------+------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+------------------+-------------------+-------------------+------------------+------------------+------------------+-------------------+------------------+-------------------+------------------+-------------------+-------------------+-------------------+
only showing top 2 rows
```

`psql`, CLI for PostgreSQL, supports a vertical display mode for this case like:
http://stackoverflow.com/questions/9604723/alternate-output-format-for-psql

```
-RECORD 0-------------------
 c0  | 0.6306087152476858
 c1  | 0.9174349686288383
 c2  | 0.5511324165035159
...
 c98 | 0.05391193524302473
 c99 | 0.28188484028329097
-RECORD 1-------------------
 c0  | 0.6866473844170801
 c1  | 0.4774360641212433
 c2  | 0.631696201340726
...
 c98 | 0.05391193524302473
 c99 | 0.28188484028329097
only showing top 2 rows
```

## How was this patch tested?
Added tests in `DataFrameSuite`.

Author: Takeshi Yamamuro <yamamuro@apache.org>

Closes #17733 from maropu/SPARK-20425.
2017-04-26 22:18:01 -07:00
Yanbo Liang dbb06c689c [MINOR][ML] Fix some PySpark & SparkR flaky tests
## What changes were proposed in this pull request?
Some PySpark & SparkR tests run with tiny dataset and tiny ```maxIter```, which means they are not converged. I don’t think checking intermediate result during iteration make sense, and these intermediate result may vulnerable and not stable, so we should switch to check the converged result. We hit this issue at #17746 when we upgrade breeze to 0.13.1.

## How was this patch tested?
Existing tests.

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #17757 from yanboliang/flaky-test.
2017-04-26 21:34:18 +08:00
zero323 df58a95a33 [SPARK-20437][R] R wrappers for rollup and cube
## What changes were proposed in this pull request?

- Add `rollup` and `cube` methods and corresponding generics.
- Add short description to the vignette.

## How was this patch tested?

- Existing unit tests.
- Additional unit tests covering new features.
- `check-cran.sh`.

Author: zero323 <zero323@users.noreply.github.com>

Closes #17728 from zero323/SPARK-20437.
2017-04-25 22:00:45 -07:00
Yanbo Liang 67eef47acf
[SPARK-20449][ML] Upgrade breeze version to 0.13.1
## What changes were proposed in this pull request?
Upgrade breeze version to 0.13.1, which fixed some critical bugs of L-BFGS-B.

## How was this patch tested?
Existing unit tests.

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #17746 from yanboliang/spark-20449.
2017-04-25 17:10:41 +00:00
zero323 8a272ddc9d [SPARK-20438][R] SparkR wrappers for split and repeat
## What changes were proposed in this pull request?

Add wrappers for `o.a.s.sql.functions`:

- `split` as `split_string`
- `repeat` as `repeat_string`

## How was this patch tested?

Existing tests, additional unit tests, `check-cran.sh`

Author: zero323 <zero323@users.noreply.github.com>

Closes #17729 from zero323/SPARK-20438.
2017-04-24 10:56:57 -07:00
zero323 fd648bff63 [SPARK-20371][R] Add wrappers for collect_list and collect_set
## What changes were proposed in this pull request?

Adds wrappers for `collect_list` and `collect_set`.

## How was this patch tested?

Unit tests, `check-cran.sh`

Author: zero323 <zero323@users.noreply.github.com>

Closes #17672 from zero323/SPARK-20371.
2017-04-21 12:06:21 -07:00
zero323 46c5749768 [SPARK-20375][R] R wrappers for array and map
## What changes were proposed in this pull request?

Adds wrappers for `o.a.s.sql.functions.array` and `o.a.s.sql.functions.map`

## How was this patch tested?

Unit tests, `check-cran.sh`

Author: zero323 <zero323@users.noreply.github.com>

Closes #17674 from zero323/SPARK-20375.
2017-04-19 21:19:46 -07:00
Shixiong Zhu 4fea7848c4 [SPARK-20397][SPARKR][SS] Fix flaky test: test_streaming.R.Terminated by error
## What changes were proposed in this pull request?

Checking a source parameter is asynchronous. When the query is created, it's not guaranteed that source has been created. This PR just increases the timeout of awaitTermination to ensure the parsing error is thrown.

## How was this patch tested?

Jenkins

Author: Shixiong Zhu <shixiong@databricks.com>

Closes #17687 from zsxwing/SPARK-20397.
2017-04-19 13:10:44 -07:00
zero323 702d85af2d [SPARK-20208][R][DOCS] Document R fpGrowth support
## What changes were proposed in this pull request?

Document  fpGrowth in:

- vignettes
- programming guide
- code example

## How was this patch tested?

Manual tests.

Author: zero323 <zero323@users.noreply.github.com>

Closes #17557 from zero323/SPARK-20208.
2017-04-18 19:59:18 -07:00
hyukjinkwon 24f09b39c7 [SPARK-19828][R][FOLLOWUP] Rename asJsonArray to as.json.array in from_json function in R
## What changes were proposed in this pull request?

This was suggested to be `as.json.array` at the first place in the PR to SPARK-19828 but we could not do this as the lint check emits an error for multiple dots in the variable names.

After SPARK-20278, now we are able to use `multiple.dots.in.names`. `asJsonArray` in `from_json` function is still able to be changed as 2.2 is not released yet.

So, this PR proposes to rename `asJsonArray` to `as.json.array`.

## How was this patch tested?

Jenkins tests, local tests with `./R/run-tests.sh` and manual `./dev/lint-r`. Existing tests should cover this.

Author: hyukjinkwon <gurwls223@gmail.com>

Closes #17653 from HyukjinKwon/SPARK-19828-followup.
2017-04-17 09:04:24 -07:00
hyukjinkwon 86d251c585 [SPARK-20278][R] Disable 'multiple_dots_linter' lint rule that is against project's code style
## What changes were proposed in this pull request?

Currently, multi-dot separated variables in R is not allowed. For example,

```diff
 setMethod("from_json", signature(x = "Column", schema = "structType"),
-          function(x, schema, asJsonArray = FALSE, ...) {
+          function(x, schema, as.json.array = FALSE, ...) {
             if (asJsonArray) {
               jschema <- callJStatic("org.apache.spark.sql.types.DataTypes",
                                      "createArrayType",
```

produces an error as below:

```
R/functions.R:2462:31: style: Words within variable and function names should be separated by '_' rather than '.'.
          function(x, schema, as.json.array = FALSE, ...) {
                              ^~~~~~~~~~~~~
```

This seems against https://google.github.io/styleguide/Rguide.xml#identifiers which says

> The preferred form for variable names is all lower case letters and words separated with dots

This looks because lintr by default https://github.com/jimhester/lintr follows http://r-pkgs.had.co.nz/style.html as written in the README.md. Few cases seems not following Google's one as "a few tweaks".

Per [SPARK-6813](https://issues.apache.org/jira/browse/SPARK-6813), we follow Google's R Style Guide with few exceptions https://google.github.io/styleguide/Rguide.xml. This is also merged into Spark's website - https://github.com/apache/spark-website/pull/43

Also, it looks we have no limit on function name. This rule also looks affecting to the name of functions as written in the README.md.

> `multiple_dots_linter`: check that function and variable names are separated by _ rather than ..

## How was this patch tested?

Manually tested `./dev/lint-r`with the manual change below in `R/functions.R`:

```diff
 setMethod("from_json", signature(x = "Column", schema = "structType"),
-          function(x, schema, asJsonArray = FALSE, ...) {
+          function(x, schema, as.json.array = FALSE, ...) {
             if (asJsonArray) {
               jschema <- callJStatic("org.apache.spark.sql.types.DataTypes",
                                      "createArrayType",
```

**Before**

```R
R/functions.R:2462:31: style: Words within variable and function names should be separated by '_' rather than '.'.
          function(x, schema, as.json.array = FALSE, ...) {
                              ^~~~~~~~~~~~~
```

**After**

```
lintr checks passed.
```

Author: hyukjinkwon <gurwls223@gmail.com>

Closes #17590 from HyukjinKwon/disable-dot-in-name.
2017-04-16 11:27:27 -07:00
Brendan Dwyer 044f7ecbfd [SPARK-20298][SPARKR][MINOR] fixed spelling mistake "charactor"
## What changes were proposed in this pull request?

Fixed spelling of "charactor"

## How was this patch tested?

Spelling change only

Author: Brendan Dwyer <brendan.dwyer@ibm.com>

Closes #17611 from bdwyer2/SPARK-20298.
2017-04-12 09:24:41 +01:00
Felix Cheung 8feb799af0 [SPARK-20197][SPARKR] CRAN check fail with package installation
## What changes were proposed in this pull request?

Test failed because SPARK_HOME is not set before Spark is installed.

Author: Felix Cheung <felixcheung_m@hotmail.com>

Closes #17516 from felixcheung/rdircheckincran.
2017-04-07 11:17:49 -07:00
Felix Cheung 5a693b4138 [SPARK-20195][SPARKR][SQL] add createTable catalog API and deprecate createExternalTable
## What changes were proposed in this pull request?

Following up on #17483, add createTable (which is new in 2.2.0) and deprecate createExternalTable, plus a number of minor fixes

## How was this patch tested?

manual, unit tests

Author: Felix Cheung <felixcheung_m@hotmail.com>

Closes #17511 from felixcheung/rceatetable.
2017-04-06 09:15:13 -07:00
Felix Cheung bccc330193 [SPARK-20196][PYTHON][SQL] update doc for catalog functions for all languages, add pyspark refreshByPath API
## What changes were proposed in this pull request?

Update doc to remove external for createTable, add refreshByPath in python

## How was this patch tested?

manual

Author: Felix Cheung <felixcheung_m@hotmail.com>

Closes #17512 from felixcheung/catalogdoc.
2017-04-06 09:09:43 -07:00
Felix Cheung c1b8b66750 [SPARKR][DOC] update doc for fpgrowth
## What changes were proposed in this pull request?

minor update

zero323

Author: Felix Cheung <felixcheung_m@hotmail.com>

Closes #17526 from felixcheung/rfpgrowthfollowup.
2017-04-04 22:32:46 -07:00
hyukjinkwon 0e2ee82044 [MINOR][R] Reorder Collate fields in DESCRIPTION file
## What changes were proposed in this pull request?

It seems cran check scripts corrects `R/pkg/DESCRIPTION` and follows the order in `Collate` fields.

This PR proposes to fix `catalog.R`'s order so that running this script does not show up a small diff in this file every time.

## How was this patch tested?

Manually via `./R/check-cran.sh`.

Author: hyukjinkwon <gurwls223@gmail.com>

Closes #17528 from HyukjinKwon/minor-reorder-description.
2017-04-04 11:42:14 -07:00