## What changes were proposed in this pull request?
R 3.6.0 is released 2019-04-26. This PR targets to change R version from 3.5.1 to 3.6.0 in AppVeyor.
This PR sets `R_REMOTES_NO_ERRORS_FROM_WARNINGS` to `true` to avoid the warnings below:
```
Error in strptime(xx, f, tz = tz) :
(converted from warning) unable to identify current timezone 'C':
please set environment variable 'TZ'
Error in i.p(...) :
(converted from warning) installation of package 'praise' had non-zero exit status
Calls: <Anonymous> ... with_rprofile_user -> with_envvar -> force -> force -> i.p
Execution halted
```
## How was this patch tested?
AppVeyor
Closes#24716 from HyukjinKwon/SPARK-27848.
Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
## What changes were proposed in this pull request?
Use https URL for CRAN repo (and for a Scala download in a Dockerfile)
## How was this patch tested?
Existing tests.
Closes#24664 from srowen/SPARK-27794.
Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
## What changes were proposed in this pull request?
This PR propose to remove `-Phive-thriftserver` profile which seems not affecting the SparkR tests in AppVeyor.
Originally wanted to check if there's a meaningful build time decrease but seems not. It will have but seems not meaningfully decreased.
## How was this patch tested?
AppVeyor tests:
```
[00:40:49] Attaching package: 'SparkR'
[00:40:49]
[00:40:49] The following objects are masked from 'package:testthat':
[00:40:49]
[00:40:49] describe, not
[00:40:49]
[00:40:49] The following objects are masked from 'package:stats':
[00:40:49]
[00:40:49] cov, filter, lag, na.omit, predict, sd, var, window
[00:40:49]
[00:40:49] The following objects are masked from 'package:base':
[00:40:49]
[00:40:49] as.data.frame, colnames, colnames<-, drop, endsWith, intersect,
[00:40:49] rank, rbind, sample, startsWith, subset, summary, transform, union
[00:40:49]
[00:40:49] Spark package found in SPARK_HOME: C:\projects\spark\bin\..
[00:41:43] basic tests for CRAN: .............
[00:41:43]
[00:41:43] DONE ===========================================================================
[00:41:43] binary functions: Spark package found in SPARK_HOME: C:\projects\spark\bin\..
[00:42:05] ...........
[00:42:05] functions on binary files: Spark package found in SPARK_HOME: C:\projects\spark\bin\..
[00:42:10] ....
[00:42:10] broadcast variables: Spark package found in SPARK_HOME: C:\projects\spark\bin\..
[00:42:12] ..
[00:42:12] functions in client.R: .....
[00:42:30] test functions in sparkR.R: ..............................................
[00:42:30] include R packages: Spark package found in SPARK_HOME: C:\projects\spark\bin\..
[00:42:31]
[00:42:31] JVM API: Spark package found in SPARK_HOME: C:\projects\spark\bin\..
[00:42:31] ..
[00:42:31] MLlib classification algorithms, except for tree-based algorithms: Spark package found in SPARK_HOME: C:\projects\spark\bin\..
[00:48:48] ......................................................................
[00:48:48] MLlib clustering algorithms: Spark package found in SPARK_HOME: C:\projects\spark\bin\..
[00:50:12] .....................................................................
[00:50:12] MLlib frequent pattern mining: Spark package found in SPARK_HOME: C:\projects\spark\bin\..
[00:50:18] .....
[00:50:18] MLlib recommendation algorithms: Spark package found in SPARK_HOME: C:\projects\spark\bin\..
[00:50:27] ........
[00:50:27] MLlib regression algorithms, except for tree-based algorithms: Spark package found in SPARK_HOME: C:\projects\spark\bin\..
[00:56:00] ................................................................................................................................
[00:56:00] MLlib statistics algorithms: Spark package found in SPARK_HOME: C:\projects\spark\bin\..
[00:56:04] ........
[00:56:04] MLlib tree-based algorithms: Spark package found in SPARK_HOME: C:\projects\spark\bin\..
[00:58:20] ..............................................................................................
[00:58:20] parallelize() and collect(): Spark package found in SPARK_HOME: C:\projects\spark\bin\..
[00:58:20] .............................
[00:58:20] basic RDD functions: Spark package found in SPARK_HOME: C:\projects\spark\bin\..
[01:03:35] ............................................................................................................................................................................................................................................................................................................................................................................................................................................
[01:03:35] SerDe functionality: Spark package found in SPARK_HOME: C:\projects\spark\bin\..
[01:03:39] ...............................
[01:03:39] partitionBy, groupByKey, reduceByKey etc.: Spark package found in SPARK_HOME: C:\projects\spark\bin\..
[01:04:20] ....................
[01:04:20] functions in sparkR.R: ....
[01:04:20] SparkSQL functions: Spark package found in SPARK_HOME: C:\projects\spark\bin\..
[01:04:50] ........................................................................................................................................-chgrp: 'APPVYR-WIN\None' does not match expected pattern for group
[01:04:50] Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
[01:04:50] -chgrp: 'APPVYR-WIN\None' does not match expected pattern for group
[01:04:50] Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
[01:04:51] -chgrp: 'APPVYR-WIN\None' does not match expected pattern for group
[01:04:51] Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
[01:06:13] ............................................................................................................................................................................................................................................................................................................................................................-chgrp: 'APPVYR-WIN\None' does not match expected pattern for group
[01:06:13] Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
[01:06:14] .-chgrp: 'APPVYR-WIN\None' does not match expected pattern for group
[01:06:14] Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
[01:06:14] ....-chgrp: 'APPVYR-WIN\None' does not match expected pattern for group
[01:06:14] Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
[01:12:30] ...................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
[01:12:30] Structured Streaming: Spark package found in SPARK_HOME: C:\projects\spark\bin\..
[01:14:27] ..........................................
[01:14:27] tests RDD function take(): Spark package found in SPARK_HOME: C:\projects\spark\bin\..
[01:14:28] ................
[01:14:28] the textFile() function: Spark package found in SPARK_HOME: C:\projects\spark\bin\..
[01:14:44] .............
[01:14:44] functions in utils.R: Spark package found in SPARK_HOME: C:\projects\spark\bin\..
[01:14:46] ............................................
[01:14:46] Windows-specific tests: .
[01:14:46]
[01:14:46] DONE ===========================================================================
[01:15:29] Build success
```
Author: hyukjinkwon <gurwls223@apache.org>
Closes#21894 from HyukjinKwon/wip-build.
## What changes were proposed in this pull request?
`testthat` 2.0.0 is released and AppVeyor now started to use it instead of 1.0.2. And then, we started to have R tests failed in AppVeyor. See - https://ci.appveyor.com/project/ApacheSoftwareFoundation/spark/build/1967-master
```
Error in get(name, envir = asNamespace(pkg), inherits = FALSE) :
object 'run_tests' not found
Calls: ::: -> get
```
This seems because we rely on internal `testthat:::run_tests` here:
https://github.com/r-lib/testthat/blob/v1.0.2/R/test-package.R#L62-L75dc4c351837/R/pkg/tests/run-all.R (L49-L52)
However, seems it was removed out from 2.0.0. I tried few other exposed APIs like `test_dir` but I failed to make a good compatible fix.
Seems we better fix the `testthat` version first to make the build passed.
## How was this patch tested?
Manually tested and AppVeyor tests.
Author: hyukjinkwon <gurwls223@gmail.com>
Closes#20003 from HyukjinKwon/SPARK-22817.
## What changes were proposed in this pull request?
Fixing the way how `SPARK_HOME` is resolved on Windows. While the previous version was working with the built release download, the set of directories changed slightly for the PySpark `pip` or `conda` install. This has been reflected in Linux files in `bin` but not for Windows `cmd` files.
First fix improves the way how the `jars` directory is found, as this was stoping Windows version of `pip/conda` install from working; JARs were not found by on Session/Context setup.
Second fix is adding `find-spark-home.cmd` script, which uses `find_spark_home.py` script, as the Linux version, to resolve `SPARK_HOME`. It is based on `find-spark-home` bash script, though, some operations are done in different order due to the `cmd` script language limitations. If environment variable is set, the Python script `find_spark_home.py` will not be run. The process can fail if Python is not installed, but it will mostly use this way if PySpark is installed via `pip/conda`, thus, there is some Python in the system.
## How was this patch tested?
Tested on local installation.
Author: Jakub Nowacki <j.s.nowacki@gmail.com>
Closes#19370 from jsnowacki/fix_spark_cmds.
## What changes were proposed in this pull request?
more file regex
## How was this patch tested?
Jenkins, AppVeyor
Author: Felix Cheung <felixcheung_m@hotmail.com>
Closes#19177 from felixcheung/rmoduletotest.
## What changes were proposed in this pull request?
This PR proposes three things as below:
**Install packages per documentation** - this does not affect the tests itself (but CRAN which we are not doing via AppVeyor) up to my knowledge.
This adds `knitr` and `rmarkdown` per 45824fb608/R/WINDOWS.md (unit-tests) (please see 45824fb608)
**Improve logs/shorten logs** - actually, long logs can be a problem on AppVeyor (e.g., see https://github.com/apache/spark/pull/17873)
`R -e ...` repeats printing R information for each invocation as below:
```
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: i386-w64-mingw32/i386 (32-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
Natural language support but running in an English locale
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
```
It looks reducing the call might be slightly better and print out the versions together looks more readable.
Before:
```
# R information ...
> packageVersion('testthat')
[1] '1.0.2'
>
>
# R information ...
> packageVersion('e1071')
[1] '1.6.8'
>
>
... 3 more times
```
After:
```
# R information ...
> packageVersion('knitr'); packageVersion('rmarkdown'); packageVersion('testthat'); packageVersion('e1071'); packageVersion('survival')
[1] ‘1.16’
[1] ‘1.6’
[1] ‘1.0.2’
[1] ‘1.6.8’
[1] ‘2.41.3’
```
**Add`appveyor.yml`/`dev/appveyor-install-dependencies.ps1` for triggering the test**
Changing this file might break the test, e.g., https://github.com/apache/spark/pull/16927
## How was this patch tested?
Before (please see https://ci.appveyor.com/project/HyukjinKwon/spark/build/169-master)
After (please see the AppVeyor build in this PR):
Author: hyukjinkwon <gurwls223@gmail.com>
Closes#18336 from HyukjinKwon/minor-add-knitr-and-rmarkdown.
## What changes were proposed in this pull request?
add environment
## How was this patch tested?
wait for appveyor run
Author: Felix Cheung <felixcheung_m@hotmail.com>
Closes#17878 from felixcheung/appveyorrcran.
## What changes were proposed in this pull request?
Currently, there are flooding logs in AppVeyor (in the console). This has been fine because we can download all the logs. However, (given my observations so far), logs are truncated when there are too many. It has been grown recently and it started to get truncated. For example, see https://ci.appveyor.com/project/ApacheSoftwareFoundation/spark/build/1209-master
Even after the log is downloaded, it looks truncated as below:
```
[00:44:21] 17/05/04 18:56:18 INFO TaskSetManager: Finished task 197.0 in stage 601.0 (TID 9211) in 0 ms on localhost (executor driver) (194/200)
[00:44:21] 17/05/04 18:56:18 INFO Executor: Running task 199.0 in stage 601.0 (TID 9213)
[00:44:21] 17/05/04 18:56:18 INFO Executor: Finished task 198.0 in stage 601.0 (TID 9212). 2473 bytes result sent to driver
...
```
Probably, it looks better to use the same log4j configuration that we are using for SparkR tests in Jenkins(please see fc472bddd1/R/run-tests.sh (L26) and fc472bddd1/R/log4j.properties)
```
# Set everything to be logged to the file target/unit-tests.log
log4j.rootCategory=INFO, file
log4j.appender.file=org.apache.log4j.FileAppender
log4j.appender.file.append=true
log4j.appender.file.file=R/target/unit-tests.log
log4j.appender.file.layout=org.apache.log4j.PatternLayout
log4j.appender.file.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss.SSS} %t %p %c{1}: %m%n
# Ignore messages below warning level from Jetty, because it's a bit verbose
log4j.logger.org.eclipse.jetty=WARN
org.eclipse.jetty.LEVEL=WARN
```
## How was this patch tested?
Manually tested with spark-test account
- https://ci.appveyor.com/project/spark-test/spark/build/672-r-log4j (there is an example for flaky test here)
- https://ci.appveyor.com/project/spark-test/spark/build/673-r-log4j (I re-ran the build).
Author: hyukjinkwon <gurwls223@gmail.com>
Closes#17873 from HyukjinKwon/appveyor-reduce-logs.
## What changes were proposed in this pull request?
We are currently detecting the changes in `R/` directory only and then trigger AppVeyor tests.
It seems we need to tests when there are Scala codes dedicated for R in `core/src/main/scala/org/apache/spark/api/r/`, `sql/core/src/main/scala/org/apache/spark/sql/api/r/` and `mllib/src/main/scala/org/apache/spark/ml/r/` too.
This will enables the tests, for example, for SPARK-20088.
## How was this patch tested?
Tests with manually created PRs.
- Changes in `sql/core/src/main/scala/org/apache/spark/sql/api/r/SQLUtils.scala` https://github.com/spark-test/spark/pull/13
- Changes in `core/src/main/scala/org/apache/spark/api/r/SerDe.scala` https://github.com/spark-test/spark/pull/12
- Changes in `README.md` https://github.com/spark-test/spark/pull/14
Author: hyukjinkwon <gurwls223@gmail.com>
Closes#17427 from HyukjinKwon/SPARK-20092.
## What changes were proposed in this pull request?
- Remove support for Hadoop 2.5 and earlier
- Remove reflection and code constructs only needed to support multiple versions at once
- Update docs to reflect newer versions
- Remove older versions' builds and profiles.
## How was this patch tested?
Existing tests
Author: Sean Owen <sowen@cloudera.com>
Closes#16810 from srowen/SPARK-19464.