ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
HyukjinKwon	90b6cda9af	[SPARK-25944][R][BUILD] AppVeyor change to latest R version (3.6.0) ## What changes were proposed in this pull request? R 3.6.0 is released 2019-04-26. This PR targets to change R version from 3.5.1 to 3.6.0 in AppVeyor. This PR sets `R_REMOTES_NO_ERRORS_FROM_WARNINGS` to `true` to avoid the warnings below: ``` Error in strptime(xx, f, tz = tz) : (converted from warning) unable to identify current timezone 'C': please set environment variable 'TZ' Error in i.p(...) : (converted from warning) installation of package 'praise' had non-zero exit status Calls: <Anonymous> ... with_rprofile_user -> with_envvar -> force -> force -> i.p Execution halted ``` ## How was this patch tested? AppVeyor Closes #24716 from HyukjinKwon/SPARK-27848. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2019-05-28 14:42:03 +09:00
Sean Owen	6c5827c723	[SPARK-27794][R][DOCS] Use https URL for CRAN repo ## What changes were proposed in this pull request? Use https URL for CRAN repo (and for a Scala download in a Dockerfile) ## How was this patch tested? Existing tests. Closes #24664 from srowen/SPARK-27794. Authored-by: Sean Owen <sean.owen@databricks.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-05-22 14:28:21 -07:00
hyukjinkwon	3210121fed	[MINOR][BUILD] Remove -Phive-thriftserver profile within appveyor.yml ## What changes were proposed in this pull request? This PR propose to remove `-Phive-thriftserver` profile which seems not affecting the SparkR tests in AppVeyor. Originally wanted to check if there's a meaningful build time decrease but seems not. It will have but seems not meaningfully decreased. ## How was this patch tested? AppVeyor tests: ``` [00:40:49] Attaching package: 'SparkR' [00:40:49] [00:40:49] The following objects are masked from 'package:testthat': [00:40:49] [00:40:49] describe, not [00:40:49] [00:40:49] The following objects are masked from 'package:stats': [00:40:49] [00:40:49] cov, filter, lag, na.omit, predict, sd, var, window [00:40:49] [00:40:49] The following objects are masked from 'package:base': [00:40:49] [00:40:49] as.data.frame, colnames, colnames<-, drop, endsWith, intersect, [00:40:49] rank, rbind, sample, startsWith, subset, summary, transform, union [00:40:49] [00:40:49] Spark package found in SPARK_HOME: C:\projects\spark\bin\.. [00:41:43] basic tests for CRAN: ............. [00:41:43] [00:41:43] DONE =========================================================================== [00:41:43] binary functions: Spark package found in SPARK_HOME: C:\projects\spark\bin\.. [00:42:05] ........... [00:42:05] functions on binary files: Spark package found in SPARK_HOME: C:\projects\spark\bin\.. [00:42:10] .... [00:42:10] broadcast variables: Spark package found in SPARK_HOME: C:\projects\spark\bin\.. [00:42:12] .. [00:42:12] functions in client.R: ..... [00:42:30] test functions in sparkR.R: .............................................. [00:42:30] include R packages: Spark package found in SPARK_HOME: C:\projects\spark\bin\.. [00:42:31] [00:42:31] JVM API: Spark package found in SPARK_HOME: C:\projects\spark\bin\.. [00:42:31] .. [00:42:31] MLlib classification algorithms, except for tree-based algorithms: Spark package found in SPARK_HOME: C:\projects\spark\bin\.. [00:48:48] ...................................................................... [00:48:48] MLlib clustering algorithms: Spark package found in SPARK_HOME: C:\projects\spark\bin\.. [00:50:12] ..................................................................... [00:50:12] MLlib frequent pattern mining: Spark package found in SPARK_HOME: C:\projects\spark\bin\.. [00:50:18] ..... [00:50:18] MLlib recommendation algorithms: Spark package found in SPARK_HOME: C:\projects\spark\bin\.. [00:50:27] ........ [00:50:27] MLlib regression algorithms, except for tree-based algorithms: Spark package found in SPARK_HOME: C:\projects\spark\bin\.. [00:56:00] ................................................................................................................................ [00:56:00] MLlib statistics algorithms: Spark package found in SPARK_HOME: C:\projects\spark\bin\.. [00:56:04] ........ [00:56:04] MLlib tree-based algorithms: Spark package found in SPARK_HOME: C:\projects\spark\bin\.. [00:58:20] .............................................................................................. [00:58:20] parallelize() and collect(): Spark package found in SPARK_HOME: C:\projects\spark\bin\.. [00:58:20] ............................. [00:58:20] basic RDD functions: Spark package found in SPARK_HOME: C:\projects\spark\bin\.. [01:03:35] ............................................................................................................................................................................................................................................................................................................................................................................................................................................ [01:03:35] SerDe functionality: Spark package found in SPARK_HOME: C:\projects\spark\bin\.. [01:03:39] ............................... [01:03:39] partitionBy, groupByKey, reduceByKey etc.: Spark package found in SPARK_HOME: C:\projects\spark\bin\.. [01:04:20] .................... [01:04:20] functions in sparkR.R: .... [01:04:20] SparkSQL functions: Spark package found in SPARK_HOME: C:\projects\spark\bin\.. [01:04:50] ........................................................................................................................................-chgrp: 'APPVYR-WIN\None' does not match expected pattern for group [01:04:50] Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... [01:04:50] -chgrp: 'APPVYR-WIN\None' does not match expected pattern for group [01:04:50] Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... [01:04:51] -chgrp: 'APPVYR-WIN\None' does not match expected pattern for group [01:04:51] Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... [01:06:13] ............................................................................................................................................................................................................................................................................................................................................................-chgrp: 'APPVYR-WIN\None' does not match expected pattern for group [01:06:13] Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... [01:06:14] .-chgrp: 'APPVYR-WIN\None' does not match expected pattern for group [01:06:14] Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... [01:06:14] ....-chgrp: 'APPVYR-WIN\None' does not match expected pattern for group [01:06:14] Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... [01:12:30] ................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... [01:12:30] Structured Streaming: Spark package found in SPARK_HOME: C:\projects\spark\bin\.. [01:14:27] .......................................... [01:14:27] tests RDD function take(): Spark package found in SPARK_HOME: C:\projects\spark\bin\.. [01:14:28] ................ [01:14:28] the textFile() function: Spark package found in SPARK_HOME: C:\projects\spark\bin\.. [01:14:44] ............. [01:14:44] functions in utils.R: Spark package found in SPARK_HOME: C:\projects\spark\bin\.. [01:14:46] ............................................ [01:14:46] Windows-specific tests: . [01:14:46] [01:14:46] DONE =========================================================================== [01:15:29] Build success ``` Author: hyukjinkwon <gurwls223@apache.org> Closes #21894 from HyukjinKwon/wip-build.	2018-07-30 10:01:18 +08:00
hyukjinkwon	c2aeddf9ea	[SPARK-22817][R] Use fixed testthat version for SparkR tests in AppVeyor ## What changes were proposed in this pull request? `testthat` 2.0.0 is released and AppVeyor now started to use it instead of 1.0.2. And then, we started to have R tests failed in AppVeyor. See - https://ci.appveyor.com/project/ApacheSoftwareFoundation/spark/build/1967-master ``` Error in get(name, envir = asNamespace(pkg), inherits = FALSE) : object 'run_tests' not found Calls: ::: -> get ``` This seems because we rely on internal `testthat:::run_tests` here: https://github.com/r-lib/testthat/blob/v1.0.2/R/test-package.R#L62-L75 `dc4c351837/R/pkg/tests/run-all.R (L49-L52)` However, seems it was removed out from 2.0.0. I tried few other exposed APIs like `test_dir` but I failed to make a good compatible fix. Seems we better fix the `testthat` version first to make the build passed. ## How was this patch tested? Manually tested and AppVeyor tests. Author: hyukjinkwon <gurwls223@gmail.com> Closes #20003 from HyukjinKwon/SPARK-22817.	2017-12-17 14:40:41 +09:00
Jakub Nowacki	b4edafa99b	[SPARK-22495] Fix setup of SPARK_HOME variable on Windows ## What changes were proposed in this pull request? Fixing the way how `SPARK_HOME` is resolved on Windows. While the previous version was working with the built release download, the set of directories changed slightly for the PySpark `pip` or `conda` install. This has been reflected in Linux files in `bin` but not for Windows `cmd` files. First fix improves the way how the `jars` directory is found, as this was stoping Windows version of `pip/conda` install from working; JARs were not found by on Session/Context setup. Second fix is adding `find-spark-home.cmd` script, which uses `find_spark_home.py` script, as the Linux version, to resolve `SPARK_HOME`. It is based on `find-spark-home` bash script, though, some operations are done in different order due to the `cmd` script language limitations. If environment variable is set, the Python script `find_spark_home.py` will not be run. The process can fail if Python is not installed, but it will mostly use this way if PySpark is installed via `pip/conda`, thus, there is some Python in the system. ## How was this patch tested? Tested on local installation. Author: Jakub Nowacki <j.s.nowacki@gmail.com> Closes #19370 from jsnowacki/fix_spark_cmds.	2017-11-23 12:47:38 +09:00
Felix Cheung	828fab0356	[BUILD][TEST][SPARKR] add sparksubmitsuite to appveyor tests ## What changes were proposed in this pull request? more file regex ## How was this patch tested? Jenkins, AppVeyor Author: Felix Cheung <felixcheung_m@hotmail.com> Closes #19177 from felixcheung/rmoduletotest.	2017-09-11 09:32:25 +09:00
hyukjinkwon	75a6d05853	[MINOR][R] Add knitr and rmarkdown packages/improve output for version info in AppVeyor tests ## What changes were proposed in this pull request? This PR proposes three things as below: Install packages per documentation - this does not affect the tests itself (but CRAN which we are not doing via AppVeyor) up to my knowledge. This adds `knitr` and `rmarkdown` per `45824fb608/R/WINDOWS.md (unit-tests)` (please see `45824fb608`) Improve logs/shorten logs - actually, long logs can be a problem on AppVeyor (e.g., see https://github.com/apache/spark/pull/17873) `R -e ...` repeats printing R information for each invocation as below: ``` R version 3.3.1 (2016-06-21) -- "Bug in Your Hair" Copyright (C) 2016 The R Foundation for Statistical Computing Platform: i386-w64-mingw32/i386 (32-bit) R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. Natural language support but running in an English locale R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R. ``` It looks reducing the call might be slightly better and print out the versions together looks more readable. Before: ``` # R information ... > packageVersion('testthat') [1] '1.0.2' > > # R information ... > packageVersion('e1071') [1] '1.6.8' > > ... 3 more times ``` After: ``` # R information ... > packageVersion('knitr'); packageVersion('rmarkdown'); packageVersion('testthat'); packageVersion('e1071'); packageVersion('survival') [1] ‘1.16’ [1] ‘1.6’ [1] ‘1.0.2’ [1] ‘1.6.8’ [1] ‘2.41.3’ ``` Add`appveyor.yml`/`dev/appveyor-install-dependencies.ps1` for triggering the test Changing this file might break the test, e.g., https://github.com/apache/spark/pull/16927 ## How was this patch tested? Before (please see https://ci.appveyor.com/project/HyukjinKwon/spark/build/169-master) After (please see the AppVeyor build in this PR): Author: hyukjinkwon <gurwls223@gmail.com> Closes #18336 from HyukjinKwon/minor-add-knitr-and-rmarkdown.	2017-06-18 08:43:47 +01:00
Felix Cheung	7087e01194	[SPARK-20543][SPARKR][FOLLOWUP] Don't skip tests on AppVeyor ## What changes were proposed in this pull request? add environment ## How was this patch tested? wait for appveyor run Author: Felix Cheung <felixcheung_m@hotmail.com> Closes #17878 from felixcheung/appveyorrcran.	2017-05-07 13:10:10 -07:00
hyukjinkwon	b433acae74	[SPARK-20614][PROJECT INFRA] Use the same log4j configuration with Jenkins in AppVeyor ## What changes were proposed in this pull request? Currently, there are flooding logs in AppVeyor (in the console). This has been fine because we can download all the logs. However, (given my observations so far), logs are truncated when there are too many. It has been grown recently and it started to get truncated. For example, see https://ci.appveyor.com/project/ApacheSoftwareFoundation/spark/build/1209-master Even after the log is downloaded, it looks truncated as below: ``` [00:44:21] 17/05/04 18:56:18 INFO TaskSetManager: Finished task 197.0 in stage 601.0 (TID 9211) in 0 ms on localhost (executor driver) (194/200) [00:44:21] 17/05/04 18:56:18 INFO Executor: Running task 199.0 in stage 601.0 (TID 9213) [00:44:21] 17/05/04 18:56:18 INFO Executor: Finished task 198.0 in stage 601.0 (TID 9212). 2473 bytes result sent to driver ... ``` Probably, it looks better to use the same log4j configuration that we are using for SparkR tests in Jenkins(please see `fc472bddd1/R/run-tests.sh (L26)` and `fc472bddd1/R/log4j.properties`) ``` # Set everything to be logged to the file target/unit-tests.log log4j.rootCategory=INFO, file log4j.appender.file=org.apache.log4j.FileAppender log4j.appender.file.append=true log4j.appender.file.file=R/target/unit-tests.log log4j.appender.file.layout=org.apache.log4j.PatternLayout log4j.appender.file.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss.SSS} %t %p %c{1}: %m%n # Ignore messages below warning level from Jetty, because it's a bit verbose log4j.logger.org.eclipse.jetty=WARN org.eclipse.jetty.LEVEL=WARN ``` ## How was this patch tested? Manually tested with spark-test account - https://ci.appveyor.com/project/spark-test/spark/build/672-r-log4j (there is an example for flaky test here) - https://ci.appveyor.com/project/spark-test/spark/build/673-r-log4j (I re-ran the build). Author: hyukjinkwon <gurwls223@gmail.com> Closes #17873 from HyukjinKwon/appveyor-reduce-logs.	2017-05-05 21:26:55 -07:00
hyukjinkwon	2422c86f2c	[SPARK-20092][R][PROJECT INFRA] Add the detection for Scala codes dedicated for R in AppVeyor tests ## What changes were proposed in this pull request? We are currently detecting the changes in `R/` directory only and then trigger AppVeyor tests. It seems we need to tests when there are Scala codes dedicated for R in `core/src/main/scala/org/apache/spark/api/r/`, `sql/core/src/main/scala/org/apache/spark/sql/api/r/` and `mllib/src/main/scala/org/apache/spark/ml/r/` too. This will enables the tests, for example, for SPARK-20088. ## How was this patch tested? Tests with manually created PRs. - Changes in `sql/core/src/main/scala/org/apache/spark/sql/api/r/SQLUtils.scala` https://github.com/spark-test/spark/pull/13 - Changes in `core/src/main/scala/org/apache/spark/api/r/SerDe.scala` https://github.com/spark-test/spark/pull/12 - Changes in `README.md` https://github.com/spark-test/spark/pull/14 Author: hyukjinkwon <gurwls223@gmail.com> Closes #17427 from HyukjinKwon/SPARK-20092.	2017-03-25 23:29:02 -07:00
Yuming Wang	9b8eca65dc	[SPARK-19660][CORE][SQL] Replace the configuration property names that are deprecated in the version of Hadoop 2.6 ## What changes were proposed in this pull request? Replace all the Hadoop deprecated configuration property names according to [DeprecatedProperties](https://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-common/DeprecatedProperties.html). except: https://github.com/apache/spark/blob/v2.1.0/python/pyspark/sql/tests.py#L1533 https://github.com/apache/spark/blob/v2.1.0/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala#L987 https://github.com/apache/spark/blob/v2.1.0/sql/core/src/main/scala/org/apache/spark/sql/execution/command/SetCommand.scala#L45 https://github.com/apache/spark/blob/v2.1.0/sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala#L614 ## How was this patch tested? Existing tests Author: Yuming Wang <wgyumg@gmail.com> Closes #16990 from wangyum/HadoopDeprecatedProperties.	2017-02-28 10:13:42 +00:00
Sean Owen	e8d3fca450	[SPARK-19464][CORE][YARN][TEST-HADOOP2.6] Remove support for Hadoop 2.5 and earlier ## What changes were proposed in this pull request? - Remove support for Hadoop 2.5 and earlier - Remove reflection and code constructs only needed to support multiple versions at once - Update docs to reflect newer versions - Remove older versions' builds and profiles. ## How was this patch tested? Existing tests Author: Sean Owen <sowen@cloudera.com> Closes #16810 from srowen/SPARK-19464.	2017-02-08 12:20:07 +00:00
hyukjinkwon	78d5d4dd5c	[SPARK-17200][PROJECT INFRA][BUILD][SPARKR] Automate building and testing on Windows (currently SparkR only) ## What changes were proposed in this pull request? This PR adds the build automation on Windows with [AppVeyor](https://www.appveyor.com/) CI tool. Currently, this only runs the tests for SparkR as we have been having some issues with testing Windows-specific PRs (e.g. https://github.com/apache/spark/pull/14743 and https://github.com/apache/spark/pull/13165) and hard time to verify this. One concern is, this build is dependent on [steveloughran/winutils](https://github.com/steveloughran/winutils) for pre-built Hadoop bin package (who is a Hadoop PMC member). ## How was this patch tested? Manually, https://ci.appveyor.com/project/HyukjinKwon/spark/build/88-SPARK-17200-build-profile This takes roughly 40 mins. Some tests are already being failed and this was found in https://github.com/apache/spark/pull/14743#issuecomment-241405287. Author: hyukjinkwon <gurwls223@gmail.com> Closes #14859 from HyukjinKwon/SPARK-17200-build.	2016-09-08 08:26:59 -07:00

13 commits