ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Dongjoon Hyun	42f8f79ff0	[SPARK-29936][R] Fix SparkR lint errors and add lint-r GitHub Action ### What changes were proposed in this pull request? This PR fixes SparkR lint errors and adds `lint-r` GitHub Action to protect the branch. ### Why are the changes needed? It turns out that we currently don't run it. It's recovered yesterday. However, after that, our Jenkins linter jobs (`master`/`branch-2.4`) has been broken on `lint-r` tasks. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Pass the GitHub Action on this PR in addition to Jenkins R and AppVeyor R. Closes #26564 from dongjoon-hyun/SPARK-29936. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-11-17 21:01:01 -08:00
Dongjoon Hyun	e1fc38b3e4	[SPARK-29932][R][TESTS] lint-r should do non-zero exit in case of errors ### What changes were proposed in this pull request? This PR aims to make `lint-r` exits with non-zero in case of errors. Please note that `lint-r` works correctly when everything are installed correctly. ### Why are the changes needed? There are two cases which hide errors from Jenkins/AppVeyor/GitHubAction. 1. `lint-r` exits with zero if there is no R installation. ```bash $ dev/lint-r dev/lint-r: line 25: type: Rscript: not found ERROR: You should install R $ echo $? 0 ``` 2. `lint-r` exits with zero if we didn't do `R/install-dev.sh`. ```bash $ dev/lint-r Error: You should install SparkR in a local directory with `R/install-dev.sh`. In addition: Warning message: In library(SparkR, lib.loc = LOCAL_LIB_LOC, logical.return = TRUE) : no library trees found in 'lib.loc' Execution halted lintr checks passed. // <=== Please note here $ echo $? 0 ``` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Manually check the above two cases. Closes #26561 from dongjoon-hyun/SPARK-29932. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-11-17 10:09:46 -08:00
Dongjoon Hyun	d0470d6394	[MINOR][TESTS] Ignore GitHub Action and AppVeyor file changes in testing ### What changes were proposed in this pull request? This PR aims to ignore `GitHub Action` and `AppVeyor` file changes. When we touch these files, Jenkins job should not trigger a full testing. ### Why are the changes needed? Currently, these files are categorized to `root` and trigger the full testing and ends up wasting the Jenkins resources. - https://github.com/apache/spark/pull/26555 ``` [info] Using build tool sbt with Hadoop profile hadoop2.7 under environment amplab_jenkins From https://github.com/apache/spark * [new branch] master -> master [info] Found the following changed modules: sparkr, root [info] Setup the following environment variables for tests: ``` ### Does this PR introduce any user-facing change? No. (Jenkins testing only). ### How was this patch tested? Manually. ``` $ dev/run-tests.py -h -v ... Trying: [x.name for x in determine_modules_for_files([".github/workflows/master.yml", "appveyor.xml"])] Expecting: [] ... ``` Closes #26556 from dongjoon-hyun/SPARK-IGNORE-APPVEYOR. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-11-16 09:26:01 -08:00
HyukjinKwon	d1ac25ba33	[SPARK-28752][BUILD][DOCS] Documentation build to support Python 3 ### What changes were proposed in this pull request? This PR proposes to switch `pygments.rb`, which only support Python 2 and seems inactive for the last few years (https://github.com/tmm1/pygments.rb), to Rouge which is pure Ruby code highlighter that is compatible with Pygments. I thought it would be pretty difficult to change but thankfully Rouge does a great job as the alternative. ### Why are the changes needed? We're moving to Python 3 and drop Python 2 completely. ### Does this PR introduce any user-facing change? Maybe a little bit of different syntax style but should not have a notable change. ### How was this patch tested? Manually tested the build and checked the documentation. Closes #26521 from HyukjinKwon/SPARK-28752. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2019-11-15 13:44:20 +09:00
Bryan Cutler	65a189c7a1	[SPARK-29376][SQL][PYTHON] Upgrade Apache Arrow to version 0.15.1 ### What changes were proposed in this pull request? Upgrade Apache Arrow to version 0.15.1. This includes Java artifacts and increases the minimum required version of PyArrow also. Version 0.12.0 to 0.15.1 includes the following selected fixes/improvements relevant to Spark users: * ARROW-6898 - [Java] Fix potential memory leak in ArrowWriter and several test classes * ARROW-6874 - [Python] Memory leak in Table.to_pandas() when conversion to object dtype * ARROW-5579 - [Java] shade flatbuffer dependency * ARROW-5843 - [Java] Improve the readability and performance of BitVectorHelper#getNullCount * ARROW-5881 - [Java] Provide functionalities to efficiently determine if a validity buffer has completely 1 bits/0 bits * ARROW-5893 - [C++] Remove arrow::Column class from C++ library * ARROW-5970 - [Java] Provide pointer to Arrow buffer * ARROW-6070 - [Java] Avoid creating new schema before IPC sending * ARROW-6279 - [Python] Add Table.slice method or allow slices in \_\_getitem\_\_ * ARROW-6313 - [Format] Tracking for ensuring flatbuffer serialized values are aligned in stream/files. * ARROW-6557 - [Python] Always return pandas.Series from Array/ChunkedArray.to_pandas, propagate field names to Series from RecordBatch, Table * ARROW-2015 - [Java] Use Java Time and Date APIs instead of JodaTime * ARROW-1261 - [Java] Add container type for Map logical type * ARROW-1207 - [C++] Implement Map logical type Changelog can be seen at https://arrow.apache.org/release/0.15.0.html ### Why are the changes needed? Upgrade to get bug fixes, improvements, and maintain compatibility with future versions of PyArrow. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Existing tests, manually tested with Python 3.7, 3.8 Closes #26133 from BryanCutler/arrow-upgrade-015-SPARK-29376. Authored-by: Bryan Cutler <cutlerb@gmail.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2019-11-15 13:27:30 +09:00
shane knapp	04e99c1e1b	[SPARK-29672][PYSPARK] update spark testing framework to use python3 ### What changes were proposed in this pull request? remove python2.7 tests and test infra for 3.0+ ### Why are the changes needed? because python2.7 is finally going the way of the dodo. ### Does this PR introduce any user-facing change? newp. ### How was this patch tested? the build system will test this Closes #26330 from shaneknapp/remove-py27-tests. Lead-authored-by: shane knapp <incomplete@gmail.com> Co-authored-by: shane <incomplete@gmail.com> Signed-off-by: shane knapp <incomplete@gmail.com>	2019-11-14 10:18:55 -08:00
Liang-Chi Hsieh	ef1abf2e2c	[SPARK-29747][BUILD] Bump joda-time version to 2.10.5 ### What changes were proposed in this pull request? This upgrades joda-time from 2.9 to 2.10.5. ### Why are the changes needed? Joda 2.9 is almost 4 yrs ago and there are bugs fix and tz database updates. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Existing tests. Closes #26389 from viirya/upgrade-joda. Authored-by: Liang-Chi Hsieh <liangchi@uber.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2019-11-05 10:08:19 +09:00
Sean Owen	19b8c71436	[SPARK-29674][CORE] Update dropwizard metrics to 4.1.x for JDK 9+ ### What changes were proposed in this pull request? Update the version of dropwizard metrics that Spark uses for metrics to 4.1.x, from 3.2.x. ### Why are the changes needed? This helps JDK 9+ support, per for example https://github.com/dropwizard/metrics/pull/1236 ### Does this PR introduce any user-facing change? No, although downstream users with custom metrics may be affected. ### How was this patch tested? Existing tests. Closes #26332 from srowen/SPARK-29674. Authored-by: Sean Owen <sean.owen@databricks.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-11-03 15:13:06 -08:00
Dongjoon Hyun	4bcfe5033c	[SPARK-29731][INFRA] Use public JIRA REST API to read-only access ### What changes were proposed in this pull request? This PR replaces `jira_client` API call for read-only access with public Apache JIRA REST API invocation. ### Why are the changes needed? This will reduce the number of authenticated API invocations. I hope this will reduce the chance of CAPCHAR from Apache JIRA site. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Manual. ``` $ echo 26375 > .github-jira-max $ dev/github_jira_sync.py Read largest PR number previously seen: 26375 Retrieved 100 JIRA PR's from Github 1 PR's remain after excluding visted ones Checking issue SPARK-29731 Writing largest PR number seen: 26376 Build PR dictionary SPARK-29731 26376 Set 26376 with labels "PROJECT INFRA" ``` Closes #26376 from dongjoon-hyun/SPARK-29731. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-11-03 11:17:53 -08:00
Dongjoon Hyun	1ac6bd9f79	[SPARK-29729][BUILD] Upgrade ASM to 7.2 ### What changes were proposed in this pull request? This PR aims to upgrade ASM to 7.2. - https://issues.apache.org/jira/browse/XBEAN-322 (Upgrade to ASM 7.2) - https://asm.ow2.io/versions.html ### Why are the changes needed? This will bring the following patches. - 317875: Infinite loop when parsing invalid method descriptor - 317873: Add support for RET instruction in AdviceAdapter - 317872: Throw an exception if visitFrame used incorrectly - add support for Java 14 ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Pass the Jenkins with the existing UTs. Closes #26373 from dongjoon-hyun/SPARK-29729. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-11-03 10:42:38 -08:00
Xingbo Jiang	155a67d00c	[SPARK-29666][BUILD] Fix the publish release failure under dry-run mode ### What changes were proposed in this pull request? `release-build.sh` fail to publish release under dry run mode with the following error message: ``` /opt/spark-rm/release-build.sh: line 429: pushd: spark-repo-g4MBm/org/apache/spark: No such file or directory ``` We need to at least run the `mvn clean install` command once to create the `$tmp_repo` path, but now those steps are all skipped under dry-run mode. This PR fixes the issue. ### How was this patch tested? Tested locally. Closes #26329 from jiangxb1987/dryrun. Authored-by: Xingbo Jiang <xingbo.jiang@databricks.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-10-30 14:57:51 -07:00
Xingbo Jiang	fd6cfb1be3	[SPARK-29646][BUILD] Allow pyspark version name format `${versionNumber}-preview` in release script ### What changes were proposed in this pull request? Update `release-build.sh`, to allow pyspark version name format `${versionNumber}-preview`, otherwise the release script won't generate pyspark release tarballs. ### How was this patch tested? Tested locally. Closes #26306 from jiangxb1987/buildPython. Authored-by: Xingbo Jiang <xingbo.jiang@databricks.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-10-30 14:51:50 -07:00
Dongjoon Hyun	ba9d1610b6	[SPARK-29617][BUILD] Upgrade to ORC 1.5.7 ### What changes were proposed in this pull request? This PR aims to upgrade to Apache ORC 1.5.7. ### Why are the changes needed? This will bring the latest bug fixes. The following is the full release note. - https://issues.apache.org/jira/projects/ORC/versions/12345702 ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Pass the Jenkins with the existing tests. Closes #26276 from dongjoon-hyun/SPARK-29617. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-10-27 21:11:17 -07:00
Dongjoon Hyun	2baf7a1d8f	[SPARK-29608][BUILD] Add `hadoop-3.2` profile to release build ### What changes were proposed in this pull request? This PR aims to add `hadoop-3.2` profile to pre-built binary package releases. ### Why are the changes needed? Since Apache Spark 3.0.0, we provides Hadoop 3.2 pre-built binary. ### Does this PR introduce any user-facing change? No. (Although the artifacts are available, this change is for release managers). ### How was this patch tested? Manual. Please note that `DRY_RUN=0` disables these combination. ``` $ dev/create-release/release-build.sh package ... Packages to build: without-hadoop hadoop3.2 hadoop2.7 make_binary_release without-hadoop -Pscala-2.12 -Phadoop-provided 2.12 make_binary_release hadoop3.2 -Pscala-2.12 -Phadoop-3.2 -Phive -Phive-thriftserver 2.12 make_binary_release hadoop2.7 -Pscala-2.12 -Phadoop-2.7 -Phive -Phive-thriftserver withpip,withr 2.12 ``` Closes #26260 from dongjoon-hyun/SPARK-29608. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-10-25 13:57:26 -07:00
Luca Canali	5867707835	[SPARK-29557][BUILD] Update dropwizard/codahale metrics library to 3.2.6 ### What changes were proposed in this pull request? This proposes to update the dropwizard/codahale metrics library version used by Spark to `3.2.6` which is the last version supporting Ganglia. ### Why are the changes needed? Spark is currently using Dropwizard metrics version 3.1.5, a version that is no more actively developed nor maintained, according to the project's Github repo README. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Existing tests + manual tests on a YARN cluster. Closes #26212 from LucaCanali/updateDropwizardVersion. Authored-by: Luca Canali <luca.canali@cern.ch> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-10-23 10:45:11 -07:00
Xianyang Liu	0a7095156b	[SPARK-29499][CORE][PYSPARK] Add mapPartitionsWithIndex for RDDBarrier ### What changes were proposed in this pull request? Add mapPartitionsWithIndex for RDDBarrier. ### Why are the changes needed? There is only one method in `RDDBarrier`. We often use the partition index as a label for the current partition. We need to get the index from `TaskContext` index in the method of `mapPartitions` which is not convenient. ### Does this PR introduce any user-facing change? No ### How was this patch tested? New UT. Closes #26148 from ConeyLiu/barrier-index. Authored-by: Xianyang Liu <xianyang.liu@intel.com> Signed-off-by: Xingbo Jiang <xingbo.jiang@databricks.com>	2019-10-23 13:46:09 +02:00
igor.calabria	78bdcfade1	[SPARK-27812][K8S] Bump K8S client version to 4.6.1 ### What changes were proposed in this pull request? Updated kubernetes client. ### Why are the changes needed? https://issues.apache.org/jira/browse/SPARK-27812 https://issues.apache.org/jira/browse/SPARK-27927 We need this fix https://github.com/fabric8io/kubernetes-client/pull/1768 that was released on version 4.6 of the client. The root cause of the problem is better explained in https://github.com/apache/spark/pull/25785 ### Does this PR introduce any user-facing change? Nope, it should be transparent to users ### How was this patch tested? This patch was tested manually using a simple pyspark job ```python from pyspark.sql import SparkSession if __name__ == '__main__': spark = SparkSession.builder.getOrCreate() ``` The expected behaviour of this "job" is that both python's and jvm's process exit automatically after the main runs. This is the case for spark versions <= 2.4. On version 2.4.3, the jvm process hangs because there's a non daemon thread running ``` "OkHttp WebSocket https://10.96.0.1/..." #121 prio=5 os_prio=0 tid=0x00007fb27c005800 nid=0x24b waiting on condition [0x00007fb300847000] "OkHttp WebSocket https://10.96.0.1/..." #117 prio=5 os_prio=0 tid=0x00007fb28c004000 nid=0x247 waiting on condition [0x00007fb300e4b000] ``` This is caused by a bug on `kubernetes-client` library, which is fixed on the version that we are upgrading to. When the mentioned job is run with this patch applied, the behaviour from spark <= 2.4.3 is restored and both processes terminate successfully Closes #26093 from igorcalabria/k8s-client-update. Authored-by: igor.calabria <igor.calabria@ubee.in> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-10-17 12:23:24 -07:00
Fokko Driesprong	8eb8f7478c	[SPARK-29483][BUILD] Bump Jackson to 2.10.0 ### What changes were proposed in this pull request? Release blog: https://medium.com/cowtowncoder/jackson-2-10-features-cd880674d8a2 Fixes the following CVE's: https://www.cvedetails.com/cve/CVE-2019-16942/ https://www.cvedetails.com/cve/CVE-2019-16943/ Looking back, there were 3 major goals for this minor release: - Resolve the growing problem of “endless CVE patches”, a stream of fixes for reported CVEs related to “Polymorphic Deserialization” problem (described in “On Jackson CVEs… ”) that resulted in security tools forcing Jackson upgrades. 2.10 now includes “Safe Default Typing” that is hoped to resolve this problem. - Evolve 2.x API towards 3.0, based on changes that were done in master, within limits of 2.x API backwards-compatibility requirements. - Add JDK support for versions beyond Java 8: specifically add“module-info.class” for JDK9+, defining proper module definitions for Jackson components Full changelog: https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.10 Improved Scala 2.13 support: https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.10#scala ### Why are the changes needed? Patches CVE's reported by the vulnerability scanner. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Ran `mvn clean install -DskipTests` locally. Closes #26131 from Fokko/SPARK-29483. Authored-by: Fokko Driesprong <fokko@apache.org> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-10-16 15:38:54 -07:00
Jeff Evans	95de93b24e	[SPARK-24540][SQL] Support for multiple character delimiter in Spark CSV read Updating univocity-parsers version to 2.8.3, which adds support for multiple character delimiters Moving univocity-parsers version to spark-parent pom dependencyManagement section Adding new utility method to build multi-char delimiter string, which delegates to existing one Adding tests for multiple character delimited CSV ### What changes were proposed in this pull request? Adds support for parsing CSV data using multiple-character delimiters. Existing logic for converting the input delimiter string to characters was kept and invoked in a loop. Project dependencies were updated to remove redundant declaration of `univocity-parsers` version, and also to change that version to the latest. ### Why are the changes needed? It is quite common for people to have delimited data, where the delimiter is not a single character, but rather a sequence of characters. Currently, it is difficult to handle such data in Spark (typically needs pre-processing). ### Does this PR introduce any user-facing change? Yes. Specifying the "delimiter" option for the DataFrame read, and providing more than one character, will no longer result in an exception. Instead, it will be converted as before and passed to the underlying library (Univocity), which has accepted multiple character delimiters since 2.8.0. ### How was this patch tested? The `CSVSuite` tests were confirmed passing (including new methods), and `sbt` tests for `sql` were executed. Closes #26027 from jeff303/SPARK-24540. Authored-by: Jeff Evans <jeffrey.wayne.evans@gmail.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2019-10-15 15:44:51 -05:00
Dongjoon Hyun	39d53d3e74	[SPARK-29470][BUILD] Update plugins to latest versions ### What changes were proposed in this pull request? This PR updates plugins to latest versions. ### Why are the changes needed? This brings bug fixes like the following. - https://issues.apache.org/jira/projects/MCOMPILER/versions/12343484 (maven-compiler-plugin) - https://issues.apache.org/jira/projects/MJAVADOC/versions/12345060 (maven-javadoc-plugin) - https://issues.apache.org/jira/projects/MCHECKSTYLE/versions/12342397 (maven-checkstyle-plugin) - https://checkstyle.sourceforge.io/releasenotes.html#Release_8.25 (checkstyle) - https://checkstyle.sourceforge.io/releasenotes.html#Release_8.24 (checkstyle) ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Pass the Jenkins building and testing with the existing code. Closes #26117 from dongjoon-hyun/SPARK-29470. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-10-15 11:55:52 -07:00
angerszhu	ef81525a1a	[SPARK-29308][BUILD] Update deps in dev/deps/spark-deps-hadoop-3.2 for hadoop-3.2 ### What changes were proposed in this pull request? Current dev/deps/spark-deps-hadoop-3.2 have some wrong deps, it's caused by `dev/test-dependencies.sh ` when build assembly dependencies. add maven compile parameter `-am` to make it build with all deps, and get right result. And update NOTICE-binary & NOTICE-binary for updated result. ### Why are the changes needed? Update dev/deps/spark-hadoop-3.2 ### Does this PR introduce any user-facing change? No ### How was this patch tested? N/A Closes #25984 from AngersZhuuuu/SPARK=29308. Authored-by: angerszhu <angers.zhu@gmail.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2019-10-13 12:53:12 -05:00
Fokko Driesprong	b5b1b69f79	[SPARK-29445][CORE] Bump netty-all from 4.1.39.Final to 4.1.42.Final ### What changes were proposed in this pull request? Minor version bump of Netty to patch reported CVE. Patches: https://www.cvedetails.com/cve/CVE-2019-16869/ ### Why are the changes needed? ### Does this PR introduce any user-facing change? No ### How was this patch tested? Compiled locally using `mvn clean install -DskipTests` Closes #26099 from Fokko/SPARK-29445. Authored-by: Fokko Driesprong <fokko@apache.org> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2019-10-12 09:43:16 -05:00
Peter Toth	3a7126cea8	[SPARK-29410][BUILD] Update commons-beanutils to 1.9.4 ### What changes were proposed in this pull request? This PR updates commons-beanutils to 1.9.4. ### Why are the changes needed? CVE fixed in 1.9.4: http://commons.apache.org/proper/commons-beanutils/javadocs/v1.9.4/RELEASE-NOTES.txt ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Existing UTs. Closes #26069 from peter-toth/SPARK-29410-update-commons-beanutils-to-1.9.4. Authored-by: Peter Toth <peter.toth@gmail.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2019-10-12 09:24:06 -05:00
Dongjoon Hyun	9a84fae216	[SPARK-29332][BUILD] Update zstd-jni to 1.4.3-1 ### What changes were proposed in this pull request? This PR aims to update zstd-jni library to 1.4.3-1. ### Why are the changes needed? This will bring the latest bug fixes in zstd itself. This is independent from another on-going Spark fix. - https://github.com/facebook/zstd/releases/tag/v1.4.3 ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Pass the Jenkins with the existing tests. Closes #26002 from dongjoon-hyun/SPARK-29332. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-10-02 11:37:02 -07:00
Sean Owen	2ec3265ae7	[MINOR][BUILD] Decode output of commands during merge script as UTF-8 consistently ### What changes were proposed in this pull request? In the PR merge script, decode the raw output of subprocess commands like `git` using UTF-8 encoding, consistently. ### Why are the changes needed? The merge script occasionally fails if run with Python 2 and the output of a command like `git` contains non-ASCII characters. I think this most usually happens when a user name, for example, contains Chinese characters. This is because the output is decoded according to `sys.getdefaultencoding()`, which is ASCII in Python 2. It's UTF-8 in Python 3, by default. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? The change caused a merge that failed before to succeed. Closes #25991 from srowen/MergePRUTF8. Authored-by: Sean Owen <sean.owen@databricks.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2019-10-02 11:28:55 +09:00
gengjiaan	1018390542	[SPARK-29252][BUILD] Upgrade zookeeper to 3.4.14 and fix vulnerabilities ### What changes were proposed in this pull request? The current code uses org.apache.zookeeper:zookeeper:jar:3.4.6 and it will cause a security vulnerabilities. We could get some security info from https://www.tenable.com/cve/CVE-2019-0201 This reference remind to upgrate the version of `zookeeper` to 3.4.14 or later. ### Why are the changes needed? This PR fix the security vulnerabilities. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Exists UT. Closes #25933 from beliefer/upgrade-zookeeper. Authored-by: gengjiaan <gengjiaan@360.cn> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2019-09-30 08:16:32 -05:00
Sean Owen	28b8383a6c	[SPARK-29289][BUILD] Update scalatest, scalacheck, scopt, clapper, scala-parser-combinators for 2.13 ### What changes were proposed in this pull request? Update scalatest, scalacheck, scopt, clapper, scala-parser-combinators to latest maintenance release that is also cross-published for Scala 2.13. ### Why are the changes needed? To build in the future for Scala 2.13 ### Does this PR introduce any user-facing change? No ### How was this patch tested? Existing tests Closes #25967 from srowen/SPARK-29289. Authored-by: Sean Owen <sean.owen@databricks.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2019-09-30 08:13:57 -05:00
gengjiaan	eef3abbb90	[SPARK-29226][BUILD] Upgrade jackson-databind to 2.9.10 and fix vulnerabilities ### What changes were proposed in this pull request? The current code uses com.fasterxml.jackson.core:jackson-databind:jar:2.9.9.3 and it will cause a security vulnerabilities. We could get some security info from https://www.tenable.com/cve/CVE-2019-16335 and https://www.tenable.com/cve/CVE-2019-14540 This reference remind to upgrate the version of `jackson-databind` to 2.9.10 or later. This PR also upgrade the version of jackson to 2.9.10. ### Why are the changes needed? This PR fix the security vulnerabilities. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Exists UT. Closes #25912 from beliefer/upgrade-jackson. Authored-by: gengjiaan <gengjiaan@360.cn> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-09-24 22:05:13 -07:00
HyukjinKwon	a838dbd2f9	[SPARK-27463][PYTHON][FOLLOW-UP] Run the tests of Cogrouped pandas UDF ### What changes were proposed in this pull request? This is a followup for https://github.com/apache/spark/pull/24981 Seems we mistakenly didn't added `test_pandas_udf_cogrouped_map` into `modules.py`. So we don't have official test results against that PR. ``` ... Starting test(python3.6): pyspark.sql.tests.test_pandas_udf ... Starting test(python3.6): pyspark.sql.tests.test_pandas_udf_grouped_agg ... Starting test(python3.6): pyspark.sql.tests.test_pandas_udf_grouped_map ... Starting test(python3.6): pyspark.sql.tests.test_pandas_udf_scalar ... Starting test(python3.6): pyspark.sql.tests.test_pandas_udf_window Finished test(python3.6): pyspark.sql.tests.test_pandas_udf (21s) ... Finished test(python3.6): pyspark.sql.tests.test_pandas_udf_grouped_map (49s) ... Finished test(python3.6): pyspark.sql.tests.test_pandas_udf_window (58s) ... Finished test(python3.6): pyspark.sql.tests.test_pandas_udf_scalar (82s) ... Finished test(python3.6): pyspark.sql.tests.test_pandas_udf_grouped_agg (105s) ... ``` If tests fail, we should revert that PR. ### Why are the changes needed? Relevant tests should be ran. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Jenkins tests. Closes #25890 from HyukjinKwon/SPARK-28840. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2019-09-22 21:39:30 +09:00
Sean Owen	a9ae262cf2	[SPARK-28772][BUILD][MLLIB] Update breeze to 1.0 ### What changes were proposed in this pull request? Update breeze dependency to 1.0. ### Why are the changes needed? Breeze 1.0 supports Scala 2.13 and has a few bug fixes. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Existing tests. Closes #25874 from srowen/SPARK-28772. Authored-by: Sean Owen <sean.owen@databricks.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-09-20 20:31:26 -07:00
Dongjoon Hyun	3bf43fb60d	[SPARK-29159][BUILD] Increase ReservedCodeCacheSize to 1G ### What changes were proposed in this pull request? This PR aims to increase the JVM CodeCacheSize from 0.5G to 1G. ### Why are the changes needed? After upgrading to `Scala 2.12.10`, the following is observed during building. ``` 2019-09-18T20:49:23.5030586Z OpenJDK 64-Bit Server VM warning: CodeCache is full. Compiler has been disabled. 2019-09-18T20:49:23.5032920Z OpenJDK 64-Bit Server VM warning: Try increasing the code cache size using -XX:ReservedCodeCacheSize= 2019-09-18T20:49:23.5034959Z CodeCache: size=524288Kb used=521399Kb max_used=521423Kb free=2888Kb 2019-09-18T20:49:23.5035472Z bounds [0x00007fa62c000000, 0x00007fa64c000000, 0x00007fa64c000000] 2019-09-18T20:49:23.5035781Z total_blobs=156549 nmethods=155863 adapters=592 2019-09-18T20:49:23.5036090Z compilation: disabled (not enough contiguous free space left) ``` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Manually check the Jenkins or GitHub Action build log (which should not have the above). Closes #25836 from dongjoon-hyun/SPARK-CODE-CACHE-1G. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-09-19 00:24:15 -07:00
Yuming Wang	8c3f27ceb4	[SPARK-28683][BUILD] Upgrade Scala to 2.12.10 ## What changes were proposed in this pull request? This PR upgrade Scala to 2.12.10. Release notes: - Fix regression in large string interpolations with non-String typed splices - Revert "Generate shallower ASTs in pattern translation" - Fix regression in classpath when JARs have 'a.b' entries beside 'a/b' - Faster compiler: 5–10% faster since 2.12.8 - Improved compatibility with JDK 11, 12, and 13 - Experimental support for build pipelining and outline type checking More details: https://github.com/scala/scala/releases/tag/v2.12.10 https://github.com/scala/scala/releases/tag/v2.12.9 ## How was this patch tested? Existing tests Closes #25404 from wangyum/SPARK-28683. Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-09-18 13:30:36 -07:00
Owen O'Malley	dfb0a8bb04	[SPARK-28208][BUILD][SQL] Upgrade to ORC 1.5.6 including closing the ORC readers ## What changes were proposed in this pull request? It upgrades ORC from 1.5.5 to 1.5.6 and adds closes the ORC readers when they aren't used to create RecordReaders. ## How was this patch tested? The changed unit tests were run. Closes #25006 from omalley/spark-28208. Lead-authored-by: Owen O'Malley <omalley@apache.org> Co-authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-09-18 09:32:43 -07:00
Kazuaki Ishizaki	8d1b5ba766	[SPARK-28906][BUILD] Fix incorrect information in bin/spark-submit --version ### What changes were proposed in this pull request? This PR allows `bin/spark-submit --version` to show the correct information while the previous versions, which were created by `dev/create-release/do-release-docker.sh`, show incorrect information. There are two root causes to show incorrect information: 1. Did not pass `USER` environment variable to the docker container 1. Did not keep `.git` directory in the work directory ### Why are the changes needed? The information is missing while the previous versions show the correct information. ### Does this PR introduce any user-facing change? Yes, the following is the console output in branch-2.3 ``` $ bin/spark-submit --version Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.3.4 /_/ Using Scala version 2.11.8, OpenJDK 64-Bit Server VM, 1.8.0_212 Branch HEAD Compiled by user ishizaki on 2019-09-02T02:18:10Z Revision `8c6f8150f3` Url https://gitbox.apache.org/repos/asf/spark.git Type --help for more information. ``` Without this PR, the console output is as follows ``` $ spark-submit --version Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.3.4 /_/ Using Scala version 2.11.8, OpenJDK 64-Bit Server VM, 1.8.0_212 Branch Compiled by user on 2019-08-26T08:29:39Z Revision Url Type --help for more information. ``` ### How was this patch tested? After building the package, I manually executed `bin/spark-submit --version` Closes #25655 from kiszk/SPARK-28906. Authored-by: Kazuaki Ishizaki <ishizaki@jp.ibm.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2019-09-11 08:12:44 -05:00
Sean Owen	6378d4bc06	[SPARK-28980][CORE][SQL][STREAMING][MLLIB] Remove most items deprecated in Spark 2.2.0 or earlier, for Spark 3 ### What changes were proposed in this pull request? - Remove SQLContext.createExternalTable and Catalog.createExternalTable, deprecated in favor of createTable since 2.2.0, plus tests of deprecated methods - Remove HiveContext, deprecated in 2.0.0, in favor of `SparkSession.builder.enableHiveSupport` - Remove deprecated KinesisUtils.createStream methods, plus tests of deprecated methods, deprecate in 2.2.0 - Remove deprecated MLlib (not Spark ML) linear method support, mostly utility constructors and 'train' methods, and associated docs. This includes methods in LinearRegression, LogisticRegression, Lasso, RidgeRegression. These have been deprecated since 2.0.0 - Remove deprecated Pyspark MLlib linear method support, including LogisticRegressionWithSGD, LinearRegressionWithSGD, LassoWithSGD - Remove 'runs' argument in KMeans.train() method, which has been a no-op since 2.0.0 - Remove deprecated ChiSqSelector isSorted protected method - Remove deprecated 'yarn-cluster' and 'yarn-client' master argument in favor of 'yarn' and deploy mode 'cluster', etc Notes: - I was not able to remove deprecated DataFrameReader.json(RDD) in favor of DataFrameReader.json(Dataset); the former was deprecated in 2.2.0, but, it is still needed to support Pyspark's .json() method, which can't use a Dataset. - Looks like SQLContext.createExternalTable was not actually deprecated in Pyspark, but, almost certainly was meant to be? Catalog.createExternalTable was. - I afterwards noted that the toDegrees, toRadians functions were almost removed fully in SPARK-25908, but Felix suggested keeping just the R version as they hadn't been technically deprecated. I'd like to revisit that. Do we really want the inconsistency? I'm not against reverting it again, but then that implies leaving SQLContext.createExternalTable just in Pyspark too, which seems weird. - I kept LogisticRegressionWithSGD, LinearRegressionWithSGD, LassoWithSGD, RidgeRegressionWithSGD in Pyspark, though deprecated, as it is hard to remove them (still used by StreamingLogisticRegressionWithSGD?) and they are not fully removed in Scala. Maybe should not have been deprecated. ### Why are the changes needed? Deprecated items are easiest to remove in a major release, so we should do so as much as possible for Spark 3. This does not target items deprecated 'recently' as of Spark 2.3, which is still 18 months old. ### Does this PR introduce any user-facing change? Yes, in that deprecated items are removed from some public APIs. ### How was this patch tested? Existing tests. Closes #25684 from srowen/SPARK-28980. Lead-authored-by: Sean Owen <sean.owen@databricks.com> Co-authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2019-09-09 10:19:40 -05:00
Nicholas Marion	6fb5ef108e	[SPARK-29011][BUILD] Update netty-all from 4.1.30-Final to 4.1.39-Final ### What changes were proposed in this pull request? Upgrade netty-all to latest in the 4.1.x line which is 4.1.39-Final. ### Why are the changes needed? Currency of dependencies. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Existing unit-tests against master branch. Closes #25712 from n-marion/master. Authored-by: Nicholas Marion <nmarion@us.ibm.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-09-06 17:48:53 -07:00
Thomas Graves	4c8f114783	[SPARK-27489][WEBUI] UI updates to show executor resource information ### What changes were proposed in this pull request? We are adding other resource type support to the executors and Spark. We should show the resource information for each executor on the UI Executors page. This also adds a toggle button to show the resources column. It is off by default. ![executorui1](https://user-images.githubusercontent.com/4563792/63891432-c815b580-c9aa-11e9-9f41-62975649efbc.png) ![Screenshot from 2019-08-28 14-56-26](https://user-images.githubusercontent.com/4563792/63891516-fd220800-c9aa-11e9-9fe4-89fcdca37306.png) ### Why are the changes needed? to show user what resources the executors have. Like Gpus, fpgas, etc ### Does this PR introduce any user-facing change? Yes introduces UI and rest api changes to show the resources ### How was this patch tested? Unit tests and manual UI tests on yarn and standalone modes. Closes #25613 from tgravescs/SPARK-27489-gpu-ui-latest. Authored-by: Thomas Graves <tgraves@nvidia.com> Signed-off-by: Gengliang Wang <gengliang.wang@databricks.com>	2019-09-04 09:45:44 +08:00
Xiao Li	2856398de9	[SPARK-28961][HOT-FIX][BUILD] Upgrade Maven from 3.6.1 to 3.6.2 ### What changes were proposed in this pull request? This PR is to upgrade the maven dependence from 3.6.1 to 3.6.2. ### Why are the changes needed? All the builds are broken because 3.6.1 is not available. http://ftp.wayne.edu/apache//maven/maven-3/ - https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-compile-maven-hadoop-3.2/485/ - https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-compile-maven-hadoop-2.7/10536/ ![image](https://user-images.githubusercontent.com/11567269/64196667-36d69100-ce39-11e9-8f93-40eb333d595d.png) ### Does this PR introduce any user-facing change? No ### How was this patch tested? N/A Closes #25665 from gatorsmile/upgradeMVN. Authored-by: Xiao Li <gatorsmile@gmail.com> Signed-off-by: Xiao Li <gatorsmile@gmail.com>	2019-09-03 11:06:57 -07:00
Andy Grove	35d4edffa2	[SPARK-28921][BUILD][K8S] Upgrade kubernetes client to 4.4.2 ### What changes were proposed in this pull request? Upgrade kubernetes client from 4.1.2 to 4.4.2 ### Why are the changes needed? To fix compatibility issue with EKS since Amazon rolled out some security patches over the past week; 1.15.3, 1.14.6, 1.13.10, 1.12.10, and 1.11.10. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Pass the Jenkins and manually test on EKS. Closes #25640 from andygrove/SPARK-28921. Authored-by: Andy Grove <andygrove73@gmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-09-02 16:50:58 -07:00
Dongjoon Hyun	560df0ea8e	[SPARK-28951][INFRA] Add release announce template ### What changes were proposed in this pull request? This PR adds a release announce template. ### Why are the changes needed? - We want to use a formal template including HTTPS in the future release. - The future release managers don't need to search mailing list to find this form. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? N/A. Closes #25656 from dongjoon-hyun/SPARK-28951. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-09-02 14:55:05 -07:00
shane knapp	84d4f94596	[SPARK-28701][INFRA][FOLLOWUP] Fix the key error when looking in os.environ ### What changes were proposed in this pull request? i broke run-tests.py for non-PRB builds in this PR: https://github.com/apache/spark/pull/25423 ### Why are the changes needed? to fix what i broke ### Does this PR introduce any user-facing change? no ### How was this patch tested? the build system will test this Closes #25585 from shaneknapp/fix-run-tests. Authored-by: shane knapp <incomplete@gmail.com> Signed-off-by: shane knapp <incomplete@gmail.com>	2019-08-26 12:40:31 -07:00
shane knapp	13fd32c9a9	[SPARK-28701][TEST-HADOOP3.2][TEST-JAVA11][K8S] adding java11 support for pull request builds ## What changes were proposed in this pull request? we need to add the ability to test PRBs against java11. see comments here: https://github.com/apache/spark/pull/25405 ## How was this patch tested? the build system will test this. Closes #25423 from shaneknapp/spark-prb-java11. Authored-by: shane knapp <incomplete@gmail.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2019-08-27 00:48:01 +09:00
Dongjoon Hyun	6214b6a541	[SPARK-28868][INFRA] Specify Jekyll version to 3.8.6 in release docker image ### What changes were proposed in this pull request? This PR aims to specify Jekyll Version explicitly in our release docker image. ### Why are the changes needed? Recently, Jekyll 4.0 is released and it dropped Ruby 2.3 support. This breaks our release docker image build. ``` Building native extensions. This could take a while... ERROR: Error installing jekyll: jekyll-sass-converter requires Ruby version >= 2.4.0. ``` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? The following should succeed. ``` $ docker build -t spark-rm:test --build-arg UID=501 dev/create-release/spark-rm ... Successfully tagged spark-rm:test ``` Closes #25578 from dongjoon-hyun/SPARK-28868. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-08-25 15:38:41 -07:00
Dongjoon Hyun	1fd7f290ab	[SPARK-28857][INFRA] Clean up the comments of PR template during merging ### What changes were proposed in this pull request? This PR aims to clean up the commit logs by removing the comments of our PR template. ### Why are the changes needed? Apache Spark PR template has comments. Sometime we forget to clean up them because GitHub hides them nicely. It would be great if we clean up this. Otherwise, this makes the commit logs too verbose. (There are a few commits already.) ### Does this PR introduce any user-facing change? No. (only for committers) ### How was this patch tested? Manually with Python2/Python3. Closes #25564 from dongjoon-hyun/SPARK-28857. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2019-08-23 18:08:10 +09:00
Sean Owen	9ea37b09cf	[SPARK-17875][CORE][BUILD] Remove dependency on Netty 3 ### What changes were proposed in this pull request? Spark uses Netty 4 directly, but also includes Netty 3 only because transitive dependencies do. The dependencies (Hadoop HDFS, Zookeeper, Avro) don't seem to need this dependency as used in Spark. I think we can forcibly remove it to slim down the dependencies. Previous attempts were blocked by its usage in Flume, but that dependency has gone away. https://github.com/apache/spark/pull/15436 ### Why are the changes needed? Mostly to reduce the transitive dependency size and complexity a little bit and avoid triggering spurious security alerts on Netty 3.x usage. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Existing tests Closes #25544 from srowen/SPARK-17875. Authored-by: Sean Owen <sean.owen@databricks.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-08-21 21:27:56 -07:00
Sean Owen	c9b49f3978	[SPARK-28737][CORE] Update Jersey to 2.29 ## What changes were proposed in this pull request? Update Jersey to 2.27+, ideally 2.29, for possible JDK 11 fixes. ## How was this patch tested? Existing tests. Closes #25455 from srowen/SPARK-28737. Authored-by: Sean Owen <sean.owen@databricks.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-08-16 15:08:04 -07:00
Dongjoon Hyun	43101c7328	[SPARK-28758][BUILD][SQL] Upgrade Janino to 3.0.15 ### What changes were proposed in this pull request? This PR aims to upgrade `Janino` from `3.0.13` to `3.0.15` in order to bring the bug fixes. Please note that `3.1.0` is a major refactoring instead of bug fixes. We had better use `3.0.15` and wait for the stabler 3.1.x. ### Why are the changes needed? This brings the following bug fixes. 3.0.15 (2019-07-28) - Fix overloaded single static method import 3.0.14 (2019-07-05) - Conflict in sbt-assembly - Overloaded static on-demand imported methods cause a CompileException: Ambiguous static method import - Handle overloaded static on-demand imports - Major refactoring of the Java 8 and Java 9 retrofit mechanism - Added tests for "JLS8 8.6 Instance Initializers" and "JLS8 8.7 Static Initializers" - Local variables in instance initializers don't work - Provide an option to keep generated code files - Added compile error handler and warning handler to ICompiler ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Pass the Jenkins with the existing tests. Closes #25474 from dongjoon-hyun/SPARK-28758. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-08-16 11:33:02 -07:00
Fokko Driesprong	babdba0f9e	[SPARK-28728][BUILD] Bump Jackson Databind to 2.9.9.3 ## What changes were proposed in this pull request? Update Jackson databind to the latest version for some latest changes. ## How was this patch tested? Pass the Jenkins. Closes #25451 from Fokko/fd-bump-jackson-databind. Lead-authored-by: Fokko Driesprong <fokko@apache.org> Co-authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-08-16 03:40:41 -07:00
Dongjoon Hyun	f1d6b19de5	[SPARK-28720][BUILD][R] Update AppVeyor R version to 3.6.1 ## What changes were proposed in this pull request? R version 3.6.1 (Action of the Toes) was released on 2019-07-05. This PR aims to upgrade R installation for AppVeyor CI environment. ## How was this patch tested? Pass the AppVeyor CI. Closes #25441 from dongjoon-hyun/SPARK-28720. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: DB Tsai <d_tsai@apple.com>	2019-08-13 22:56:53 +00:00
WeichenXu	f21bc1874a	[SPARK-27889][INFRA] Make development scripts under dev/ support Python 3 ## What changes were proposed in this pull request? I made an audit and update all dev scripts to support python3. (except `merge_spark_pr.py` which already updated) ## How was this patch tested? Manual. Closes #25289 from WeichenXu123/dev_py3. Authored-by: WeichenXu <weichen.xu@databricks.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2019-08-09 18:55:48 +09:00

1 2 3 4 5 ...

763 commits