ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Wenchen Fan	8b119f1663	[SPARK-32640][SQL] Downgrade Janino to fix a correctness bug ### What changes were proposed in this pull request? This PR reverts https://github.com/apache/spark/pull/27860 to downgrade Janino, as the new version has a bug. ### Why are the changes needed? The symptom is about NaN comparison. For code below ``` if (double_value <= 0.0) { ... } else { ... } ``` If `double_value` is NaN, `NaN <= 0.0` is false and we should go to the else branch. However, current Spark goes to the if branch and causes correctness issues like SPARK-32640. One way to fix it is: ``` boolean cond = double_value <= 0.0; if (cond) { ... } else { ... } ``` I'm not familiar with Janino so I don't know what's going on there. ### Does this PR introduce _any_ user-facing change? Yes, fix correctness bugs. ### How was this patch tested? a new test Closes #29495 from cloud-fan/revert. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-08-20 13:26:39 -07:00
Kousuke Saruta	9a79bbc8b6	[SPARK-32610][DOCS] Fix the link to metrics.dropwizard.io in monitoring.md to refer the proper version ### What changes were proposed in this pull request? This PR fixes the link to metrics.dropwizard.io in monitoring.md to refer the proper version of the library. ### Why are the changes needed? There are links to metrics.dropwizard.io in monitoring.md but the link targets refer the version 3.1.0, while we use 4.1.1. Now that users can create their own metrics using the dropwizard library, it's better to fix the links to refer the proper version. ### Does this PR introduce _any_ user-facing change? Yes. The modified links refer the version 4.1.1. ### How was this patch tested? Build the docs and visit all the modified links. Closes #29426 from sarutak/fix-dropwizard-url. Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com> Signed-off-by: Sean Owen <srowen@gmail.com>	2020-08-16 12:07:37 -05:00
Dongjoon Hyun	eb74d55fb5	[SPARK-32568][BUILD][SS] Upgrade Kafka to 2.6.0 ### What changes were proposed in this pull request? This PR aims to update Kafka client library to 2.6.0 for Apache Spark 3.1.0. ### Why are the changes needed? This will bring client-side bug fixes like KAFKA-10134 and KAFKA-10223. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the existing tests. Closes #29386 from dongjoon-hyun/SPARK-32568. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-08-08 10:31:36 +09:00
yangjie01	0693d8bbf2	[SPARK-32490][BUILD] Upgrade netty-all to 4.1.51.Final ### What changes were proposed in this pull request? This PR aims to bring the bug fixes from the latest netty version. ### Why are the changes needed? - 4.1.48.Final: [https://github.com/netty/netty/milestone/223?closed=1](https://github.com/netty/netty/milestone/223?closed=1)(14 patches or issues) - 4.1.49.Final: [https://github.com/netty/netty/milestone/224?closed=1](https://github.com/netty/netty/milestone/224?closed=1)(48 patches or issues) - 4.1.50.Final: [https://github.com/netty/netty/milestone/225?closed=1](https://github.com/netty/netty/milestone/225?closed=1)(38 patches or issues) - 4.1.51.Final: [https://github.com/netty/netty/milestone/226?closed=1](https://github.com/netty/netty/milestone/226?closed=1)(53 patches or issues) ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass the Jenkins with the existing tests. Closes #29299 from LuciferYang/upgrade-netty-version. Authored-by: yangjie01 <yangjie01@baidu.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-08-02 16:46:11 -07:00
Holden Karau	50911df08e	[SPARK-32397][BUILD] Allow specifying of time for build to keep time consistent between modules ### What changes were proposed in this pull request? Upgrade codehaus maven build helper to allow people to specify a time during the build to avoid snapshot artifacts with different version strings. ### Why are the changes needed? During builds of snapshots the maven may assign different versions to different artifacts based on the time each individual sub-module starts building. The timestamp is used as part of the version string when run `maven deploy` on a snapshot build. This results in different sub-modules having different version strings. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Manual build while specifying the current time, ensured the time is consistent in the sub components. Open question: Ideally I'd like to backport this as well since it's sort of a bug fix and while it does change a dependency version it's not one that is propagated. I'd like to hear folks thoughts about this. Closes #29274 from holdenk/SPARK-32397-snapshot-artifact-timestamp-differences. Authored-by: Holden Karau <hkarau@apple.com> Signed-off-by: DB Tsai <d_tsai@apple.com>	2020-07-29 21:39:14 +00:00
Dongjoon Hyun	13c64c2980	[SPARK-32448][K8S][TESTS] Use single version for exec-maven-plugin/scalatest-maven-plugin ### What changes were proposed in this pull request? Two different versions are used for the same artifacts, `exec-maven-plugin` and `scalatest-maven-plugin`. This PR aims to use the same versions for `exec-maven-plugin` and `scalatest-maven-plugin`. In addition, this PR removes `scala-maven-plugin.version` from `K8s` integration suite because it's unused. ### Why are the changes needed? This will prevent the mistake which upgrades only one place and forgets the others. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the Jenkins K8S IT. Closes #29248 from dongjoon-hyun/SPARK-32448. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-07-26 19:25:41 -07:00
Dongjoon Hyun	83ffef7ffb	[SPARK-32441][BUILD][CORE] Update json4s to 3.7.0-M5 for Scala 2.13 ### What changes were proposed in this pull request? This PR aims to upgrade `json4s` to from 3.6.6 to 3.7.0-M5 for Scala 2.13 support at Apache Spark 3.1.0 on December. We will upgrade to the latest `json4s` around November. ### Why are the changes needed? `json4s` starts to support Scala 2.13 since v3.7.0-M4. - https://github.com/json4s/json4s/issues/660 - `b013af8e75` Old `json4s` causes many UT failures with `NoSuchMethodException`. ```scala Cause: java.lang.NoSuchMethodException: scala.collection.immutable.Seq$.apply(scala.collection.Seq) at java.lang.Class.getMethod(Class.java:1786) ``` The following is one example. ```scala $ dev/change-scala-version.sh 2.13 $ build/mvn test -pl core --am -Pscala-2.13 -Dtest=none -DwildcardSuites=org.apache.spark.executor.CoarseGrainedExecutorBackendSuite ... Tests: succeeded 4, failed 9, canceled 0, ignored 0, pending 0 * 9 TESTS FAILED * ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? 1. Scala 2.12: Pass the Jenkins or GitHub Action with the existing tests. 2. Scala 2.13: Do the following manually at least. ```scala $ dev/change-scala-version.sh 2.13 $ build/mvn test -pl core --am -Pscala-2.13 -Dtest=none -DwildcardSuites=org.apache.spark.executor.CoarseGrainedExecutorBackendSuite ... Tests: succeeded 13, failed 0, canceled 0, ignored 0, pending 0 All tests passed. ``` Closes #29239 from dongjoon-hyun/SPARK-32441. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-07-25 20:34:31 -07:00
Dongjoon Hyun	f642234d85	[SPARK-32437][CORE] Improve MapStatus deserialization speed with RoaringBitmap 0.9.0 ### What changes were proposed in this pull request? This PR aims to speed up `MapStatus` deserialization by 5~18% with the latest RoaringBitmap `0.9.0` and new APIs. Note that we focus on `deserialization` time because `serialization` occurs once while `deserialization` occurs many times. ### Why are the changes needed? The current version is too old. We had better upgrade it to get the performance improvement and bug fixes. Although `MapStatusesSerDeserBenchmark` is synthetic, the benchmark result is updated with this patch. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the Jenkins or GitHub Action. Closes #29233 from dongjoon-hyun/SPARK-ROAR. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-07-25 08:07:28 -07:00
Sean Owen	be2eca22e9	[SPARK-32398][TESTS][CORE][STREAMING][SQL][ML] Update to scalatest 3.2.0 for Scala 2.13.3+ ### What changes were proposed in this pull request? Updates to scalatest 3.2.0. Though it looks large, it is 99% changes to the new location of scalatest classes. ### Why are the changes needed? 3.2.0+ has a fix that is required for Scala 2.13.3+ compatibility. ### Does this PR introduce _any_ user-facing change? No, only affects tests. ### How was this patch tested? Existing tests. Closes #29196 from srowen/SPARK-32398. Authored-by: Sean Owen <srowen@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-07-23 16:20:17 -07:00
yangjie01	5e0cb3ee16	[SPARK-32305][BUILD] Make `mvn clean` remove `metastore_db` and `spark-warehouse` ### What changes were proposed in this pull request? Add additional configuration to `maven-clean-plugin` to ensure cleanup `metastore_db` and `spark-warehouse` directory when execute `mvn clean` command. ### Why are the changes needed? Now Spark support two version of build-in hive and there are some test generated meta data not in target dir like `metastore_db`, they don't clean up automatically when we run `mvn clean` command. So if we run `mvn clean test -pl sql/hive -am -Phadoop-2.7 -Phive -Phive-1.2 ` , the `metastore_db` dir will created and meta data will remains after test complete. Then we need manual cleanup `metastore_db` directory to ensure `mvn clean test -pl sql/hive -am -Phadoop-2.7 -Phive` command use hive2.3 profile can succeed because the residual metastore data is not compatible. `spark-warehouse` will also cause test failure in some data residual scenarios because test case thinks that meta data should not exist. This pr is used to simplify manual cleanup `metastore_db` and `spark-warehouse` directory operation. ### How was this patch tested? Manual execute `mvn clean test -pl sql/hive -am -Phadoop-2.7 -Phive -Phive-1.2`, then execute `mvn clean test -pl sql/hive -am -Phadoop-2.7 -Phive`, both commands should succeed. Closes #29103 from LuciferYang/add-clean-directory. Authored-by: yangjie01 <yangjie01@baidu.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-07-14 12:40:47 -07:00
Sean Owen	3ad4863673	[SPARK-29292][SPARK-30010][CORE] Let core compile for Scala 2.13 ### What changes were proposed in this pull request? The purpose of this PR is to partly resolve SPARK-29292, and fully resolve SPARK-30010, which should allow Spark to compile vs Scala 2.13 in Spark Core and up through GraphX (not SQL, Streaming, etc). Note that we are not trying to determine here whether this makes Spark work on 2.13 yet, just compile, as a prerequisite for assessing test outcomes. However, of course, we need to ensure that the change does not break 2.12. The changes are, in the main, adding .toSeq and .toMap calls where mutable collections / maps are returned as Seq / Map, which are immutable by default in Scala 2.13. The theory is that it should be a no-op for Scala 2.12 (these return themselves), and required for 2.13. There are a few non-trivial changes highlighted below. In particular, to get Core to compile, we need to resolve SPARK-30010 which removes a deprecated SparkConf method ### Why are the changes needed? Eventually, we need to support a Scala 2.13 build, perhaps in Spark 3.1. ### Does this PR introduce _any_ user-facing change? Yes, removal of the deprecated SparkConf.setAll overload, which isn't legal in Scala 2.13 anymore. ### How was this patch tested? Existing tests. (2.13 was not _tested_; this is about getting it to compile without breaking 2.12) Closes #28971 from srowen/SPARK-29292.1. Authored-by: Sean Owen <srowen@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-07-11 14:34:02 -07:00
William Hyun	10a65ee9b4	[SPARK-32150][BUILD] Upgrade to ZStd 1.4.5-4 ### What changes were proposed in this pull request? This PR aims to upgrade to ZStd 1.4.5-4. ### Why are the changes needed? ZStd 1.4.5-4 fixes the following. - `3d16e51525` - `3d51bdcb82` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the Jenkins. Closes #28969 from williamhyun/zstd2. Authored-by: William Hyun <williamhyun3@gmail.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-07-12 00:45:48 +09:00
Dongjoon Hyun	9c134b57bf	[SPARK-32058][BUILD] Use Apache Hadoop 3.2.0 dependency by default ### What changes were proposed in this pull request? According to the dev mailing list discussion, this PR aims to switch the default Apache Hadoop dependency from 2.7.4 to 3.2.0 for Apache Spark 3.1.0 on December 2020. \| Item \| Default Hadoop Dependency \| \|------\|-----------------------------\| \| Apache Spark Website \| 3.2.0 \| \| Apache Download Site \| 3.2.0 \| \| Apache Snapshot \| 3.2.0 \| \| Maven Central \| 3.2.0 \| \| PyPI \| 2.7.4 (We will switch later) \| \| CRAN \| 2.7.4 (We will switch later) \| \| Homebrew \| 3.2.0 (already) \| In Apache Spark 3.0.0 release, we focused on the other features. This PR targets for [Apache Spark 3.1.0 scheduled on December 2020](https://spark.apache.org/versioning-policy.html). ### Why are the changes needed? Apache Hadoop 3.2 has many fixes and new cloud-friendly features. Reference - 2017-08-04: https://hadoop.apache.org/release/2.7.4.html - 2019-01-16: https://hadoop.apache.org/release/3.2.0.html ### Does this PR introduce _any_ user-facing change? Since the default Hadoop dependency changes, the users will get a better support in a cloud environment. ### How was this patch tested? Pass the Jenkins. Closes #28897 from dongjoon-hyun/SPARK-32058. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-06-26 19:43:29 -07:00
Gabor Somogyi	2dbfae8775	[SPARK-32049][SQL][TESTS] Upgrade Oracle JDBC Driver 8 ### What changes were proposed in this pull request? `OracleIntegrationSuite` is not using the latest oracle JDBC driver. In this PR I've upgraded the driver to the latest which supports JDK8, JDK9, and JDK11. ### Why are the changes needed? Old JDBC driver. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing unit tests. Existing integration tests (especially `OracleIntegrationSuite`) Closes #28893 from gaborgsomogyi/SPARK-32049. Authored-by: Gabor Somogyi <gabor.g.somogyi@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-06-23 03:58:40 -07:00
William Hyun	2bcbe3dd9a	[SPARK-32045][BUILD] Upgrade to Apache Commons Lang 3.10 ### What changes were proposed in this pull request? This PR aims to upgrade to Apache Commons Lang 3.10. ### Why are the changes needed? This will bring the latest bug fixes like [LANG-1453](https://issues.apache.org/jira/browse/LANG-1453). https://commons.apache.org/proper/commons-lang/release-notes/RELEASE-NOTES-3.10.txt ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the Jenkins. Closes #28889 from williamhyun/commons-lang-3.10. Authored-by: William Hyun <williamhyun3@gmail.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-06-23 16:59:55 +09:00
Gabor Somogyi	eeb81200e2	[SPARK-31337][SQL] Support MS SQL Kerberos login in JDBC connector ### What changes were proposed in this pull request? When loading DataFrames from JDBC datasource with Kerberos authentication, remote executors (yarn-client/cluster etc. modes) fail to establish a connection due to lack of Kerberos ticket or ability to generate it. This is a real issue when trying to ingest data from kerberized data sources (SQL Server, Oracle) in enterprise environment where exposing simple authentication access is not an option due to IT policy issues. In this PR I've added MS SQL support. What this PR contains: * Added `MSSQLConnectionProvider` * Added `MSSQLConnectionProviderSuite` * Changed MS SQL JDBC driver to use the latest (test scope only) * Changed `MsSqlServerIntegrationSuite` docker image to use the latest * Added a version comment to `MariaDBConnectionProvider` to increase trackability ### Why are the changes needed? Missing JDBC kerberos support. ### Does this PR introduce _any_ user-facing change? Yes, now user is able to connect to MS SQL using kerberos. ### How was this patch tested? * Additional + existing unit tests * Existing integration tests * Test on cluster manually Closes #28635 from gaborgsomogyi/SPARK-31337. Authored-by: Gabor Somogyi <gabor.g.somogyi@gmail.com> Signed-off-by: Marcelo Vanzin <vanzin@apache.org>	2020-06-16 18:22:12 -07:00
Kousuke Saruta	88a4e55fae	[SPARK-31765][WEBUI][TEST-MAVEN] Upgrade HtmlUnit >= 2.37.0 ### What changes were proposed in this pull request? This PR upgrades HtmlUnit. Selenium and Jetty also upgraded because of dependency. ### Why are the changes needed? Recently, a security issue which affects HtmlUnit is reported. https://nvd.nist.gov/vuln/detail/CVE-2020-5529 According to the report, arbitrary code can be run by malicious users. HtmlUnit is used for test so the impact might not be large but it's better to upgrade it just in case. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing testcases. Closes #28585 from sarutak/upgrade-htmlunit. Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com> Signed-off-by: Sean Owen <srowen@gmail.com>	2020-06-11 18:27:53 -05:00
HyukjinKwon	baafd4386c	Revert "[SPARK-31765][WEBUI] Upgrade HtmlUnit >= 2.37.0" This reverts commit `e5c3463910`.	2020-06-03 14:15:30 +09:00
William Hyun	367d94a30d	[SPARK-31876][BUILD] Upgrade to Zstd 1.4.5 ### What changes were proposed in this pull request? This PR aims to upgrade to Zstd 1.4.5. ### Why are the changes needed? Zstd 1.4.5 improves performance. https://github.com/facebook/zstd/releases/tag/v1.4.5 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Passed the Jenkins. Closes #28682 from williamhyun/zstd. Authored-by: William Hyun <williamhyun3@gmail.com> Signed-off-by: DB Tsai <d_tsai@apple.com>	2020-06-02 21:58:33 +00:00
Kousuke Saruta	e5c3463910	[SPARK-31765][WEBUI] Upgrade HtmlUnit >= 2.37.0 ### What changes were proposed in this pull request? This PR upgrades HtmlUnit. Selenium and Jetty also upgraded because of dependency. ### Why are the changes needed? Recently, a security issue which affects HtmlUnit is reported. https://nvd.nist.gov/vuln/detail/CVE-2020-5529 According to the report, arbitrary code can be run by malicious users. HtmlUnit is used for test so the impact might not be large but it's better to upgrade it just in case. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing testcases. Closes #28585 from sarutak/upgrade-htmlunit. Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com> Signed-off-by: Sean Owen <srowen@gmail.com>	2020-06-02 08:29:07 -05:00
Kousuke Saruta	d3eba5bc8c	[SPARK-31756][WEBUI] Add real headless browser support for UI test ### What changes were proposed in this pull request? This PR mainly adds two things. 1. Real headless browser support for UI test 2. A test suite using headless Chrome as one instance of those browsers. Also, for environment where Chrome and Chrome driver is not installed, `ChromeUITest` tag is added to filter out the test suite. By default, test suites with `ChromeUITest` is disabled. ### Why are the changes needed? In the current master, there are two problems for UI test. 1. Lots of tests especially JavaScript related ones are done manually. Appearance is better to be confirmed by our eyes but logic should be tested by test cases ideally. 2. Compared to the real web browsers, HtmlUnit doesn't seem to support JavaScript enough. I added a JavaScript related test before for SPARK-31534 using HtmlUnit which is simple library based headless browser for test. The test I added works somehow but some JavaScript related error is shown in unit-tests.log. ``` ======= EXCEPTION START ======== Exception class=[net.sourceforge.htmlunit.corejs.javascript.JavaScriptException] com.gargoylesoftware.htmlunit.ScriptException: Error: TOOLTIP: Option "sanitizeFn" provided type "window" but expected type "(null\|function)". (http://192.168.1.209:60724/static/jquery-3.4.1.min.js#2) at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:904) at net.sourceforge.htmlunit.corejs.javascript.Context.call(Context.java:628) at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.call(ContextFactory.java:515) at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.callFunction(JavaScriptEngine.java:835) at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.callFunction(JavaScriptEngine.java:807) at com.gargoylesoftware.htmlunit.InteractivePage.executeJavaScriptFunctionIfPossible(InteractivePage.java:216) at com.gargoylesoftware.htmlunit.javascript.background.JavaScriptFunctionJob.runJavaScript(JavaScriptFunctionJob.java:52) at com.gargoylesoftware.htmlunit.javascript.background.JavaScriptExecutionJob.run(JavaScriptExecutionJob.java:102) at com.gargoylesoftware.htmlunit.javascript.background.JavaScriptJobManagerImpl.runSingleJob(JavaScriptJobManagerImpl.java:426) at com.gargoylesoftware.htmlunit.javascript.background.DefaultJavaScriptExecutor.run(DefaultJavaScriptExecutor.java:157) at java.lang.Thread.run(Thread.java:748) Caused by: net.sourceforge.htmlunit.corejs.javascript.JavaScriptException: Error: TOOLTIP: Option "sanitizeFn" provided type "window" but expected type "(null\|function)". (http://192.168.1.209:60724/static/jquery-3.4.1.min.js#2) at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpretLoop(Interpreter.java:1009) at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpret(Interpreter.java:800) at net.sourceforge.htmlunit.corejs.javascript.InterpretedFunction.call(InterpretedFunction.java:105) at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.doTopCall(ContextFactory.java:413) at com.gargoylesoftware.htmlunit.javascript.HtmlUnitContextFactory.doTopCall(HtmlUnitContextFactory.java:252) at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.doTopCall(ScriptRuntime.java:3264) at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$4.doRun(JavaScriptEngine.java:828) at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:889) ... 10 more JavaScriptException value = Error: TOOLTIP: Option "sanitizeFn" provided type "window" but expected type "(null\|function)". == CALLING JAVASCRIPT == function () { throw e; } ======= EXCEPTION END ======== ``` I tried to upgrade HtmlUnit to 2.40.0 but what is worse, the test become not working even though it works on real browsers like Chrome, Safari and Firefox without error. ``` [info] UISeleniumSuite: [info] - SPARK-31534: text for tooltip should be escaped * FAILED * (17 seconds, 745 milliseconds) [info] The code passed to eventually never returned normally. Attempted 2 times over 12.910785232 seconds. Last failure message: com.gargoylesoftware.htmlunit.ScriptException: ReferenceError: Assignment to undefined "regeneratorRuntime" in strict mode (http://192.168.1.209:62132/static/vis-timeline-graph2d.min.js#52(Function)#1) ``` To resolve those problems, it's better to support headless browser for UI test. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? I tested with following patterns. Both Chrome and Chrome driver should be installed to test. 1. sbt / with default excluded tags (ChromeUISeleniumSuite is expected to be skipped and SQLQueryTestSuite is expected to succeed) `build/sbt -Dspark.test.webdriver.chrome.driver=/path/to/chromedriver "testOnly org.apache.spark.ui.ChromeUISeleniumSuite org.apache.spark.sql.SQLQueryTestSuite" 2. sbt / overwrite default excluded tags as empty string (Both suites are expected to succeed) `build/sbt -Dtest.default.exclude.tags= -Dspark.test.webdriver.chrome.driver=/path/to/chromedriver "testOnly org.apache.spark.ui.ChromeUISeleniumSuite org.apache.spark.sql.SQLQueryTestSuite" 3. sbt / set `test.exclude.tags` to `org.apache.spark.tags.ExtendedSQLTest` (Both suites are expected to be skipped) `build/sbt -Dtest.exclude.tags=org.apache.spark.tags.ExtendedSQLTest -Dspark.test.webdriver.chrome.driver=/path/to/chromedriver "testOnly org.apache.spark.ui.ChromeUISeleniumSuite org.apache.spark.sql.SQLQueryTestSuite" 4. Maven / with default excluded tags (ChromeUISeleniumSuite is expected to be skipped and SQLQueryTestSuite is expected to succeed) `build/mvn -Dspark.test.webdriver.chrome.driver=/path/to/chromedriver -Dtest=none -DwildcardSuites=org.apache.spark.ui.ChromeUISeleniumSuite,org.apache.spark.sql.SQLQueryTestSuite test` 5. Maven / overwrite default excluded tags as empty string (Both suites are expected to succeed) `build/mvn -Dtest.default.exclude.tags= -Dspark.test.webdriver.chrome.driver=/path/to/chromedriver -Dtest=none -DwildcardSuites=org.apache.spark.ui.ChromeUISeleniumSuite,org.apache.spark.sql.SQLQueryTestSuite test` 6. Maven / set `test.exclude.tags` to `org.apache.spark.tags.ExtendedSQLTest` (Both suites are expected to be skipped) `build/mvn -Dtest.exclude.tags=org.apache.spark.tags.ExtendedSQLTest -Dspark.test.webdriver.chrome.driver=/path/to/chromedriver -Dtest=none -DwildcardSuites=org.apache.spark.ui.ChromeUISeleniumSuite,org.apache.spark.sql.SQLQueryTestSuite test` Closes #28627 from sarutak/real-headless-browser-support-take2. Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-05-29 10:41:29 -07:00
Dongjoon Hyun	625abca9db	[SPARK-31858][BUILD] Upgrade commons-io to 2.5 in Hadoop 3.2 profile ### What changes were proposed in this pull request? This PR aims to upgrade `commons-io` from 2.4 to 2.5 for Apache Spark 3.1. ### Why are the changes needed? Since Hadoop 3.1, `commons-io` 2.5 is used. - https://issues.apache.org/jira/browse/HADOOP-15261 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the Jenkins with Hadoop-3.2 profile. Maven dependency is verified via `test-dependencies.sh` automatically. SBT dependency can be verified like the following manually. ``` build/sbt -Phadoop-3.2 "core/dependencyTree" \| grep commons-io:commons-io \| head -n1 [info] \| \| +-commons-io:commons-io:2.5 ``` Closes #28665 from dongjoon-hyun/SPARK-31858. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-05-29 07:46:53 -07:00
Jungtaek Lim (HeartSaVioR)	fe1d1e24bc	[SPARK-31214][BUILD] Upgrade Janino to 3.1.2 ### What changes were proposed in this pull request? This PR proposes to upgrade Janino to 3.1.2 which is released recently. Major changes were done for refactoring, as well as there're lots of "no commit message". Belows are the pairs of (commit title, commit) which seem to deal with some bugs or specific improvements (not purposed to refactor) after 3.0.15. * Issue #119: Guarantee executing popOperand() in popUninitializedVariableOperand() via moving popOperand() out of "assert" * Issue #116: replace operand to final target type if boxing conversion and widening reference conversion happen together * Merged pull request `#114` "Grow the code for relocatables, and do fixup, and relocate". * `367c58e73e` * issue `#107`: Janino requires "org.codehaus.commons.compiler.io", but commons-compiler does not export this package * `f7d99596d4` * Throw an NYI CompileException when a static interface method is invoked. * `efd3884983` * Fixed the promotion of the array access index expression (see JLS7 15.13 Array Access Expressions) * `32fdb5f5f1` * Issue `#104`: ClassLoaderIClassLoader 's ClassNotFoundException handle mechanism enhancement * `6e8a97d609` You can see the changelog from the link: http://janino-compiler.github.io/janino/changelog.html ### Why are the changes needed? We got some report on failure on user's query which Janino throws error on compiling generated code. The issue is here: https://github.com/janino-compiler/janino/issues/113 It contains the information of generated code, symptom (error), and analysis of the bug, so please refer the link for more details. Janino 3.1.1 contains the PR https://github.com/janino-compiler/janino/pull/114 which would enable Janino to succeed to compile user's query properly. I've also fixed a couple of more bugs as 3.1.1 made Spark UTs fail - hence we need to upgrade to 3.1.2. Furthermore, from my testing, https://github.com/janino-compiler/janino/issues/90 (which Josh Rosen filed before) seems to be also resolved in 3.1.2 as well. Looks like Janino is maintained by one person and there's no even version branches and releases/tags so we can't expect Janino maintainer to release a new bugfix version - hence have to try out new minor version. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Existing UTs. Closes #27860 from HeartSaVioR/SPARK-31101. Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-05-29 07:42:57 -07:00
Gabor Somogyi	8f1d77488c	[SPARK-31821][BUILD] Remove mssql-jdbc dependencies from Hadoop 3.2 profile ### What changes were proposed in this pull request? There is an unnecessary dependency for `mssql-jdbc`. In this PR I've removed it. ### Why are the changes needed? Unnecessary dependency. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the Jenkins with the following configuration. - [x] Pass the dependency test. - [x] SBT with Hadoop-3.2 (https://github.com/apache/spark/pull/28640#issuecomment-634192512) - [ ] Maven with Hadoop-3.2 Closes #28640 from gaborgsomogyi/SPARK-31821. Authored-by: Gabor Somogyi <gabor.g.somogyi@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-05-26 18:21:22 -07:00
Kousuke Saruta	8441e936fc	Revert "[SPARK-31756][WEBUI] Add real headless browser support for UI test This reverts commit `d95570864a`. Closes #28624 from sarutak/revert-real-headless-browser-support. Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com> Signed-off-by: Kousuke Saruta <sarutak@oss.nttdata.com>	2020-05-24 09:13:38 +09:00
Kousuke Saruta	d95570864a	[SPARK-31756][WEBUI] Add real headless browser support for UI test ### What changes were proposed in this pull request? This PR mainly adds two things. 1. Real headless browser support for UI test 2. A test suite using headless Chrome as one instance of those browsers. Also, for environment where Chrome and Chrome driver is not installed, `ChromeUITest` tag is added to filter out the test suite. ### Why are the changes needed? In the current master, there are two problems for UI test. 1. Lots of tests especially JavaScript related ones are done manually. Appearance is better to be confirmed by our eyes but logic should be tested by test cases ideally. 2. Compared to the real web browsers, HtmlUnit doesn't seem to support JavaScript enough. I added a JavaScript related test before for SPARK-31534 using HtmlUnit which is simple library based headless browser for test. The test I added works somehow but some JavaScript related error is shown in unit-tests.log. ``` ======= EXCEPTION START ======== Exception class=[net.sourceforge.htmlunit.corejs.javascript.JavaScriptException] com.gargoylesoftware.htmlunit.ScriptException: Error: TOOLTIP: Option "sanitizeFn" provided type "window" but expected type "(null\|function)". (http://192.168.1.209:60724/static/jquery-3.4.1.min.js#2) at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:904) at net.sourceforge.htmlunit.corejs.javascript.Context.call(Context.java:628) at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.call(ContextFactory.java:515) at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.callFunction(JavaScriptEngine.java:835) at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.callFunction(JavaScriptEngine.java:807) at com.gargoylesoftware.htmlunit.InteractivePage.executeJavaScriptFunctionIfPossible(InteractivePage.java:216) at com.gargoylesoftware.htmlunit.javascript.background.JavaScriptFunctionJob.runJavaScript(JavaScriptFunctionJob.java:52) at com.gargoylesoftware.htmlunit.javascript.background.JavaScriptExecutionJob.run(JavaScriptExecutionJob.java:102) at com.gargoylesoftware.htmlunit.javascript.background.JavaScriptJobManagerImpl.runSingleJob(JavaScriptJobManagerImpl.java:426) at com.gargoylesoftware.htmlunit.javascript.background.DefaultJavaScriptExecutor.run(DefaultJavaScriptExecutor.java:157) at java.lang.Thread.run(Thread.java:748) Caused by: net.sourceforge.htmlunit.corejs.javascript.JavaScriptException: Error: TOOLTIP: Option "sanitizeFn" provided type "window" but expected type "(null\|function)". (http://192.168.1.209:60724/static/jquery-3.4.1.min.js#2) at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpretLoop(Interpreter.java:1009) at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpret(Interpreter.java:800) at net.sourceforge.htmlunit.corejs.javascript.InterpretedFunction.call(InterpretedFunction.java:105) at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.doTopCall(ContextFactory.java:413) at com.gargoylesoftware.htmlunit.javascript.HtmlUnitContextFactory.doTopCall(HtmlUnitContextFactory.java:252) at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.doTopCall(ScriptRuntime.java:3264) at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$4.doRun(JavaScriptEngine.java:828) at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:889) ... 10 more JavaScriptException value = Error: TOOLTIP: Option "sanitizeFn" provided type "window" but expected type "(null\|function)". == CALLING JAVASCRIPT == function () { throw e; } ======= EXCEPTION END ======== ``` I tried to upgrade HtmlUnit to 2.40.0 but what is worse, the test become not working even though it works on real browsers like Chrome, Safari and Firefox without error. ``` [info] UISeleniumSuite: [info] - SPARK-31534: text for tooltip should be escaped * FAILED * (17 seconds, 745 milliseconds) [info] The code passed to eventually never returned normally. Attempted 2 times over 12.910785232 seconds. Last failure message: com.gargoylesoftware.htmlunit.ScriptException: ReferenceError: Assignment to undefined "regeneratorRuntime" in strict mode (http://192.168.1.209:62132/static/vis-timeline-graph2d.min.js#52(Function)#1) ``` To resolve those problems, it's better to support headless browser for UI test. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? I tested with following patterns. Both Chrome and Chrome driver should be installed to test. 1. sbt / with chromedriver / include tag (expect to succeed) `build/sbt -Dspark.test.webdriver.chrome.driver=/path/to/chromedriver "testOnly org.apache.spark.ui.ChromeUISeleniumSuite"` 2. sbt / with chromedriver / exclude tag (expect to be ignored) `build/sbt -Dspark.test.webdriver.chrome.driver=/path/to/chromedriver "testOnly org.apache.spark.ui.ChromeUISeleniumSuite -l org.apache.spark.tags.ChromeUITest"` 3. sbt / without chromedriver / include tag (expect to be failed) `build/sbt "testOnly org.apache.spark.ui.ChromeUISeleniumSuite"` 4. sbt / without chromedriver / exclude tag (expect to be skipped) `build/sbt -Dtest.exclude.tags=org.apache.spark.tags.ChromeUITest "testOnly org.apache.spark.ui.ChromeUISeleniumSuite"` 5. Maven / wth chromedriver / include tag (expect to succeed) `build/mvn -Dspark.test.webdriver.chrome.driver=/path/to/chromedriver -Dtest=none -DwildcardSuites=org.apache.spark.ui.ChromeUISeleniumSuite test` 6. Maven / with chromedriver / exclude tag (expect to be skipped) `build/mvn -Dtest.exclude.tags="org.apache.spark.tags.ChromeUITest" -Dspark.test.webdriver.chrome.driver=/path/to/chromedriver -Dtest=none -DwildcardSuites=org.apache.spark.ui.ChromeUISeleniumSuite test` 7. Maven / without chromedriver / include tag (expect to be failed) `build/mvn -Dtest=none -DwildcardSuites=org.apache.spark.ui.ChromeUISeleniumSuite test` 8. Maven / without chromedriver / exclude tag (expect to be skipped) `build/mvn -Dtest.exclude.tags=org.apache.spark.tags.ChromeUITest -Dtest=none -DwildcardSuites=org.apache.spark.ui.ChromeUISeleniumSuite test` Closes #28578 from sarutak/real-headless-browser-support. Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com> Signed-off-by: Sean Owen <srowen@gmail.com>	2020-05-22 08:24:31 -05:00
Gengliang Wang	db5e5fce68	Revert "[SPARK-31765][WEBUI] Upgrade HtmlUnit >= 2.37.0" This reverts commit `92877c4ef2`. Closes #28602 from gengliangwang/revertSPARK-31765. Authored-by: Gengliang Wang <gengliang.wang@databricks.com> Signed-off-by: Gengliang Wang <gengliang.wang@databricks.com>	2020-05-21 16:00:58 -07:00
Kousuke Saruta	92877c4ef2	[SPARK-31765][WEBUI] Upgrade HtmlUnit >= 2.37.0 ### What changes were proposed in this pull request? This PR upgrades HtmlUnit. Selenium and Jetty also upgraded because of dependency. ### Why are the changes needed? Recently, a security issue which affects HtmlUnit is reported. https://nvd.nist.gov/vuln/detail/CVE-2020-5529 According to the report, arbitrary code can be run by malicious users. HtmlUnit is used for test so the impact might not be large but it's better to upgrade it just in case. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing testcases. Closes #28585 from sarutak/upgrade-htmlunit. Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com> Signed-off-by: Gengliang Wang <gengliang.wang@databricks.com>	2020-05-21 11:43:25 -07:00
angerszhu	0d9faf602e	[SPARK-31655][BUILD] Upgrade snappy-java to 1.1.7.5 ### What changes were proposed in this pull request? snappy-java have release v1.1.7.5, upgrade to latest version. Fixed in v1.1.7.4 - Caching internal buffers for SnappyFramed streams #234 - Fixed the native lib for ppc64le to work with glibc 2.17 (Previously it depended on 2.22) Fixed in v1.1.7.5 - Fixes java.lang.NoClassDefFoundError: org/xerial/snappy/pool/DefaultPoolFactory in 1.1.7.4 https://github.com/xerial/snappy-java/compare/1.1.7.3...1.1.7.5 v 1.1.7.5 release note: `edc4ec28bd` ### Why are the changes needed? Fix bug ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? No need Closes #28472 from AngersZhuuuu/spark-31655. Authored-by: angerszhu <angers.zhu@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-05-07 12:01:43 -07:00
Steve Loughran	86c4e43525	[SPARK-31644][BUILD] Make Spark's guava version configurable from the command line ### What changes were proposed in this pull request? This adds the maven property guava.version which can be used to control the guava version for a build. It does not change the current version. ### Why are the changes needed? All future Hadoop releases are going to be built with a later guava version, including Hadoop 3.1.4. This means to run the spark tests with that release you need to update the spark guava version. This patch lets whoever builds spark do this locally. ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? ran the hadoop-cloud module tests with the 3.1.4 RC0 ``` mvn -T 1 -Phadoop-3.2 -Dhadoop.version=3.1.4 -Psnapshots-and-staging -Phadoop-cloud,yarn,kinesis-asl test --pl hadoop-cloud ``` observed the linkage problem ``` java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V at org.apache.hadoop.conf.Configuration.set(Configuration.java:1357) ``` made the version configurable, retested with ``` -Phadoop-3.2 -Dhadoop.version=3.1.4 -Psnapshots-and-staging Dguava.version=27.0-jre ``` all good. Closes #28455 from steveloughran/SPARK-31644-guava-version. Authored-by: Steve Loughran <stevel@cloudera.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-05-05 12:17:24 -07:00
Dongjoon Hyun	e7995c2ddc	[SPARK-31633][BUILD] Upgrade SLF4J from 1.7.16 to 1.7.30 ### What changes were proposed in this pull request? This PR aims to upgrade SLF4J from 1.7.16 to 1.7.30. ### Why are the changes needed? SLF4J 1.7.23+ is required to enable `slf4j-log4j12` with MDC feature to run under Java 9. Also, this will bring all latest bug fixes. - http://www.slf4j.org/news.html > When running under Java 9, log4j version 1.2.x is unable to correctly parse the "java.version" system property. Assuming an inccorect Java version, it proceeded to disable its MDC functionality. The slf4j-log4j12 module shipping in this release fixes the issue by tweaking MDC internals by reflection, allowing log4j to run under Java 9. See also SLF4J-393. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the Jenkins with the existing tests. Closes #28446 from dongjoon-hyun/SPARK-31633. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-05-04 08:14:12 -07:00
Dongjoon Hyun	79eaaaf6da	[SPARK-31580][BUILD] Upgrade Apache ORC to 1.5.10 ### What changes were proposed in this pull request? This PR aims to upgrade Apache ORC to 1.5.10. ### Why are the changes needed? Apache ORC 1.5.10 is a maintenance release with the following patches. - [ORC-621](https://issues.apache.org/jira/browse/ORC-621) Need reader fix for ORC-569 - [ORC-616](https://issues.apache.org/jira/browse/ORC-616) In Patched Base encoding, the value of headerThirdByte goes beyond the range of byte - [ORC-613](https://issues.apache.org/jira/browse/ORC-613) OrcMapredRecordReader mis-reuse struct object when actual children schema differs - [ORC-610](https://issues.apache.org/jira/browse/ORC-610) Updated Copyright year in the NOTICE file The following is release note. - https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12318320&version=12346912 ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Pass the Jenkins with the existing ORC tests and a newly added test case. - The first commit is already tested in `hive-2.3` profile with both native ORC implementation and Hive 2.3 ORC implementation. (https://github.com/apache/spark/pull/28373#issuecomment-620265114) - The latest run is about to make the test case disable in `hive-1.2` profile which doesn't use Apache ORC. - `hive-1.2`: https://github.com/apache/spark/pull/28373#issuecomment-620325906 Closes #28373 from dongjoon-hyun/SPARK-ORC-1.5.10. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-04-27 18:56:30 -07:00
Gabor Somogyi	c619990c1d	[SPARK-31272][SQL] Support DB2 Kerberos login in JDBC connector ### What changes were proposed in this pull request? When loading DataFrames from JDBC datasource with Kerberos authentication, remote executors (yarn-client/cluster etc. modes) fail to establish a connection due to lack of Kerberos ticket or ability to generate it. This is a real issue when trying to ingest data from kerberized data sources (SQL Server, Oracle) in enterprise environment where exposing simple authentication access is not an option due to IT policy issues. In this PR I've added DB2 support (other supported databases will come in later PRs). What this PR contains: * Added `DB2ConnectionProvider` * Added `DB2ConnectionProviderSuite` * Added `DB2KrbIntegrationSuite` docker integration test * Changed DB2 JDBC driver to use the latest (test scope only) * Changed test table data type to a type which is supported by all the databases * Removed double connection creation on test side * Increased connection timeout in docker tests because DB2 docker takes quite a time to start ### Why are the changes needed? Missing JDBC kerberos support. ### Does this PR introduce any user-facing change? Yes, now user is able to connect to DB2 using kerberos. ### How was this patch tested? * Additional + existing unit tests * Additional + existing integration tests * Test on cluster manually Closes #28215 from gaborgsomogyi/SPARK-31272. Authored-by: Gabor Somogyi <gabor.g.somogyi@gmail.com> Signed-off-by: Marcelo Vanzin <vanzin@apache.org>	2020-04-22 17:10:30 -07:00
Yuming Wang	b11e42663b	[SPARK-31381][SPARK-29245][SQL] Upgrade built-in Hive 2.3.6 to 2.3.7 ### What changes were proposed in this pull request? Hive 2.3.7 fixed these issues: - HIVE-21508: ClassCastException when initializing HiveMetaStoreClient on JDK10 or newer - HIVE-21980:Parsing time can be high in case of deeply nested subqueries - HIVE-22249: Support Parquet through HCatalog ### Why are the changes needed? Fix CCE during creating HiveMetaStoreClient in JDK11 environment: [SPARK-29245](https://issues.apache.org/jira/browse/SPARK-29245). ### Does this PR introduce any user-facing change? No. ### How was this patch tested? - [x] Test Jenkins with Hadoop 2.7 (https://github.com/apache/spark/pull/28148#issuecomment-616757840) - [x] Test Jenkins with Hadoop 3.2 on JDK11 (https://github.com/apache/spark/pull/28148#issuecomment-616294353) - [x] Manual test with remote hive metastore. Hive side: ``` export JAVA_HOME=/usr/lib/jdk1.8.0_221 export PATH=$JAVA_HOME/bin:$PATH cd /usr/lib/hive-2.3.6 # Start Hive metastore with Hive 2.3.6 bin/schematool -dbType derby -initSchema --verbose bin/hive --service metastore ``` Spark side: ``` export JAVA_HOME=/usr/lib/jdk-11.0.3 export PATH=$JAVA_HOME/bin:$PATH build/sbt clean package -Phive -Phadoop-3.2 -Phive-thriftserver export SPARK_PREPEND_CLASSES=true bin/spark-sql --conf spark.hadoop.hive.metastore.uris=thrift://localhost:9083 ``` Closes #28148 from wangyum/SPARK-31381. Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-04-20 13:38:24 -07:00
Dongjoon Hyun	c6e39dffd6	[SPARK-31464][BUILD][SS] Upgrade Kafka to 2.5.0 ### What changes were proposed in this pull request? This PR aims to upgrade Kafka library to 2.5.0 for Apache Spark 3.1.0. ### Why are the changes needed? Apache Kafka 2.5.0 client has improvements and bug fixes like [KAFKA-9241](https://issues.apache.org/jira/browse/KAFKA-9241) - https://downloads.apache.org/kafka/2.5.0/RELEASE_NOTES.html ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Pass the Jenkins with the existing tests. - [x] SBT https://github.com/apache/spark/pull/28235#issuecomment-615936382 - [x] Maven https://github.com/apache/spark/pull/28235#issuecomment-616138840 (All Scala/Java/Python/R UT tests passed. It's timeout during R installation testing which is already covered by SBT.) Closes #28235 from dongjoon-hyun/SPARK-KAFKA-2.5. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-04-19 10:51:09 -07:00
Gabor Somogyi	1354d2d0de	[SPARK-31021][SQL] Support MariaDB Kerberos login in JDBC connector ### What changes were proposed in this pull request? When loading DataFrames from JDBC datasource with Kerberos authentication, remote executors (yarn-client/cluster etc. modes) fail to establish a connection due to lack of Kerberos ticket or ability to generate it. This is a real issue when trying to ingest data from kerberized data sources (SQL Server, Oracle) in enterprise environment where exposing simple authentication access is not an option due to IT policy issues. In this PR I've added MariaDB support (other supported databases will come in later PRs). What this PR contains: * Introduced `SecureConnectionProvider` and added basic secure functionalities * Added `MariaDBConnectionProvider` * Added `MariaDBConnectionProviderSuite` * Added `MariaDBKrbIntegrationSuite` docker integration test * Added some missing code documentation ### Why are the changes needed? Missing JDBC kerberos support. ### Does this PR introduce any user-facing change? Yes, now user is able to connect to MariaDB using kerberos. ### How was this patch tested? * Additional + existing unit tests * Additional + existing integration tests * Test on cluster manually Closes #28019 from gaborgsomogyi/SPARK-31021. Authored-by: Gabor Somogyi <gabor.g.somogyi@gmail.com> Signed-off-by: Marcelo Vanzin <vanzin@apache.org>	2020-04-09 09:20:02 -07:00
Jungtaek Lim (HeartSaVioR)	f55f6b569b	[SPARK-31101][BUILD] Upgrade Janino to 3.0.16 ### What changes were proposed in this pull request? This PR(SPARK-31101) proposes to upgrade Janino to 3.0.16 which is released recently. * Merged pull request janino-compiler/janino#114 "Grow the code for relocatables, and do fixup, and relocate". Please see the commit log. - https://github.com/janino-compiler/janino/commits/3.0.16 You can see the changelog from the link: http://janino-compiler.github.io/janino/changelog.html / though release note for Janino 3.0.16 is actually incorrect. ### Why are the changes needed? We got some report on failure on user's query which Janino throws error on compiling generated code. The issue is here: janino-compiler/janino#113 It contains the information of generated code, symptom (error), and analysis of the bug, so please refer the link for more details. Janino 3.0.16 contains the PR janino-compiler/janino#114 which would enable Janino to succeed to compile user's query properly. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Existing UTs. Closes #27932 from HeartSaVioR/SPARK-31101-janino-3.0.16. Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-03-21 19:10:23 -07:00
Gabor Somogyi	b0d2956a35	[SPARK-31135][BUILD][TESTS] Upgrdade docker-client version to 8.14.1 ### What changes were proposed in this pull request? Upgrdade `docker-client` version. ### Why are the changes needed? `docker-client` what Spark uses is super old. Snippet from the project page: ``` Spotify no longer uses recent versions of this project internally. The version of docker-client we're using is whatever helios has in its pom.xml. => 8.14.1 ``` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? ``` build/mvn install -DskipTests build/mvn -Pdocker-integration-tests -pl :spark-docker-integration-tests_2.12 -Dtest=none -DwildcardSuites=org.apache.spark.sql.jdbc.DB2IntegrationSuite test` build/mvn -Pdocker-integration-tests -pl :spark-docker-integration-tests_2.12 -Dtest=none -DwildcardSuites=org.apache.spark.sql.jdbc.MsSqlServerIntegrationSuite test` build/mvn -Pdocker-integration-tests -pl :spark-docker-integration-tests_2.12 -Dtest=none -DwildcardSuites=org.apache.spark.sql.jdbc.PostgresIntegrationSuite test` ``` Closes #27892 from gaborgsomogyi/docker-client. Authored-by: Gabor Somogyi <gabor.g.somogyi@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-03-15 23:55:04 -07:00
Dongjoon Hyun	614323d326	[SPARK-31126][SS] Upgrade Kafka to 2.4.1 ### What changes were proposed in this pull request? This PR (SPARK-31126) aims to upgrade Kafka library to bring a client-side bug fix like KAFKA-8933 ### Why are the changes needed? The following is the full release note. - https://downloads.apache.org/kafka/2.4.1/RELEASE_NOTES.html ### Does this PR introduce any user-facing change? No ### How was this patch tested? Pass the Jenkins with the existing test. Closes #27881 from dongjoon-hyun/SPARK-KAFKA-2.4.1. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-03-11 19:26:15 -07:00
Dongjoon Hyun	93def95b08	[SPARK-31095][BUILD] Upgrade netty-all to 4.1.47.Final ### What changes were proposed in this pull request? This PR aims to bring the bug fixes from the latest netty-all. ### Why are the changes needed? - 4.1.47.Final: https://github.com/netty/netty/milestone/222?closed=1 (15 patches or issues) - 4.1.46.Final: https://github.com/netty/netty/milestone/221?closed=1 (80 patches or issues) - 4.1.45.Final: https://github.com/netty/netty/milestone/220?closed=1 (23 patches or issues) - 4.1.44.Final: https://github.com/netty/netty/milestone/218?closed=1 (113 patches or issues) - 4.1.43.Final: https://github.com/netty/netty/milestone/217?closed=1 (63 patches or issues) ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Pass the Jenkins with the existing tests. Closes #27869 from dongjoon-hyun/SPARK-31095. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-03-10 17:50:34 -07:00
HyukjinKwon	5b3277f4fc	[SPARK-30994][BUILD][FOLLOW-UP] Change scope of xml-apis to include it and add xerces in SBT as dependency override ### What changes were proposed in this pull request? This PR propose 1. Explicitly include xml-apis. xml-apis is already the part of xerces 2.12.0 (https://repo1.maven.org/maven2/xerces/xercesImpl/2.12.0/xercesImpl-2.12.0.pom). However, we're excluding it by setting `scope` to `test`. This seems causing `spark-shell`, built from Maven, to fail. Seems like previously xml-apis wasn't reached for some reasons but after we upgrade, it seems requiring. Therefore, this PR proposes to include it. 2. Pins `xerces` version in SBT as well. Seems this dependency is resolved differently from Maven. Note that Hadoop 3 does not looks requiring this as they replaced xerces as of [HDFS-12221](https://issues.apache.org/jira/browse/HDFS-12221). ### Why are the changes needed? To make `spark-shell` working from Maven build, and uses the same xerces version. ### Does this PR introduce any user-facing change? No, it's master only. ### How was this patch tested? 1. ```bash ./build/mvn -DskipTests -Psparkr -Phive clean package ./bin/spark-shell ``` Before: ``` Exception in thread "main" java.lang.NoClassDefFoundError: org/w3c/dom/ElementTraversal at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:763) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:468) at java.net.URLClassLoader.access$100(URLClassLoader.java:74) at java.net.URLClassLoader$1.run(URLClassLoader.java:369) at java.net.URLClassLoader$1.run(URLClassLoader.java:363) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:362) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at org.apache.xerces.parsers.AbstractDOMParser.startDocument(Unknown Source) at org.apache.xerces.xinclude.XIncludeHandler.startDocument(Unknown Source) at org.apache.xerces.impl.dtd.XMLDTDValidator.startDocument(Unknown Source) at org.apache.xerces.impl.XMLDocumentScannerImpl.startEntity(Unknown Source) at org.apache.xerces.impl.XMLVersionDetector.startDocumentParsing(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XMLParser.parse(Unknown Source) at org.apache.xerces.parsers.DOMParser.parse(Unknown Source) at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source) at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:150) at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2482) at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2470) at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2541) at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2494) at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2407) at org.apache.hadoop.conf.Configuration.set(Configuration.java:1143) at org.apache.hadoop.conf.Configuration.set(Configuration.java:1115) at org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456) at org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427) at org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.lang.ClassNotFoundException: org.w3c.dom.ElementTraversal at java.net.URLClassLoader.findClass(URLClassLoader.java:382) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 42 more ``` After: ``` ... Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 3.1.0-SNAPSHOT /_/ Using Scala version 2.12.10 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_202) Type in expressions to have them evaluated. Type :help for more information. scala> ``` 2. ``` ./build/sbt dependencyTree -Phadoop-2.7 -Phive-2.3 -Phive-thriftserver -Phive ./build/sbt dependencyTree -Phadoop-3.2 -Phive-2.3 -Phive-thriftserver -Phive ``` Closes #27808 from HyukjinKwon/SPARK-30994. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-03-06 09:39:02 +09:00
Sean Owen	97d9a22b04	[SPARK-30994][CORE] Update xerces to 2.12.0 ### What changes were proposed in this pull request? Manage up the version of Xerces that Hadoop uses (and potentially user apps) to 2.12.0 to match https://issues.apache.org/jira/browse/HADOOP-16530 ### Why are the changes needed? Picks up bug and security fixes: https://www.xml.com/news/2018-05-apache-xerces-j-2120/ ### Does this PR introduce any user-facing change? Should be no behavior changes. ### How was this patch tested? Existing tests. Closes #27746 from srowen/SPARK-30994. Authored-by: Sean Owen <srowen@gmail.com> Signed-off-by: Sean Owen <srowen@gmail.com>	2020-03-03 09:27:18 -06:00
Dongjoon Hyun	3995728c3c	[SPARK-30968][BUILD] Upgrade aws-java-sdk-sts to 1.11.655 ### What changes were proposed in this pull request? This PR aims to upgrade `aws-java-sdk-sts` to `1.11.655`. ### Why are the changes needed? [SPARK-29677](https://github.com/apache/spark/pull/26333) upgrades AWS Kinesis Client to 1.12.0 for Apache Spark 2.4.5 and 3.0.0. Since AWS Kinesis Client 1.12.0 is using AWS SDK 1.11.665, `aws-java-sdk-sts` should be consistent with Kinesis client dependency. - https://github.com/awslabs/amazon-kinesis-client/releases/tag/v1.12.0 ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Pass the Jenkins. Closes #27720 from dongjoon-hyun/SPARK-30968. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2020-02-27 17:05:56 -08:00
gatorsmile	28b8713036	[SPARK-30950][BUILD] Setting version to 3.1.0-SNAPSHOT ### What changes were proposed in this pull request? This patch is to bump the master branch version to 3.1.0-SNAPSHOT. ### Why are the changes needed? N/A ### Does this PR introduce any user-facing change? N/A ### How was this patch tested? N/A Closes #27698 from gatorsmile/updateVersion. Authored-by: gatorsmile <gatorsmile@gmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2020-02-25 19:44:31 -08:00
Josh Rosen	f152d2a0a8	[SPARK-30944][BUILD] Update URL for Google Cloud Storage mirror of Maven Central ### What changes were proposed in this pull request? This PR is a followup to #27307: per https://travis-ci.community/t/maven-builds-that-use-the-gcs-maven-central-mirror-should-update-their-paths/5926, the Google Cloud Storage mirror of Maven Central has updated its URLs: the new paths are updated more frequently. The new paths are listed on https://storage-download.googleapis.com/maven-central/index.html This patch updates our build files to use these new URLs. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Existing build + tests. Closes #27688 from JoshRosen/update-gcs-mirror-url. Authored-by: Josh Rosen <joshrosen@databricks.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-02-25 17:04:13 +09:00
Yin Huai	ea626b6acf	[SPARK-30783] Exclude hive-service-rpc ### What changes were proposed in this pull request? Exclude hive-service-rpc from build. ### Why are the changes needed? hive-service-rpc 2.3.6 and spark sql's thrift server module have duplicate classes. Leaving hive-service-rpc 2.3.6 in the class path means that spark can pick up classes defined in hive instead of its thrift server module, which can cause hard to debug runtime errors due to class loading order and compilation errors for applications depend on spark. If you compare hive-service-rpc 2.3.6's jar (https://search.maven.org/remotecontent?filepath=org/apache/hive/hive-service-rpc/2.3.6/hive-service-rpc-2.3.6.jar) and spark thrift server's jar (e.g. https://repository.apache.org/content/groups/snapshots/org/apache/spark/spark-hive-thriftserver_2.12/3.0.0-SNAPSHOT/spark-hive-thriftserver_2.12-3.0.0-20200207.021914-364.jar), you will see that all of classes provided by hive-service-rpc-2.3.6.jar are covered by spark thrift server's jar. https://issues.apache.org/jira/browse/SPARK-30783 has output of jar tf for both jars. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Existing tests. Closes #27533 from yhuai/SPARK-30783. Authored-by: Yin Huai <yhuai@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2020-02-12 00:12:45 +08:00
Dongjoon Hyun	41bdb7ad39	[SPARK-30718][BUILD] Exclude jdk.tools dependency from hadoop-yarn-api ### What changes were proposed in this pull request? This PR removes the `jdk.tools:jdk.tools` transitive dependency from `hadoop-yarn-api`. - This is only used in `hadoop-annotation` project in some `*Doclet.java`. ### Why are the changes needed? Although this is not used in Apache Spark, this can cause a resolve failure in JDK11 environment. <img width="530" alt="jdk tools" src="https://user-images.githubusercontent.com/9700541/73697745-2f3f4080-4694-11ea-95a7-228638e31cf7.png"> ### Does this PR introduce any user-facing change? No. This is a dev-only change. From developers, this will remove the `Cannot resolve` error in IDE environment. ### How was this patch tested? - Pass the Jenkins in JDK8 - Manually, import the project with JDK11. Closes #27445 from dongjoon-hyun/SPARK-30718. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2020-02-03 19:57:16 -08:00
jiaan geng	35380958b8	[SPARK-30698][BUILD] Bumps checkstyle from 8.25 to 8.29 ### What changes were proposed in this pull request? I found checkstyle have a new release https://checkstyle.org/releasenotes.html#Release_8.29 Bumps checkstyle from 8.25 to 8.29. ### Why are the changes needed? I have bump checkstyle from 8.25 to 8.29 on my fork branch and test to build. It's OK 8.29 added some new features： - New Check: AvoidNoArgumentSuperConstructorCall. - New Check NoEnumTrailingComma. - ENUM_DEF token support in RightCurlyCheck. - FallThrough module does not support the spelling "fall-through" by default. 8.29 fix some bugs: - Java 8 Grammar: annotations on varargs parameters. - Sonar violation: Disable XML external entity (XXE) processing. - Disable instantiation of modules with private ctor. - Sonar violation: "ThreadLocal" variables should be cleaned up when no longer used. - Indentation incorrect level for chained method with bracket on new line. - InvalidJavadocPosition: false positive when comment is between javadoc and package. ### Does this PR introduce any user-facing change? No ### How was this patch tested? No UT Closes #27426 from beliefer/bump-checkstyle. Authored-by: jiaan geng <jiaan.geng@jiaandeMacBook-Air.local> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2020-01-31 21:14:11 -08:00
Dongjoon Hyun	2fd15a26fb	[SPARK-30695][BUILD] Upgrade Apache ORC to 1.5.9 ### What changes were proposed in this pull request? This PR aims to upgrade to Apache ORC 1.5.9. - For `hive-2.3` profile, we need to upgrade `hive-storage-api` from `2.6.0` to `2.7.1`. - For `hive-1.2` profile, ORC library with classifier `nohive` already shaded it. So, there is no change. ### Why are the changes needed? This will bring the latest bug fixes. The following is the full release note. - https://issues.apache.org/jira/projects/ORC/versions/12346546 ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Pass the Jenkins with the existing tests. Here is the summary. 1. `Hive 1.2 + Hadoop 2.7` passed. ([here](https://github.com/apache/spark/pull/27421#issuecomment-580924552)) 2. `Hive 2.3 + Hadoop 2.7` passed. ([here](https://github.com/apache/spark/pull/27421#issuecomment-580973391)) Closes #27421 from dongjoon-hyun/SPARK-ORC-1.5.9. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2020-01-31 17:41:27 -08:00
Dongjoon Hyun	862959747e	[SPARK-30639][BUILD] Upgrade Jersey to 2.30 ### What changes were proposed in this pull request? For better JDK11 support, this PR aims to upgrade Jersey and javassist to `2.30` and `3.35.0-GA` respectively. ### Why are the changes needed? Jersey: This will bring the following `Jersey` updates. - https://eclipse-ee4j.github.io/jersey.github.io/release-notes/2.30.html - https://github.com/eclipse-ee4j/jersey/issues/4245 (Java 11 java.desktop module dependency) javassist: This is a transitive dependency from 3.20.0-CR2 to 3.25.0-GA. - `javassist` officially supports JDK11 from [3.24.0-GA release note](https://github.com/jboss-javassist/javassist/blob/master/Readme.html#L308). ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Pass the Jenkins with both JDK8 and JDK11. Closes #27357 from dongjoon-hyun/SPARK-30639. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2020-01-25 15:41:55 -08:00
cody koeninger	843224ebd4	[SPARK-30570][BUILD] Update scalafmt plugin to 1.0.3 with onlyChangedFiles feature ### What changes were proposed in this pull request? Update the scalafmt plugin to 1.0.3 and use its new onlyChangedFiles feature rather than --diff ### Why are the changes needed? Older versions of the plugin either didn't work with scala 2.13, or got rid of the --diff argument and didn't allow for formatting only changed files ### Does this PR introduce any user-facing change? The /dev/scalafmt script no longer passes through arbitrary args, instead using the arg to select scala version. The issue here is the plugin name literally contains the scala version, and doesn't appear to have a shorter way to refer to it. If srowen or someone else with better maven-fu has an idea I'm all ears. ### How was this patch tested? Manually, e.g. edited a file and ran dev/scalafmt or dev/scalafmt 2.13 Closes #27279 from koeninger/SPARK-30570. Authored-by: cody koeninger <cody@koeninger.org> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2020-01-23 12:44:43 -08:00
HyukjinKwon	cd9ccdc0ac	[SPARK-30601][BUILD] Add a Google Maven Central as a primary repository ### What changes were proposed in this pull request? This PR proposes to address four things. Three issues and fixes were a bit mixed so this PR sorts it out. See also http://apache-spark-developers-list.1001551.n3.nabble.com/Adding-Maven-Central-mirror-from-Google-to-the-build-td28728.html for the discussion in the mailing list. 1. Add the Google Maven Central mirror (GCS) as a primary repository. This will not only help development more stable but also in order to make Github Actions build (where it is always required to download jars) stable. In case of Jenkins PR builder, it wouldn't be affected too much as it uses the pre-downloaded jars under `.m2`. - Google Maven Central seems stable for heavy workload but not synced very quickly (e.g., new release is missing) - Maven Central (default) seems less stable but synced quickly. We already added this GCS mirror as a default additional remote repository at SPARK-29175. So I don't see an issue to add it as a repo. `abf759a91e/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala (L2111-L2118)` 2. Currently, we have the hard-corded repository in [`sbt-pom-reader`](https://github.com/JoshRosen/sbt-pom-reader/blob/v1.0.0-spark/src/main/scala/com/typesafe/sbt/pom/MavenPomResolver.scala#L32) and this seems overwriting Maven's existing resolver by the same ID `central` with `http://` when initially the pom file is ported into SBT instance. This uses `http://` which latently Maven Central disallowed (see https://github.com/apache/spark/pull/27242) My speculation is that we just need to be able to load plugin and let it convert POM to SBT instance with another fallback repo. After that, it _seems_ using `central` with `https` properly. See also https://github.com/apache/spark/pull/27307#issuecomment-576720395. I double checked that we use `https` properly from the SBT build as well: ``` [debug] downloading https://repo1.maven.org/maven2/com/etsy/sbt-checkstyle-plugin_2.10_0.13/3.1.1/sbt-checkstyle-plugin-3.1.1.pom ... [debug] public: downloading https://repo1.maven.org/maven2/com/etsy/sbt-checkstyle-plugin_2.10_0.13/3.1.1/sbt-checkstyle-plugin-3.1.1.pom [debug] public: downloading https://repo1.maven.org/maven2/com/etsy/sbt-checkstyle-plugin_2.10_0.13/3.1.1/sbt-checkstyle-plugin-3.1.1.pom.sha1 ``` This was fixed by adding the same repo (https://github.com/apache/spark/pull/27281), `central_without_mirror`, which is a bit awkward. Instead, this PR adds GCS as a main repo, and community Maven central as a fallback repo. So, presumably the community Maven central repo is used when the plugin is loaded as a fallback. 3. While I am here, I fix another issue. Github Action at https://github.com/apache/spark/pull/27279 is being failed. The reason seems to be scalafmt 1.0.3 is in Maven central but not in GCS. ``` org.apache.maven.plugin.PluginResolutionException: Plugin org.antipathy:mvn-scalafmt_2.12:1.0.3 or one of its dependencies could not be resolved: Could not find artifact org.antipathy:mvn-scalafmt_2.12🫙1.0.3 in google-maven-central (https://maven-central.storage-download.googleapis.com/repos/central/data/) at org.apache.maven.plugin.internal.DefaultPluginDependenciesResolver.resolve (DefaultPluginDependenciesResolver.java:131) ``` `mvn-scalafmt` exists in Maven central: ```bash $ curl https://repo.maven.apache.org/maven2/org/antipathy/mvn-scalafmt_2.12/1.0.3/mvn-scalafmt_2.12-1.0.3.pom ``` ```xml <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd"> <modelVersion>4.0.0</modelVersion> ... ``` whereas not in GCS mirror: ```bash $ curl https://maven-central.storage-download.googleapis.com/repos/central/data/org/antipathy/mvn-scalafmt_2.12/1.0.3/mvn-scalafmt_2.12-1.0.3.pom ``` ```xml <?xml version='1.0' encoding='UTF-8'?><Error><Code>NoSuchKey</Code><Message>The specified key does not exist.</Message><Details>No such object: maven-central/repos/central/data/org/antipathy/mvn-scalafmt_2.12/1.0.3/mvn-scalafmt_2.12-1.0.3.pom</Details></Error>% ``` In this PR, simply make both repos accessible by adding to `pluginRepositories`. 4. Remove the workarounds in Github Actions to switch mirrors because now we have same repos in the same order (Google Maven Central first, and Maven Central second) ### Why are the changes needed? To make the build and Github Action more stable. ### Does this PR introduce any user-facing change? No, dev only change. ### How was this patch tested? I roughly checked local and PR against my fork (https://github.com/HyukjinKwon/spark/pull/2 and https://github.com/HyukjinKwon/spark/pull/3). Closes #27307 from HyukjinKwon/SPARK-30572. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-01-23 16:00:21 +09:00
HyukjinKwon	e170422f74	Revert "[SPARK-30534][INFRA] Use mvn in `dev/scalastyle`" This reverts commit `384899944b`.	2020-01-21 18:23:03 +09:00
Takeshi Yamamuro	775fae4640	[SPARK-30486][BUILD] Bump lz4-java version to 1.7.1 ### What changes were proposed in this pull request? This pr intends to upgrade lz4-java from 1.7.0 to 1.7.1. ### Why are the changes needed? This release includes a bug fix for older macOS. You can see the link below for the changes; https://github.com/lz4/lz4-java/blob/master/CHANGES.md#171 ### Does this PR introduce any user-facing change? ### How was this patch tested? Existing tests. Closes #27271 from maropu/SPARK-30486. Authored-by: Takeshi Yamamuro <yamamuro@apache.org> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2020-01-19 19:05:30 -08:00
Sean Owen	a2081ae4e1	[SPARK-29290][CORE] Update to chill 0.9.5 ### What changes were proposed in this pull request? Update Twitter Chill to 0.9.5. ### Why are the changes needed? Primarily, Scala 2.13 support for later. Other changes from 0.9.3 are apparently just minor fixes and improvements: https://github.com/twitter/chill/releases ### Does this PR introduce any user-facing change? No ### How was this patch tested? Existing tests Closes #27227 from srowen/SPARK-29290. Authored-by: Sean Owen <srowen@gmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2020-01-19 18:39:38 -08:00
Dongjoon Hyun	c992716a33	[SPARK-30572][BUILD] Add a fallback Maven repository ### What changes were proposed in this pull request? This PR aims to add a fallback Maven repository when a mirror to `central` fail. ### Why are the changes needed? We use `Google Maven Central` in GitHub Action as a mirror of `central`. However, `Google Maven Central` sometimes doesn't have newly published artifacts and there is no guarantee when we get the newly published artifacts. By duplicating `Maven Central` with a new ID, we can add a fallback Maven repository which is not mirrored by `Google Maven Central`. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Manually testing with the new `Twitter` chill artifacts by switching `chill.version` from `0.9.3` to `0.9.5`. ``` $ rm -rf ~/.m2/repository/com/twitter/chill* $ mvn compile \| grep chill Downloading from google-maven-central: https://maven-central.storage-download.googleapis.com/repos/central/data/com/twitter/chill_2.12/0.9.5/chill_2.12-0.9.5.pom Downloading from central_without_mirror: https://repo.maven.apache.org/maven2/com/twitter/chill_2.12/0.9.5/chill_2.12-0.9.5.pom Downloaded from central_without_mirror: https://repo.maven.apache.org/maven2/com/twitter/chill_2.12/0.9.5/chill_2.12-0.9.5.pom (2.8 kB at 11 kB/s) ``` Closes #27281 from dongjoon-hyun/SPARK-30572. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2020-01-19 17:42:34 -08:00
Dongjoon Hyun	384899944b	[SPARK-30534][INFRA] Use mvn in `dev/scalastyle` ### What changes were proposed in this pull request? This PR aims to use `mvn` instead of `sbt` in `dev/scalastyle` to recover GitHub Action. ### Why are the changes needed? As of now, Apache Spark sbt build is broken by the Maven Central repository policy. https://stackoverflow.com/questions/59764749/requests-to-http-repo1-maven-org-maven2-return-a-501-https-required-status-an > Effective January 15, 2020, The Central Maven Repository no longer supports insecure > communication over plain HTTP and requires that all requests to the repository are > encrypted over HTTPS. We can reproduce this locally by the following. ``` $ rm -rf ~/.m2/repository/org/apache/apache/18/ $ build/sbt clean ``` And, in GitHub Action, `lint-scala` is the only one which is using `sbt`. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? First of all, GitHub Action should be recovered. Also, manually, do the following. Without Scalastyle violation ``` $ dev/scalastyle OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=384m; support was removed in 8.0 Using `mvn` from path: /usr/local/bin/mvn Scalastyle checks passed. ``` With Scalastyle violation ``` $ dev/scalastyle OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=384m; support was removed in 8.0 Using `mvn` from path: /usr/local/bin/mvn Scalastyle checks failed at following occurrences: error file=/Users/dongjoon/PRS/SPARK-HTTP-501/core/src/main/scala/org/apache/spark/SparkConf.scala message=There should be no empty line separating imports in the same group. line=22 column=0 error file=/Users/dongjoon/PRS/SPARK-HTTP-501/core/src/test/scala/org/apache/spark/resource/ResourceProfileSuite.scala message=There should be no empty line separating imports in the same group. line=22 column=0 ``` Closes #27242 from dongjoon-hyun/SPARK-30534. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2020-01-16 16:00:58 -08:00
Jungtaek Lim (HeartSaVioR)	8384ff4c9d	[SPARK-28144][SPARK-29294][SS] Upgrade Kafka to 2.4.0 ### What changes were proposed in this pull request? This patch upgrades the version of Kafka to 2.4, which supports Scala 2.13. There're some incompatible changes in Kafka 2.4 which the patch addresses as well: * `ZkUtils` is removed -> Replaced with `KafkaZkClient` * Majority of methods are removed in `AdminUtils` -> Replaced with `AdminZkClient` * Method signature of `Scheduler.schedule` is changed (return type) -> leverage `DeterministicScheduler` to avoid implementing `ScheduledFuture` ### Why are the changes needed? * Kafka 2.4 supports Scala 2.13 ### Does this PR introduce any user-facing change? No, as Kafka API is known to be compatible across versions. ### How was this patch tested? Existing UTs Closes #26960 from HeartSaVioR/SPARK-29294. Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-12-21 14:01:25 -08:00
Sean Owen	fac6b9bde8	Revert [SPARK-27300][GRAPH] Add Spark Graph modules and dependencies This reverts commit `709387d660`. See https://issues.apache.org/jira/browse/SPARK-27300?focusedCommentId=16990048&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16990048 and previous mailing list discussions. ### What changes were proposed in this pull request? Revert the addition of skeleton graph API modules for Spark 3.0. ### Why are the changes needed? It does not appear that content will be added to the module for Spark 3, so I propose avoiding committing to the modules, which are no-ops now, in the upcoming major 3.0 release. ### Does this PR introduce any user-facing change? No, the modules were not released. ### How was this patch tested? Existing tests, but mostly N/A. Closes #26928 from srowen/Revert27300. Authored-by: Sean Owen <srowen@gmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-12-17 09:06:23 -08:00
Yuming Wang	696288f623	[INFRA] Reverts commit `56dcd79` and `c216ef1` ### What changes were proposed in this pull request? 1. Revert "Preparing development version 3.0.1-SNAPSHOT": `56dcd79` 2. Revert "Preparing Spark release v3.0.0-preview2-rc2": `c216ef1` ### Why are the changes needed? Shouldn't change master. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? manual test: https://github.com/apache/spark/compare/5de5e46..wangyum:revert-master Closes #26915 from wangyum/revert-master. Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: Yuming Wang <wgyumg@gmail.com>	2019-12-16 19:57:44 -07:00
Yuming Wang	56dcd79992	Preparing development version 3.0.1-SNAPSHOT	2019-12-17 01:57:27 +00:00
Yuming Wang	c216ef1d03	Preparing Spark release v3.0.0-preview2-rc2	2019-12-17 01:57:21 +00:00
Dongjoon Hyun	b709091b4f	[SPARK-30228][BUILD] Update zstd-jni to 1.4.4-3 ### What changes were proposed in this pull request? This PR aims to update zstd-jni library to 1.4.4-3. ### Why are the changes needed? This will bring the latest bug fixes in zstd itself and some performance improvement. - https://github.com/facebook/zstd/releases/tag/v1.4.4 ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Pass the Jenkins. Closes #26856 from dongjoon-hyun/SPARK-ZSTD-144. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2019-12-12 14:16:32 +09:00
Takeshi Yamamuro	be867e8a9e	[SPARK-30196][BUILD] Bump lz4-java version to 1.7.0 ### What changes were proposed in this pull request? This pr intends to upgrade lz4-java from 1.6.0 to 1.7.0. ### Why are the changes needed? This release includes a performance bug (https://github.com/lz4/lz4-java/pull/143) fixed by JoshRosen and some improvements (e.g., LZ4 binary update). You can see the link below for the changes; https://github.com/lz4/lz4-java/blob/master/CHANGES.md#170 ### Does this PR introduce any user-facing change? No ### How was this patch tested? Existing tests. Closes #26823 from maropu/LZ4_1_7_0. Authored-by: Takeshi Yamamuro <yamamuro@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2019-12-10 12:22:03 +09:00
Dongjoon Hyun	afc4fa02bd	[SPARK-30156][BUILD] Upgrade Jersey from 2.29 to 2.29.1 ### What changes were proposed in this pull request? This PR aims to upgrade `Jersey` from 2.29 to 2.29.1. ### Why are the changes needed? This will bring several bug fixes and important dependency upgrades. - https://eclipse-ee4j.github.io/jersey.github.io/release-notes/2.29.1.html ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Pass the Jenkins. Closes #26785 from dongjoon-hyun/SPARK-30156. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-12-06 18:49:43 -08:00
Dongjoon Hyun	1e0037b5e9	[SPARK-30157][BUILD][TEST-HADOOP3.2][TEST-JAVA11] Upgrade Apache HttpCore from 4.4.10 to 4.4.12 ### What changes were proposed in this pull request? This PR aims to upgrade `Apache HttpCore` from 4.4.10 to 4.4.12. ### Why are the changes needed? `Apache HttpCore v4.4.11` is the first official release for JDK11. > This is a maintenance release that corrects a number of defects in non-blocking SSL session code that caused compatibility issues with TLSv1.3 protocol implementation shipped with Java 11. For the full release note, please see the following. - https://www.apache.org/dist/httpcomponents/httpcore/RELEASE_NOTES-4.4.x.txt ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Pass the Jenkins. Closes #26786 from dongjoon-hyun/SPARK-30157. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2019-12-07 10:59:10 +09:00
Dongjoon Hyun	1595e46a4e	[SPARK-30142][TEST-MAVEN][BUILD] Upgrade Maven to 3.6.3 ### What changes were proposed in this pull request? This PR aims to upgrade Maven from 3.6.2 to 3.6.3. ### Why are the changes needed? This will bring bug fixes like the following. - MNG-6759 Maven fails to use <repositories> section from dependency when resolving transitive dependencies in some cases - MNG-6760 ExclusionArtifactFilter result invalid when wildcard exclusion is followed by other exclusions The following is the full release note. - https://maven.apache.org/docs/3.6.3/release-notes.html ### Does this PR introduce any user-facing change? No. (This is a dev-environment change.) ### How was this patch tested? Pass the Jenkins with both SBT and Maven. Closes #26770 from dongjoon-hyun/SPARK-30142. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2019-12-06 23:41:59 +09:00
Dongjoon Hyun	f3abee377d	[SPARK-30051][BUILD] Clean up hadoop-3.2 dependency ### What changes were proposed in this pull request? This PR aims to cut `org.eclipse.jetty:jetty-webapp`and `org.eclipse.jetty:jetty-xml` transitive dependency from `hadoop-common`. ### Why are the changes needed? This will simplify our dependency management by the removal of unused dependencies. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Pass the GitHub Action with all combinations and the Jenkins UT with (Hadoop-3.2). Closes #26742 from dongjoon-hyun/SPARK-30051. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-12-03 14:33:36 -08:00
huangtianhua	e842033acc	[SPARK-27721][BUILD] Switch to use right leveldbjni according to the platforms This change adds a profile to switch to use the right leveldbjni package according to the platforms: aarch64 uses org.openlabtesting.leveldbjni:leveldbjni-all.1.8, and other platforms use the old one org.fusesource.leveldbjni:leveldbjni-all.1.8. And because some hadoop dependencies packages are also depend on org.fusesource.leveldbjni:leveldbjni-all, but hadoop merge the similar change on trunk, details see https://issues.apache.org/jira/browse/HADOOP-16614, so exclude the dependency of org.fusesource.leveldbjni for these hadoop packages related. Then Spark can build/test on aarch64 platform successfully. Closes #26636 from huangtianhua/add-aarch64-leveldbjni. Authored-by: huangtianhua <huangtianhua@huawei.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2019-12-02 09:04:00 -06:00
HyukjinKwon	f32ca4b279	[SPARK-30076][BUILD][TESTS] Upgrade Mockito to 3.1.0 ### What changes were proposed in this pull request? We used 2.28.2 of Mockito as of https://github.com/apache/spark/pull/25139 because 3.0.0 might be unstable. Now 3.1.0 is released. See release notes - https://github.com/mockito/mockito/blob/v3.1.0/doc/release-notes/official.md ### Why are the changes needed? To bring the fixes made in the dependency. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Jenkins will test. Closes #26707 from HyukjinKwon/upgrade-Mockito. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2019-11-30 12:23:11 -06:00
Sean Owen	e23c135e56	[SPARK-29293][BUILD] Move scalafmt to Scala 2.12 profile; bump to 0.12 ### What changes were proposed in this pull request? Move scalafmt to Scala 2.12 profile; bump to 0.12. ### Why are the changes needed? To facilitate a future Scala 2.13 build. ### Does this PR introduce any user-facing change? None. ### How was this patch tested? This isn't covered by tests, it's a convenience for contributors. Closes #26655 from srowen/SPARK-29293. Authored-by: Sean Owen <sean.owen@databricks.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-11-26 09:59:19 -08:00
Dongjoon Hyun	c2d513f8e9	[SPARK-30035][BUILD] Upgrade to Apache Commons Lang 3.9 ### What changes were proposed in this pull request? This PR aims to upgrade to `Apache Commons Lang 3.9`. ### Why are the changes needed? `Apache Commons Lang 3.9` is the first official release to support JDK9+. The following is the full release note. - https://commons.apache.org/proper/commons-lang/release-notes/RELEASE-NOTES-3.9.txt ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Pass the Jenkins with the existing tests. Closes #26672 from dongjoon-hyun/SPARK-30035. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2019-11-26 21:31:02 +09:00
Dongjoon Hyun	53e19f3678	[SPARK-30032][BUILD] Upgrade to ORC 1.5.8 ### What changes were proposed in this pull request? This PR aims to upgrade to Apache ORC 1.5.8. ### Why are the changes needed? This will bring the latest bug fixes. The following is the full release note. - https://issues.apache.org/jira/projects/ORC/versions/12346462 ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Pass the Jenkins with the existing tests. Closes #26669 from dongjoon-hyun/SPARK-ORC-1.5.8. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-11-25 20:08:11 -08:00
Dongjoon Hyun	2a28c73d81	[SPARK-30031][BUILD][SQL] Remove `hive-2.3` profile from `sql/hive` module ### What changes were proposed in this pull request? This PR aims to remove `hive-2.3` profile from `sql/hive` module. ### Why are the changes needed? Currently, we need `-Phive-1.2` or `-Phive-2.3` additionally to build `hive` or `hive-thriftserver` module. Without specifying it, the build fails like the following. This PR will recover it. ``` $ build/mvn -DskipTests compile --pl sql/hive ... [ERROR] [Error] /Users/dongjoon/APACHE/spark-merge/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala:32: object serde is not a member of package org.apache.hadoop.hive ``` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? 1. Pass GitHub Action dependency check with no manifest change. 2. Pass GitHub Action build for all combinations. 3. Pass the Jenkins UT. Closes #26668 from dongjoon-hyun/SPARK-30031. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-11-25 15:17:27 -08:00
Dongjoon Hyun	1466863cee	[SPARK-30015][BUILD] Move hive-storage-api dependency from `hive-2.3` to `sql/core` # What changes were proposed in this pull request? This PR aims to relocate the following internal dependencies to compile `sql/core` without `-Phive-2.3` profile. 1. Move the `hive-storage-api` to `sql/core` which is using `hive-storage-api` really. BEFORE (sql/core compilation) ``` $ ./build/mvn -DskipTests --pl sql/core --am compile ... [ERROR] [Error] /Users/dongjoon/APACHE/spark/sql/core/v2.3/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFilters.scala:21: object hive is not a member of package org.apache.hadoop ... [INFO] ------------------------------------------------------------------------ [INFO] BUILD FAILURE [INFO] ------------------------------------------------------------------------ ``` AFTER (sql/core compilation) ``` $ ./build/mvn -DskipTests --pl sql/core --am compile ... [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------------------------ [INFO] Total time: 02:04 min [INFO] Finished at: 2019-11-25T00:20:11-08:00 [INFO] ------------------------------------------------------------------------ ``` 2. For (1), add `commons-lang:commons-lang` test dependency to `spark-core` module to manage the dependency explicitly. Without this, `core` module fails to build the test classes. ``` $ ./build/mvn -DskipTests --pl core --am package -Phadoop-3.2 ... [INFO] --- scala-maven-plugin:4.3.0:testCompile (scala-test-compile-first) spark-core_2.12 --- [INFO] Using incremental compilation using Mixed compile order [INFO] Compiler bridge file: /Users/dongjoon/.sbt/1.0/zinc/org.scala-sbt/org.scala-sbt-compiler-bridge_2.12-1.3.1-bin_2.12.10__52.0-1.3.1_20191012T045515.jar [INFO] Compiling 271 Scala sources and 26 Java sources to /spark/core/target/scala-2.12/test-classes ... [ERROR] [Error] /spark/core/src/test/scala/org/apache/spark/util/PropertiesCloneBenchmark.scala:23: object lang is not a member of package org.apache.commons [ERROR] [Error] /spark/core/src/test/scala/org/apache/spark/util/PropertiesCloneBenchmark.scala:49: not found: value SerializationUtils [ERROR] two errors found ``` BEFORE (commons-lang:commons-lang) The following is the previous `core` module's `commons-lang:commons-lang` dependency. 1. branch-2.4 ``` $ mvn dependency:tree -Dincludes=commons-lang:commons-lang [INFO] --- maven-dependency-plugin:3.0.2:tree (default-cli) spark-core_2.11 --- [INFO] org.apache.spark:spark-core_2.11🫙2.4.5-SNAPSHOT [INFO] \- org.spark-project.hive:hive-exec:jar:1.2.1.spark2:provided [INFO] \- commons-lang:commons-lang:jar:2.6:compile ``` 2. v3.0.0-preview (-Phadoop-3.2) ``` $ mvn dependency:tree -Dincludes=commons-lang:commons-lang -Phadoop-3.2 [INFO] --- maven-dependency-plugin:3.1.1:tree (default-cli) spark-core_2.12 --- [INFO] org.apache.spark:spark-core_2.12🫙3.0.0-preview [INFO] \- org.apache.hive:hive-storage-api:jar:2.6.0:compile [INFO] \- commons-lang:commons-lang:jar:2.6:compile ``` 3. v3.0.0-preview(default) ``` $ mvn dependency:tree -Dincludes=commons-lang:commons-lang [INFO] --- maven-dependency-plugin:3.1.1:tree (default-cli) spark-core_2.12 --- [INFO] org.apache.spark:spark-core_2.12🫙3.0.0-preview [INFO] \- org.apache.hadoop:hadoop-client:jar:2.7.4:compile [INFO] \- org.apache.hadoop:hadoop-common:jar:2.7.4:compile [INFO] \- commons-lang:commons-lang:jar:2.6:compile ``` AFTER (commons-lang:commons-lang) ``` $ mvn dependency:tree -Dincludes=commons-lang:commons-lang [INFO] --- maven-dependency-plugin:3.1.1:tree (default-cli) spark-core_2.12 --- [INFO] org.apache.spark:spark-core_2.12🫙3.0.0-SNAPSHOT [INFO] \- commons-lang:commons-lang:jar:2.6:test ``` Since we wanted to verify that this PR doesn't change `hive-1.2` profile, we merged [SPARK-30005 Update `test-dependencies.sh` to check `hive-1.2/2.3` profile](`a1706e2fa7`) before this PR. ### Why are the changes needed? - Apache Spark 2.4's `sql/core` is using `Apache ORC (nohive)` jars including shaded `hive-storage-api` to access ORC data sources. - Apache Spark 3.0's `sql/core` is using `Apache Hive` jars directly. Previously, `-Phadoop-3.2` hid this `hive-storage-api` dependency. Now, we are using `-Phive-2.3` instead. As I mentioned [previously](https://github.com/apache/spark/pull/26619#issuecomment-556926064), this PR is required to compile `sql/core` module without `-Phive-2.3`. - For `sql/hive` and `sql/hive-thriftserver`, it's natural that we need `-Phive-1.2` or `-Phive-2.3`. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This will pass the Jenkins (with the dependency check and unit tests). We need to check manually with `./build/mvn -DskipTests --pl sql/core --am compile`. This closes #26657 . Closes #26658 from dongjoon-hyun/SPARK-30015. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-11-25 10:54:14 -08:00
Sean Owen	13896e4eae	[SPARK-30013][SQL] For scala 2.13, omit parens in various BigDecimal value() methods ### What changes were proposed in this pull request? Omit parens on calls like BigDecimal.longValue() ### Why are the changes needed? For some reason, this won't compile in Scala 2.13. The calls are otherwise equivalent in 2.12. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Existing tests Closes #26653 from srowen/SPARK-30013. Authored-by: Sean Owen <sean.owen@databricks.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-11-24 18:23:34 -08:00
Dongjoon Hyun	6625b69027	[SPARK-29981][BUILD][FOLLOWUP] Change hive.version.short ### What changes were proposed in this pull request? This is a follow-up according to liancheng 's advice. - https://github.com/apache/spark/pull/26619#discussion_r349326090 ### Why are the changes needed? Previously, we chose the full version to be carefully. As of today, it seems that `Apache Hive 2.3` branch seems to become stable. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Pass the compile combination on GitHub Action. 1. hadoop-2.7/hive-1.2/JDK8 2. hadoop-2.7/hive-2.3/JDK8 3. hadoop-3.2/hive-2.3/JDK8 4. hadoop-3.2/hive-2.3/JDK11 Also, pass the Jenkins with `hadoop-2.7` and `hadoop-3.2` for (1) and (4). (2) and (3) is not ready in Jenkins. Closes #26645 from dongjoon-hyun/SPARK-RENAME-HIVE-DIRECTORY. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-11-23 12:50:50 -08:00
Dongjoon Hyun	c98e5eb339	[SPARK-29981][BUILD] Add hive-1.2/2.3 profiles ### What changes were proposed in this pull request? This PR aims the followings. - Add two profiles, `hive-1.2` and `hive-2.3` (default) - Validate if we keep the existing combination at least. (Hadoop-2.7 + Hive 1.2 / Hadoop-3.2 + Hive 2.3). For now, we assumes that `hive-1.2` is explicitly used with `hadoop-2.7` and `hive-2.3` with `hadoop-3.2`. The followings are beyond the scope of this PR. - SPARK-29988 Adjust Jenkins jobs for `hive-1.2/2.3` combination - SPARK-29989 Update release-script for `hive-1.2/2.3` combination - SPARK-29991 Support `hive-1.2/2.3` in PR Builder ### Why are the changes needed? This will help to switch our dependencies to update the exposed dependencies. ### Does this PR introduce any user-facing change? This is a dev-only change that the build profile combinations are changed. - `-Phadoop-2.7` => `-Phadoop-2.7 -Phive-1.2` - `-Phadoop-3.2` => `-Phadoop-3.2 -Phive-2.3` ### How was this patch tested? Pass the Jenkins with the dependency check and tests to make it sure we don't change anything for now. - [Jenkins (-Phadoop-2.7 -Phive-1.2)](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114192/consoleFull) - [Jenkins (-Phadoop-3.2 -Phive-2.3)](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114192/consoleFull) Also, from now, GitHub Action validates the following combinations. ![gha](https://user-images.githubusercontent.com/9700541/69355365-822d5e00-0c36-11ea-93f7-e00e5459e1d0.png) Closes #26619 from dongjoon-hyun/SPARK-29981. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-11-23 10:02:22 -08:00
Dongjoon Hyun	f77c10de38	[SPARK-29923][SQL][TESTS] Set io.netty.tryReflectionSetAccessible for Arrow on JDK9+ ### What changes were proposed in this pull request? This PR aims to add `io.netty.tryReflectionSetAccessible=true` to the testing configuration for JDK11 because this is an officially documented requirement of Apache Arrow. Apache Arrow community documented this requirement at `0.15.0` ([ARROW-6206](https://github.com/apache/arrow/pull/5078)). > #### For java 9 or later, should set "-Dio.netty.tryReflectionSetAccessible=true". > This fixes `java.lang.UnsupportedOperationException: sun.misc.Unsafe or java.nio.DirectByteBuffer.(long, int) not available`. thrown by netty. ### Why are the changes needed? After ARROW-3191, Arrow Java library requires the property `io.netty.tryReflectionSetAccessible` to be set to true for JDK >= 9. After https://github.com/apache/spark/pull/26133, JDK11 Jenkins job seem to fail. - https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-3.2-jdk-11/676/ - https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-3.2-jdk-11/677/ - https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-3.2-jdk-11/678/ ```scala Previous exception in task: sun.misc.Unsafe or java.nio.DirectByteBuffer.<init>(long, int) not available io.netty.util.internal.PlatformDependent.directBuffer(PlatformDependent.java:473) io.netty.buffer.NettyArrowBuf.getDirectBuffer(NettyArrowBuf.java:243) io.netty.buffer.NettyArrowBuf.nioBuffer(NettyArrowBuf.java:233) io.netty.buffer.ArrowBuf.nioBuffer(ArrowBuf.java:245) org.apache.arrow.vector.ipc.message.ArrowRecordBatch.computeBodyLength(ArrowRecordBatch.java:222) ``` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Pass the Jenkins with JDK11. Closes #26552 from dongjoon-hyun/SPARK-ARROW-JDK11. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-11-15 23:58:15 -08:00
Bryan Cutler	65a189c7a1	[SPARK-29376][SQL][PYTHON] Upgrade Apache Arrow to version 0.15.1 ### What changes were proposed in this pull request? Upgrade Apache Arrow to version 0.15.1. This includes Java artifacts and increases the minimum required version of PyArrow also. Version 0.12.0 to 0.15.1 includes the following selected fixes/improvements relevant to Spark users: * ARROW-6898 - [Java] Fix potential memory leak in ArrowWriter and several test classes * ARROW-6874 - [Python] Memory leak in Table.to_pandas() when conversion to object dtype * ARROW-5579 - [Java] shade flatbuffer dependency * ARROW-5843 - [Java] Improve the readability and performance of BitVectorHelper#getNullCount * ARROW-5881 - [Java] Provide functionalities to efficiently determine if a validity buffer has completely 1 bits/0 bits * ARROW-5893 - [C++] Remove arrow::Column class from C++ library * ARROW-5970 - [Java] Provide pointer to Arrow buffer * ARROW-6070 - [Java] Avoid creating new schema before IPC sending * ARROW-6279 - [Python] Add Table.slice method or allow slices in \_\_getitem\_\_ * ARROW-6313 - [Format] Tracking for ensuring flatbuffer serialized values are aligned in stream/files. * ARROW-6557 - [Python] Always return pandas.Series from Array/ChunkedArray.to_pandas, propagate field names to Series from RecordBatch, Table * ARROW-2015 - [Java] Use Java Time and Date APIs instead of JodaTime * ARROW-1261 - [Java] Add container type for Map logical type * ARROW-1207 - [C++] Implement Map logical type Changelog can be seen at https://arrow.apache.org/release/0.15.0.html ### Why are the changes needed? Upgrade to get bug fixes, improvements, and maintain compatibility with future versions of PyArrow. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Existing tests, manually tested with Python 3.7, 3.8 Closes #26133 from BryanCutler/arrow-upgrade-015-SPARK-29376. Authored-by: Bryan Cutler <cutlerb@gmail.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2019-11-15 13:27:30 +09:00
Dongjoon Hyun	e25cfc4bb8	[SPARK-29528][BUILD] Upgrade scala-maven-plugin to 4.3.0 for Scala 2.13.1 ### What changes were proposed in this pull request? This PR aims to upgrade `scala-maven-plugin` to `4.3.0` for Scala `2.13.1`. We tried 4.2.4, but it's reverted due to Windows build issue. Now, `4.3.0` has a Window fix. ### Why are the changes needed? Scala 2.13.1 seems to break the binary compatibility. We need to upgrade `scala-maven-plugin` to bring the the following fixes for the latest Scala 2.13.1. - https://github.com/davidB/scala-maven-plugin/issues/363 - https://github.com/sbt/zinc/issues/698 Also, `4.3.0` has the following Window fix. - https://github.com/davidB/scala-maven-plugin/issues/370 (4.2.4 throws error on Windows) ### Does this PR introduce any user-facing change? No. ### How was this patch tested? - For now, we don't support Scala-2.13. This PR at least needs to pass the existing Jenkins with Maven to get prepared for Scala-2.13. - `AppVeyor` passed. (https://ci.appveyor.com/project/ApacheSoftwareFoundation/spark/builds/28745383) Closes #26457 from dongjoon-hyun/SPARK-29528. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-11-10 08:49:05 -08:00
Liang-Chi Hsieh	ef1abf2e2c	[SPARK-29747][BUILD] Bump joda-time version to 2.10.5 ### What changes were proposed in this pull request? This upgrades joda-time from 2.9 to 2.10.5. ### Why are the changes needed? Joda 2.9 is almost 4 yrs ago and there are bugs fix and tz database updates. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Existing tests. Closes #26389 from viirya/upgrade-joda. Authored-by: Liang-Chi Hsieh <liangchi@uber.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2019-11-05 10:08:19 +09:00
angerszhu	e524a3a223	[SPARK-29742][BUILD] Update checkstyle plugin's check dir scope ### What changes were proposed in this pull request? Current checkstyle checking folder can't cover all folder. Since for support multi version hive, we have some divided hive folder. We should check it too. ### Why are the changes needed? Fix build bug ### Does this PR introduce any user-facing change? NO ### How was this patch tested? NO Closes #26385 from AngersZhuuuu/SPARK-29742. Authored-by: angerszhu <angers.zhu@gmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-11-04 09:08:47 -08:00
Sean Owen	19b8c71436	[SPARK-29674][CORE] Update dropwizard metrics to 4.1.x for JDK 9+ ### What changes were proposed in this pull request? Update the version of dropwizard metrics that Spark uses for metrics to 4.1.x, from 3.2.x. ### Why are the changes needed? This helps JDK 9+ support, per for example https://github.com/dropwizard/metrics/pull/1236 ### Does this PR introduce any user-facing change? No, although downstream users with custom metrics may be affected. ### How was this patch tested? Existing tests. Closes #26332 from srowen/SPARK-29674. Authored-by: Sean Owen <sean.owen@databricks.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-11-03 15:13:06 -08:00
Dongjoon Hyun	1ac6bd9f79	[SPARK-29729][BUILD] Upgrade ASM to 7.2 ### What changes were proposed in this pull request? This PR aims to upgrade ASM to 7.2. - https://issues.apache.org/jira/browse/XBEAN-322 (Upgrade to ASM 7.2) - https://asm.ow2.io/versions.html ### Why are the changes needed? This will bring the following patches. - 317875: Infinite loop when parsing invalid method descriptor - 317873: Add support for RET instruction in AdviceAdapter - 317872: Throw an exception if visitFrame used incorrectly - add support for Java 14 ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Pass the Jenkins with the existing UTs. Closes #26373 from dongjoon-hyun/SPARK-29729. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-11-03 10:42:38 -08:00
Eric Meisel	be022d9aee	[SPARK-29677][DSTREAMS] amazon-kinesis-client 1.12.0 ### What changes were proposed in this pull request? Upgrading the amazon-kinesis-client dependency to 1.12.0. ### Why are the changes needed? The current amazon-kinesis-client version is 1.8.10. This version depends on the use of `describeStream`, which has a hard limit on an AWS account (10 reqs / second). Versions 1.9.0 and up leverage `listShards`, which has no such limit. For large customers, this can be a major problem. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Existing tests Closes #26333 from etspaceman/kclUpgrade. Authored-by: Eric Meisel <eric.steven.meisel@gmail.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2019-11-02 16:42:49 -05:00
Xingbo Jiang	8207c835b4	Revert "Prepare Spark release v3.0.0-preview-rc2" This reverts commit `007c873ae3`.	2019-10-30 17:45:44 -07:00
Xingbo Jiang	007c873ae3	Prepare Spark release v3.0.0-preview-rc2 ### What changes were proposed in this pull request? To push the built jars to maven release repository, we need to remove the 'SNAPSHOT' tag from the version name. Made the following changes in this PR: * Update all the `3.0.0-SNAPSHOT` version name to `3.0.0-preview` * Update the sparkR version number check logic to allow jvm version like `3.0.0-preview` Please note those changes were generated by the release script in the past, but this time since we manually add tags on master branch, we need to manually apply those changes too. We shall revert the changes after 3.0.0-preview release passed. ### Why are the changes needed? To make the maven release repository to accept the built jars. ### Does this PR introduce any user-facing change? No ### How was this patch tested? N/A	2019-10-30 17:42:59 -07:00
Xingbo Jiang	b33a58c0c6	Revert "Prepare Spark release v3.0.0-preview-rc1" This reverts commit `5eddbb5f1d`.	2019-10-28 22:32:34 -07:00
Xingbo Jiang	5eddbb5f1d	Prepare Spark release v3.0.0-preview-rc1 ### What changes were proposed in this pull request? To push the built jars to maven release repository, we need to remove the 'SNAPSHOT' tag from the version name. Made the following changes in this PR: * Update all the `3.0.0-SNAPSHOT` version name to `3.0.0-preview` * Update the PySpark version from `3.0.0.dev0` to `3.0.0` Please note those changes were generated by the release script in the past, but this time since we manually add tags on master branch, we need to manually apply those changes too. We shall revert the changes after 3.0.0-preview release passed. ### Why are the changes needed? To make the maven release repository to accept the built jars. ### Does this PR introduce any user-facing change? No ### How was this patch tested? N/A Closes #26243 from jiangxb1987/3.0.0-preview-prepare. Lead-authored-by: Xingbo Jiang <xingbo.jiang@databricks.com> Co-authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Xingbo Jiang <xingbo.jiang@databricks.com>	2019-10-28 22:31:29 -07:00
HyukjinKwon	a8d5134981	Revert "[SPARK-29528][BUILD][TEST-MAVEN] Upgrade scala-maven-plugin to 4.2.4 for Scala 2.13.1" This reverts commit `5fc363b307`.	2019-10-28 20:46:28 +09:00
Dongjoon Hyun	ba9d1610b6	[SPARK-29617][BUILD] Upgrade to ORC 1.5.7 ### What changes were proposed in this pull request? This PR aims to upgrade to Apache ORC 1.5.7. ### Why are the changes needed? This will bring the latest bug fixes. The following is the full release note. - https://issues.apache.org/jira/projects/ORC/versions/12345702 ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Pass the Jenkins with the existing tests. Closes #26276 from dongjoon-hyun/SPARK-29617. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-10-27 21:11:17 -07:00
Dongjoon Hyun	a43b966f00	[SPARK-29613][BUILD][SS] Upgrade to Kafka 2.3.1 ### What changes were proposed in this pull request? This PR aims to upgrade to Kafka 2.3.1 client library for client fixes like KAFKA-8950, KAFKA-8570, and KAFKA-8635. The following is the full release note. - https://archive.apache.org/dist/kafka/2.3.1/RELEASE_NOTES.html ### Why are the changes needed? - [KAFKA-8950 KafkaConsumer stops fetching](https://issues.apache.org/jira/browse/KAFKA-8950) - [KAFKA-8570 Downconversion could fail when log contains out of order message formats](https://issues.apache.org/jira/browse/KAFKA-8570) - [KAFKA-8635 Unnecessary wait when looking up coordinator before transactional request](https://issues.apache.org/jira/browse/KAFKA-8635) ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Pass the Jenkins with the existing tests. Closes #26271 from dongjoon-hyun/SPARK-29613. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-10-27 16:15:54 -07:00
Luca Canali	5867707835	[SPARK-29557][BUILD] Update dropwizard/codahale metrics library to 3.2.6 ### What changes were proposed in this pull request? This proposes to update the dropwizard/codahale metrics library version used by Spark to `3.2.6` which is the last version supporting Ganglia. ### Why are the changes needed? Spark is currently using Dropwizard metrics version 3.1.5, a version that is no more actively developed nor maintained, according to the project's Github repo README. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Existing tests + manual tests on a YARN cluster. Closes #26212 from LucaCanali/updateDropwizardVersion. Authored-by: Luca Canali <luca.canali@cern.ch> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-10-23 10:45:11 -07:00
Dongjoon Hyun	5fc363b307	[SPARK-29528][BUILD][TEST-MAVEN] Upgrade scala-maven-plugin to 4.2.4 for Scala 2.13.1 ### What changes were proposed in this pull request? This PR upgrades `scala-maven-plugin` to `4.2.4` for Scala `2.13.1`. ### Why are the changes needed? Scala 2.13.1 seems to break the binary compatibility. We need to upgrade `scala-maven-plugin` to bring the the following fixes for the latest Scala 2.13.1. - https://github.com/davidB/scala-maven-plugin/issues/363 - https://github.com/sbt/zinc/issues/698 ### Does this PR introduce any user-facing change? No. ### How was this patch tested? For now, we don't support Scala-2.13. This PR at least needs to pass the existing Jenkins with Maven to get prepared for Scala-2.13. Closes #26185 from dongjoon-hyun/SPARK-29528. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2019-10-21 19:05:27 +09:00
Fokko Driesprong	8eb8f7478c	[SPARK-29483][BUILD] Bump Jackson to 2.10.0 ### What changes were proposed in this pull request? Release blog: https://medium.com/cowtowncoder/jackson-2-10-features-cd880674d8a2 Fixes the following CVE's: https://www.cvedetails.com/cve/CVE-2019-16942/ https://www.cvedetails.com/cve/CVE-2019-16943/ Looking back, there were 3 major goals for this minor release: - Resolve the growing problem of “endless CVE patches”, a stream of fixes for reported CVEs related to “Polymorphic Deserialization” problem (described in “On Jackson CVEs… ”) that resulted in security tools forcing Jackson upgrades. 2.10 now includes “Safe Default Typing” that is hoped to resolve this problem. - Evolve 2.x API towards 3.0, based on changes that were done in master, within limits of 2.x API backwards-compatibility requirements. - Add JDK support for versions beyond Java 8: specifically add“module-info.class” for JDK9+, defining proper module definitions for Jackson components Full changelog: https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.10 Improved Scala 2.13 support: https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.10#scala ### Why are the changes needed? Patches CVE's reported by the vulnerability scanner. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Ran `mvn clean install -DskipTests` locally. Closes #26131 from Fokko/SPARK-29483. Authored-by: Fokko Driesprong <fokko@apache.org> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-10-16 15:38:54 -07:00
Jeff Evans	95de93b24e	[SPARK-24540][SQL] Support for multiple character delimiter in Spark CSV read Updating univocity-parsers version to 2.8.3, which adds support for multiple character delimiters Moving univocity-parsers version to spark-parent pom dependencyManagement section Adding new utility method to build multi-char delimiter string, which delegates to existing one Adding tests for multiple character delimited CSV ### What changes were proposed in this pull request? Adds support for parsing CSV data using multiple-character delimiters. Existing logic for converting the input delimiter string to characters was kept and invoked in a loop. Project dependencies were updated to remove redundant declaration of `univocity-parsers` version, and also to change that version to the latest. ### Why are the changes needed? It is quite common for people to have delimited data, where the delimiter is not a single character, but rather a sequence of characters. Currently, it is difficult to handle such data in Spark (typically needs pre-processing). ### Does this PR introduce any user-facing change? Yes. Specifying the "delimiter" option for the DataFrame read, and providing more than one character, will no longer result in an exception. Instead, it will be converted as before and passed to the underlying library (Univocity), which has accepted multiple character delimiters since 2.8.0. ### How was this patch tested? The `CSVSuite` tests were confirmed passing (including new methods), and `sbt` tests for `sql` were executed. Closes #26027 from jeff303/SPARK-24540. Authored-by: Jeff Evans <jeffrey.wayne.evans@gmail.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2019-10-15 15:44:51 -05:00
Dongjoon Hyun	39d53d3e74	[SPARK-29470][BUILD] Update plugins to latest versions ### What changes were proposed in this pull request? This PR updates plugins to latest versions. ### Why are the changes needed? This brings bug fixes like the following. - https://issues.apache.org/jira/projects/MCOMPILER/versions/12343484 (maven-compiler-plugin) - https://issues.apache.org/jira/projects/MJAVADOC/versions/12345060 (maven-javadoc-plugin) - https://issues.apache.org/jira/projects/MCHECKSTYLE/versions/12342397 (maven-checkstyle-plugin) - https://checkstyle.sourceforge.io/releasenotes.html#Release_8.25 (checkstyle) - https://checkstyle.sourceforge.io/releasenotes.html#Release_8.24 (checkstyle) ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Pass the Jenkins building and testing with the existing code. Closes #26117 from dongjoon-hyun/SPARK-29470. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-10-15 11:55:52 -07:00
Fokko Driesprong	b5b1b69f79	[SPARK-29445][CORE] Bump netty-all from 4.1.39.Final to 4.1.42.Final ### What changes were proposed in this pull request? Minor version bump of Netty to patch reported CVE. Patches: https://www.cvedetails.com/cve/CVE-2019-16869/ ### Why are the changes needed? ### Does this PR introduce any user-facing change? No ### How was this patch tested? Compiled locally using `mvn clean install -DskipTests` Closes #26099 from Fokko/SPARK-29445. Authored-by: Fokko Driesprong <fokko@apache.org> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2019-10-12 09:43:16 -05:00
Peter Toth	3a7126cea8	[SPARK-29410][BUILD] Update commons-beanutils to 1.9.4 ### What changes were proposed in this pull request? This PR updates commons-beanutils to 1.9.4. ### Why are the changes needed? CVE fixed in 1.9.4: http://commons.apache.org/proper/commons-beanutils/javadocs/v1.9.4/RELEASE-NOTES.txt ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Existing UTs. Closes #26069 from peter-toth/SPARK-29410-update-commons-beanutils-to-1.9.4. Authored-by: Peter Toth <peter.toth@gmail.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2019-10-12 09:24:06 -05:00

1 2 3 4 5 ...

915 commits