ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Dongjoon Hyun	625abca9db	[SPARK-31858][BUILD] Upgrade commons-io to 2.5 in Hadoop 3.2 profile ### What changes were proposed in this pull request? This PR aims to upgrade `commons-io` from 2.4 to 2.5 for Apache Spark 3.1. ### Why are the changes needed? Since Hadoop 3.1, `commons-io` 2.5 is used. - https://issues.apache.org/jira/browse/HADOOP-15261 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the Jenkins with Hadoop-3.2 profile. Maven dependency is verified via `test-dependencies.sh` automatically. SBT dependency can be verified like the following manually. ``` build/sbt -Phadoop-3.2 "core/dependencyTree" \| grep commons-io:commons-io \| head -n1 [info] \| \| +-commons-io:commons-io:2.5 ``` Closes #28665 from dongjoon-hyun/SPARK-31858. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-05-29 07:46:53 -07:00
Jungtaek Lim (HeartSaVioR)	fe1d1e24bc	[SPARK-31214][BUILD] Upgrade Janino to 3.1.2 ### What changes were proposed in this pull request? This PR proposes to upgrade Janino to 3.1.2 which is released recently. Major changes were done for refactoring, as well as there're lots of "no commit message". Belows are the pairs of (commit title, commit) which seem to deal with some bugs or specific improvements (not purposed to refactor) after 3.0.15. * Issue #119: Guarantee executing popOperand() in popUninitializedVariableOperand() via moving popOperand() out of "assert" * Issue #116: replace operand to final target type if boxing conversion and widening reference conversion happen together * Merged pull request `#114` "Grow the code for relocatables, and do fixup, and relocate". * `367c58e73e` * issue `#107`: Janino requires "org.codehaus.commons.compiler.io", but commons-compiler does not export this package * `f7d99596d4` * Throw an NYI CompileException when a static interface method is invoked. * `efd3884983` * Fixed the promotion of the array access index expression (see JLS7 15.13 Array Access Expressions) * `32fdb5f5f1` * Issue `#104`: ClassLoaderIClassLoader 's ClassNotFoundException handle mechanism enhancement * `6e8a97d609` You can see the changelog from the link: http://janino-compiler.github.io/janino/changelog.html ### Why are the changes needed? We got some report on failure on user's query which Janino throws error on compiling generated code. The issue is here: https://github.com/janino-compiler/janino/issues/113 It contains the information of generated code, symptom (error), and analysis of the bug, so please refer the link for more details. Janino 3.1.1 contains the PR https://github.com/janino-compiler/janino/pull/114 which would enable Janino to succeed to compile user's query properly. I've also fixed a couple of more bugs as 3.1.1 made Spark UTs fail - hence we need to upgrade to 3.1.2. Furthermore, from my testing, https://github.com/janino-compiler/janino/issues/90 (which Josh Rosen filed before) seems to be also resolved in 3.1.2 as well. Looks like Janino is maintained by one person and there's no even version branches and releases/tags so we can't expect Janino maintainer to release a new bugfix version - hence have to try out new minor version. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Existing UTs. Closes #27860 from HeartSaVioR/SPARK-31101. Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-05-29 07:42:57 -07:00
Gabor Somogyi	8f1d77488c	[SPARK-31821][BUILD] Remove mssql-jdbc dependencies from Hadoop 3.2 profile ### What changes were proposed in this pull request? There is an unnecessary dependency for `mssql-jdbc`. In this PR I've removed it. ### Why are the changes needed? Unnecessary dependency. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the Jenkins with the following configuration. - [x] Pass the dependency test. - [x] SBT with Hadoop-3.2 (https://github.com/apache/spark/pull/28640#issuecomment-634192512) - [ ] Maven with Hadoop-3.2 Closes #28640 from gaborgsomogyi/SPARK-31821. Authored-by: Gabor Somogyi <gabor.g.somogyi@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-05-26 18:21:22 -07:00
Dongjoon Hyun	64ffc66496	[SPARK-31786][K8S][BUILD] Upgrade kubernetes-client to 4.9.2 ### What changes were proposed in this pull request? This PR aims to upgrade `kubernetes-client` library to bring the JDK8 related fixes. Please note that JDK11 works fine without any problem. - https://github.com/fabric8io/kubernetes-client/releases/tag/v4.9.2 - JDK8 always uses http/1.1 protocol (Prevent OkHttp from wrongly enabling http/2) ### Why are the changes needed? OkHttp "wrongly" detects the Platform as Jdk9Platform on JDK 8u251. - https://github.com/fabric8io/kubernetes-client/issues/2212 - https://stackoverflow.com/questions/61565751/why-am-i-not-able-to-run-sparkpi-example-on-a-kubernetes-k8s-cluster Although there is a workaround `export HTTP2_DISABLE=true` and `Downgrade JDK or K8s`, we had better avoid this problematic situation. ### Does this PR introduce _any_ user-facing change? No. This will recover the failures on JDK 8u252. ### How was this patch tested? - [x] Pass the Jenkins UT (https://github.com/apache/spark/pull/28601#issuecomment-632474270) - [x] Pass the Jenkins K8S IT with the K8s 1.13 (https://github.com/apache/spark/pull/28601#issuecomment-632438452) - [x] Manual testing with K8s 1.17.3. (Below) v1.17.6 result (on Minikube) ``` KubernetesSuite: - Run SparkPi with no resources - Run SparkPi with a very long application name. - Use SparkLauncher.NO_RESOURCE - Run SparkPi with a master URL without a scheme. - Run SparkPi with an argument. - Run SparkPi with custom labels, annotations, and environment variables. - All pods have the same service account by default - Run extraJVMOptions check on driver - Run SparkRemoteFileTest using a remote data file - Run SparkPi with env and mount secrets. - Run PySpark on simple pi.py example - Run PySpark with Python2 to test a pyfiles example - Run PySpark with Python3 to test a pyfiles example - Run PySpark with memory customization - Run in client mode. - Start pod creation from template - PVs with local storage - Launcher client dependencies - Test basic decommissioning Run completed in 8 minutes, 27 seconds. Total number of tests run: 19 Suites: completed 2, aborted 0 Tests: succeeded 19, failed 0, canceled 0, ignored 0, pending 0 All tests passed. ``` Closes #28601 from dongjoon-hyun/SPARK-K8S-CLIENT. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-05-23 11:07:45 -07:00
angerszhu	0d9faf602e	[SPARK-31655][BUILD] Upgrade snappy-java to 1.1.7.5 ### What changes were proposed in this pull request? snappy-java have release v1.1.7.5, upgrade to latest version. Fixed in v1.1.7.4 - Caching internal buffers for SnappyFramed streams #234 - Fixed the native lib for ppc64le to work with glibc 2.17 (Previously it depended on 2.22) Fixed in v1.1.7.5 - Fixes java.lang.NoClassDefFoundError: org/xerial/snappy/pool/DefaultPoolFactory in 1.1.7.4 https://github.com/xerial/snappy-java/compare/1.1.7.3...1.1.7.5 v 1.1.7.5 release note: `edc4ec28bd` ### Why are the changes needed? Fix bug ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? No need Closes #28472 from AngersZhuuuu/spark-31655. Authored-by: angerszhu <angers.zhu@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-05-07 12:01:43 -07:00
Dongjoon Hyun	e7995c2ddc	[SPARK-31633][BUILD] Upgrade SLF4J from 1.7.16 to 1.7.30 ### What changes were proposed in this pull request? This PR aims to upgrade SLF4J from 1.7.16 to 1.7.30. ### Why are the changes needed? SLF4J 1.7.23+ is required to enable `slf4j-log4j12` with MDC feature to run under Java 9. Also, this will bring all latest bug fixes. - http://www.slf4j.org/news.html > When running under Java 9, log4j version 1.2.x is unable to correctly parse the "java.version" system property. Assuming an inccorect Java version, it proceeded to disable its MDC functionality. The slf4j-log4j12 module shipping in this release fixes the issue by tweaking MDC internals by reflection, allowing log4j to run under Java 9. See also SLF4J-393. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the Jenkins with the existing tests. Closes #28446 from dongjoon-hyun/SPARK-31633. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-05-04 08:14:12 -07:00
Dongjoon Hyun	79eaaaf6da	[SPARK-31580][BUILD] Upgrade Apache ORC to 1.5.10 ### What changes were proposed in this pull request? This PR aims to upgrade Apache ORC to 1.5.10. ### Why are the changes needed? Apache ORC 1.5.10 is a maintenance release with the following patches. - [ORC-621](https://issues.apache.org/jira/browse/ORC-621) Need reader fix for ORC-569 - [ORC-616](https://issues.apache.org/jira/browse/ORC-616) In Patched Base encoding, the value of headerThirdByte goes beyond the range of byte - [ORC-613](https://issues.apache.org/jira/browse/ORC-613) OrcMapredRecordReader mis-reuse struct object when actual children schema differs - [ORC-610](https://issues.apache.org/jira/browse/ORC-610) Updated Copyright year in the NOTICE file The following is release note. - https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12318320&version=12346912 ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Pass the Jenkins with the existing ORC tests and a newly added test case. - The first commit is already tested in `hive-2.3` profile with both native ORC implementation and Hive 2.3 ORC implementation. (https://github.com/apache/spark/pull/28373#issuecomment-620265114) - The latest run is about to make the test case disable in `hive-1.2` profile which doesn't use Apache ORC. - `hive-1.2`: https://github.com/apache/spark/pull/28373#issuecomment-620325906 Closes #28373 from dongjoon-hyun/SPARK-ORC-1.5.10. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-04-27 18:56:30 -07:00
Yuming Wang	b11e42663b	[SPARK-31381][SPARK-29245][SQL] Upgrade built-in Hive 2.3.6 to 2.3.7 ### What changes were proposed in this pull request? Hive 2.3.7 fixed these issues: - HIVE-21508: ClassCastException when initializing HiveMetaStoreClient on JDK10 or newer - HIVE-21980:Parsing time can be high in case of deeply nested subqueries - HIVE-22249: Support Parquet through HCatalog ### Why are the changes needed? Fix CCE during creating HiveMetaStoreClient in JDK11 environment: [SPARK-29245](https://issues.apache.org/jira/browse/SPARK-29245). ### Does this PR introduce any user-facing change? No. ### How was this patch tested? - [x] Test Jenkins with Hadoop 2.7 (https://github.com/apache/spark/pull/28148#issuecomment-616757840) - [x] Test Jenkins with Hadoop 3.2 on JDK11 (https://github.com/apache/spark/pull/28148#issuecomment-616294353) - [x] Manual test with remote hive metastore. Hive side: ``` export JAVA_HOME=/usr/lib/jdk1.8.0_221 export PATH=$JAVA_HOME/bin:$PATH cd /usr/lib/hive-2.3.6 # Start Hive metastore with Hive 2.3.6 bin/schematool -dbType derby -initSchema --verbose bin/hive --service metastore ``` Spark side: ``` export JAVA_HOME=/usr/lib/jdk-11.0.3 export PATH=$JAVA_HOME/bin:$PATH build/sbt clean package -Phive -Phadoop-3.2 -Phive-thriftserver export SPARK_PREPEND_CLASSES=true bin/spark-sql --conf spark.hadoop.hive.metastore.uris=thrift://localhost:9083 ``` Closes #28148 from wangyum/SPARK-31381. Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-04-20 13:38:24 -07:00
Jungtaek Lim (HeartSaVioR)	f55f6b569b	[SPARK-31101][BUILD] Upgrade Janino to 3.0.16 ### What changes were proposed in this pull request? This PR(SPARK-31101) proposes to upgrade Janino to 3.0.16 which is released recently. * Merged pull request janino-compiler/janino#114 "Grow the code for relocatables, and do fixup, and relocate". Please see the commit log. - https://github.com/janino-compiler/janino/commits/3.0.16 You can see the changelog from the link: http://janino-compiler.github.io/janino/changelog.html / though release note for Janino 3.0.16 is actually incorrect. ### Why are the changes needed? We got some report on failure on user's query which Janino throws error on compiling generated code. The issue is here: janino-compiler/janino#113 It contains the information of generated code, symptom (error), and analysis of the bug, so please refer the link for more details. Janino 3.0.16 contains the PR janino-compiler/janino#114 which would enable Janino to succeed to compile user's query properly. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Existing UTs. Closes #27932 from HeartSaVioR/SPARK-31101-janino-3.0.16. Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-03-21 19:10:23 -07:00
Dongjoon Hyun	93def95b08	[SPARK-31095][BUILD] Upgrade netty-all to 4.1.47.Final ### What changes were proposed in this pull request? This PR aims to bring the bug fixes from the latest netty-all. ### Why are the changes needed? - 4.1.47.Final: https://github.com/netty/netty/milestone/222?closed=1 (15 patches or issues) - 4.1.46.Final: https://github.com/netty/netty/milestone/221?closed=1 (80 patches or issues) - 4.1.45.Final: https://github.com/netty/netty/milestone/220?closed=1 (23 patches or issues) - 4.1.44.Final: https://github.com/netty/netty/milestone/218?closed=1 (113 patches or issues) - 4.1.43.Final: https://github.com/netty/netty/milestone/217?closed=1 (63 patches or issues) ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Pass the Jenkins with the existing tests. Closes #27869 from dongjoon-hyun/SPARK-31095. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-03-10 17:50:34 -07:00
HyukjinKwon	5b3277f4fc	[SPARK-30994][BUILD][FOLLOW-UP] Change scope of xml-apis to include it and add xerces in SBT as dependency override ### What changes were proposed in this pull request? This PR propose 1. Explicitly include xml-apis. xml-apis is already the part of xerces 2.12.0 (https://repo1.maven.org/maven2/xerces/xercesImpl/2.12.0/xercesImpl-2.12.0.pom). However, we're excluding it by setting `scope` to `test`. This seems causing `spark-shell`, built from Maven, to fail. Seems like previously xml-apis wasn't reached for some reasons but after we upgrade, it seems requiring. Therefore, this PR proposes to include it. 2. Pins `xerces` version in SBT as well. Seems this dependency is resolved differently from Maven. Note that Hadoop 3 does not looks requiring this as they replaced xerces as of [HDFS-12221](https://issues.apache.org/jira/browse/HDFS-12221). ### Why are the changes needed? To make `spark-shell` working from Maven build, and uses the same xerces version. ### Does this PR introduce any user-facing change? No, it's master only. ### How was this patch tested? 1. ```bash ./build/mvn -DskipTests -Psparkr -Phive clean package ./bin/spark-shell ``` Before: ``` Exception in thread "main" java.lang.NoClassDefFoundError: org/w3c/dom/ElementTraversal at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:763) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:468) at java.net.URLClassLoader.access$100(URLClassLoader.java:74) at java.net.URLClassLoader$1.run(URLClassLoader.java:369) at java.net.URLClassLoader$1.run(URLClassLoader.java:363) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:362) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at org.apache.xerces.parsers.AbstractDOMParser.startDocument(Unknown Source) at org.apache.xerces.xinclude.XIncludeHandler.startDocument(Unknown Source) at org.apache.xerces.impl.dtd.XMLDTDValidator.startDocument(Unknown Source) at org.apache.xerces.impl.XMLDocumentScannerImpl.startEntity(Unknown Source) at org.apache.xerces.impl.XMLVersionDetector.startDocumentParsing(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XMLParser.parse(Unknown Source) at org.apache.xerces.parsers.DOMParser.parse(Unknown Source) at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source) at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:150) at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2482) at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2470) at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2541) at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2494) at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2407) at org.apache.hadoop.conf.Configuration.set(Configuration.java:1143) at org.apache.hadoop.conf.Configuration.set(Configuration.java:1115) at org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456) at org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427) at org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.lang.ClassNotFoundException: org.w3c.dom.ElementTraversal at java.net.URLClassLoader.findClass(URLClassLoader.java:382) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 42 more ``` After: ``` ... Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 3.1.0-SNAPSHOT /_/ Using Scala version 2.12.10 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_202) Type in expressions to have them evaluated. Type :help for more information. scala> ``` 2. ``` ./build/sbt dependencyTree -Phadoop-2.7 -Phive-2.3 -Phive-thriftserver -Phive ./build/sbt dependencyTree -Phadoop-3.2 -Phive-2.3 -Phive-thriftserver -Phive ``` Closes #27808 from HyukjinKwon/SPARK-30994. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-03-06 09:39:02 +09:00
Sean Owen	97d9a22b04	[SPARK-30994][CORE] Update xerces to 2.12.0 ### What changes were proposed in this pull request? Manage up the version of Xerces that Hadoop uses (and potentially user apps) to 2.12.0 to match https://issues.apache.org/jira/browse/HADOOP-16530 ### Why are the changes needed? Picks up bug and security fixes: https://www.xml.com/news/2018-05-apache-xerces-j-2120/ ### Does this PR introduce any user-facing change? Should be no behavior changes. ### How was this patch tested? Existing tests. Closes #27746 from srowen/SPARK-30994. Authored-by: Sean Owen <srowen@gmail.com> Signed-off-by: Sean Owen <srowen@gmail.com>	2020-03-03 09:27:18 -06:00
Dongjoon Hyun	fc4e56a54c	[SPARK-30884][PYSPARK] Upgrade to Py4J 0.10.9 This PR aims to upgrade Py4J to `0.10.9` for better Python 3.7 support in Apache Spark 3.0.0 (master/branch-3.0). This is not for `branch-2.4`. - Apache Spark 3.0.0 is using `Py4J 0.10.8.1` (released on 2018-10-21) because `0.10.8.1` was the first official release to support Python 3.7. - https://www.py4j.org/changelog.html#py4j-0-10-8-and-py4j-0-10-8-1 - `Py4J 0.10.9` was released on January 25th 2020 with better Python 3.7 support and `magic_member` bug fix. - https://github.com/bartdag/py4j/releases/tag/0.10.9 - https://www.py4j.org/changelog.html#py4j-0-10-9 No. Pass the Jenkins with the existing tests. Closes #27641 from dongjoon-hyun/SPARK-30884. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2020-02-20 09:09:30 -08:00
Yin Huai	ea626b6acf	[SPARK-30783] Exclude hive-service-rpc ### What changes were proposed in this pull request? Exclude hive-service-rpc from build. ### Why are the changes needed? hive-service-rpc 2.3.6 and spark sql's thrift server module have duplicate classes. Leaving hive-service-rpc 2.3.6 in the class path means that spark can pick up classes defined in hive instead of its thrift server module, which can cause hard to debug runtime errors due to class loading order and compilation errors for applications depend on spark. If you compare hive-service-rpc 2.3.6's jar (https://search.maven.org/remotecontent?filepath=org/apache/hive/hive-service-rpc/2.3.6/hive-service-rpc-2.3.6.jar) and spark thrift server's jar (e.g. https://repository.apache.org/content/groups/snapshots/org/apache/spark/spark-hive-thriftserver_2.12/3.0.0-SNAPSHOT/spark-hive-thriftserver_2.12-3.0.0-20200207.021914-364.jar), you will see that all of classes provided by hive-service-rpc-2.3.6.jar are covered by spark thrift server's jar. https://issues.apache.org/jira/browse/SPARK-30783 has output of jar tf for both jars. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Existing tests. Closes #27533 from yhuai/SPARK-30783. Authored-by: Yin Huai <yhuai@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2020-02-12 00:12:45 +08:00
Onur Satici	86fdb818bf	[SPARK-30715][K8S] Bump fabric8 to 4.7.1 ### What changes were proposed in this pull request? Bump fabric8 kubernetes-client to 4.7.1 ### Why are the changes needed? New fabric8 version brings support for Kubernetes 1.17 clusters. Full release notes: - https://github.com/fabric8io/kubernetes-client/releases/tag/v4.7.0 - https://github.com/fabric8io/kubernetes-client/releases/tag/v4.7.1 ### Does this PR introduce any user-facing change? No ### How was this patch tested? Existing unit and integration tests cover creation of K8S objects. Adjusted them to work with the new fabric8 version Closes #27443 from onursatici/os/bump-fabric8. Authored-by: Onur Satici <onursatici@gmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2020-02-05 01:17:30 -08:00
Dongjoon Hyun	2fd15a26fb	[SPARK-30695][BUILD] Upgrade Apache ORC to 1.5.9 ### What changes were proposed in this pull request? This PR aims to upgrade to Apache ORC 1.5.9. - For `hive-2.3` profile, we need to upgrade `hive-storage-api` from `2.6.0` to `2.7.1`. - For `hive-1.2` profile, ORC library with classifier `nohive` already shaded it. So, there is no change. ### Why are the changes needed? This will bring the latest bug fixes. The following is the full release note. - https://issues.apache.org/jira/projects/ORC/versions/12346546 ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Pass the Jenkins with the existing tests. Here is the summary. 1. `Hive 1.2 + Hadoop 2.7` passed. ([here](https://github.com/apache/spark/pull/27421#issuecomment-580924552)) 2. `Hive 2.3 + Hadoop 2.7` passed. ([here](https://github.com/apache/spark/pull/27421#issuecomment-580973391)) Closes #27421 from dongjoon-hyun/SPARK-ORC-1.5.9. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2020-01-31 17:41:27 -08:00
Dongjoon Hyun	862959747e	[SPARK-30639][BUILD] Upgrade Jersey to 2.30 ### What changes were proposed in this pull request? For better JDK11 support, this PR aims to upgrade Jersey and javassist to `2.30` and `3.35.0-GA` respectively. ### Why are the changes needed? Jersey: This will bring the following `Jersey` updates. - https://eclipse-ee4j.github.io/jersey.github.io/release-notes/2.30.html - https://github.com/eclipse-ee4j/jersey/issues/4245 (Java 11 java.desktop module dependency) javassist: This is a transitive dependency from 3.20.0-CR2 to 3.25.0-GA. - `javassist` officially supports JDK11 from [3.24.0-GA release note](https://github.com/jboss-javassist/javassist/blob/master/Readme.html#L308). ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Pass the Jenkins with both JDK8 and JDK11. Closes #27357 from dongjoon-hyun/SPARK-30639. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2020-01-25 15:41:55 -08:00
Takeshi Yamamuro	775fae4640	[SPARK-30486][BUILD] Bump lz4-java version to 1.7.1 ### What changes were proposed in this pull request? This pr intends to upgrade lz4-java from 1.7.0 to 1.7.1. ### Why are the changes needed? This release includes a bug fix for older macOS. You can see the link below for the changes; https://github.com/lz4/lz4-java/blob/master/CHANGES.md#171 ### Does this PR introduce any user-facing change? ### How was this patch tested? Existing tests. Closes #27271 from maropu/SPARK-30486. Authored-by: Takeshi Yamamuro <yamamuro@apache.org> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2020-01-19 19:05:30 -08:00
Sean Owen	a2081ae4e1	[SPARK-29290][CORE] Update to chill 0.9.5 ### What changes were proposed in this pull request? Update Twitter Chill to 0.9.5. ### Why are the changes needed? Primarily, Scala 2.13 support for later. Other changes from 0.9.3 are apparently just minor fixes and improvements: https://github.com/twitter/chill/releases ### Does this PR introduce any user-facing change? No ### How was this patch tested? Existing tests Closes #27227 from srowen/SPARK-29290. Authored-by: Sean Owen <srowen@gmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2020-01-19 18:39:38 -08:00
Xinrong Meng	f88874194a	[SPARK-30491][INFRA] Enable dependency audit files to tell dependency classifier ### What changes were proposed in this pull request? Enable dependency audit files to tell the value of artifact id, version, and classifier of a dependency. For example, `avro-mapred-1.8.2-hadoop2.jar` should be expanded to `avro-mapred/1.8.2/hadoop2/avro-mapred-1.8.2-hadoop2.jar` where `avro-mapred` is the artifact id, `1.8.2` is the version, and `haddop2` is the classifier. ### Why are the changes needed? Dependency audit files are expected to be consumed by automated tests or downstream tools. However, current dependency audit files under `dev/deps` only show jar names. And there isn't a simple rule on how to parse the jar name to get the values of different fields. For example, `hadoop2` is the classifier of `avro-mapred-1.8.2-hadoop2.jar`, in contrast, `incubating` is the version of `htrace-core-3.1.0-incubating.jar`. Reference: There is a good example of the downstream tool that would be enabled as yhuai suggested, > Say we have a Spark application that depends on a third-party dependency `foo`, which pulls in `jackson` as a transient dependency. Unfortunately, `foo` depends on a different version of `jackson` than Spark. So, in the pom of this Spark application, we use the dependency management section to pin the version of `jackson`. By doing this, we are lifting `jackson` to the top-level dependency of my application and I want to have a way to keep tracking what Spark uses. What we can do is to cross-check my Spark application's classpath with what Spark uses. Then, with a test written in my code base, whenever my application bumps Spark version, this test will check what we define in the application and what Spark has, and then remind us to change our application's pom if needed. In my case, I am fine to directly access git to get these audit files. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Code changes are verified by generated dependency audit files naturally. Thus, there are no tests added. Closes #27177 from mengCareers/depsOptimize. Lead-authored-by: Xinrong Meng <meng.careers@gmail.com> Co-authored-by: mengCareers <meng.careers@gmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2020-01-15 20:19:44 -08:00
Sean Owen	fac6b9bde8	Revert [SPARK-27300][GRAPH] Add Spark Graph modules and dependencies This reverts commit `709387d660`. See https://issues.apache.org/jira/browse/SPARK-27300?focusedCommentId=16990048&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16990048 and previous mailing list discussions. ### What changes were proposed in this pull request? Revert the addition of skeleton graph API modules for Spark 3.0. ### Why are the changes needed? It does not appear that content will be added to the module for Spark 3, so I propose avoiding committing to the modules, which are no-ops now, in the upcoming major 3.0 release. ### Does this PR introduce any user-facing change? No, the modules were not released. ### How was this patch tested? Existing tests, but mostly N/A. Closes #26928 from srowen/Revert27300. Authored-by: Sean Owen <srowen@gmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-12-17 09:06:23 -08:00
Dongjoon Hyun	cc276f8a6e	[SPARK-30243][BUILD][K8S] Upgrade K8s client dependency to 4.6.4 ### What changes were proposed in this pull request? This PR aims to upgrade K8s client library from 4.6.1 to 4.6.4 for `3.0.0-preview2`. ### Why are the changes needed? This will bring the latest bug fixes. - https://github.com/fabric8io/kubernetes-client/releases/tag/v4.6.4 - https://github.com/fabric8io/kubernetes-client/releases/tag/v4.6.3 - https://github.com/fabric8io/kubernetes-client/releases/tag/v4.6.2 ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Pass the Jenkins with K8s integration test. Closes #26874 from dongjoon-hyun/SPARK-30243. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-12-13 08:25:51 -08:00
Dongjoon Hyun	b709091b4f	[SPARK-30228][BUILD] Update zstd-jni to 1.4.4-3 ### What changes were proposed in this pull request? This PR aims to update zstd-jni library to 1.4.4-3. ### Why are the changes needed? This will bring the latest bug fixes in zstd itself and some performance improvement. - https://github.com/facebook/zstd/releases/tag/v1.4.4 ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Pass the Jenkins. Closes #26856 from dongjoon-hyun/SPARK-ZSTD-144. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2019-12-12 14:16:32 +09:00
Takeshi Yamamuro	be867e8a9e	[SPARK-30196][BUILD] Bump lz4-java version to 1.7.0 ### What changes were proposed in this pull request? This pr intends to upgrade lz4-java from 1.6.0 to 1.7.0. ### Why are the changes needed? This release includes a performance bug (https://github.com/lz4/lz4-java/pull/143) fixed by JoshRosen and some improvements (e.g., LZ4 binary update). You can see the link below for the changes; https://github.com/lz4/lz4-java/blob/master/CHANGES.md#170 ### Does this PR introduce any user-facing change? No ### How was this patch tested? Existing tests. Closes #26823 from maropu/LZ4_1_7_0. Authored-by: Takeshi Yamamuro <yamamuro@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2019-12-10 12:22:03 +09:00
Dongjoon Hyun	afc4fa02bd	[SPARK-30156][BUILD] Upgrade Jersey from 2.29 to 2.29.1 ### What changes were proposed in this pull request? This PR aims to upgrade `Jersey` from 2.29 to 2.29.1. ### Why are the changes needed? This will bring several bug fixes and important dependency upgrades. - https://eclipse-ee4j.github.io/jersey.github.io/release-notes/2.29.1.html ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Pass the Jenkins. Closes #26785 from dongjoon-hyun/SPARK-30156. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-12-06 18:49:43 -08:00
Dongjoon Hyun	1e0037b5e9	[SPARK-30157][BUILD][TEST-HADOOP3.2][TEST-JAVA11] Upgrade Apache HttpCore from 4.4.10 to 4.4.12 ### What changes were proposed in this pull request? This PR aims to upgrade `Apache HttpCore` from 4.4.10 to 4.4.12. ### Why are the changes needed? `Apache HttpCore v4.4.11` is the first official release for JDK11. > This is a maintenance release that corrects a number of defects in non-blocking SSL session code that caused compatibility issues with TLSv1.3 protocol implementation shipped with Java 11. For the full release note, please see the following. - https://www.apache.org/dist/httpcomponents/httpcore/RELEASE_NOTES-4.4.x.txt ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Pass the Jenkins. Closes #26786 from dongjoon-hyun/SPARK-30157. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2019-12-07 10:59:10 +09:00
Dongjoon Hyun	f3abee377d	[SPARK-30051][BUILD] Clean up hadoop-3.2 dependency ### What changes were proposed in this pull request? This PR aims to cut `org.eclipse.jetty:jetty-webapp`and `org.eclipse.jetty:jetty-xml` transitive dependency from `hadoop-common`. ### Why are the changes needed? This will simplify our dependency management by the removal of unused dependencies. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Pass the GitHub Action with all combinations and the Jenkins UT with (Hadoop-3.2). Closes #26742 from dongjoon-hyun/SPARK-30051. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-12-03 14:33:36 -08:00
Dongjoon Hyun	c2d513f8e9	[SPARK-30035][BUILD] Upgrade to Apache Commons Lang 3.9 ### What changes were proposed in this pull request? This PR aims to upgrade to `Apache Commons Lang 3.9`. ### Why are the changes needed? `Apache Commons Lang 3.9` is the first official release to support JDK9+. The following is the full release note. - https://commons.apache.org/proper/commons-lang/release-notes/RELEASE-NOTES-3.9.txt ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Pass the Jenkins with the existing tests. Closes #26672 from dongjoon-hyun/SPARK-30035. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2019-11-26 21:31:02 +09:00
Dongjoon Hyun	53e19f3678	[SPARK-30032][BUILD] Upgrade to ORC 1.5.8 ### What changes were proposed in this pull request? This PR aims to upgrade to Apache ORC 1.5.8. ### Why are the changes needed? This will bring the latest bug fixes. The following is the full release note. - https://issues.apache.org/jira/projects/ORC/versions/12346462 ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Pass the Jenkins with the existing tests. Closes #26669 from dongjoon-hyun/SPARK-ORC-1.5.8. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-11-25 20:08:11 -08:00
Dongjoon Hyun	a1706e2fa7	[SPARK-30005][INFRA] Update `test-dependencies.sh` to check `hive-1.2/2.3` profile ### What changes were proposed in this pull request? This PR aims to update `test-dependencies.sh` to validate all available `Hadoop/Hive` combination. ### Why are the changes needed? Previously, we have been checking only `Hadoop2.7/Hive1.2` and `Hadoop3.2/Hive2.3`. We need to validate `Hadoop2.7/Hive2.3` additionally for Apache Spark 3.0. ### Does this PR introduce any user-facing change? No. (This is a dev-only change). ### How was this patch tested? Pass the GitHub Action (Linter) with the newly updated manifest because this is only dependency check. Closes #26646 from dongjoon-hyun/SPARK-30005. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-11-24 10:14:02 -08:00
Bryan Cutler	65a189c7a1	[SPARK-29376][SQL][PYTHON] Upgrade Apache Arrow to version 0.15.1 ### What changes were proposed in this pull request? Upgrade Apache Arrow to version 0.15.1. This includes Java artifacts and increases the minimum required version of PyArrow also. Version 0.12.0 to 0.15.1 includes the following selected fixes/improvements relevant to Spark users: * ARROW-6898 - [Java] Fix potential memory leak in ArrowWriter and several test classes * ARROW-6874 - [Python] Memory leak in Table.to_pandas() when conversion to object dtype * ARROW-5579 - [Java] shade flatbuffer dependency * ARROW-5843 - [Java] Improve the readability and performance of BitVectorHelper#getNullCount * ARROW-5881 - [Java] Provide functionalities to efficiently determine if a validity buffer has completely 1 bits/0 bits * ARROW-5893 - [C++] Remove arrow::Column class from C++ library * ARROW-5970 - [Java] Provide pointer to Arrow buffer * ARROW-6070 - [Java] Avoid creating new schema before IPC sending * ARROW-6279 - [Python] Add Table.slice method or allow slices in \_\_getitem\_\_ * ARROW-6313 - [Format] Tracking for ensuring flatbuffer serialized values are aligned in stream/files. * ARROW-6557 - [Python] Always return pandas.Series from Array/ChunkedArray.to_pandas, propagate field names to Series from RecordBatch, Table * ARROW-2015 - [Java] Use Java Time and Date APIs instead of JodaTime * ARROW-1261 - [Java] Add container type for Map logical type * ARROW-1207 - [C++] Implement Map logical type Changelog can be seen at https://arrow.apache.org/release/0.15.0.html ### Why are the changes needed? Upgrade to get bug fixes, improvements, and maintain compatibility with future versions of PyArrow. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Existing tests, manually tested with Python 3.7, 3.8 Closes #26133 from BryanCutler/arrow-upgrade-015-SPARK-29376. Authored-by: Bryan Cutler <cutlerb@gmail.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2019-11-15 13:27:30 +09:00
Liang-Chi Hsieh	ef1abf2e2c	[SPARK-29747][BUILD] Bump joda-time version to 2.10.5 ### What changes were proposed in this pull request? This upgrades joda-time from 2.9 to 2.10.5. ### Why are the changes needed? Joda 2.9 is almost 4 yrs ago and there are bugs fix and tz database updates. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Existing tests. Closes #26389 from viirya/upgrade-joda. Authored-by: Liang-Chi Hsieh <liangchi@uber.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2019-11-05 10:08:19 +09:00
Sean Owen	19b8c71436	[SPARK-29674][CORE] Update dropwizard metrics to 4.1.x for JDK 9+ ### What changes were proposed in this pull request? Update the version of dropwizard metrics that Spark uses for metrics to 4.1.x, from 3.2.x. ### Why are the changes needed? This helps JDK 9+ support, per for example https://github.com/dropwizard/metrics/pull/1236 ### Does this PR introduce any user-facing change? No, although downstream users with custom metrics may be affected. ### How was this patch tested? Existing tests. Closes #26332 from srowen/SPARK-29674. Authored-by: Sean Owen <sean.owen@databricks.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-11-03 15:13:06 -08:00
Dongjoon Hyun	1ac6bd9f79	[SPARK-29729][BUILD] Upgrade ASM to 7.2 ### What changes were proposed in this pull request? This PR aims to upgrade ASM to 7.2. - https://issues.apache.org/jira/browse/XBEAN-322 (Upgrade to ASM 7.2) - https://asm.ow2.io/versions.html ### Why are the changes needed? This will bring the following patches. - 317875: Infinite loop when parsing invalid method descriptor - 317873: Add support for RET instruction in AdviceAdapter - 317872: Throw an exception if visitFrame used incorrectly - add support for Java 14 ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Pass the Jenkins with the existing UTs. Closes #26373 from dongjoon-hyun/SPARK-29729. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-11-03 10:42:38 -08:00
Dongjoon Hyun	ba9d1610b6	[SPARK-29617][BUILD] Upgrade to ORC 1.5.7 ### What changes were proposed in this pull request? This PR aims to upgrade to Apache ORC 1.5.7. ### Why are the changes needed? This will bring the latest bug fixes. The following is the full release note. - https://issues.apache.org/jira/projects/ORC/versions/12345702 ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Pass the Jenkins with the existing tests. Closes #26276 from dongjoon-hyun/SPARK-29617. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-10-27 21:11:17 -07:00
Luca Canali	5867707835	[SPARK-29557][BUILD] Update dropwizard/codahale metrics library to 3.2.6 ### What changes were proposed in this pull request? This proposes to update the dropwizard/codahale metrics library version used by Spark to `3.2.6` which is the last version supporting Ganglia. ### Why are the changes needed? Spark is currently using Dropwizard metrics version 3.1.5, a version that is no more actively developed nor maintained, according to the project's Github repo README. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Existing tests + manual tests on a YARN cluster. Closes #26212 from LucaCanali/updateDropwizardVersion. Authored-by: Luca Canali <luca.canali@cern.ch> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-10-23 10:45:11 -07:00
igor.calabria	78bdcfade1	[SPARK-27812][K8S] Bump K8S client version to 4.6.1 ### What changes were proposed in this pull request? Updated kubernetes client. ### Why are the changes needed? https://issues.apache.org/jira/browse/SPARK-27812 https://issues.apache.org/jira/browse/SPARK-27927 We need this fix https://github.com/fabric8io/kubernetes-client/pull/1768 that was released on version 4.6 of the client. The root cause of the problem is better explained in https://github.com/apache/spark/pull/25785 ### Does this PR introduce any user-facing change? Nope, it should be transparent to users ### How was this patch tested? This patch was tested manually using a simple pyspark job ```python from pyspark.sql import SparkSession if __name__ == '__main__': spark = SparkSession.builder.getOrCreate() ``` The expected behaviour of this "job" is that both python's and jvm's process exit automatically after the main runs. This is the case for spark versions <= 2.4. On version 2.4.3, the jvm process hangs because there's a non daemon thread running ``` "OkHttp WebSocket https://10.96.0.1/..." #121 prio=5 os_prio=0 tid=0x00007fb27c005800 nid=0x24b waiting on condition [0x00007fb300847000] "OkHttp WebSocket https://10.96.0.1/..." #117 prio=5 os_prio=0 tid=0x00007fb28c004000 nid=0x247 waiting on condition [0x00007fb300e4b000] ``` This is caused by a bug on `kubernetes-client` library, which is fixed on the version that we are upgrading to. When the mentioned job is run with this patch applied, the behaviour from spark <= 2.4.3 is restored and both processes terminate successfully Closes #26093 from igorcalabria/k8s-client-update. Authored-by: igor.calabria <igor.calabria@ubee.in> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-10-17 12:23:24 -07:00
Fokko Driesprong	8eb8f7478c	[SPARK-29483][BUILD] Bump Jackson to 2.10.0 ### What changes were proposed in this pull request? Release blog: https://medium.com/cowtowncoder/jackson-2-10-features-cd880674d8a2 Fixes the following CVE's: https://www.cvedetails.com/cve/CVE-2019-16942/ https://www.cvedetails.com/cve/CVE-2019-16943/ Looking back, there were 3 major goals for this minor release: - Resolve the growing problem of “endless CVE patches”, a stream of fixes for reported CVEs related to “Polymorphic Deserialization” problem (described in “On Jackson CVEs… ”) that resulted in security tools forcing Jackson upgrades. 2.10 now includes “Safe Default Typing” that is hoped to resolve this problem. - Evolve 2.x API towards 3.0, based on changes that were done in master, within limits of 2.x API backwards-compatibility requirements. - Add JDK support for versions beyond Java 8: specifically add“module-info.class” for JDK9+, defining proper module definitions for Jackson components Full changelog: https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.10 Improved Scala 2.13 support: https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.10#scala ### Why are the changes needed? Patches CVE's reported by the vulnerability scanner. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Ran `mvn clean install -DskipTests` locally. Closes #26131 from Fokko/SPARK-29483. Authored-by: Fokko Driesprong <fokko@apache.org> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-10-16 15:38:54 -07:00
Jeff Evans	95de93b24e	[SPARK-24540][SQL] Support for multiple character delimiter in Spark CSV read Updating univocity-parsers version to 2.8.3, which adds support for multiple character delimiters Moving univocity-parsers version to spark-parent pom dependencyManagement section Adding new utility method to build multi-char delimiter string, which delegates to existing one Adding tests for multiple character delimited CSV ### What changes were proposed in this pull request? Adds support for parsing CSV data using multiple-character delimiters. Existing logic for converting the input delimiter string to characters was kept and invoked in a loop. Project dependencies were updated to remove redundant declaration of `univocity-parsers` version, and also to change that version to the latest. ### Why are the changes needed? It is quite common for people to have delimited data, where the delimiter is not a single character, but rather a sequence of characters. Currently, it is difficult to handle such data in Spark (typically needs pre-processing). ### Does this PR introduce any user-facing change? Yes. Specifying the "delimiter" option for the DataFrame read, and providing more than one character, will no longer result in an exception. Instead, it will be converted as before and passed to the underlying library (Univocity), which has accepted multiple character delimiters since 2.8.0. ### How was this patch tested? The `CSVSuite` tests were confirmed passing (including new methods), and `sbt` tests for `sql` were executed. Closes #26027 from jeff303/SPARK-24540. Authored-by: Jeff Evans <jeffrey.wayne.evans@gmail.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2019-10-15 15:44:51 -05:00
angerszhu	ef81525a1a	[SPARK-29308][BUILD] Update deps in dev/deps/spark-deps-hadoop-3.2 for hadoop-3.2 ### What changes were proposed in this pull request? Current dev/deps/spark-deps-hadoop-3.2 have some wrong deps, it's caused by `dev/test-dependencies.sh ` when build assembly dependencies. add maven compile parameter `-am` to make it build with all deps, and get right result. And update NOTICE-binary & NOTICE-binary for updated result. ### Why are the changes needed? Update dev/deps/spark-hadoop-3.2 ### Does this PR introduce any user-facing change? No ### How was this patch tested? N/A Closes #25984 from AngersZhuuuu/SPARK=29308. Authored-by: angerszhu <angers.zhu@gmail.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2019-10-13 12:53:12 -05:00
Fokko Driesprong	b5b1b69f79	[SPARK-29445][CORE] Bump netty-all from 4.1.39.Final to 4.1.42.Final ### What changes were proposed in this pull request? Minor version bump of Netty to patch reported CVE. Patches: https://www.cvedetails.com/cve/CVE-2019-16869/ ### Why are the changes needed? ### Does this PR introduce any user-facing change? No ### How was this patch tested? Compiled locally using `mvn clean install -DskipTests` Closes #26099 from Fokko/SPARK-29445. Authored-by: Fokko Driesprong <fokko@apache.org> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2019-10-12 09:43:16 -05:00
Peter Toth	3a7126cea8	[SPARK-29410][BUILD] Update commons-beanutils to 1.9.4 ### What changes were proposed in this pull request? This PR updates commons-beanutils to 1.9.4. ### Why are the changes needed? CVE fixed in 1.9.4: http://commons.apache.org/proper/commons-beanutils/javadocs/v1.9.4/RELEASE-NOTES.txt ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Existing UTs. Closes #26069 from peter-toth/SPARK-29410-update-commons-beanutils-to-1.9.4. Authored-by: Peter Toth <peter.toth@gmail.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2019-10-12 09:24:06 -05:00
Dongjoon Hyun	9a84fae216	[SPARK-29332][BUILD] Update zstd-jni to 1.4.3-1 ### What changes were proposed in this pull request? This PR aims to update zstd-jni library to 1.4.3-1. ### Why are the changes needed? This will bring the latest bug fixes in zstd itself. This is independent from another on-going Spark fix. - https://github.com/facebook/zstd/releases/tag/v1.4.3 ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Pass the Jenkins with the existing tests. Closes #26002 from dongjoon-hyun/SPARK-29332. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-10-02 11:37:02 -07:00
gengjiaan	1018390542	[SPARK-29252][BUILD] Upgrade zookeeper to 3.4.14 and fix vulnerabilities ### What changes were proposed in this pull request? The current code uses org.apache.zookeeper:zookeeper:jar:3.4.6 and it will cause a security vulnerabilities. We could get some security info from https://www.tenable.com/cve/CVE-2019-0201 This reference remind to upgrate the version of `zookeeper` to 3.4.14 or later. ### Why are the changes needed? This PR fix the security vulnerabilities. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Exists UT. Closes #25933 from beliefer/upgrade-zookeeper. Authored-by: gengjiaan <gengjiaan@360.cn> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2019-09-30 08:16:32 -05:00
Sean Owen	28b8383a6c	[SPARK-29289][BUILD] Update scalatest, scalacheck, scopt, clapper, scala-parser-combinators for 2.13 ### What changes were proposed in this pull request? Update scalatest, scalacheck, scopt, clapper, scala-parser-combinators to latest maintenance release that is also cross-published for Scala 2.13. ### Why are the changes needed? To build in the future for Scala 2.13 ### Does this PR introduce any user-facing change? No ### How was this patch tested? Existing tests Closes #25967 from srowen/SPARK-29289. Authored-by: Sean Owen <sean.owen@databricks.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2019-09-30 08:13:57 -05:00
gengjiaan	eef3abbb90	[SPARK-29226][BUILD] Upgrade jackson-databind to 2.9.10 and fix vulnerabilities ### What changes were proposed in this pull request? The current code uses com.fasterxml.jackson.core:jackson-databind:jar:2.9.9.3 and it will cause a security vulnerabilities. We could get some security info from https://www.tenable.com/cve/CVE-2019-16335 and https://www.tenable.com/cve/CVE-2019-14540 This reference remind to upgrate the version of `jackson-databind` to 2.9.10 or later. This PR also upgrade the version of jackson to 2.9.10. ### Why are the changes needed? This PR fix the security vulnerabilities. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Exists UT. Closes #25912 from beliefer/upgrade-jackson. Authored-by: gengjiaan <gengjiaan@360.cn> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-09-24 22:05:13 -07:00
Sean Owen	a9ae262cf2	[SPARK-28772][BUILD][MLLIB] Update breeze to 1.0 ### What changes were proposed in this pull request? Update breeze dependency to 1.0. ### Why are the changes needed? Breeze 1.0 supports Scala 2.13 and has a few bug fixes. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Existing tests. Closes #25874 from srowen/SPARK-28772. Authored-by: Sean Owen <sean.owen@databricks.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-09-20 20:31:26 -07:00
Yuming Wang	8c3f27ceb4	[SPARK-28683][BUILD] Upgrade Scala to 2.12.10 ## What changes were proposed in this pull request? This PR upgrade Scala to 2.12.10. Release notes: - Fix regression in large string interpolations with non-String typed splices - Revert "Generate shallower ASTs in pattern translation" - Fix regression in classpath when JARs have 'a.b' entries beside 'a/b' - Faster compiler: 5–10% faster since 2.12.8 - Improved compatibility with JDK 11, 12, and 13 - Experimental support for build pipelining and outline type checking More details: https://github.com/scala/scala/releases/tag/v2.12.10 https://github.com/scala/scala/releases/tag/v2.12.9 ## How was this patch tested? Existing tests Closes #25404 from wangyum/SPARK-28683. Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-09-18 13:30:36 -07:00
Owen O'Malley	dfb0a8bb04	[SPARK-28208][BUILD][SQL] Upgrade to ORC 1.5.6 including closing the ORC readers ## What changes were proposed in this pull request? It upgrades ORC from 1.5.5 to 1.5.6 and adds closes the ORC readers when they aren't used to create RecordReaders. ## How was this patch tested? The changed unit tests were run. Closes #25006 from omalley/spark-28208. Lead-authored-by: Owen O'Malley <omalley@apache.org> Co-authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-09-18 09:32:43 -07:00
Nicholas Marion	6fb5ef108e	[SPARK-29011][BUILD] Update netty-all from 4.1.30-Final to 4.1.39-Final ### What changes were proposed in this pull request? Upgrade netty-all to latest in the 4.1.x line which is 4.1.39-Final. ### Why are the changes needed? Currency of dependencies. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Existing unit-tests against master branch. Closes #25712 from n-marion/master. Authored-by: Nicholas Marion <nmarion@us.ibm.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-09-06 17:48:53 -07:00
Andy Grove	35d4edffa2	[SPARK-28921][BUILD][K8S] Upgrade kubernetes client to 4.4.2 ### What changes were proposed in this pull request? Upgrade kubernetes client from 4.1.2 to 4.4.2 ### Why are the changes needed? To fix compatibility issue with EKS since Amazon rolled out some security patches over the past week; 1.15.3, 1.14.6, 1.13.10, 1.12.10, and 1.11.10. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Pass the Jenkins and manually test on EKS. Closes #25640 from andygrove/SPARK-28921. Authored-by: Andy Grove <andygrove73@gmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-09-02 16:50:58 -07:00
Sean Owen	9ea37b09cf	[SPARK-17875][CORE][BUILD] Remove dependency on Netty 3 ### What changes were proposed in this pull request? Spark uses Netty 4 directly, but also includes Netty 3 only because transitive dependencies do. The dependencies (Hadoop HDFS, Zookeeper, Avro) don't seem to need this dependency as used in Spark. I think we can forcibly remove it to slim down the dependencies. Previous attempts were blocked by its usage in Flume, but that dependency has gone away. https://github.com/apache/spark/pull/15436 ### Why are the changes needed? Mostly to reduce the transitive dependency size and complexity a little bit and avoid triggering spurious security alerts on Netty 3.x usage. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Existing tests Closes #25544 from srowen/SPARK-17875. Authored-by: Sean Owen <sean.owen@databricks.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-08-21 21:27:56 -07:00
Sean Owen	c9b49f3978	[SPARK-28737][CORE] Update Jersey to 2.29 ## What changes were proposed in this pull request? Update Jersey to 2.27+, ideally 2.29, for possible JDK 11 fixes. ## How was this patch tested? Existing tests. Closes #25455 from srowen/SPARK-28737. Authored-by: Sean Owen <sean.owen@databricks.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-08-16 15:08:04 -07:00
Dongjoon Hyun	43101c7328	[SPARK-28758][BUILD][SQL] Upgrade Janino to 3.0.15 ### What changes were proposed in this pull request? This PR aims to upgrade `Janino` from `3.0.13` to `3.0.15` in order to bring the bug fixes. Please note that `3.1.0` is a major refactoring instead of bug fixes. We had better use `3.0.15` and wait for the stabler 3.1.x. ### Why are the changes needed? This brings the following bug fixes. 3.0.15 (2019-07-28) - Fix overloaded single static method import 3.0.14 (2019-07-05) - Conflict in sbt-assembly - Overloaded static on-demand imported methods cause a CompileException: Ambiguous static method import - Handle overloaded static on-demand imports - Major refactoring of the Java 8 and Java 9 retrofit mechanism - Added tests for "JLS8 8.6 Instance Initializers" and "JLS8 8.7 Static Initializers" - Local variables in instance initializers don't work - Provide an option to keep generated code files - Added compile error handler and warning handler to ICompiler ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Pass the Jenkins with the existing tests. Closes #25474 from dongjoon-hyun/SPARK-28758. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-08-16 11:33:02 -07:00
Fokko Driesprong	babdba0f9e	[SPARK-28728][BUILD] Bump Jackson Databind to 2.9.9.3 ## What changes were proposed in this pull request? Update Jackson databind to the latest version for some latest changes. ## How was this patch tested? Pass the Jenkins. Closes #25451 from Fokko/fd-bump-jackson-databind. Lead-authored-by: Fokko Driesprong <fokko@apache.org> Co-authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-08-16 03:40:41 -07:00
Dongjoon Hyun	a428f40669	[SPARK-28549][BUILD][CORE][SQL] Use `text.StringEscapeUtils` instead `lang3.StringEscapeUtils` ## What changes were proposed in this pull request? `org.apache.commons.lang3.StringEscapeUtils` was deprecated over two years ago at [LANG-1316](https://issues.apache.org/jira/browse/LANG-1316). There is no bug fixes after that. ```java /** * <p>Escapes and unescapes {code String}s for * Java, Java Script, HTML and XML.</p> * * <p>#ThreadSafe#</p> * since 2.0 * deprecated as of 3.6, use commons-text * <a href="https://commons.apache.org/proper/commons-text/javadocs/api-release/org/apache/commons/text/StringEscapeUtils.html"> * StringEscapeUtils</a> instead */ Deprecated public class StringEscapeUtils { ``` This PR aims to use the latest one from `commons-text` module which has more bug fixes like [TEXT-100](https://issues.apache.org/jira/browse/TEXT-100), [TEXT-118](https://issues.apache.org/jira/browse/TEXT-118) and [TEXT-120](https://issues.apache.org/jira/browse/TEXT-120) by the following replacement. ```scala -import org.apache.commons.lang3.StringEscapeUtils +import org.apache.commons.text.StringEscapeUtils ``` This will add a new dependency to `hadoop-2.7` profile distribution. In `hadoop-3.2` profile, we already have it. ``` +commons-text-1.6.jar ``` ## How was this patch tested? Pass the Jenkins with the existing tests. - [Hadoop 2.7](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108281) - [Hadoop 3.2](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108282) Closes #25281 from dongjoon-hyun/SPARK-28549. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2019-07-29 11:45:29 +09:00
Dongjoon Hyun	33e6e4703d	[SPARK-28544][BUILD] Update zstd-jni to 1.4.2-1 ## What changes were proposed in this pull request? This PR aims to update `zstd-jni` library to bring the latest improvement and bug fixes in `1.4.1` and `1.4.2`. - https://github.com/facebook/zstd/releases/tag/v1.4.1 (4.5 ~ 11.8% performance improvement from v1.4.0 and bug fixes) - https://github.com/facebook/zstd/releases/tag/v1.4.2 (bug fixes) ## How was this patch tested? Pass the Jenkins. Closes #25275 from dongjoon-hyun/SPARK-28544. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-07-27 18:08:20 -07:00
Liang-Chi Hsieh	591de42351	[SPARK-28381][PYSPARK] Upgraded version of Pyrolite to 4.30 ## What changes were proposed in this pull request? This upgraded to a newer version of Pyrolite. Most updates [1] in the newer version are for dotnot. For java, it includes a bug fix to Unpickler regarding cleaning up Unpickler memo, and support of protocol 5. After upgrading, we can remove the fix at SPARK-27629 for the bug in Unpickler. [1] https://github.com/irmen/Pyrolite/compare/pyrolite-4.23...master ## How was this patch tested? Manually tested on Python 3.6 in local on existing tests. Closes #25143 from viirya/upgrade-pyrolite. Authored-by: Liang-Chi Hsieh <viirya@gmail.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2019-07-15 12:29:58 +09:00
Yuming Wang	4ad0c33be4	[SPARK-28221][BUILD] Upgrade janino to 3.0.13 ## What changes were proposed in this pull request? Mainly change logs: ### Version 3.0.13: - Support for JDK 9/10 in Full Compiler - The syntax elements that can have modifiers now all have sets of "is...()" methods that check for each modifier. Some also have methods "getAccess()" and/or "getAnnotations()". - Implement "type annotations" (JLS8 9.7.4) - Implemented parsing (but not compilation) of "modular compilation units" (JLS11 7.3). - Replaced all "assert...Uncookable(..., Pattern messageRegex)" and "assert...Uncookable(..., String messageInfix)" method pairs with a single "assert...Uncookable(..., String messageRegex)" method. Minor refactoring: Allowed modifiers are now checked in the Parser, not in Java.*. This saves a lot of THROWS clauses. - Parse Type inference syntax: Type inference for generic instance creation implemented, test cases added. - Parse MethodReference, ClassInstanceCreationReference and ArrayCreationReference ### Version 3.0.12 - Fixed: Operator "&" not defined on types "java.lang.Long" and "int" - Major bug in JavaSourceClassLoader: When loading the second and following classes, CUs were compiled again, leading to an inconsistent class hierarchy. - Fixed: Java 9 added "Override public final CharBuffer CharBuffer.rewind() { ..." -- leads easily to a java.lang.NoSuchMethodError - Changed all occurences of the words "Java bytecode" to "JVM bytecode" to make clearer that the generated bytecode is for the JVMS and not suitable for, e.g. DALVIK. http://janino-compiler.github.io/janino/changelog.html ## How was this patch tested? Existing test Closes #25021 from wangyum/SPARK-28221. Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-07-06 10:02:42 -07:00
Dongjoon Hyun	ea0e119f84	[SPARK-28111][BUILD] Upgrade `xbean-asm7-shaded` to 4.14 ## What changes were proposed in this pull request? This PR aims to update `xbean-asm7-shaded` to bring [XBEAN-318](https://issues.apache.org/jira/browse/XBEAN-318) which is helpful to log the class definition reading failures. - https://issues.apache.org/jira/projects/XBEAN/versions/12345220 ## How was this patch tested? Pass the Jenkins. Closes #24914 from dongjoon-hyun/SPARK-28111. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-06-20 07:59:59 -07:00
Martin Junghanns	709387d660	[SPARK-27300][GRAPH] Add Spark Graph modules and dependencies ## What changes were proposed in this pull request? This PR introduces the necessary Maven modules for the new [Spark Graph](https://issues.apache.org/jira/browse/SPARK-25994) feature for Spark 3.0. * `spark-graph` is a parent module that users depend on to get all graph functionalities (Cypher and Graph Algorithms) * `spark-graph-api` defines the [Property Graph API](https://docs.google.com/document/d/1Wxzghj0PvpOVu7XD1iA8uonRYhexwn18utdcTxtkxlI) that is being shared between Cypher and Algorithms * `spark-cypher` contains a Cypher query engine implementation Both, `spark-graph-api` and `spark-cypher` depend on Spark SQL. Note, that the Maven module for Graph Algorithms is not part of this PR and will be introduced in https://issues.apache.org/jira/browse/SPARK-27302 A PoC for a running Cypher implementation can be found in this WIP PR https://github.com/apache/spark/pull/24297 ## How was this patch tested? Pass the Jenkins with all profiles and manually build and check the followings. ``` $ ls assembly/target/scala-2.12/jars/spark-cypher* assembly/target/scala-2.12/jars/spark-cypher_2.12-3.0.0-SNAPSHOT.jar $ ls assembly/target/scala-2.12/jars/spark-graph* \| grep -v graphx assembly/target/scala-2.12/jars/spark-graph-api_2.12-3.0.0-SNAPSHOT.jar assembly/target/scala-2.12/jars/spark-graph_2.12-3.0.0-SNAPSHOT.jar ``` Closes #24490 from s1ck/SPARK-27300. Lead-authored-by: Martin Junghanns <martin.junghanns@neotechnology.com> Co-authored-by: Max Kießling <max@kopfueber.org> Co-authored-by: Martin Junghanns <martin.junghanns@neo4j.com> Co-authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-06-09 00:26:26 -07:00
Izek Greenfield	c647f9011c	[SPARK-27862][BUILD] Move to json4s 3.6.6 ## What changes were proposed in this pull request? Move to json4s version 3.6.6 Add scala-xml 1.2.0 ## How was this patch tested? Pass the Jenkins Closes #24736 from igreenfield/master. Authored-by: Izek Greenfield <igreenfield@axiomsl.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2019-05-30 19:42:56 -05:00
Fokko Driesprong	bd87323003	[SPARK-27757][CORE] Bump Jackson to 2.9.9 ## What changes were proposed in this pull request? This fixes CVE-2019-12086 on Databind: https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.9.9 ## How was this patch tested? Existing tests Closes #24646 from Fokko/SPARK-27757. Authored-by: Fokko Driesprong <fokko@apache.org> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2019-05-30 09:35:20 -05:00
Dongjoon Hyun	141a3bfc8d	[SPARK-27755][BUILD] Update zstd-jni to 1.4.0-1 ## What changes were proposed in this pull request? This PR aims to update `zstd-jni` library to `1.4.0-1` which improves the `level 1 compression speed` performance by 6% in most scenarios. The following is the full release note. - https://github.com/facebook/zstd/releases/tag/v1.4.0 ## How was this patch tested? Pass the Jenkins. Closes #24632 from dongjoon-hyun/SPARK-27755. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-05-17 08:34:45 -07:00
Kazuaki Ishizaki	9e0d8c6ce2	[SPARK-27752][CORE] Upgrade lz4-java from 1.5.1 to 1.6.0 ## What changes were proposed in this pull request? This PR upgrades lz4-java from 1.5.1 to 1.6.0. Lz4-java is available at https://github.com/lz4/lz4-java. Changes from 1.5.1: - Upgraded LZ4 to 1.9.1. Updated the JNI bindings, except for the one for Linux/i386. Decompression speed is improved on amd64. - Deprecated use of LZ4FastDecompressor of a native instance because the corresponding C API function is deprecated. See the release note of LZ4 1.9.0 for details. Updated javadoc accordingly. - Changed the module name from org.lz4.lz4-java to org.lz4.java to avoid using - in the module name. (severn-everett, Oliver Eikemeier, Rei Odaira) - Enabled build with Java 11. Note that the distribution is still built with Java 7. (Rei Odaira) ## How was this patch tested? Existing tests. Closes #24629 from kiszk/SPARK-27752. Authored-by: Kazuaki Ishizaki <ishizaki@jp.ibm.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-05-16 20:45:13 -07:00
Yuming Wang	875e7e1d97	[SPARK-27620][BUILD] Upgrade jetty to 9.4.18.v20190429 ## What changes were proposed in this pull request? This pr upgrade jetty to [9.4.18.v20190429](https://github.com/eclipse/jetty.project/releases/tag/jetty-9.4.18.v20190429) because of [CVE-2019-10247](https://nvd.nist.gov/vuln/detail/CVE-2019-10247). ## How was this patch tested? Existing test. Closes #24513 from wangyum/SPARK-27620. Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2019-05-03 09:25:54 +09:00
Yuming Wang	3ecafb0e14	[SPARK-27601][BUILD] Upgrade stream-lib to 2.9.6 ## What changes were proposed in this pull request? [stream-lib 2.9.6](https://github.com/addthis/stream-lib/commits/v2.9.6) include several improvements: ![image](https://user-images.githubusercontent.com/5399861/56938062-7eb77580-6b32-11e9-8c36-711ab943d657.png) ## How was this patch tested? N/A Closes #24492 from wangyum/SPARK-27601. Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2019-05-02 15:21:57 -05:00
Cheng Lian	b73744a147	[SPARK-27611][BUILD] Exclude jakarta.activation:jakarta.activation-api from org.glassfish.jaxb:jaxb-runtime:2.3.2 PR #23890 introduced `org.glassfish.jaxb:jaxb-runtime:2.3.2` as a runtime dependency. As an unexpected side effect, `jakarta.activation:jakarta.activation-api:1.2.1` was also pulled in as a transitive dependency. As a result, for the Maven build, both of the following two jars can be found under `assembly/target/scala-2.12/jars/`: ``` activation-1.1.1.jar jakarta.activation-api-1.2.1.jar ``` This PR exludes the Jakarta one. Manually built Spark using Maven and checked files under `assembly/target/scala-2.12/jars/`. After this change, only `activation-1.1.1.jar` is there. Closes #24507 from liancheng/spark-27611. Authored-by: Cheng Lian <lian@databricks.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-05-01 20:12:17 -07:00
Yuming Wang	fe99305101	[SPARK-27556][BUILD] Exclude com.zaxxer:HikariCP-java7 from hadoop-yarn-server-web-proxy ## What changes were proposed in this pull request? There are two HikariCP packages in classpath when building with `-Phive -Pyarn -Phadoop-3.2`. The HikariCP dependency tree: ``` [INFO] \| +- org.apache.hadoop:hadoop-yarn-server-web-proxy:jar:3.2.0:compile [INFO] \| \| \- org.apache.hadoop:hadoop-yarn-server-common:jar:3.2.0:compile [INFO] \| \| +- org.apache.hadoop:hadoop-yarn-registry:jar:3.2.0:compile [INFO] \| \| \| \- commons-daemon:commons-daemon:jar:1.0.13:compile [INFO] \| \| +- org.apache.geronimo.specs:geronimo-jcache_1.0_spec🫙1.0-alpha-1:compile [INFO] \| \| +- org.ehcache:ehcache:jar:3.3.1:compile [INFO] \| \| +- com.zaxxer:HikariCP-java7:jar:2.4.12:compile ``` ``` [INFO] +- org.apache.hive:hive-metastore:jar:2.3.4:compile [INFO] \| +- javolution:javolution:jar:5.5.1:compile [INFO] \| +- com.google.protobuf:protobuf-java:jar:2.5.0:compile [INFO] \| +- com.jolbox:bonecp:jar:0.8.0.RELEASE:compile [INFO] \| +- com.zaxxer:HikariCP:jar:2.5.1:compile ``` This pr exclude `com.zaxxer:HikariCP-java7` from `hadoop-yarn-server-web-proxy`. ## How was this patch tested? manual tests Closes #24450 from wangyum/SPARK-27556. Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2019-04-26 12:15:39 -05:00
Yuming Wang	777b4502b2	[SPARK-27176][FOLLOW-UP][SQL] Upgrade Hive parquet to 1.10.1 for hadoop-3.2 ## What changes were proposed in this pull request? When we compile and test Hadoop 3.2, we will hint the following two issues: 1. JobSummaryLevel is not a member of object org.apache.parquet.hadoop.ParquetOutputFormat. Fixed by [PARQUET-381](https://issues.apache.org/jira/browse/PARQUET-381)(Parquet 1.9.0) 2. java.lang.NoSuchFieldError: BROTLI at org.apache.parquet.hadoop.metadata.CompressionCodecName.<clinit>(CompressionCodecName.java:31). Fixed by [PARQUET-1143](https://issues.apache.org/jira/browse/PARQUET-1143)(Parquet 1.10.0) The reason is that the `parquet-hadoop-bundle-1.8.1.jar` conflicts with Parquet 1.10.1. I think it would be safe to upgrade Hive's parquet to 1.10.1 to workaround this issue. This is what Hive did when upgrading Parquet 1.8.1 to 1.10.0: [HIVE-17000](https://issues.apache.org/jira/browse/HIVE-17000) and [HIVE-19464](https://issues.apache.org/jira/browse/HIVE-19464). We can see that all changes are related to vectors, and vectors are disabled by default: see [HIVE-14826](https://issues.apache.org/jira/browse/HIVE-14826) and [HiveConf.java#L2723](https://github.com/apache/hive/blob/rel/release-2.3.4/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L2723). This pr removes [parquet-hadoop-bundle-1.8.1.jar](https://github.com/apache/parquet-mr/tree/master/parquet-hadoop-bundle) , so Hive serde will use [parquet-common-1.10.1.jar, parquet-column-1.10.1.jar and parquet-hadoop-1.10.1.jar](https://github.com/apache/spark/blob/master/dev/deps/spark-deps-hadoop-3.2#L185-L189). ## How was this patch tested? 1. manual tests 2. [upgrade Hive Parquet to 1.10.1 annd run Hadoop 3.2 test on jenkins](https://github.com/apache/spark/pull/24044#commits-pushed-0c3f962) Closes #24346 from wangyum/SPARK-27176. Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: gatorsmile <gatorsmile@gmail.com>	2019-04-19 08:59:08 -07:00
Dongjoon Hyun	f93460dae9	[SPARK-27493][BUILD] Upgrade ASM to 7.1 ## What changes were proposed in this pull request? [SPARK-25946](https://issues.apache.org/jira/browse/SPARK-25946) upgraded ASM to 7.0 to support JDK11. This PR aims to update ASM to 7.1 to bring the bug fixes. - https://asm.ow2.io/versions.html - https://issues.apache.org/jira/browse/XBEAN-316 ## How was this patch tested? Pass the Jenkins. Closes #24395 from dongjoon-hyun/SPARK-27493. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2019-04-18 13:36:52 +09:00
Dongjoon Hyun	a8f20c95ab	[SPARK-27452][BUILD] Update zstd-jni to 1.3.8-9 ## What changes were proposed in this pull request? This PR aims to update `zstd-jni` from 1.3.2-2 to 1.3.8-9 to be aligned with the latest Zstd 1.3.8 in Apache Spark 3.0.0. Currently, Apache Spark is aligned with the old Zstd used in the first PR and there are many bugfix and improvement updates in `zstd-jni` until now. - https://github.com/facebook/zstd/releases/tag/v1.3.8 - https://github.com/facebook/zstd/releases/tag/v1.3.7 - https://github.com/facebook/zstd/releases/tag/v1.3.6 - https://github.com/facebook/zstd/releases/tag/v1.3.4 - https://github.com/facebook/zstd/releases/tag/v1.3.3 ## How was this patch tested? Pass the Jenkins with the existing tests. Closes #24364 from dongjoon-hyun/SPARK-ZSTD. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-04-16 08:54:16 -07:00
Sean Owen	8718367e2e	[SPARK-27470][PYSPARK] Update pyrolite to 4.23 ## What changes were proposed in this pull request? Update pyrolite to 4.23 to pick up bug and security fixes. ## How was this patch tested? Existing tests. Closes #24381 from srowen/SPARK-27470. Authored-by: Sean Owen <sean.owen@databricks.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2019-04-16 19:41:40 +09:00
Sean Owen	a4cf1a4f4e	[SPARK-27469][CORE] Update Commons BeanUtils to 1.9.3 ## What changes were proposed in this pull request? Unify commons-beanutils deps to latest 1.9.3. This resolves the version inconsistency in Hadoop 2.7's build and also picks up security and bug fixes. ## How was this patch tested? Existing tests. Closes #24378 from srowen/SPARK-27469. Authored-by: Sean Owen <sean.owen@databricks.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-04-15 19:18:37 -07:00
Dongjoon Hyun	0881f648cf	[SPARK-27451][BUILD] Upgrade lz4-java to 1.5.1 ## What changes were proposed in this pull request? This PR upgrades `lz4-java` to 1.5.1 in order to get a patch for avoiding racing with GC. - https://github.com/lz4/lz4-java/blob/master/CHANGES.md#151 ## How was this patch tested? Pass the Jenkins with the existing tests. Closes #24363 from dongjoon-hyun/SPARK-LZ4. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-04-12 19:21:43 -07:00
Yuming Wang	33f3c48cac	[SPARK-27176][SQL] Upgrade hadoop-3's built-in Hive maven dependencies to 2.3.4 ## What changes were proposed in this pull request? This PR mainly contains: 1. Upgrade hadoop-3's built-in Hive maven dependencies to 2.3.4. 2. Resolve compatibility issues between Hive 1.2.1 and Hive 2.3.4 in the `sql/hive` module. ## How was this patch tested? jenkins test hadoop-2.7 manual test hadoop-3: ```shell build/sbt clean package -Phadoop-3.2 -Phive export SPARK_PREPEND_CLASSES=true # rm -rf metastore_db cat <<EOF > test_hadoop3.scala spark.range(10).write.saveAsTable("test_hadoop3") spark.table("test_hadoop3").show EOF bin/spark-shell --conf spark.hadoop.hive.metastore.schema.verification=false --conf spark.hadoop.datanucleus.schema.autoCreateAll=true -i test_hadoop3.scala ``` Closes #23788 from wangyum/SPARK-23710-hadoop3. Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: gatorsmile <gatorsmile@gmail.com>	2019-04-08 08:42:21 -07:00
LantaoJin	69dd44af19	[SPARK-27216][CORE] Upgrade RoaringBitmap to 0.7.45 to fix Kryo unsafe ser/dser issue ## What changes were proposed in this pull request? HighlyCompressedMapStatus uses RoaringBitmap to record the empty blocks. But RoaringBitmap couldn't be ser/deser with unsafe KryoSerializer. It's a bug of RoaringBitmap-0.5.11 and fixed in latest version. This is an update of #24157 ## How was this patch tested? Add a UT Closes #24264 from LantaoJin/SPARK-27216. Lead-authored-by: LantaoJin <jinlantao@gmail.com> Co-authored-by: Lantao Jin <jinlantao@gmail.com> Signed-off-by: Imran Rashid <irashid@cloudera.com>	2019-04-03 20:09:50 -05:00
Sean Owen	2ec650d843	[SPARK-27267][CORE] Update snappy to avoid error when decompressing empty serialized data ## What changes were proposed in this pull request? (See JIRA for problem statement) Update snappy 1.1.7.1 -> 1.1.7.3 to pick up an empty-stream and Java 9 fix. There appear to be no other changes of consequence: https://github.com/xerial/snappy-java/blob/master/Milestone.md ## How was this patch tested? Existing tests Closes #24242 from srowen/SPARK-27267. Authored-by: Sean Owen <sean.owen@databricks.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2019-03-30 02:41:24 -05:00
Yuming Wang	9c0af746e5	[SPARK-27175][BUILD] Upgrade hadoop-3 to 3.2.0 ## What changes were proposed in this pull request? This PR upgrade `hadoop-3` to `3.2.0` to workaround [HADOOP-16086](https://issues.apache.org/jira/browse/HADOOP-16086). Otherwise some test case will throw IllegalArgumentException: ```java 02:44:34.707 ERROR org.apache.hadoop.hive.ql.exec.Task: Job Submission failed with exception 'java.io.IOException(Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses.)' java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses. at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:116) at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:109) at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:102) at org.apache.hadoop.mapred.JobClient.init(JobClient.java:475) at org.apache.hadoop.mapred.JobClient.<init>(JobClient.java:454) at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:369) at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:151) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2183) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1839) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1526) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1227) at org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$runHive$1(HiveClientImpl.scala:730) at org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:283) at org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:221) at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:220) at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:266) at org.apache.spark.sql.hive.client.HiveClientImpl.runHive(HiveClientImpl.scala:719) at org.apache.spark.sql.hive.client.HiveClientImpl.runSqlHive(HiveClientImpl.scala:709) at org.apache.spark.sql.hive.StatisticsSuite.createNonPartitionedTable(StatisticsSuite.scala:719) at org.apache.spark.sql.hive.StatisticsSuite.$anonfun$testAlterTableProperties$2(StatisticsSuite.scala:822) ``` ## How was this patch tested? manual tests Closes #24106 from wangyum/SPARK-27175. Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2019-03-16 19:42:05 -05:00
Dongjoon Hyun	f26a1f3d37	[SPARK-27165][SPARK-27107][BUILD][SQL] Upgrade Apache ORC to 1.5.5 ## What changes were proposed in this pull request? This PR aims to update Apache ORC dependency to fix [SPARK-27107](https://issues.apache.org/jira/browse/SPARK-27107) . ``` [ORC-452] Support converting MAP column from JSON to ORC Improvement [ORC-447] Change the docker scripts to keep a persistent m2 cache [ORC-463] Add `version` command [ORC-475] ORC reader should lazily get filesystem [ORC-476] Make SearchAgument kryo buffer size configurable ``` ## How was this patch tested? Pass the Jenkins with the existing tests. Closes #24096 from dongjoon-hyun/SPARK-27165. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-03-14 20:14:31 -07:00
Jiaxin Shan	2d0b7cfe44	[SPARK-26742][K8S] Update Kubernetes-Client version to 4.1.2 ## What changes were proposed in this pull request? https://github.com/apache/spark/pull/23814 was reverted because of Jenkins integration tests failure. After minikube upgrade, Kubernetes client SDK v1.4.2 work with kubernetes v1.13. We can bring this change back. Reference: [Bump Kubernetes Client Version to 4.1.2](https://issues.apache.org/jira/browse/SPARK-26742) [Original PR against master](https://github.com/apache/spark/pull/23814) [Kubernetes client upgrade for Spark 2.4](https://github.com/apache/spark/pull/23993) ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Unit Tests: ``` All tests passed. [INFO] ------------------------------------------------------------------------ [INFO] Reactor Summary for Spark Project Parent POM 3.0.0-SNAPSHOT: [INFO] [INFO] Spark Project Parent POM ........................... SUCCESS [ 2.343 s] [INFO] Spark Project Tags ................................. SUCCESS [ 2.039 s] [INFO] Spark Project Sketch ............................... SUCCESS [ 12.714 s] [INFO] Spark Project Local DB ............................. SUCCESS [ 2.185 s] [INFO] Spark Project Networking ........................... SUCCESS [ 38.154 s] [INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [ 7.989 s] [INFO] Spark Project Unsafe ............................... SUCCESS [ 2.297 s] [INFO] Spark Project Launcher ............................. SUCCESS [ 2.813 s] [INFO] Spark Project Core ................................. SUCCESS [38:03 min] [INFO] Spark Project ML Local Library ..................... SUCCESS [ 3.848 s] [INFO] Spark Project GraphX ............................... SUCCESS [ 56.084 s] [INFO] Spark Project Streaming ............................ SUCCESS [04:58 min] [INFO] Spark Project Catalyst ............................. SUCCESS [06:39 min] [INFO] Spark Project SQL .................................. SUCCESS [37:12 min] [INFO] Spark Project ML Library ........................... SUCCESS [18:59 min] [INFO] Spark Project Tools ................................ SUCCESS [ 0.767 s] [INFO] Spark Project Hive ................................. SUCCESS [33:45 min] [INFO] Spark Project REPL ................................. SUCCESS [01:14 min] [INFO] Spark Project Assembly ............................. SUCCESS [ 1.444 s] [INFO] Spark Integration for Kafka 0.10 ................... SUCCESS [01:12 min] [INFO] Kafka 0.10+ Token Provider for Streaming ........... SUCCESS [ 6.719 s] [INFO] Kafka 0.10+ Source for Structured Streaming ........ SUCCESS [07:00 min] [INFO] Spark Project Examples ............................. SUCCESS [ 21.805 s] [INFO] Spark Integration for Kafka 0.10 Assembly .......... SUCCESS [ 0.906 s] [INFO] Spark Avro ......................................... SUCCESS [ 50.486 s] [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------------------------ [INFO] Total time: 02:32 h [INFO] Finished at: 2019-03-07T08:39:34Z [INFO] ------------------------------------------------------------------------ ``` Please review http://spark.apache.org/contributing.html before opening a pull request. Closes #24002 from Jeffwan/update_k8s_sdk_master. Authored-by: Jiaxin Shan <seedjeffwan@gmail.com> Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>	2019-03-13 15:04:27 -07:00
Yuming Wang	eed3091a60	[SPARK-27120][BUILD][TEST] Upgrade scalatest version to 3.0.5 ## What changes were proposed in this pull request? ScalaTest 3.0.5 Release Notes Bug Fixes - Fixed the implicit view not available problem when used with compile macro. - Fixed a stack depth problem in RefSpecLike and fixture.SpecLike under Scala 2.13. - Changed Framework and ScalaTestFramework to set spanScaleFactor for Runner object instances for different Runners using different class loaders. This fixed a problem whereby an incorrect Runner.spanScaleFactor could be used when the tests for multiple sbt project's were run concurrently. - Fixed a bug in endsWith regex matcher. Improvements - Removed duplicated parsing code for -C in ArgsParser. - Improved performance in WebBrowser. - Documentation typo rectification. - Improve validity of Junit XML reports. - Improved performance by replacing all .size == 0 and .length == 0 to .isEmpty. Enhancements - Added 'C' option to -P, which will tell -P to use cached thread pool. - External Dependencies Update - Bumped up scala-js version to 0.6.22. - Changed to depend on mockito-core, not mockito-all. - Bumped up jmock version to 2.8.3. - Bumped up junit version to 4.12. - Removed dependency to scala-parser-combinators. More details: http://www.scalatest.org/release_notes/3.0.5 ## How was this patch tested? manual tests on local machine: ``` nohup build/sbt clean -Djline.terminal=jline.UnsupportedTerminal -Phadoop-2.7 -Pkubernetes -Phive-thriftserver -Pyarn -Pspark-ganglia-lgpl -Phive -Pkinesis-asl -Pmesos test > run.scalatest.log & ``` Closes #24042 from wangyum/SPARK-27120. Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-03-10 15:22:52 -07:00
Yuming Wang	f732647ae4	[SPARK-27054][BUILD][SQL] Remove the Calcite dependency ## What changes were proposed in this pull request? Calcite is only used for [runSqlHive](`02bbe977ab/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala (L699-L705)`) when `hive.cbo.enable=true`([SemanticAnalyzer](https://github.com/apache/hive/blob/release-1.2.1/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzerFactory.java#L278-L280)). So we can disable `hive.cbo.enable` and remove Calcite dependency. ## How was this patch tested? Exist tests Closes #23970 from wangyum/SPARK-27054. Lead-authored-by: Yuming Wang <yumwang@ebay.com> Co-authored-by: Yuming Wang <wgyumg@gmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-03-09 16:34:24 -08:00
Yanbo Liang	7857c6d633	[SPARK-27051][CORE] Bump Jackson version to 2.9.8 ## What changes were proposed in this pull request? Fasterxml Jackson version before 2.9.8 is affected by multiple [CVEs](https://github.com/FasterXML/jackson-databind/issues/2186), we need to fix bump the dependent Jackson to 2.9.8. ## How was this patch tested? Existing tests and offline benchmark. I have run ```SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain org.apache.spark.sql.execution.datasources.json.JSONBenchmark"``` to check there is no performance degradation for this upgrade. Closes #23965 from yanboliang/SPARK-27051. Authored-by: Yanbo Liang <ybliang8@gmail.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>	2019-03-05 11:46:51 +09:00
Sean Owen	d8754df2bf	[SPARK-27029][BUILD] Update Thrift to 0.12.0 ## What changes were proposed in this pull request? Update Thrift to 0.12.0 to pick up bug and security fixes. Changes: https://github.com/apache/thrift/blob/master/CHANGES.md The important one is for https://issues.apache.org/jira/browse/THRIFT-4506 ## How was this patch tested? Existing tests. A quick local test suggests this works. Closes #23935 from srowen/SPARK-27029. Authored-by: Sean Owen <sean.owen@databricks.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-03-02 17:28:37 -08:00
Sean Owen	131b464d0c	[SPARK-26986][ML][FOLLOWUP] Add JAXB reference impl to build for Java 9+ ## What changes were proposed in this pull request? Remove a few new JAXB dependencies that shouldn't be necessary now. See https://github.com/apache/spark/pull/23890#issuecomment-468299922 ## How was this patch tested? Existing tests Closes #23923 from srowen/SPARK-26986.2. Authored-by: Sean Owen <sean.owen@databricks.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2019-03-01 11:23:40 -06:00
Sean Owen	9c283662c6	[SPARK-26986][ML] Add JAXB reference impl to build for Java 9+ ## What changes were proposed in this pull request? Add reference JAXB impl for Java 9+ from Glassfish. Right now it's only apparently necessary in MLlib but can be expanded later. ## How was this patch tested? Existing tests particularly PMML-related ones, which use JAXB. This works on Java 11. Closes #23890 from srowen/SPARK-26986. Authored-by: Sean Owen <sean.owen@databricks.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2019-02-26 18:26:49 -06:00
Marcelo Vanzin	afbff6446f	Revert "[SPARK-26742][K8S] Update Kubernetes-Client version to 4.1.2" This reverts commit `a3192d966a`.	2019-02-26 13:42:07 -08:00
Jiaxin Shan	a3192d966a	[SPARK-26742][K8S] Update Kubernetes-Client version to 4.1.2 ## What changes were proposed in this pull request? Changed the `kubernetes-client` version to 4.1.2. Latest version fix error with exec credentials (used by aws eks) and this will be used to talk with kubernetes API server. Users can submit spark job to EKS api endpoint now with this patch. ## How was this patch tested? unit tests and manual tests. Closes #23814 from Jeffwan/update_k8s_sdk. Authored-by: Jiaxin Shan <seedjeffwan@gmail.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2019-02-25 04:56:04 -06:00
Ryan Blue	f72d217788	[SPARK-26677][BUILD] Update Parquet to 1.10.1 with notEq pushdown fix. ## What changes were proposed in this pull request? Update to Parquet Java 1.10.1. ## How was this patch tested? Added a test from HyukjinKwon that validates the notEq case from SPARK-26677. Closes #23704 from rdblue/SPARK-26677-fix-noteq-parquet-bug. Lead-authored-by: Ryan Blue <blue@apache.org> Co-authored-by: Hyukjin Kwon <gurwls223@apache.org> Co-authored-by: Ryan Blue <rdblue@users.noreply.github.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2019-02-02 09:17:52 -08:00
Bryan Cutler	16990f9299	[SPARK-26566][PYTHON][SQL] Upgrade Apache Arrow to version 0.12.0 ## What changes were proposed in this pull request? Upgrade Apache Arrow to version 0.12.0. This includes the Java artifacts and fixes to enable usage with pyarrow 0.12.0 Version 0.12.0 includes the following selected fixes/improvements relevant to Spark users: * Safe cast fails from numpy float64 array with nans to integer, ARROW-4258 * Java, Reduce heap usage for variable width vectors, ARROW-4147 * Binary identity cast not implemented, ARROW-4101 * pyarrow open_stream deprecated, use ipc.open_stream, ARROW-4098 * conversion to date object no longer needed, ARROW-3910 * Error reading IPC file with no record batches, ARROW-3894 * Signed to unsigned integer cast yields incorrect results when type sizes are the same, ARROW-3790 * from_pandas gives incorrect results when converting floating point to bool, ARROW-3428 * Import pyarrow fails if scikit-learn is installed from conda (boost-cpp / libboost issue), ARROW-3048 * Java update to official Flatbuffers version 1.9.0, ARROW-3175 complete list [here](https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20in%20(Resolved%2C%20Closed)%20AND%20fixVersion%20%3D%200.12.0) PySpark requires the following fixes to work with PyArrow 0.12.0 * Encrypted pyspark worker fails due to ChunkedStream missing closed property * pyarrow now converts dates as objects by default, which causes error because type is assumed datetime64 * ArrowTests fails due to difference in raised error message * pyarrow.open_stream deprecated * tests fail because groupby adds index column with duplicate name ## How was this patch tested? Ran unit tests with pyarrow versions 0.8.0, 0.10.0, 0.11.1, 0.12.0 Closes #23657 from BryanCutler/arrow-upgrade-012. Authored-by: Bryan Cutler <cutlerb@gmail.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>	2019-01-29 14:18:45 +08:00
Dongjoon Hyun	81addaa6b7	[SPARK-26427][BUILD] Upgrade Apache ORC to 1.5.4 ## What changes were proposed in this pull request? This PR aims to update Apache ORC dependency to the latest version 1.5.4 released at Dec. 20. ([Release Notes](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12318320&version=12344187])) ``` [ORC-237] OrcFile.mergeFiles Specified block size is less than configured minimum value [ORC-409] Changes for extending MemoryManagerImpl [ORC-410] Fix a locale-dependent test in TestCsvReader [ORC-416] Avoid opening data reader when there is no stripe [ORC-417] Use dynamic Apache Maven mirror link [ORC-419] Ensure to call `close` at RecordReaderImpl constructor exception [ORC-432] openjdk 8 has a bug that prevents surefire from working [ORC-435] Ability to read stripes that are greater than 2GB [ORC-437] Make acid schema checks case insensitive [ORC-411] Update build to work with Java 10. [ORC-418] Fix broken docker build script ``` ## How was this patch tested? Build and pass Jenkins. Closes #23364 from dongjoon-hyun/SPARK-26427. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2018-12-22 00:41:21 -08:00
Sean Owen	2ea9792fde	[SPARK-26266][BUILD] Update to Scala 2.12.8 ## What changes were proposed in this pull request? Update to Scala 2.12.8 ## How was this patch tested? Existing tests. Closes #23218 from srowen/SPARK-26266. Authored-by: Sean Owen <sean.owen@databricks.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2018-12-08 05:59:53 -06:00
Dongjoon Hyun	4772265203	[SPARK-26298][BUILD] Upgrade Janino to 3.0.11 ## What changes were proposed in this pull request? This PR aims to upgrade Janino compiler to the latest version 3.0.11. The followings are the changes from the [release note](http://janino-compiler.github.io/janino/changelog.html). - Script with many "helper" variables. - Java 9+ compatibility - Compilation Error Messages Generated by JDK. - Added experimental support for the "StackMapFrame" attribute; not active yet. - Make Unparser more flexible. - Fixed NPEs in various "toString()" methods. - Optimize static method invocation with rvalue target expression. - Added all missing "ClassFile.getConstant*Info()" methods, removing the necessity for many type casts. ## How was this patch tested? Pass the Jenkins with the existing tests. Closes #23250 from dongjoon-hyun/SPARK-26298. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2018-12-06 20:50:57 -08:00
Takanobu Asanuma	15c0384977	[SPARK-26134][CORE] Upgrading Hadoop to 2.7.4 to fix java.version problem ## What changes were proposed in this pull request? When I ran spark-shell on JDK11+28(2018-09-25), It failed with the error below. ``` Exception in thread "main" java.lang.ExceptionInInitializerError at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:80) at org.apache.hadoop.security.SecurityUtil.getAuthenticationMethod(SecurityUtil.java:611) at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:273) at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:261) at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:791) at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:761) at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:634) at org.apache.spark.util.Utils$.$anonfun$getCurrentUserName$1(Utils.scala:2427) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2427) at org.apache.spark.SecurityManager.<init>(SecurityManager.scala:79) at org.apache.spark.deploy.SparkSubmit.secMgr$lzycompute$1(SparkSubmit.scala:359) at org.apache.spark.deploy.SparkSubmit.secMgr$1(SparkSubmit.scala:359) at org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$9(SparkSubmit.scala:367) at scala.Option.map(Option.scala:146) at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:367) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:143) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:927) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:936) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.lang.StringIndexOutOfBoundsException: begin 0, end 3, length 2 at java.base/java.lang.String.checkBoundsBeginEnd(String.java:3319) at java.base/java.lang.String.substring(String.java:1874) at org.apache.hadoop.util.Shell.<clinit>(Shell.java:52) ``` This is a Hadoop issue that fails to parse some java.version. It has been fixed from Hadoop-2.7.4(see [HADOOP-14586](https://issues.apache.org/jira/browse/HADOOP-14586)). Note, Hadoop-2.7.5 or upper have another problem with Spark ([SPARK-25330](https://issues.apache.org/jira/browse/SPARK-25330)). So upgrading to 2.7.4 would be fine for now. ## How was this patch tested? Existing tests. Closes #23101 from tasanuma/SPARK-26134. Authored-by: Takanobu Asanuma <tasanuma@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2018-11-21 23:09:57 -08:00
DB Tsai	ad853c5678	[SPARK-25956] Make Scala 2.12 as default Scala version in Spark 3.0 ## What changes were proposed in this pull request? This PR makes Spark's default Scala version as 2.12, and Scala 2.11 will be the alternative version. This implies that Scala 2.12 will be used by our CI builds including pull request builds. We'll update the Jenkins to include a new compile-only jobs for Scala 2.11 to ensure the code can be still compiled with Scala 2.11. ## How was this patch tested? existing tests Closes #22967 from dbtsai/scala2.12. Authored-by: DB Tsai <d_tsai@apple.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2018-11-14 16:22:23 -08:00
Sean Owen	722369ee55	[SPARK-24421][BUILD][CORE] Accessing sun.misc.Cleaner in JDK11 …. Other related changes to get JDK 11 working, to test ## What changes were proposed in this pull request? - Access `sun.misc.Cleaner` (Java 8) and `jdk.internal.ref.Cleaner` (JDK 9+) by reflection (note: the latter only works if illegal reflective access is allowed) - Access `sun.misc.Unsafe.invokeCleaner` in Java 9+ instead of `sun.misc.Cleaner` (Java 8) In order to test anything on JDK 11, I also fixed a few small things, which I include here: - Fix minor JDK 11 compile issues - Update scala plugin, Jetty for JDK 11, to facilitate tests too This doesn't mean JDK 11 tests all pass now, but lots do. Note also that the JDK 9+ solution for the Cleaner has a big caveat. ## How was this patch tested? Existing tests. Manually tested JDK 11 build and tests, and tests covering this change appear to pass. All Java 8 tests should still pass, but this change alone does not achieve full JDK 11 compatibility. Closes #22993 from srowen/SPARK-24421. Authored-by: Sean Owen <sean.owen@databricks.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2018-11-14 12:52:54 -08:00
gatorsmile	0ba9715c7d	[SPARK-26005][SQL] Upgrade ANTRL from 4.7 to 4.7.1 ## What changes were proposed in this pull request? Based on the release description of ANTRL 4.7.1., https://github.com/antlr/antlr4/releases, let us upgrade our parser to 4.7.1. ## How was this patch tested? N/A Closes #23005 from gatorsmile/upgradeAntlr4.7. Authored-by: gatorsmile <gatorsmile@gmail.com> Signed-off-by: gatorsmile <gatorsmile@gmail.com>	2018-11-11 23:21:47 -08:00
DB Tsai	3ed91c9b89	[SPARK-25946][BUILD] Upgrade ASM to 7.x to support JDK11 ## What changes were proposed in this pull request? Upgrade ASM to 7.x to support JDK11 ## How was this patch tested? Existing tests. Closes #22953 from dbtsai/asm7. Authored-by: DB Tsai <d_tsai@apple.com> Signed-off-by: DB Tsai <d_tsai@apple.com>	2018-11-06 05:38:59 +00:00
Dongjoon Hyun	e4cb42ad89	[SPARK-25891][PYTHON] Upgrade to Py4J 0.10.8.1 ## What changes were proposed in this pull request? Py4J 0.10.8.1 is released on October 21st and is the first release of Py4J to support Python 3.7 officially. We had better have this to get the official support. Also, there are some patches related to garbage collections. https://www.py4j.org/changelog.html#py4j-0-10-8-and-py4j-0-10-8-1 ## How was this patch tested? Pass the Jenkins. Closes #22901 from dongjoon-hyun/SPARK-25891. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2018-10-31 09:55:03 -07:00

1 2 3 4 5 ...

280 commits