## What changes were proposed in this pull request?
We're using an old-ish jQuery, 1.12.4, and should probably update for Spark 3 to keep up in general, but also to keep up with CVEs. In fact, we know of at least one resolved in only 3.4.0+ (https://nvd.nist.gov/vuln/detail/CVE-2019-11358). They may not affect Spark, but, if the update isn't painful, maybe worthwhile in order to make future 3.x updates easier.
jQuery 1 -> 2 doesn't sound like a breaking change, as 2.0 is supposed to maintain compatibility with 1.9+ (https://blog.jquery.com/2013/04/18/jquery-2-0-released/)
2 -> 3 has breaking changes: https://jquery.com/upgrade-guide/3.0/. It's hard to evaluate each one, but the most likely area for problems is in ajax(). However, our usage of jQuery (and plugins) is pretty simple.
Update jquery to 3.4.1; update jquery blockUI and mustache to latest
## How was this patch tested?
Manual testing of docs build (except R docs), worker/master UI, spark application UI.
Note: this really doesn't guarantee it works, as our tests can't test javascript, and this is merely anecdotal testing, although I clicked about every link I could find. There's a risk this breaks a minor part of the UI; it does seem to work fine in the main.
Closes#24843 from srowen/SPARK-28004.
Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
## What changes were proposed in this pull request?
For Apache Spark 3.0.0 release, this PR aims to update Kafka dependency to 2.2.1 to bring the following improvement and bug fixes like [KAFKA-8134](https://issues.apache.org/jira/browse/KAFKA-8134) (`'linger.ms' must be a long`).
https://issues.apache.org/jira/projects/KAFKA/versions/12345010
## How was this patch tested?
Pass the Jenkins.
Closes#24847 from dongjoon-hyun/SPARK-28013.
Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
## What changes were proposed in this pull request?
This PR introduces the necessary Maven modules for the new [Spark Graph](https://issues.apache.org/jira/browse/SPARK-25994) feature for Spark 3.0.
* `spark-graph` is a parent module that users depend on to get all graph functionalities (Cypher and Graph Algorithms)
* `spark-graph-api` defines the [Property Graph API](https://docs.google.com/document/d/1Wxzghj0PvpOVu7XD1iA8uonRYhexwn18utdcTxtkxlI) that is being shared between Cypher and Algorithms
* `spark-cypher` contains a Cypher query engine implementation
Both, `spark-graph-api` and `spark-cypher` depend on Spark SQL.
Note, that the Maven module for Graph Algorithms is not part of this PR and will be introduced in https://issues.apache.org/jira/browse/SPARK-27302
A PoC for a running Cypher implementation can be found in this WIP PR https://github.com/apache/spark/pull/24297
## How was this patch tested?
Pass the Jenkins with all profiles and manually build and check the followings.
```
$ ls assembly/target/scala-2.12/jars/spark-cypher*
assembly/target/scala-2.12/jars/spark-cypher_2.12-3.0.0-SNAPSHOT.jar
$ ls assembly/target/scala-2.12/jars/spark-graph* | grep -v graphx
assembly/target/scala-2.12/jars/spark-graph-api_2.12-3.0.0-SNAPSHOT.jar
assembly/target/scala-2.12/jars/spark-graph_2.12-3.0.0-SNAPSHOT.jar
```
Closes#24490 from s1ck/SPARK-27300.
Lead-authored-by: Martin Junghanns <martin.junghanns@neotechnology.com>
Co-authored-by: Max Kießling <max@kopfueber.org>
Co-authored-by: Martin Junghanns <martin.junghanns@neo4j.com>
Co-authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
## What changes were proposed in this pull request?
This pr moves Hive test jars(`hive-contrib-0.13.1.jar`, `hive-hcatalog-core-0.13.1.jar`, `hive-contrib-2.3.5.jar` and `hive-hcatalog-core-2.3.5.jar`) to maven dependency.
## How was this patch tested?
Existing test
Please note that this pr need test with `maven` and `sbt`.
Closes#24751 from wangyum/SPARK-27831.
Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
## What changes were proposed in this pull request?
Move to json4s version 3.6.6
Add scala-xml 1.2.0
## How was this patch tested?
Pass the Jenkins
Closes#24736 from igreenfield/master.
Authored-by: Izek Greenfield <igreenfield@axiomsl.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
## What changes were proposed in this pull request?
This fixes CVE-2019-12086 on Databind: https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.9.9
## How was this patch tested?
Existing tests
Closes#24646 from Fokko/SPARK-27757.
Authored-by: Fokko Driesprong <fokko@apache.org>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
## What changes were proposed in this pull request?
This pr moves Hive test jars(`hive-contrib-0.13.1.jar`, `hive-hcatalog-core-0.13.1.jar`, `hive-contrib-2.3.5.jar` and `hive-hcatalog-core-2.3.5.jar`) to maven dependency.
## How was this patch tested?
Existing test
Closes#24695 from wangyum/SPARK-27831.
Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
## What changes were proposed in this pull request?
Leave avro, avro-ipc dependendencies as compile scope even for hadoop-provided usages, to ensure 1.8 is used. Hadoop 2.7 has Avro 1.7, and Spark won't generally work with that. Reports from the field are that this works, to include avro 1.8 with the Spark distro on Hadoop 2.7.
## How was this patch tested?
Existing tests
Closes#24680 from srowen/SPARK-26045.
Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
## What changes were proposed in this pull request?
This PR aims to update `zstd-jni` library to `1.4.0-1` which improves the `level 1 compression speed` performance by 6% in most scenarios. The following is the full release note.
- https://github.com/facebook/zstd/releases/tag/v1.4.0
## How was this patch tested?
Pass the Jenkins.
Closes#24632 from dongjoon-hyun/SPARK-27755.
Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
## What changes were proposed in this pull request?
This PR upgrades lz4-java from 1.5.1 to 1.6.0. Lz4-java is available at https://github.com/lz4/lz4-java.
Changes from 1.5.1:
- Upgraded LZ4 to 1.9.1. Updated the JNI bindings, except for the one for Linux/i386. Decompression speed is improved on amd64.
- Deprecated use of LZ4FastDecompressor of a native instance because the corresponding C API function is deprecated. See the release note of LZ4 1.9.0 for details. Updated javadoc accordingly.
- Changed the module name from org.lz4.lz4-java to org.lz4.java to avoid using - in the module name. (severn-everett, Oliver Eikemeier, Rei Odaira)
- Enabled build with Java 11. Note that the distribution is still built with Java 7. (Rei Odaira)
## How was this patch tested?
Existing tests.
Closes#24629 from kiszk/SPARK-27752.
Authored-by: Kazuaki Ishizaki <ishizaki@jp.ibm.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
## What changes were proposed in this pull request?
Fixed the `spark-<version>-yarn-shuffle.jar` artifact packaging to shade the native netty libraries:
- shade the `META-INF/native/libnetty_*` native libraries when packagin
the yarn shuffle service jar. This is required as netty library loader
derives that based on shaded package name.
- updated the `org/spark_project` shade package prefix to `org/sparkproject`
(i.e. removed underscore) as the former breaks the netty native lib loading.
This was causing the yarn external shuffle service to fail
when spark.shuffle.io.mode=EPOLL
## How was this patch tested?
Manual tests
Closes#24502 from amuraru/SPARK-27610_master.
Authored-by: Adi Muraru <amuraru@adobe.com>
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
## What changes were proposed in this pull request?
This PR aims to upgrade Maven to 3.6.1 to bring JDK9+ related patches like [MNG-6506](https://issues.apache.org/jira/browse/MNG-6506). For the full release note, please see the following.
- https://maven.apache.org/docs/3.6.1/release-notes.html
This was committed and reverted due to AppVeyor failure. It turns out that the root cause is `PATH` issue. With the updated AppVeyor script, it passed.
https://ci.appveyor.com/project/ApacheSoftwareFoundation/spark/builds/24273412
## How was this patch tested?
Pass the Jenkins and AppVoyer
Closes#24481 from dongjoon-hyun/SPARK-R.
Lead-authored-by: Dongjoon Hyun <dhyun@apple.com>
Co-authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
PR #23890 introduced `org.glassfish.jaxb:jaxb-runtime:2.3.2` as a runtime dependency. As an unexpected side effect, `jakarta.activation:jakarta.activation-api:1.2.1` was also pulled in as a transitive dependency. As a result, for the Maven build, both of the following two jars can be found under `assembly/target/scala-2.12/jars/`:
```
activation-1.1.1.jar
jakarta.activation-api-1.2.1.jar
```
This PR exludes the Jakarta one.
Manually built Spark using Maven and checked files under `assembly/target/scala-2.12/jars/`. After this change, only `activation-1.1.1.jar` is there.
Closes#24507 from liancheng/spark-27611.
Authored-by: Cheng Lian <lian@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
## What changes were proposed in this pull request?
One more place to update ASM 7.0 -> 7.1
## How was this patch tested?
Existing tests
Closes#24508 from srowen/SPARK-27493.3.
Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
## What changes were proposed in this pull request?
This PR aims to upgrade Surefire plugin to 3.0.0-M3 to bring [SUREFIRE-1613](https://issues.apache.org/jira/browse/SUREFIRE-1613).
## How was this patch tested?
Pass the Jenkins with the existing tests.
Closes#24501 from dongjoon-hyun/SPARK-27608.
Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
## What changes were proposed in this pull request?
hadoop-2.7 gets `commons-logging` version from `hive-metastore`:
```
[INFO] +- org.spark-project.hive:hive-metastore:jar:1.2.1.spark2:compile
[INFO] | +- com.jolbox:bonecp:jar:0.8.0.RELEASE:compile
[INFO] | +- commons-cli:commons-cli:jar:1.2:compile
[INFO] | +- commons-logging:commons-logging:jar:1.1.3:compile
```
But Hive removes `commons-logging` since [HIVE-12237(Hive 2.0.0)](https://issues.apache.org/jira/browse/HIVE-12237), so hadoop-3.2 gets `commons-logging` from `commons-httpclient`:
```
[INFO] +- commons-httpclient:commons-httpclient:jar:3.1:compile
[INFO] | \- commons-logging:commons-logging:jar:1.0.4:compile
```
Thus. we may hint `LogConfigurationException`:
```
bin/spark-sql --conf spark.sql.hive.metastore.version=1.2.2 --conf spark.sql.hive.metastore.jars=file:///apache/hive-1.2.2-bin/lib/*
...
Caused by: org.apache.commons.logging.LogConfigurationException: Invalid class loader hierarchy. You have more than one version of 'org.apache.commons.logging.Log' visible, which is not allowed.
at org.apache.commons.logging.impl.LogFactoryImpl.getLogConstructor(LogFactoryImpl.java:385)
... 43 more
```
This pr upgrade `commons-logging` to 1.1.3 for hadoop-3.2 to fix this issue.
## How was this patch tested?
manual tests
Closes#24388 from wangyum/SPARK-27481.
Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
## What changes were proposed in this pull request?
This PR aims to upgrade Maven to 3.6.1 to bring JDK9+ related patches like [MNG-6506](https://issues.apache.org/jira/browse/MNG-6506). For the full release note, please see the following.
- https://maven.apache.org/docs/3.6.1/release-notes.html
## How was this patch tested?
Pass the Jenkins with `[test-maven]` tag.
Closes#24377 from dongjoon-hyun/SPARK-27467.
Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
## What changes were proposed in this pull request?
Unify commons-beanutils deps to latest 1.9.3. This resolves the version inconsistency in Hadoop 2.7's build and also picks up security and bug fixes.
## How was this patch tested?
Existing tests.
Closes#24378 from srowen/SPARK-27469.
Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
## What changes were proposed in this pull request?
This PR upgrades `lz4-java` to 1.5.1 in order to get a patch for avoiding racing with GC.
- https://github.com/lz4/lz4-java/blob/master/CHANGES.md#151
## How was this patch tested?
Pass the Jenkins with the existing tests.
Closes#24363 from dongjoon-hyun/SPARK-LZ4.
Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
## What changes were proposed in this pull request?
This PR mainly contains:
1. Upgrade hadoop-3's built-in Hive maven dependencies to 2.3.4.
2. Resolve compatibility issues between Hive 1.2.1 and Hive 2.3.4 in the `sql/hive` module.
## How was this patch tested?
jenkins test hadoop-2.7
manual test hadoop-3:
```shell
build/sbt clean package -Phadoop-3.2 -Phive
export SPARK_PREPEND_CLASSES=true
# rm -rf metastore_db
cat <<EOF > test_hadoop3.scala
spark.range(10).write.saveAsTable("test_hadoop3")
spark.table("test_hadoop3").show
EOF
bin/spark-shell --conf spark.hadoop.hive.metastore.schema.verification=false --conf spark.hadoop.datanucleus.schema.autoCreateAll=true -i test_hadoop3.scala
```
Closes#23788 from wangyum/SPARK-23710-hadoop3.
Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: gatorsmile <gatorsmile@gmail.com>
## What changes were proposed in this pull request?
HighlyCompressedMapStatus uses RoaringBitmap to record the empty blocks. But RoaringBitmap couldn't be ser/deser with unsafe KryoSerializer.
It's a bug of RoaringBitmap-0.5.11 and fixed in latest version.
This is an update of #24157
## How was this patch tested?
Add a UT
Closes#24264 from LantaoJin/SPARK-27216.
Lead-authored-by: LantaoJin <jinlantao@gmail.com>
Co-authored-by: Lantao Jin <jinlantao@gmail.com>
Signed-off-by: Imran Rashid <irashid@cloudera.com>
## What changes were proposed in this pull request?
(See JIRA for problem statement)
Update snappy 1.1.7.1 -> 1.1.7.3 to pick up an empty-stream and Java 9 fix.
There appear to be no other changes of consequence:
https://github.com/xerial/snappy-java/blob/master/Milestone.md
## How was this patch tested?
Existing tests
Closes#24242 from srowen/SPARK-27267.
Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
## What changes were proposed in this pull request?
Remove Scala 2.11 support in build files and docs, and in various parts of code that accommodated 2.11. See some targeted comments below.
## How was this patch tested?
Existing tests.
Closes#23098 from srowen/SPARK-26132.
Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
## What changes were proposed in this pull request?
This PR aims to update Kafka dependency to 2.2.0 to bring the following improvement and bug fixes.
- https://issues.apache.org/jira/projects/KAFKA/versions/12344063
Due to [KAFKA-4453](https://issues.apache.org/jira/browse/KAFKA-4453), data plane API and controller plane API are separated. Apache Spark needs the following changes.
```scala
- servers.head.apis.metadataCache
+ servers.head.dataPlaneRequestProcessor.metadataCache
```
## How was this patch tested?
Pass the Jenkins with the existing tests.
Closes#24190 from dongjoon-hyun/SPARK-27260.
Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
## What changes were proposed in this pull request?
Fix Scala 2.11 maven build issue after merging SPARK-26946.
## How was this patch tested?
Maven Scala 2.11 and 2.12 builds with `-Phadoop-provided -Phadoop-2.7 -Pyarn -Phive -Phive-thriftserver`.
Closes#24184 from jzhuge/SPARK-26946-1.
Authored-by: John Zhuge <jzhuge@apache.org>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
## What changes were proposed in this pull request?
Since [YARN-7113](https://issues.apache.org/jira/browse/YARN-7113)(Hadoop-3.1.0), `hadoop-client` add `javax.ws.rs:jsr311-api` to its dependency. This conflict with [javax.ws.rs-api-2.0.1.jar](f26a1f3d37/dev/deps/spark-deps-hadoop-3.1 (L105)).
```shell
build/sbt "core/testOnly *.UISeleniumSuite *.HistoryServerSuite" -Phadoop-3.2
...
[info] <pre> Server Error</pre></p><h3>Caused by:</h3><pre>java.lang.NoSuchMethodError: javax.ws.rs.core.Application.getProperties()Ljava/util/Map;
...
```
This pr exclude `javax.ws.rs:jsr311-api` from hadoop-client.
## How was this patch tested?
manual tests:
```shell
build/sbt "core/testOnly *.UISeleniumSuite *.HistoryServerSuite" -Phadoop-3.2
```
Closes#24114 from wangyum/SPARK-27179.
Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
## What changes were proposed in this pull request?
This PR upgrade `hadoop-3` to `3.2.0` to workaround [HADOOP-16086](https://issues.apache.org/jira/browse/HADOOP-16086). Otherwise some test case will throw IllegalArgumentException:
```java
02:44:34.707 ERROR org.apache.hadoop.hive.ql.exec.Task: Job Submission failed with exception 'java.io.IOException(Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses.)'
java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses.
at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:116)
at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:109)
at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:102)
at org.apache.hadoop.mapred.JobClient.init(JobClient.java:475)
at org.apache.hadoop.mapred.JobClient.<init>(JobClient.java:454)
at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:369)
at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:151)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2183)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1839)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1526)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1227)
at org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$runHive$1(HiveClientImpl.scala:730)
at org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:283)
at org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:221)
at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:220)
at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:266)
at org.apache.spark.sql.hive.client.HiveClientImpl.runHive(HiveClientImpl.scala:719)
at org.apache.spark.sql.hive.client.HiveClientImpl.runSqlHive(HiveClientImpl.scala:709)
at org.apache.spark.sql.hive.StatisticsSuite.createNonPartitionedTable(StatisticsSuite.scala:719)
at org.apache.spark.sql.hive.StatisticsSuite.$anonfun$testAlterTableProperties$2(StatisticsSuite.scala:822)
```
## How was this patch tested?
manual tests
Closes#24106 from wangyum/SPARK-27175.
Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
## What changes were proposed in this pull request?
This PR aims to update Apache ORC dependency to fix [SPARK-27107](https://issues.apache.org/jira/browse/SPARK-27107) .
```
[ORC-452] Support converting MAP column from JSON to ORC Improvement
[ORC-447] Change the docker scripts to keep a persistent m2 cache
[ORC-463] Add `version` command
[ORC-475] ORC reader should lazily get filesystem
[ORC-476] Make SearchAgument kryo buffer size configurable
```
## How was this patch tested?
Pass the Jenkins with the existing tests.
Closes#24096 from dongjoon-hyun/SPARK-27165.
Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
## What changes were proposed in this pull request?
**ScalaTest 3.0.5 Release Notes**
**Bug Fixes**
- Fixed the implicit view not available problem when used with compile macro.
- Fixed a stack depth problem in RefSpecLike and fixture.SpecLike under Scala 2.13.
- Changed Framework and ScalaTestFramework to set spanScaleFactor for Runner object instances for different Runners using different class loaders. This fixed a problem whereby an incorrect Runner.spanScaleFactor could be used when the tests for multiple sbt project's were run concurrently.
- Fixed a bug in endsWith regex matcher.
**Improvements**
- Removed duplicated parsing code for -C in ArgsParser.
- Improved performance in WebBrowser.
- Documentation typo rectification.
- Improve validity of Junit XML reports.
- Improved performance by replacing all .size == 0 and .length == 0 to .isEmpty.
**Enhancements**
- Added 'C' option to -P, which will tell -P to use cached thread pool.
- External Dependencies Update
- Bumped up scala-js version to 0.6.22.
- Changed to depend on mockito-core, not mockito-all.
- Bumped up jmock version to 2.8.3.
- Bumped up junit version to 4.12.
- Removed dependency to scala-parser-combinators.
More details:
http://www.scalatest.org/release_notes/3.0.5
## How was this patch tested?
manual tests on local machine:
```
nohup build/sbt clean -Djline.terminal=jline.UnsupportedTerminal -Phadoop-2.7 -Pkubernetes -Phive-thriftserver -Pyarn -Pspark-ganglia-lgpl -Phive -Pkinesis-asl -Pmesos test > run.scalatest.log &
```
Closes#24042 from wangyum/SPARK-27120.
Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
## What changes were proposed in this pull request?
This pr adds 2 maven properties to help us upgrade the built-in Hive.
| Property Name | Default | In future |
| ------ | ------ | ------ |
| hive.classifier | (none) | core |
| hive.parquet.group | com.twitter | org.apache.parquet |
## How was this patch tested?
existing tests
Closes#23996 from wangyum/add_2_maven_properties.
Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
## What changes were proposed in this pull request?
Fasterxml Jackson version before 2.9.8 is affected by multiple [CVEs](https://github.com/FasterXML/jackson-databind/issues/2186), we need to fix bump the dependent Jackson to 2.9.8.
## How was this patch tested?
Existing tests and offline benchmark.
I have run ```SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain org.apache.spark.sql.execution.datasources.json.JSONBenchmark"``` to check there is no performance degradation for this upgrade.
Closes#23965 from yanboliang/SPARK-27051.
Authored-by: Yanbo Liang <ybliang8@gmail.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
## What changes were proposed in this pull request?
Update Thrift to 0.12.0 to pick up bug and security fixes.
Changes: https://github.com/apache/thrift/blob/master/CHANGES.md
The important one is for https://issues.apache.org/jira/browse/THRIFT-4506
## How was this patch tested?
Existing tests. A quick local test suggests this works.
Closes#23935 from srowen/SPARK-27029.
Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
## What changes were proposed in this pull request?
Remove a few new JAXB dependencies that shouldn't be necessary now.
See https://github.com/apache/spark/pull/23890#issuecomment-468299922
## How was this patch tested?
Existing tests
Closes#23923 from srowen/SPARK-26986.2.
Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
## What changes were proposed in this pull request?
Add reference JAXB impl for Java 9+ from Glassfish. Right now it's only apparently necessary in MLlib but can be expanded later.
## How was this patch tested?
Existing tests particularly PMML-related ones, which use JAXB.
This works on Java 11.
Closes#23890 from srowen/SPARK-26986.
Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
## What changes were proposed in this pull request?
Since Spark 2.0, Akka is not used anymore and Akka related stuff were removed. However there are still some leftover. This PR aims to remove these leftover.
* `/pom.xml` has a comment about Akka, which is not needed anymore.
## How was this patch tested?
Existing tests.
Closes#23885 from seancxmao/remove-akka-leftover.
Authored-by: seancxmao <seancxmao@gmail.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
## What changes were proposed in this pull request?
JPMML apparently only supports Java 9 in 1.4.2+. We are seeing text failures from JPMML relating to JAXB when running on Java 11. It's shaded and not a big change, so should be safe.
## How was this patch tested?
Existing tests.
Closes#23868 from srowen/SPARK-26966.
Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
## What changes were proposed in this pull request?
As a part of preparing the official JDK 11 support ([SPARK-24417](https://issues.apache.org/jira/browse/SPARK-24417)), Spark 3.0.0 upgraded KAFKA version to 2.1.0. This PR updates Kafka dependency to 2.1.1 to bring the following 42 bug fixes.
- https://issues.apache.org/jira/projects/KAFKA/versions/12344250
## How was this patch tested?
Pass the Jenkins with the existing tests.
Closes#23831 from dongjoon-hyun/SPARK-26916.
Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
## What changes were proposed in this pull request?
Update to Parquet Java 1.10.1.
## How was this patch tested?
Added a test from HyukjinKwon that validates the notEq case from SPARK-26677.
Closes#23704 from rdblue/SPARK-26677-fix-noteq-parquet-bug.
Lead-authored-by: Ryan Blue <blue@apache.org>
Co-authored-by: Hyukjin Kwon <gurwls223@apache.org>
Co-authored-by: Ryan Blue <rdblue@users.noreply.github.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
## What changes were proposed in this pull request?
Upgrade Apache Arrow to version 0.12.0. This includes the Java artifacts and fixes to enable usage with pyarrow 0.12.0
Version 0.12.0 includes the following selected fixes/improvements relevant to Spark users:
* Safe cast fails from numpy float64 array with nans to integer, ARROW-4258
* Java, Reduce heap usage for variable width vectors, ARROW-4147
* Binary identity cast not implemented, ARROW-4101
* pyarrow open_stream deprecated, use ipc.open_stream, ARROW-4098
* conversion to date object no longer needed, ARROW-3910
* Error reading IPC file with no record batches, ARROW-3894
* Signed to unsigned integer cast yields incorrect results when type sizes are the same, ARROW-3790
* from_pandas gives incorrect results when converting floating point to bool, ARROW-3428
* Import pyarrow fails if scikit-learn is installed from conda (boost-cpp / libboost issue), ARROW-3048
* Java update to official Flatbuffers version 1.9.0, ARROW-3175
complete list [here](https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20in%20(Resolved%2C%20Closed)%20AND%20fixVersion%20%3D%200.12.0)
PySpark requires the following fixes to work with PyArrow 0.12.0
* Encrypted pyspark worker fails due to ChunkedStream missing closed property
* pyarrow now converts dates as objects by default, which causes error because type is assumed datetime64
* ArrowTests fails due to difference in raised error message
* pyarrow.open_stream deprecated
* tests fail because groupby adds index column with duplicate name
## How was this patch tested?
Ran unit tests with pyarrow versions 0.8.0, 0.10.0, 0.11.1, 0.12.0
Closes#23657 from BryanCutler/arrow-upgrade-012.
Authored-by: Bryan Cutler <cutlerb@gmail.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
## What changes were proposed in this pull request?
There are ugly provided dependencies inside core for the following:
* Hive
* Kafka
In this PR I've extracted them out. This PR contains the following:
* Token providers are now loaded with service loader
* Hive token provider moved to hive project
* Kafka token provider extracted into a new project
## How was this patch tested?
Existing + newly added unit tests.
Additionally tested on cluster.
Closes#23499 from gaborgsomogyi/SPARK-26254.
Authored-by: Gabor Somogyi <gabor.g.somogyi@gmail.com>
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>