Commit graph

1023 commits

Author SHA1 Message Date
Kent Yao 2507301705 [SPARK-33159][SQL] Use hive-service-rpc as dependency instead of inlining the generated code
### What changes were proposed in this pull request?

Hive's `hive-service-rpc` module started since hive-2.1.0 and it contains only the thrift IDL file and the code generated by it.

Removing the inlined code will help maintain and upgrade builtin hive versions

### Why are the changes needed?

to simply the code.

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

passing CI

Closes #30055 from yaooqinn/SPARK-33159.

Authored-by: Kent Yao <yaooqinn@hotmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-10-16 09:37:54 -07:00
Dongjoon Hyun 9896288b88 [SPARK-33117][BUILD] Update zstd-jni to 1.4.5-6
### What changes were proposed in this pull request?

This PR aims to upgrade ZStandard library for Apache Spark 3.1.0.

### Why are the changes needed?

This will bring the latest bug fixes.
- 2662fbdc32
- bbe140b758

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CI.

Closes #30010 from dongjoon-hyun/SPARK-33117.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2020-10-12 00:27:53 -07:00
Yuming Wang 543d59dfbf [SPARK-33107][BUILD][FOLLOW-UP] Remove com.twitter:parquet-hadoop-bundle:1.6.0 and orc.classifier
### What changes were proposed in this pull request?

This pr removes `com.twitter:parquet-hadoop-bundle:1.6.0` and `orc.classifier`.

### Why are the changes needed?

To make code more clear and readable.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing test.

Closes #30005 from wangyum/SPARK-33107.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2020-10-11 21:54:56 -07:00
Dongjoon Hyun 008a2ad1f8 [SPARK-20202][BUILD][SQL] Remove references to org.spark-project.hive (Hive 1.2.1)
### What changes were proposed in this pull request?

As of today,
- SPARK-30034 Apache Spark 3.0.0 switched its default Hive execution engine from Hive 1.2 to Hive 2.3. This removes the direct dependency to the forked Hive 1.2.1 in maven repository.
- SPARK-32981 Apache Spark 3.1.0(`master` branch) removed Hive 1.2 related artifacts from Apache Spark binary distributions.

This PR(SPARK-20202) aims to remove the following usage of unofficial Apache Hive fork completely from Apache Spark master for Apache Spark 3.1.0.
```
<hive.group>org.spark-project.hive</hive.group>
<hive.version>1.2.1.spark2</hive.version>
```

For the forked Hive 1.2.1.spark2 users, Apache Spark 2.4(LTS) and 3.0 (~ 2021.12) will provide it.

### Why are the changes needed?

- First, Apache Spark community should not use the unofficial forked release of another Apache project.
- Second, Apache Hive 1.2.1 was released at 2015-06-26 and the forked Hive `1.2.1.spark2` exposed many unfixable bugs in Apache because the forked `1.2.1.spark2` is not maintained at all. Apache Hive 2.3.0 was released at 2017-07-19 and it has been used with less number of bugs compared with `1.2.1.spark2`. Many bugs still exist in `hive-1.2` profile and new Apache Spark unit tests are added with `HiveUtils.isHive23` condition so far.

### Does this PR introduce _any_ user-facing change?

No. This is a dev-only change. PRBuilder will not accept `[test-hive1.2]` on master and `branch-3.1`.

### How was this patch tested?

1. SBT/Hadoop 3.2/Hive 2.3 (https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129366)
2. SBT/Hadoop 2.7/Hive 2.3 (https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129382)
3. SBT/Hadoop 3.2/Hive 1.2 (This has not been supported already due to Hive 1.2 doesn't work with Hadoop 3.2.)
4. SBT/Hadoop 2.7/Hive 1.2 (https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129383, This is rejected)

Closes #29936 from dongjoon-hyun/SPARK-REMOVE-HIVE1.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2020-10-05 15:29:56 -07:00
Dongjoon Hyun aa6657981a [SPARK-33050][BUILD] Upgrade Apache ORC to 1.5.12
### What changes were proposed in this pull request?

This PR aims to upgrade Apache ORC to 1.5.12.

### Why are the changes needed?

This brings us the latest bug patches like the followings.
- ORC-644 nested struct evolution does not respect to orc.force.positional.evolution
- ORC-667 Positional mapping for nested struct types should not applied by default

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CI.

Closes #29930 from dongjoon-hyun/SPARK-ORC-1.5.12.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2020-10-02 00:06:03 -07:00
Dongjoon Hyun 9c618b3308 [SPARK-33047][BUILD] Upgrade hive-storage-api to 2.7.2
### What changes were proposed in this pull request?

This PR aims to upgrade Apache Hive `hive-storage-api` library from 2.7.1 to 2.7.2.

### Why are the changes needed?

[storage-api 2.7.2](https://github.com/apache/hive/commits/rel/storage-release-2.7.2/storage-api) has the following extension and can be used when users uses a provided orc dependency.

[HIVE-22959](dade9919d9 (diff-ccfc9dd7584117f531322cda3a29f3c3)) : Extend storage-api to expose FilterContext
[HIVE-23215](361925d2f3 (diff-ccfc9dd7584117f531322cda3a29f3c3)) : Make FilterContext and MutableFilterContext interfaces

### Does this PR introduce _any_ user-facing change?

Yes. This is a dependency change.

### How was this patch tested?

Pass the existing tests.

Closes #29923 from dongjoon-hyun/SPARK-33047.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2020-10-01 12:41:40 -07:00
Bryan Cutler e0538bd38c [SPARK-32312][SQL][PYTHON][TEST-JAVA11] Upgrade Apache Arrow to version 1.0.1
### What changes were proposed in this pull request?

Upgrade Apache Arrow to version 1.0.1 for the Java dependency and increase minimum version of PyArrow to 1.0.0.

This release marks a transition to binary stability of the columnar format (which was already informally backward-compatible going back to December 2017) and a transition to Semantic Versioning for the Arrow software libraries. Also note that the Java arrow-memory artifact has been split to separate dependence on netty-buffer and allow users to select an allocator. Spark will continue to use `arrow-memory-netty` to maintain performance benefits.

Version 1.0.0 - 1.0.0 include the following selected fixes/improvements relevant to Spark users:

ARROW-9300 - [Java] Separate Netty Memory to its own module
ARROW-9272 - [C++][Python] Reduce complexity in python to arrow conversion
ARROW-9016 - [Java] Remove direct references to Netty/Unsafe Allocators
ARROW-8664 - [Java] Add skip null check to all Vector types
ARROW-8485 - [Integration][Java] Implement extension types integration
ARROW-8434 - [C++] Ipc RecordBatchFileReader deserializes the Schema multiple times
ARROW-8314 - [Python] Provide a method to select a subset of columns of a Table
ARROW-8230 - [Java] Move Netty memory manager into a separate module
ARROW-8229 - [Java] Move ArrowBuf into the Arrow package
ARROW-7955 - [Java] Support large buffer for file/stream IPC
ARROW-7831 - [Java] unnecessary buffer allocation when calling splitAndTransferTo on variable width vectors
ARROW-6111 - [Java] Support LargeVarChar and LargeBinary types and add integration test with C++
ARROW-6110 - [Java] Support LargeList Type and add integration test with C++
ARROW-5760 - [C++] Optimize Take implementation
ARROW-300 - [Format] Add body buffer compression option to IPC message protocol using LZ4 or ZSTD
ARROW-9098 - RecordBatch::ToStructArray cannot handle record batches with 0 column
ARROW-9066 - [Python] Raise correct error in isnull()
ARROW-9223 - [Python] Fix to_pandas() export for timestamps within structs
ARROW-9195 - [Java] Wrong usage of Unsafe.get from bytearray in ByteFunctionsHelper class
ARROW-7610 - [Java] Finish support for 64 bit int allocations
ARROW-8115 - [Python] Conversion when mixing NaT and datetime objects not working
ARROW-8392 - [Java] Fix overflow related corner cases for vector value comparison
ARROW-8537 - [C++] Performance regression from ARROW-8523
ARROW-8803 - [Java] Row count should be set before loading buffers in VectorLoader
ARROW-8911 - [C++] Slicing a ChunkedArray with zero chunks segfaults

View release notes here:
https://arrow.apache.org/release/1.0.1.html
https://arrow.apache.org/release/1.0.0.html

### Why are the changes needed?

Upgrade brings fixes, improvements and stability guarantees.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Existing tests with pyarrow 1.0.0 and 1.0.1

Closes #29686 from BryanCutler/arrow-upgrade-100-SPARK-32312.

Authored-by: Bryan Cutler <cutlerb@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-09-10 14:16:19 +09:00
Sean Owen a9d4e60a90 [SPARK-32614][SQL] Don't apply comment processing if 'comment' unset for CSV
### What changes were proposed in this pull request?

Spark's CSV source can optionally ignore lines starting with a comment char. Some code paths check to see if it's set before applying comment logic (i.e. not set to default of `\0`), but many do not, including the one that passes the option to Univocity. This means that rows beginning with a null char were being treated as comments even when 'disabled'.

### Why are the changes needed?

To avoid dropping rows that start with a null char when this is not requested or intended. See JIRA for an example.

### Does this PR introduce _any_ user-facing change?

Nothing beyond the effect of the bug fix.

### How was this patch tested?

Existing tests plus new test case.

Closes #29516 from srowen/SPARK-32614.

Authored-by: Sean Owen <srowen@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-08-26 00:25:58 +09:00
Wenchen Fan 8b119f1663 [SPARK-32640][SQL] Downgrade Janino to fix a correctness bug
### What changes were proposed in this pull request?

This PR reverts https://github.com/apache/spark/pull/27860 to downgrade Janino, as the new version has a bug.

### Why are the changes needed?

The symptom is about NaN comparison. For code below
```
if (double_value <= 0.0) {
  ...
} else {
  ...
}
```

If `double_value` is NaN, `NaN <= 0.0` is false and we should go to the else branch. However, current Spark goes to the if branch and causes correctness issues like SPARK-32640.

One way to fix it is:
```
boolean cond = double_value <= 0.0;
if (cond) {
  ...
} else {
  ...
}
```

I'm not familiar with Janino so I don't know what's going on there.

### Does this PR introduce _any_ user-facing change?

Yes, fix correctness bugs.

### How was this patch tested?

a new test

Closes #29495 from cloud-fan/revert.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-08-20 13:26:39 -07:00
Kousuke Saruta 9a79bbc8b6 [SPARK-32610][DOCS] Fix the link to metrics.dropwizard.io in monitoring.md to refer the proper version
### What changes were proposed in this pull request?

This PR fixes the link to metrics.dropwizard.io in monitoring.md to refer the proper version of the library.

### Why are the changes needed?

There are links to metrics.dropwizard.io in monitoring.md but the link targets refer the version 3.1.0, while we use 4.1.1.
Now that users can create their own metrics using the dropwizard library, it's better to fix the links to refer the proper version.

### Does this PR introduce _any_ user-facing change?

Yes. The modified links refer the version 4.1.1.

### How was this patch tested?

Build the docs and visit all the modified links.

Closes #29426 from sarutak/fix-dropwizard-url.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
2020-08-16 12:07:37 -05:00
Dongjoon Hyun eb74d55fb5 [SPARK-32568][BUILD][SS] Upgrade Kafka to 2.6.0
### What changes were proposed in this pull request?

This PR aims to update Kafka client library to 2.6.0 for Apache Spark 3.1.0.

### Why are the changes needed?

This will bring client-side bug fixes like KAFKA-10134 and KAFKA-10223.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the existing tests.

Closes #29386 from dongjoon-hyun/SPARK-32568.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-08-08 10:31:36 +09:00
yangjie01 0693d8bbf2 [SPARK-32490][BUILD] Upgrade netty-all to 4.1.51.Final
### What changes were proposed in this pull request?
This PR aims to bring the bug fixes from the latest netty version.

### Why are the changes needed?
- 4.1.48.Final: [https://github.com/netty/netty/milestone/223?closed=1](https://github.com/netty/netty/milestone/223?closed=1)(14 patches or issues)
- 4.1.49.Final: [https://github.com/netty/netty/milestone/224?closed=1](https://github.com/netty/netty/milestone/224?closed=1)(48 patches or issues)
- 4.1.50.Final: [https://github.com/netty/netty/milestone/225?closed=1](https://github.com/netty/netty/milestone/225?closed=1)(38 patches or issues)
- 4.1.51.Final: [https://github.com/netty/netty/milestone/226?closed=1](https://github.com/netty/netty/milestone/226?closed=1)(53 patches or issues)

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass the Jenkins with the existing tests.

Closes #29299 from LuciferYang/upgrade-netty-version.

Authored-by: yangjie01 <yangjie01@baidu.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-08-02 16:46:11 -07:00
Holden Karau 50911df08e [SPARK-32397][BUILD] Allow specifying of time for build to keep time consistent between modules
### What changes were proposed in this pull request?

Upgrade codehaus maven build helper to allow people to specify a time during the build to avoid snapshot artifacts with different version strings.

### Why are the changes needed?

During builds of snapshots the maven may assign different versions to different artifacts based on the time each individual sub-module starts building.

The timestamp is used as part of the version string when run `maven deploy` on a snapshot build. This results in different sub-modules having different version strings.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Manual build while specifying the current time, ensured the time is consistent in the sub components.

Open question: Ideally I'd like to backport this as well since it's sort of a bug fix and while it does change a dependency version it's not one that is propagated. I'd like to hear folks thoughts about this.

Closes #29274 from holdenk/SPARK-32397-snapshot-artifact-timestamp-differences.

Authored-by: Holden Karau <hkarau@apple.com>
Signed-off-by: DB Tsai <d_tsai@apple.com>
2020-07-29 21:39:14 +00:00
Dongjoon Hyun 13c64c2980 [SPARK-32448][K8S][TESTS] Use single version for exec-maven-plugin/scalatest-maven-plugin
### What changes were proposed in this pull request?

Two different versions are used for the same artifacts, `exec-maven-plugin` and `scalatest-maven-plugin`. This PR aims to use the same versions for `exec-maven-plugin` and `scalatest-maven-plugin`. In addition, this PR removes `scala-maven-plugin.version` from `K8s` integration suite because it's unused.

### Why are the changes needed?

This will prevent the mistake which upgrades only one place and forgets the others.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the Jenkins K8S IT.

Closes #29248 from dongjoon-hyun/SPARK-32448.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-07-26 19:25:41 -07:00
Dongjoon Hyun 83ffef7ffb [SPARK-32441][BUILD][CORE] Update json4s to 3.7.0-M5 for Scala 2.13
### What changes were proposed in this pull request?

This PR aims to upgrade `json4s` to from 3.6.6 to 3.7.0-M5 for Scala 2.13 support at Apache Spark 3.1.0 on December. We will upgrade to the latest `json4s` around November.

### Why are the changes needed?

`json4s` starts to support Scala 2.13 since v3.7.0-M4.
- https://github.com/json4s/json4s/issues/660
- b013af8e75

Old `json4s` causes many UT failures with `NoSuchMethodException`.
```scala
 Cause: java.lang.NoSuchMethodException: scala.collection.immutable.Seq$.apply(scala.collection.Seq)
  at java.lang.Class.getMethod(Class.java:1786)
```

The following is one example.
```scala
$ dev/change-scala-version.sh 2.13
$ build/mvn test -pl core --am -Pscala-2.13 -Dtest=none -DwildcardSuites=org.apache.spark.executor.CoarseGrainedExecutorBackendSuite
...
Tests: succeeded 4, failed 9, canceled 0, ignored 0, pending 0
*** 9 TESTS FAILED ***
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

1. **Scala 2.12**: Pass the Jenkins or GitHub Action with the existing tests.
2. **Scala 2.13**: Do the following manually at least.
```scala
$ dev/change-scala-version.sh 2.13
$ build/mvn test -pl core --am -Pscala-2.13 -Dtest=none -DwildcardSuites=org.apache.spark.executor.CoarseGrainedExecutorBackendSuite
...
Tests: succeeded 13, failed 0, canceled 0, ignored 0, pending 0
All tests passed.
```

Closes #29239 from dongjoon-hyun/SPARK-32441.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-07-25 20:34:31 -07:00
Dongjoon Hyun f642234d85 [SPARK-32437][CORE] Improve MapStatus deserialization speed with RoaringBitmap 0.9.0
### What changes were proposed in this pull request?

This PR aims to speed up `MapStatus` deserialization by 5~18% with the latest RoaringBitmap `0.9.0` and new APIs. Note that we focus on `deserialization` time because `serialization` occurs once while `deserialization` occurs many times.

### Why are the changes needed?

The current version is too old. We had better upgrade it to get the performance improvement and bug fixes.
Although `MapStatusesSerDeserBenchmark` is synthetic, the benchmark result is updated with this patch.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the Jenkins or GitHub Action.

Closes #29233 from dongjoon-hyun/SPARK-ROAR.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-07-25 08:07:28 -07:00
Sean Owen be2eca22e9 [SPARK-32398][TESTS][CORE][STREAMING][SQL][ML] Update to scalatest 3.2.0 for Scala 2.13.3+
### What changes were proposed in this pull request?

Updates to scalatest 3.2.0. Though it looks large, it is 99% changes to the new location of scalatest classes.

### Why are the changes needed?

3.2.0+ has a fix that is required for Scala 2.13.3+ compatibility.

### Does this PR introduce _any_ user-facing change?

No, only affects tests.

### How was this patch tested?

Existing tests.

Closes #29196 from srowen/SPARK-32398.

Authored-by: Sean Owen <srowen@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-07-23 16:20:17 -07:00
yangjie01 5e0cb3ee16 [SPARK-32305][BUILD] Make mvn clean remove metastore_db and spark-warehouse
### What changes were proposed in this pull request?

Add additional configuration to `maven-clean-plugin` to ensure cleanup `metastore_db` and `spark-warehouse` directory when execute `mvn clean` command.

### Why are the changes needed?
Now Spark support two version of build-in hive and there are some test generated meta data not in target dir like `metastore_db`,  they don't clean up automatically when we run `mvn clean` command.

So if we run `mvn clean test -pl sql/hive -am -Phadoop-2.7 -Phive -Phive-1.2 ` , the `metastore_db` dir will created and meta data will remains after test complete.

Then we need manual cleanup `metastore_db` directory to ensure `mvn clean test -pl sql/hive -am -Phadoop-2.7 -Phive` command use hive2.3 profile can succeed because the residual metastore data is not compatible.

`spark-warehouse` will also cause test failure in some data residual scenarios because test case thinks that meta data should not exist.

This pr is used to simplify manual cleanup `metastore_db` and `spark-warehouse` directory operation.

### How was this patch tested?

Manual execute `mvn clean test -pl sql/hive -am -Phadoop-2.7 -Phive -Phive-1.2`, then execute `mvn clean test -pl sql/hive -am -Phadoop-2.7 -Phive`, both commands should succeed.

Closes #29103 from LuciferYang/add-clean-directory.

Authored-by: yangjie01 <yangjie01@baidu.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-07-14 12:40:47 -07:00
Sean Owen 3ad4863673 [SPARK-29292][SPARK-30010][CORE] Let core compile for Scala 2.13
### What changes were proposed in this pull request?

The purpose of this PR is to partly resolve SPARK-29292, and fully resolve SPARK-30010, which should allow Spark to compile vs Scala 2.13 in Spark Core and up through GraphX (not SQL, Streaming, etc).

Note that we are not trying to determine here whether this makes Spark work on 2.13 yet, just compile, as a prerequisite for assessing test outcomes. However, of course, we need to ensure that the change does not break 2.12.

The changes are, in the main, adding .toSeq and .toMap calls where mutable collections / maps are returned as Seq / Map, which are immutable by default in Scala 2.13. The theory is that it should be a no-op for Scala 2.12 (these return themselves), and required for 2.13.

There are a few non-trivial changes highlighted below.
In particular, to get Core to compile, we need to resolve SPARK-30010 which removes a deprecated SparkConf method

### Why are the changes needed?

Eventually, we need to support a Scala 2.13 build, perhaps in Spark 3.1.

### Does this PR introduce _any_ user-facing change?

Yes, removal of the deprecated SparkConf.setAll overload, which isn't legal in Scala 2.13 anymore.

### How was this patch tested?

Existing tests. (2.13 was not _tested_; this is about getting it to compile without breaking 2.12)

Closes #28971 from srowen/SPARK-29292.1.

Authored-by: Sean Owen <srowen@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-07-11 14:34:02 -07:00
William Hyun 10a65ee9b4 [SPARK-32150][BUILD] Upgrade to ZStd 1.4.5-4
### What changes were proposed in this pull request?

This PR aims to upgrade to ZStd 1.4.5-4.

### Why are the changes needed?

ZStd 1.4.5-4 fixes the following.

- 3d16e51525
- 3d51bdcb82
### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the Jenkins.

Closes #28969 from williamhyun/zstd2.

Authored-by: William Hyun <williamhyun3@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-07-12 00:45:48 +09:00
Dongjoon Hyun 9c134b57bf [SPARK-32058][BUILD] Use Apache Hadoop 3.2.0 dependency by default
### What changes were proposed in this pull request?

According to the dev mailing list discussion, this PR aims to switch the default Apache Hadoop dependency from 2.7.4 to 3.2.0 for Apache Spark 3.1.0 on December 2020.

| Item | Default Hadoop Dependency |
|------|-----------------------------|
| Apache Spark Website | 3.2.0 |
| Apache Download Site | 3.2.0 |
| Apache Snapshot | 3.2.0 |
| Maven Central | 3.2.0 |
| PyPI | 2.7.4 (We will switch later) |
| CRAN | 2.7.4 (We will switch later) |
| Homebrew | 3.2.0 (already) |

In Apache Spark 3.0.0 release, we focused on the other features. This PR targets for [Apache Spark 3.1.0 scheduled on December 2020](https://spark.apache.org/versioning-policy.html).

### Why are the changes needed?

Apache Hadoop 3.2 has many fixes and new cloud-friendly features.

**Reference**
- 2017-08-04: https://hadoop.apache.org/release/2.7.4.html
- 2019-01-16: https://hadoop.apache.org/release/3.2.0.html

### Does this PR introduce _any_ user-facing change?

Since the default Hadoop dependency changes, the users will get a better support in a cloud environment.

### How was this patch tested?

Pass the Jenkins.

Closes #28897 from dongjoon-hyun/SPARK-32058.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-06-26 19:43:29 -07:00
Gabor Somogyi 2dbfae8775 [SPARK-32049][SQL][TESTS] Upgrade Oracle JDBC Driver 8
### What changes were proposed in this pull request?
`OracleIntegrationSuite` is not using the latest oracle JDBC driver. In this PR I've upgraded the driver to the latest which supports JDK8, JDK9, and JDK11.

### Why are the changes needed?
Old JDBC driver.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Existing unit tests.
Existing integration tests (especially `OracleIntegrationSuite`)

Closes #28893 from gaborgsomogyi/SPARK-32049.

Authored-by: Gabor Somogyi <gabor.g.somogyi@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-06-23 03:58:40 -07:00
William Hyun 2bcbe3dd9a [SPARK-32045][BUILD] Upgrade to Apache Commons Lang 3.10
### What changes were proposed in this pull request?

This PR aims to upgrade to Apache Commons Lang 3.10.

### Why are the changes needed?

This will bring the latest bug fixes like [LANG-1453](https://issues.apache.org/jira/browse/LANG-1453).
https://commons.apache.org/proper/commons-lang/release-notes/RELEASE-NOTES-3.10.txt

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass the Jenkins.

Closes #28889 from williamhyun/commons-lang-3.10.

Authored-by: William Hyun <williamhyun3@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-06-23 16:59:55 +09:00
Gabor Somogyi eeb81200e2 [SPARK-31337][SQL] Support MS SQL Kerberos login in JDBC connector
### What changes were proposed in this pull request?
When loading DataFrames from JDBC datasource with Kerberos authentication, remote executors (yarn-client/cluster etc. modes) fail to establish a connection due to lack of Kerberos ticket or ability to generate it.

This is a real issue when trying to ingest data from kerberized data sources (SQL Server, Oracle) in enterprise environment where exposing simple authentication access is not an option due to IT policy issues.

In this PR I've added MS SQL support.

What this PR contains:
* Added `MSSQLConnectionProvider`
* Added `MSSQLConnectionProviderSuite`
* Changed MS SQL JDBC driver to use the latest (test scope only)
* Changed `MsSqlServerIntegrationSuite` docker image to use the latest
* Added a version comment to `MariaDBConnectionProvider` to increase trackability

### Why are the changes needed?
Missing JDBC kerberos support.

### Does this PR introduce _any_ user-facing change?
Yes, now user is able to connect to MS SQL using kerberos.

### How was this patch tested?
* Additional + existing unit tests
* Existing integration tests
* Test on cluster manually

Closes #28635 from gaborgsomogyi/SPARK-31337.

Authored-by: Gabor Somogyi <gabor.g.somogyi@gmail.com>
Signed-off-by: Marcelo Vanzin <vanzin@apache.org>
2020-06-16 18:22:12 -07:00
Kousuke Saruta 88a4e55fae [SPARK-31765][WEBUI][TEST-MAVEN] Upgrade HtmlUnit >= 2.37.0
### What changes were proposed in this pull request?

This PR upgrades HtmlUnit.
Selenium and Jetty also upgraded because of dependency.
### Why are the changes needed?

Recently, a security issue which affects HtmlUnit is reported.
https://nvd.nist.gov/vuln/detail/CVE-2020-5529
According to the report, arbitrary code can be run by malicious users.
HtmlUnit is used for test so the impact might not be large but it's better to upgrade it just in case.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing testcases.

Closes #28585 from sarutak/upgrade-htmlunit.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
2020-06-11 18:27:53 -05:00
HyukjinKwon baafd4386c Revert "[SPARK-31765][WEBUI] Upgrade HtmlUnit >= 2.37.0"
This reverts commit e5c3463910.
2020-06-03 14:15:30 +09:00
William Hyun 367d94a30d [SPARK-31876][BUILD] Upgrade to Zstd 1.4.5
### What changes were proposed in this pull request?

This PR aims to upgrade to Zstd 1.4.5.

### Why are the changes needed?

Zstd 1.4.5 improves performance.

https://github.com/facebook/zstd/releases/tag/v1.4.5

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Passed the Jenkins.

Closes #28682 from williamhyun/zstd.

Authored-by: William Hyun <williamhyun3@gmail.com>
Signed-off-by: DB Tsai <d_tsai@apple.com>
2020-06-02 21:58:33 +00:00
Kousuke Saruta e5c3463910 [SPARK-31765][WEBUI] Upgrade HtmlUnit >= 2.37.0
### What changes were proposed in this pull request?

This PR upgrades HtmlUnit.
Selenium and Jetty also upgraded because of dependency.
### Why are the changes needed?

Recently, a security issue which affects HtmlUnit is reported.
https://nvd.nist.gov/vuln/detail/CVE-2020-5529
According to the report, arbitrary code can be run by malicious users.
HtmlUnit is used for test so the impact might not be large but it's better to upgrade it just in case.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing testcases.

Closes #28585 from sarutak/upgrade-htmlunit.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
2020-06-02 08:29:07 -05:00
Kousuke Saruta d3eba5bc8c
[SPARK-31756][WEBUI] Add real headless browser support for UI test
### What changes were proposed in this pull request?

This PR mainly adds two things.

1. Real headless browser support for UI test
2. A test suite using headless Chrome as one instance of  those browsers.

Also, for environment where Chrome and Chrome driver is not installed, `ChromeUITest` tag is added to filter out the test suite.
By default, test suites with `ChromeUITest` is disabled.

### Why are the changes needed?

In the current master, there are two problems for UI test.
1. Lots of tests especially JavaScript related ones are done manually.
Appearance is better to be confirmed by our eyes but logic should be tested by test cases ideally.

2. Compared to the real web browsers, HtmlUnit doesn't seem to support JavaScript enough.
I added a JavaScript related test before for SPARK-31534 using HtmlUnit which is simple library based headless browser for test.
The test I added works somehow but some JavaScript related error is shown in unit-tests.log.

```
======= EXCEPTION START ========
Exception class=[net.sourceforge.htmlunit.corejs.javascript.JavaScriptException]
com.gargoylesoftware.htmlunit.ScriptException: Error: TOOLTIP: Option "sanitizeFn" provided type "window" but expected type "(null|function)". (http://192.168.1.209:60724/static/jquery-3.4.1.min.js#2)
        at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:904)
        at net.sourceforge.htmlunit.corejs.javascript.Context.call(Context.java:628)
        at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.call(ContextFactory.java:515)
        at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.callFunction(JavaScriptEngine.java:835)
        at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.callFunction(JavaScriptEngine.java:807)
        at com.gargoylesoftware.htmlunit.InteractivePage.executeJavaScriptFunctionIfPossible(InteractivePage.java:216)
        at com.gargoylesoftware.htmlunit.javascript.background.JavaScriptFunctionJob.runJavaScript(JavaScriptFunctionJob.java:52)
        at com.gargoylesoftware.htmlunit.javascript.background.JavaScriptExecutionJob.run(JavaScriptExecutionJob.java:102)
        at com.gargoylesoftware.htmlunit.javascript.background.JavaScriptJobManagerImpl.runSingleJob(JavaScriptJobManagerImpl.java:426)
        at com.gargoylesoftware.htmlunit.javascript.background.DefaultJavaScriptExecutor.run(DefaultJavaScriptExecutor.java:157)
        at java.lang.Thread.run(Thread.java:748)
Caused by: net.sourceforge.htmlunit.corejs.javascript.JavaScriptException: Error: TOOLTIP: Option "sanitizeFn" provided type "window" but expected type "(null|function)". (http://192.168.1.209:60724/static/jquery-3.4.1.min.js#2)
        at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpretLoop(Interpreter.java:1009)
        at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpret(Interpreter.java:800)
        at net.sourceforge.htmlunit.corejs.javascript.InterpretedFunction.call(InterpretedFunction.java:105)
        at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.doTopCall(ContextFactory.java:413)
        at com.gargoylesoftware.htmlunit.javascript.HtmlUnitContextFactory.doTopCall(HtmlUnitContextFactory.java:252)
        at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.doTopCall(ScriptRuntime.java:3264)
        at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$4.doRun(JavaScriptEngine.java:828)
        at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:889)
        ... 10 more
JavaScriptException value = Error: TOOLTIP: Option "sanitizeFn" provided type "window" but expected type "(null|function)".
== CALLING JAVASCRIPT ==
  function () {
      throw e;
  }
======= EXCEPTION END ========
```
I tried to upgrade HtmlUnit to 2.40.0 but what is worse, the test become not working even though it works on real browsers like Chrome, Safari and Firefox without error.
```
[info] UISeleniumSuite:
[info] - SPARK-31534: text for tooltip should be escaped *** FAILED *** (17 seconds, 745 milliseconds)
[info]   The code passed to eventually never returned normally. Attempted 2 times over 12.910785232 seconds. Last failure message: com.gargoylesoftware.htmlunit.ScriptException: ReferenceError: Assignment to undefined "regeneratorRuntime" in strict mode (http://192.168.1.209:62132/static/vis-timeline-graph2d.min.js#52(Function)#1)
```
To resolve those problems, it's better to support headless browser for UI test.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

I tested with following patterns. Both Chrome and Chrome driver should be installed to test.

1. sbt / with default excluded tags (ChromeUISeleniumSuite is expected to be skipped and SQLQueryTestSuite is expected to succeed)
`build/sbt -Dspark.test.webdriver.chrome.driver=/path/to/chromedriver "testOnly org.apache.spark.ui.ChromeUISeleniumSuite org.apache.spark.sql.SQLQueryTestSuite"

2. sbt / overwrite default excluded tags as empty string (Both suites are expected to succeed)
`build/sbt -Dtest.default.exclude.tags= -Dspark.test.webdriver.chrome.driver=/path/to/chromedriver "testOnly org.apache.spark.ui.ChromeUISeleniumSuite org.apache.spark.sql.SQLQueryTestSuite"

3. sbt / set `test.exclude.tags` to `org.apache.spark.tags.ExtendedSQLTest` (Both suites are expected to be skipped)
`build/sbt -Dtest.exclude.tags=org.apache.spark.tags.ExtendedSQLTest -Dspark.test.webdriver.chrome.driver=/path/to/chromedriver "testOnly org.apache.spark.ui.ChromeUISeleniumSuite org.apache.spark.sql.SQLQueryTestSuite"

4. Maven / with default excluded tags (ChromeUISeleniumSuite is expected to be skipped and SQLQueryTestSuite is expected to succeed)
`build/mvn -Dspark.test.webdriver.chrome.driver=/path/to/chromedriver -Dtest=none -DwildcardSuites=org.apache.spark.ui.ChromeUISeleniumSuite,org.apache.spark.sql.SQLQueryTestSuite test`

5. Maven / overwrite default excluded tags as empty string (Both suites are expected to succeed)
`build/mvn -Dtest.default.exclude.tags= -Dspark.test.webdriver.chrome.driver=/path/to/chromedriver -Dtest=none -DwildcardSuites=org.apache.spark.ui.ChromeUISeleniumSuite,org.apache.spark.sql.SQLQueryTestSuite test`

6. Maven / set `test.exclude.tags` to `org.apache.spark.tags.ExtendedSQLTest` (Both suites are expected to be skipped)
`build/mvn -Dtest.exclude.tags=org.apache.spark.tags.ExtendedSQLTest  -Dspark.test.webdriver.chrome.driver=/path/to/chromedriver -Dtest=none -DwildcardSuites=org.apache.spark.ui.ChromeUISeleniumSuite,org.apache.spark.sql.SQLQueryTestSuite test`

Closes #28627 from sarutak/real-headless-browser-support-take2.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-05-29 10:41:29 -07:00
Dongjoon Hyun 625abca9db [SPARK-31858][BUILD] Upgrade commons-io to 2.5 in Hadoop 3.2 profile
### What changes were proposed in this pull request?

This PR aims to upgrade `commons-io` from 2.4 to 2.5 for Apache Spark 3.1.

### Why are the changes needed?

Since Hadoop 3.1, `commons-io` 2.5 is used.
- https://issues.apache.org/jira/browse/HADOOP-15261

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the Jenkins with Hadoop-3.2 profile.

Maven dependency is verified via `test-dependencies.sh` automatically. SBT dependency can be verified like the following manually.
```
build/sbt -Phadoop-3.2 "core/dependencyTree" | grep commons-io:commons-io | head -n1
[info]   | | +-commons-io:commons-io:2.5
```

Closes #28665 from dongjoon-hyun/SPARK-31858.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-05-29 07:46:53 -07:00
Jungtaek Lim (HeartSaVioR) fe1d1e24bc [SPARK-31214][BUILD] Upgrade Janino to 3.1.2
### What changes were proposed in this pull request?

This PR proposes to upgrade Janino to 3.1.2 which is released recently.

Major changes were done for refactoring, as well as there're lots of "no commit message". Belows are the pairs of (commit title, commit) which seem to deal with some bugs or specific improvements (not purposed to refactor) after 3.0.15.

* Issue #119: Guarantee executing popOperand() in popUninitializedVariableOperand() via moving popOperand() out of "assert"
* Issue #116: replace operand to final target type if boxing conversion and widening reference conversion happen together
* Merged pull request `#114` "Grow the code for relocatables, and do fixup, and relocate".
  * 367c58e73e
* issue `#107`: Janino requires "org.codehaus.commons.compiler.io", but commons-compiler does not export this package
  * f7d99596d4
* Throw an NYI CompileException when a static interface method is invoked.
  * efd3884983
* Fixed the promotion of the array access index expression (see JLS7 15.13 Array Access Expressions)
  * 32fdb5f5f1
* Issue `#104`: ClassLoaderIClassLoader 's ClassNotFoundException handle mechanism enhancement
  * 6e8a97d609

You can see the changelog from the link: http://janino-compiler.github.io/janino/changelog.html

### Why are the changes needed?

We got some report on failure on user's query which Janino throws error on compiling generated code. The issue is here: https://github.com/janino-compiler/janino/issues/113 It contains the information of generated code, symptom (error), and analysis of the bug, so please refer the link for more details.
Janino 3.1.1 contains the PR https://github.com/janino-compiler/janino/pull/114 which would enable Janino to succeed to compile user's query properly. I've also fixed a couple of more bugs as 3.1.1 made Spark UTs fail - hence we need to upgrade to 3.1.2.

Furthermore, from my testing, https://github.com/janino-compiler/janino/issues/90 (which Josh Rosen filed before) seems to be also resolved in 3.1.2 as well.

Looks like Janino is maintained by one person and there's no even version branches and releases/tags so we can't expect Janino maintainer to release a new bugfix version - hence have to try out new minor version.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Existing UTs.

Closes #27860 from HeartSaVioR/SPARK-31101.

Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-05-29 07:42:57 -07:00
Gabor Somogyi 8f1d77488c
[SPARK-31821][BUILD] Remove mssql-jdbc dependencies from Hadoop 3.2 profile
### What changes were proposed in this pull request?
There is an unnecessary dependency for `mssql-jdbc`. In this PR I've removed it.

### Why are the changes needed?
Unnecessary dependency.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?

Pass the Jenkins with the following configuration.
- [x] Pass the dependency test.
- [x] SBT with Hadoop-3.2 (https://github.com/apache/spark/pull/28640#issuecomment-634192512)
- [ ] Maven with Hadoop-3.2

Closes #28640 from gaborgsomogyi/SPARK-31821.

Authored-by: Gabor Somogyi <gabor.g.somogyi@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-05-26 18:21:22 -07:00
Kousuke Saruta 8441e936fc Revert "[SPARK-31756][WEBUI] Add real headless browser support for UI test
This reverts commit d95570864a.

Closes #28624 from sarutak/revert-real-headless-browser-support.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Kousuke Saruta <sarutak@oss.nttdata.com>
2020-05-24 09:13:38 +09:00
Kousuke Saruta d95570864a [SPARK-31756][WEBUI] Add real headless browser support for UI test
### What changes were proposed in this pull request?

This PR mainly adds two things.

1. Real headless browser support for UI test
2. A test suite using headless Chrome as one instance of  those browsers.

Also, for environment where Chrome and Chrome driver is not installed, `ChromeUITest` tag is added to filter out the test suite.

### Why are the changes needed?

In the current master, there are two problems for UI test.
1. Lots of tests especially JavaScript related ones are done manually.
Appearance is better to be confirmed by our eyes but logic should be tested by test cases ideally.

2. Compared to the real web browsers, HtmlUnit doesn't seem to support JavaScript enough.
I added a JavaScript related test before for SPARK-31534 using HtmlUnit which is simple library based headless browser for test.
The test I added works somehow but some JavaScript related error is shown in unit-tests.log.

```
======= EXCEPTION START ========
Exception class=[net.sourceforge.htmlunit.corejs.javascript.JavaScriptException]
com.gargoylesoftware.htmlunit.ScriptException: Error: TOOLTIP: Option "sanitizeFn" provided type "window" but expected type "(null|function)". (http://192.168.1.209:60724/static/jquery-3.4.1.min.js#2)
        at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:904)
        at net.sourceforge.htmlunit.corejs.javascript.Context.call(Context.java:628)
        at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.call(ContextFactory.java:515)
        at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.callFunction(JavaScriptEngine.java:835)
        at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.callFunction(JavaScriptEngine.java:807)
        at com.gargoylesoftware.htmlunit.InteractivePage.executeJavaScriptFunctionIfPossible(InteractivePage.java:216)
        at com.gargoylesoftware.htmlunit.javascript.background.JavaScriptFunctionJob.runJavaScript(JavaScriptFunctionJob.java:52)
        at com.gargoylesoftware.htmlunit.javascript.background.JavaScriptExecutionJob.run(JavaScriptExecutionJob.java:102)
        at com.gargoylesoftware.htmlunit.javascript.background.JavaScriptJobManagerImpl.runSingleJob(JavaScriptJobManagerImpl.java:426)
        at com.gargoylesoftware.htmlunit.javascript.background.DefaultJavaScriptExecutor.run(DefaultJavaScriptExecutor.java:157)
        at java.lang.Thread.run(Thread.java:748)
Caused by: net.sourceforge.htmlunit.corejs.javascript.JavaScriptException: Error: TOOLTIP: Option "sanitizeFn" provided type "window" but expected type "(null|function)". (http://192.168.1.209:60724/static/jquery-3.4.1.min.js#2)
        at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpretLoop(Interpreter.java:1009)
        at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpret(Interpreter.java:800)
        at net.sourceforge.htmlunit.corejs.javascript.InterpretedFunction.call(InterpretedFunction.java:105)
        at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.doTopCall(ContextFactory.java:413)
        at com.gargoylesoftware.htmlunit.javascript.HtmlUnitContextFactory.doTopCall(HtmlUnitContextFactory.java:252)
        at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.doTopCall(ScriptRuntime.java:3264)
        at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$4.doRun(JavaScriptEngine.java:828)
        at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:889)
        ... 10 more
JavaScriptException value = Error: TOOLTIP: Option "sanitizeFn" provided type "window" but expected type "(null|function)".
== CALLING JAVASCRIPT ==
  function () {
      throw e;
  }
======= EXCEPTION END ========
```
I tried to upgrade HtmlUnit to 2.40.0 but what is worse, the test become not working even though it works on real browsers like Chrome, Safari and Firefox without error.
```
[info] UISeleniumSuite:
[info] - SPARK-31534: text for tooltip should be escaped *** FAILED *** (17 seconds, 745 milliseconds)
[info]   The code passed to eventually never returned normally. Attempted 2 times over 12.910785232 seconds. Last failure message: com.gargoylesoftware.htmlunit.ScriptException: ReferenceError: Assignment to undefined "regeneratorRuntime" in strict mode (http://192.168.1.209:62132/static/vis-timeline-graph2d.min.js#52(Function)#1)
```
To resolve those problems, it's better to support headless browser for UI test.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

I tested with following patterns. Both Chrome and Chrome driver should be installed to test.

1. sbt / with chromedriver / include tag (expect to succeed)
`build/sbt -Dspark.test.webdriver.chrome.driver=/path/to/chromedriver "testOnly org.apache.spark.ui.ChromeUISeleniumSuite"`
2. sbt / with chromedriver / exclude tag (expect to be ignored)
`build/sbt -Dspark.test.webdriver.chrome.driver=/path/to/chromedriver "testOnly org.apache.spark.ui.ChromeUISeleniumSuite -l org.apache.spark.tags.ChromeUITest"`
3. sbt / without chromedriver / include tag (expect to be failed)
`build/sbt  "testOnly org.apache.spark.ui.ChromeUISeleniumSuite"`
4. sbt / without chromedriver / exclude tag (expect to be skipped)
`build/sbt  -Dtest.exclude.tags=org.apache.spark.tags.ChromeUITest "testOnly org.apache.spark.ui.ChromeUISeleniumSuite"`
5. Maven / wth chromedriver / include tag (expect to succeed)
`build/mvn -Dspark.test.webdriver.chrome.driver=/path/to/chromedriver -Dtest=none -DwildcardSuites=org.apache.spark.ui.ChromeUISeleniumSuite test`
6. Maven / with chromedriver / exclude tag (expect to be skipped)
`build/mvn -Dtest.exclude.tags="org.apache.spark.tags.ChromeUITest" -Dspark.test.webdriver.chrome.driver=/path/to/chromedriver -Dtest=none -DwildcardSuites=org.apache.spark.ui.ChromeUISeleniumSuite test`
7. Maven / without chromedriver / include tag (expect to be failed)
`build/mvn -Dtest=none -DwildcardSuites=org.apache.spark.ui.ChromeUISeleniumSuite test`
8. Maven / without chromedriver / exclude tag (expect to be skipped)
`build/mvn -Dtest.exclude.tags=org.apache.spark.tags.ChromeUITest  -Dtest=none -DwildcardSuites=org.apache.spark.ui.ChromeUISeleniumSuite test`

Closes #28578 from sarutak/real-headless-browser-support.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
2020-05-22 08:24:31 -05:00
Gengliang Wang db5e5fce68 Revert "[SPARK-31765][WEBUI] Upgrade HtmlUnit >= 2.37.0"
This reverts commit 92877c4ef2.

Closes #28602 from gengliangwang/revertSPARK-31765.

Authored-by: Gengliang Wang <gengliang.wang@databricks.com>
Signed-off-by: Gengliang Wang <gengliang.wang@databricks.com>
2020-05-21 16:00:58 -07:00
Kousuke Saruta 92877c4ef2 [SPARK-31765][WEBUI] Upgrade HtmlUnit >= 2.37.0
### What changes were proposed in this pull request?

This PR upgrades HtmlUnit.
Selenium and Jetty also upgraded because of dependency.
### Why are the changes needed?

Recently, a security issue which affects HtmlUnit is reported.
https://nvd.nist.gov/vuln/detail/CVE-2020-5529
According to the report, arbitrary code can be run by malicious users.
HtmlUnit is used for test so the impact might not be large but it's better to upgrade it just in case.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing testcases.

Closes #28585 from sarutak/upgrade-htmlunit.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Gengliang Wang <gengliang.wang@databricks.com>
2020-05-21 11:43:25 -07:00
angerszhu 0d9faf602e
[SPARK-31655][BUILD] Upgrade snappy-java to 1.1.7.5
### What changes were proposed in this pull request?

snappy-java have release v1.1.7.5, upgrade to latest version.

Fixed in v1.1.7.4
- Caching internal buffers for SnappyFramed streams #234
- Fixed the native lib for ppc64le to work with glibc 2.17 (Previously it depended on 2.22)

Fixed in v1.1.7.5
- Fixes java.lang.NoClassDefFoundError: org/xerial/snappy/pool/DefaultPoolFactory in 1.1.7.4

https://github.com/xerial/snappy-java/compare/1.1.7.3...1.1.7.5

v 1.1.7.5 release note:
edc4ec28bd
### Why are the changes needed?
Fix bug

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
No need

Closes #28472 from AngersZhuuuu/spark-31655.

Authored-by: angerszhu <angers.zhu@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-05-07 12:01:43 -07:00
Steve Loughran 86c4e43525
[SPARK-31644][BUILD] Make Spark's guava version configurable from the command line
### What changes were proposed in this pull request?

This adds the maven property guava.version which can be
used to control the guava version for a build.

It does not change the current version.

### Why are the changes needed?

All future Hadoop releases are going to be built with a later guava version, including Hadoop 3.1.4. This means to run the spark tests with that release you need to update the spark guava version. This patch lets whoever builds spark do this locally.

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

ran the hadoop-cloud module tests with the 3.1.4 RC0

```
mvn -T 1  -Phadoop-3.2 -Dhadoop.version=3.1.4 -Psnapshots-and-staging -Phadoop-cloud,yarn,kinesis-asl test --pl hadoop-cloud
```

observed the linkage problem

```
  java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
  at org.apache.hadoop.conf.Configuration.set(Configuration.java:1357)
```

made the version configurable, retested with

```
-Phadoop-3.2 -Dhadoop.version=3.1.4 -Psnapshots-and-staging Dguava.version=27.0-jre
```

all good.

Closes #28455 from steveloughran/SPARK-31644-guava-version.

Authored-by: Steve Loughran <stevel@cloudera.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-05-05 12:17:24 -07:00
Dongjoon Hyun e7995c2ddc
[SPARK-31633][BUILD] Upgrade SLF4J from 1.7.16 to 1.7.30
### What changes were proposed in this pull request?

This PR aims to upgrade SLF4J from 1.7.16 to 1.7.30.

### Why are the changes needed?

SLF4J 1.7.23+ is required to enable `slf4j-log4j12` with MDC feature to run under Java 9. Also, this will bring all latest bug fixes.
- http://www.slf4j.org/news.html

> When running under Java 9, log4j version 1.2.x is unable to correctly parse the "java.version" system property. Assuming an inccorect Java version, it proceeded to disable its MDC functionality. The slf4j-log4j12 module shipping in this release fixes the issue by tweaking MDC internals by reflection, allowing log4j to run under Java 9. See also SLF4J-393.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the Jenkins with the existing tests.

Closes #28446 from dongjoon-hyun/SPARK-31633.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-05-04 08:14:12 -07:00
Dongjoon Hyun 79eaaaf6da
[SPARK-31580][BUILD] Upgrade Apache ORC to 1.5.10
### What changes were proposed in this pull request?

This PR aims to upgrade Apache ORC to 1.5.10.

### Why are the changes needed?

Apache ORC 1.5.10 is a maintenance release with the following patches.

- [ORC-621](https://issues.apache.org/jira/browse/ORC-621) Need reader fix for ORC-569
- [ORC-616](https://issues.apache.org/jira/browse/ORC-616) In Patched Base encoding, the value of headerThirdByte goes beyond the range of byte
- [ORC-613](https://issues.apache.org/jira/browse/ORC-613) OrcMapredRecordReader mis-reuse struct object when actual children schema differs
- [ORC-610](https://issues.apache.org/jira/browse/ORC-610) Updated Copyright year in the NOTICE file

The following is release note.
- https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12318320&version=12346912

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Pass the Jenkins with the existing ORC tests and a newly added test case.

- The first commit is already tested in `hive-2.3` profile with both native ORC implementation and Hive 2.3 ORC implementation. (https://github.com/apache/spark/pull/28373#issuecomment-620265114)
- The latest run is about to make the test case disable in `hive-1.2` profile which doesn't use Apache ORC.
  - `hive-1.2`: https://github.com/apache/spark/pull/28373#issuecomment-620325906

Closes #28373 from dongjoon-hyun/SPARK-ORC-1.5.10.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-04-27 18:56:30 -07:00
Gabor Somogyi c619990c1d [SPARK-31272][SQL] Support DB2 Kerberos login in JDBC connector
### What changes were proposed in this pull request?
When loading DataFrames from JDBC datasource with Kerberos authentication, remote executors (yarn-client/cluster etc. modes) fail to establish a connection due to lack of Kerberos ticket or ability to generate it.

This is a real issue when trying to ingest data from kerberized data sources (SQL Server, Oracle) in enterprise environment where exposing simple authentication access is not an option due to IT policy issues.

In this PR I've added DB2 support (other supported databases will come in later PRs).

What this PR contains:
* Added `DB2ConnectionProvider`
* Added `DB2ConnectionProviderSuite`
* Added `DB2KrbIntegrationSuite` docker integration test
* Changed DB2 JDBC driver to use the latest (test scope only)
* Changed test table data type to a type which is supported by all the databases
* Removed double connection creation on test side
* Increased connection timeout in docker tests because DB2 docker takes quite a time to start

### Why are the changes needed?
Missing JDBC kerberos support.

### Does this PR introduce any user-facing change?
Yes, now user is able to connect to DB2 using kerberos.

### How was this patch tested?
* Additional + existing unit tests
* Additional + existing integration tests
* Test on cluster manually

Closes #28215 from gaborgsomogyi/SPARK-31272.

Authored-by: Gabor Somogyi <gabor.g.somogyi@gmail.com>
Signed-off-by: Marcelo Vanzin <vanzin@apache.org>
2020-04-22 17:10:30 -07:00
Yuming Wang b11e42663b
[SPARK-31381][SPARK-29245][SQL] Upgrade built-in Hive 2.3.6 to 2.3.7
### What changes were proposed in this pull request?

**Hive 2.3.7** fixed these issues:
- HIVE-21508: ClassCastException when initializing HiveMetaStoreClient on JDK10 or newer
- HIVE-21980:Parsing time can be high in case of deeply nested subqueries
- HIVE-22249: Support Parquet through HCatalog

### Why are the changes needed?
Fix CCE during creating HiveMetaStoreClient in JDK11 environment: [SPARK-29245](https://issues.apache.org/jira/browse/SPARK-29245).

### Does this PR introduce any user-facing change?
No.

### How was this patch tested?

- [x] Test Jenkins with Hadoop 2.7 (https://github.com/apache/spark/pull/28148#issuecomment-616757840)
- [x] Test Jenkins with Hadoop 3.2 on JDK11 (https://github.com/apache/spark/pull/28148#issuecomment-616294353)
- [x] Manual test with remote hive metastore.

Hive side:

```
export JAVA_HOME=/usr/lib/jdk1.8.0_221
export PATH=$JAVA_HOME/bin:$PATH
cd /usr/lib/hive-2.3.6 # Start Hive metastore with Hive 2.3.6
bin/schematool -dbType derby -initSchema --verbose
bin/hive --service metastore
```

Spark side:

```
export JAVA_HOME=/usr/lib/jdk-11.0.3
export PATH=$JAVA_HOME/bin:$PATH
build/sbt clean package -Phive -Phadoop-3.2 -Phive-thriftserver
export SPARK_PREPEND_CLASSES=true
bin/spark-sql --conf spark.hadoop.hive.metastore.uris=thrift://localhost:9083
```

Closes #28148 from wangyum/SPARK-31381.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-04-20 13:38:24 -07:00
Dongjoon Hyun c6e39dffd6
[SPARK-31464][BUILD][SS] Upgrade Kafka to 2.5.0
### What changes were proposed in this pull request?

This PR aims to upgrade Kafka library to 2.5.0 for Apache Spark 3.1.0.

### Why are the changes needed?

Apache Kafka 2.5.0 client has improvements and bug fixes like [KAFKA-9241](https://issues.apache.org/jira/browse/KAFKA-9241)
- https://downloads.apache.org/kafka/2.5.0/RELEASE_NOTES.html

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Pass the Jenkins with the existing tests.

- [x] SBT https://github.com/apache/spark/pull/28235#issuecomment-615936382
- [x] Maven https://github.com/apache/spark/pull/28235#issuecomment-616138840 (All Scala/Java/Python/R UT tests passed. It's timeout during R installation testing which is already covered by SBT.)

Closes #28235 from dongjoon-hyun/SPARK-KAFKA-2.5.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-04-19 10:51:09 -07:00
Gabor Somogyi 1354d2d0de [SPARK-31021][SQL] Support MariaDB Kerberos login in JDBC connector
### What changes were proposed in this pull request?
When loading DataFrames from JDBC datasource with Kerberos authentication, remote executors (yarn-client/cluster etc. modes) fail to establish a connection due to lack of Kerberos ticket or ability to generate it.

This is a real issue when trying to ingest data from kerberized data sources (SQL Server, Oracle) in enterprise environment where exposing simple authentication access is not an option due to IT policy issues.

In this PR I've added MariaDB support (other supported databases will come in later PRs).

What this PR contains:
* Introduced `SecureConnectionProvider` and added basic secure functionalities
* Added `MariaDBConnectionProvider`
* Added `MariaDBConnectionProviderSuite`
* Added `MariaDBKrbIntegrationSuite` docker integration test
* Added some missing code documentation

### Why are the changes needed?
Missing JDBC kerberos support.

### Does this PR introduce any user-facing change?
Yes, now user is able to connect to MariaDB using kerberos.

### How was this patch tested?
* Additional + existing unit tests
* Additional + existing integration tests
* Test on cluster manually

Closes #28019 from gaborgsomogyi/SPARK-31021.

Authored-by: Gabor Somogyi <gabor.g.somogyi@gmail.com>
Signed-off-by: Marcelo Vanzin <vanzin@apache.org>
2020-04-09 09:20:02 -07:00
Jungtaek Lim (HeartSaVioR) f55f6b569b
[SPARK-31101][BUILD] Upgrade Janino to 3.0.16
### What changes were proposed in this pull request?

This PR(SPARK-31101) proposes to upgrade Janino to 3.0.16 which is released recently.

* Merged pull request janino-compiler/janino#114 "Grow the code for relocatables, and do fixup, and relocate".

Please see the commit log.
- https://github.com/janino-compiler/janino/commits/3.0.16

You can see the changelog from the link: http://janino-compiler.github.io/janino/changelog.html / though release note for Janino 3.0.16 is actually incorrect.

### Why are the changes needed?

We got some report on failure on user's query which Janino throws error on compiling generated code. The issue is here: janino-compiler/janino#113 It contains the information of generated code, symptom (error), and analysis of the bug, so please refer the link for more details.
Janino 3.0.16 contains the PR janino-compiler/janino#114 which would enable Janino to succeed to compile user's query properly.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Existing UTs.

Closes #27932 from HeartSaVioR/SPARK-31101-janino-3.0.16.

Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-03-21 19:10:23 -07:00
Gabor Somogyi b0d2956a35
[SPARK-31135][BUILD][TESTS] Upgrdade docker-client version to 8.14.1
### What changes were proposed in this pull request?
Upgrdade `docker-client` version.

### Why are the changes needed?
`docker-client` what Spark uses is super old. Snippet from the project page:
```
Spotify no longer uses recent versions of this project internally.
The version of docker-client we're using is whatever helios has in its pom.xml. => 8.14.1
```

### Does this PR introduce any user-facing change?
No.

### How was this patch tested?
```
build/mvn install -DskipTests
build/mvn -Pdocker-integration-tests -pl :spark-docker-integration-tests_2.12 -Dtest=none -DwildcardSuites=org.apache.spark.sql.jdbc.DB2IntegrationSuite test`
build/mvn -Pdocker-integration-tests -pl :spark-docker-integration-tests_2.12 -Dtest=none -DwildcardSuites=org.apache.spark.sql.jdbc.MsSqlServerIntegrationSuite test`
build/mvn -Pdocker-integration-tests -pl :spark-docker-integration-tests_2.12 -Dtest=none -DwildcardSuites=org.apache.spark.sql.jdbc.PostgresIntegrationSuite test`
```

Closes #27892 from gaborgsomogyi/docker-client.

Authored-by: Gabor Somogyi <gabor.g.somogyi@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-03-15 23:55:04 -07:00
Dongjoon Hyun 614323d326
[SPARK-31126][SS] Upgrade Kafka to 2.4.1
### What changes were proposed in this pull request?

This PR (SPARK-31126) aims to upgrade Kafka library to bring a client-side bug fix like KAFKA-8933

### Why are the changes needed?

The following is the full release note.
- https://downloads.apache.org/kafka/2.4.1/RELEASE_NOTES.html

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

Pass the Jenkins with the existing test.

Closes #27881 from dongjoon-hyun/SPARK-KAFKA-2.4.1.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-03-11 19:26:15 -07:00
Dongjoon Hyun 93def95b08
[SPARK-31095][BUILD] Upgrade netty-all to 4.1.47.Final
### What changes were proposed in this pull request?

This PR aims to bring the bug fixes from the latest netty-all.

### Why are the changes needed?

- 4.1.47.Final: https://github.com/netty/netty/milestone/222?closed=1 (15 patches or issues)
- 4.1.46.Final: https://github.com/netty/netty/milestone/221?closed=1 (80 patches or issues)
- 4.1.45.Final: https://github.com/netty/netty/milestone/220?closed=1 (23 patches or issues)
- 4.1.44.Final: https://github.com/netty/netty/milestone/218?closed=1 (113 patches or issues)
- 4.1.43.Final: https://github.com/netty/netty/milestone/217?closed=1 (63 patches or issues)

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Pass the Jenkins with the existing tests.

Closes #27869 from dongjoon-hyun/SPARK-31095.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-03-10 17:50:34 -07:00
HyukjinKwon 5b3277f4fc [SPARK-30994][BUILD][FOLLOW-UP] Change scope of xml-apis to include it and add xerces in SBT as dependency override
### What changes were proposed in this pull request?

This PR propose

1. Explicitly include xml-apis. xml-apis is already the part of xerces 2.12.0 (https://repo1.maven.org/maven2/xerces/xercesImpl/2.12.0/xercesImpl-2.12.0.pom). However, we're excluding it by setting `scope` to `test`. This seems causing `spark-shell`, built from Maven, to fail.

    Seems like previously xml-apis wasn't reached for some reasons but after we upgrade, it seems requiring. Therefore, this PR proposes to include it.

2. Pins `xerces` version in SBT as well. Seems this dependency is resolved differently from Maven.

Note that Hadoop 3 does not looks requiring this as they replaced xerces as of [HDFS-12221](https://issues.apache.org/jira/browse/HDFS-12221).

### Why are the changes needed?

To make `spark-shell` working from Maven build, and uses the same xerces version.

### Does this PR introduce any user-facing change?

No, it's master only.

### How was this patch tested?

**1.**

```bash
./build/mvn -DskipTests -Psparkr -Phive clean package
./bin/spark-shell
```

Before:

```
Exception in thread "main" java.lang.NoClassDefFoundError: org/w3c/dom/ElementTraversal
	at java.lang.ClassLoader.defineClass1(Native Method)
	at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
	at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
	at java.net.URLClassLoader.defineClass(URLClassLoader.java:468)
	at java.net.URLClassLoader.access$100(URLClassLoader.java:74)
	at java.net.URLClassLoader$1.run(URLClassLoader.java:369)
	at java.net.URLClassLoader$1.run(URLClassLoader.java:363)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.net.URLClassLoader.findClass(URLClassLoader.java:362)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	at org.apache.xerces.parsers.AbstractDOMParser.startDocument(Unknown Source)
	at org.apache.xerces.xinclude.XIncludeHandler.startDocument(Unknown Source)
	at org.apache.xerces.impl.dtd.XMLDTDValidator.startDocument(Unknown Source)
	at org.apache.xerces.impl.XMLDocumentScannerImpl.startEntity(Unknown Source)
	at org.apache.xerces.impl.XMLVersionDetector.startDocumentParsing(Unknown Source)
	at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
	at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
	at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
	at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
	at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)
	at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:150)
	at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2482)
	at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2470)
	at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2541)
	at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2494)
	at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2407)
	at org.apache.hadoop.conf.Configuration.set(Configuration.java:1143)
	at org.apache.hadoop.conf.Configuration.set(Configuration.java:1115)
	at org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456)
	at org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427)
	at org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342)
	at scala.Option.getOrElse(Option.scala:189)
	at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342)
	at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871)
	at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
	at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
	at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
	at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.w3c.dom.ElementTraversal
	at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	... 42 more
```

After:

```
...
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 3.1.0-SNAPSHOT
      /_/

Using Scala version 2.12.10 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_202)
Type in expressions to have them evaluated.
Type :help for more information.

scala>
```

**2.**

```
./build/sbt dependencyTree -Phadoop-2.7 -Phive-2.3 -Phive-thriftserver -Phive
./build/sbt dependencyTree -Phadoop-3.2 -Phive-2.3 -Phive-thriftserver -Phive
```

Closes #27808 from HyukjinKwon/SPARK-30994.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-03-06 09:39:02 +09:00
Sean Owen 97d9a22b04 [SPARK-30994][CORE] Update xerces to 2.12.0
### What changes were proposed in this pull request?

Manage up the version of Xerces that Hadoop uses (and potentially user apps) to 2.12.0 to match https://issues.apache.org/jira/browse/HADOOP-16530

### Why are the changes needed?

Picks up bug and security fixes: https://www.xml.com/news/2018-05-apache-xerces-j-2120/

### Does this PR introduce any user-facing change?

Should be no behavior changes.

### How was this patch tested?

Existing tests.

Closes #27746 from srowen/SPARK-30994.

Authored-by: Sean Owen <srowen@gmail.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
2020-03-03 09:27:18 -06:00
Dongjoon Hyun 3995728c3c [SPARK-30968][BUILD] Upgrade aws-java-sdk-sts to 1.11.655
### What changes were proposed in this pull request?

This PR aims to upgrade `aws-java-sdk-sts` to `1.11.655`.

### Why are the changes needed?

[SPARK-29677](https://github.com/apache/spark/pull/26333) upgrades AWS Kinesis Client to 1.12.0 for Apache Spark 2.4.5 and 3.0.0.

Since AWS Kinesis Client 1.12.0 is using AWS SDK 1.11.665, `aws-java-sdk-sts` should be consistent with Kinesis client dependency.
- https://github.com/awslabs/amazon-kinesis-client/releases/tag/v1.12.0

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Pass the Jenkins.

Closes #27720 from dongjoon-hyun/SPARK-30968.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2020-02-27 17:05:56 -08:00
gatorsmile 28b8713036 [SPARK-30950][BUILD] Setting version to 3.1.0-SNAPSHOT
### What changes were proposed in this pull request?
This patch is to bump the master branch version to 3.1.0-SNAPSHOT.

### Why are the changes needed?
N/A

### Does this PR introduce any user-facing change?
N/A

### How was this patch tested?
N/A

Closes #27698 from gatorsmile/updateVersion.

Authored-by: gatorsmile <gatorsmile@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2020-02-25 19:44:31 -08:00
Josh Rosen f152d2a0a8 [SPARK-30944][BUILD] Update URL for Google Cloud Storage mirror of Maven Central
### What changes were proposed in this pull request?

This PR is a followup to #27307: per https://travis-ci.community/t/maven-builds-that-use-the-gcs-maven-central-mirror-should-update-their-paths/5926, the Google Cloud Storage mirror of Maven Central has updated its URLs: the new paths are updated more frequently. The new paths are listed on https://storage-download.googleapis.com/maven-central/index.html

This patch updates our build files to use these new URLs.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Existing build + tests.

Closes #27688 from JoshRosen/update-gcs-mirror-url.

Authored-by: Josh Rosen <joshrosen@databricks.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-02-25 17:04:13 +09:00
Yin Huai ea626b6acf [SPARK-30783] Exclude hive-service-rpc
### What changes were proposed in this pull request?
Exclude hive-service-rpc from build.

### Why are the changes needed?
hive-service-rpc 2.3.6 and spark sql's thrift server module have duplicate classes. Leaving hive-service-rpc 2.3.6 in the class path means that spark can pick up classes defined in hive instead of its thrift server module, which can cause hard to debug runtime errors due to class loading order and compilation errors for applications depend on spark.

 If you compare hive-service-rpc 2.3.6's jar (https://search.maven.org/remotecontent?filepath=org/apache/hive/hive-service-rpc/2.3.6/hive-service-rpc-2.3.6.jar) and spark thrift server's jar (e.g. https://repository.apache.org/content/groups/snapshots/org/apache/spark/spark-hive-thriftserver_2.12/3.0.0-SNAPSHOT/spark-hive-thriftserver_2.12-3.0.0-20200207.021914-364.jar), you will see that all of classes provided by hive-service-rpc-2.3.6.jar are covered by spark thrift server's jar. https://issues.apache.org/jira/browse/SPARK-30783 has output of jar tf for both jars.

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
Existing tests.

Closes #27533 from yhuai/SPARK-30783.

Authored-by: Yin Huai <yhuai@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2020-02-12 00:12:45 +08:00
Dongjoon Hyun 41bdb7ad39 [SPARK-30718][BUILD] Exclude jdk.tools dependency from hadoop-yarn-api
### What changes were proposed in this pull request?

This PR removes the `jdk.tools:jdk.tools` transitive dependency from `hadoop-yarn-api`.
- This is only used in `hadoop-annotation` project in some `*Doclet.java`.

### Why are the changes needed?

Although this is not used in Apache Spark, this can cause a resolve failure in JDK11 environment.

<img width="530" alt="jdk tools" src="https://user-images.githubusercontent.com/9700541/73697745-2f3f4080-4694-11ea-95a7-228638e31cf7.png">

### Does this PR introduce any user-facing change?

No. This is a dev-only change.
From developers, this will remove the `Cannot resolve` error in IDE environment.

### How was this patch tested?

- Pass the Jenkins in JDK8
- Manually, import the project with JDK11.

Closes #27445 from dongjoon-hyun/SPARK-30718.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2020-02-03 19:57:16 -08:00
jiaan geng 35380958b8 [SPARK-30698][BUILD] Bumps checkstyle from 8.25 to 8.29
### What changes were proposed in this pull request?
I found checkstyle have a new release https://checkstyle.org/releasenotes.html#Release_8.29
Bumps checkstyle from 8.25 to 8.29.

### Why are the changes needed?
I have bump  checkstyle from 8.25 to 8.29 on my fork branch and test to build.
It's OK

8.29 added some new features:

- New Check: AvoidNoArgumentSuperConstructorCall.

- New Check NoEnumTrailingComma.

- ENUM_DEF token support in RightCurlyCheck.

- FallThrough module does not support the spelling "fall-through" by default.

8.29 fix some bugs:

- Java 8 Grammar: annotations on varargs parameters.

- Sonar violation: Disable XML external entity (XXE) processing.

- Disable instantiation of modules with private ctor.

- Sonar violation: "ThreadLocal" variables should be cleaned up when no longer used.

- Indentation incorrect level for chained method with bracket on new line.

- InvalidJavadocPosition: false positive when comment is between javadoc and package.

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
No UT

Closes #27426 from beliefer/bump-checkstyle.

Authored-by: jiaan geng <jiaan.geng@jiaandeMacBook-Air.local>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2020-01-31 21:14:11 -08:00
Dongjoon Hyun 2fd15a26fb [SPARK-30695][BUILD] Upgrade Apache ORC to 1.5.9
### What changes were proposed in this pull request?

This PR aims to upgrade to Apache ORC 1.5.9.
- For `hive-2.3` profile, we need to upgrade `hive-storage-api` from `2.6.0` to `2.7.1`.
- For `hive-1.2` profile, ORC library with classifier `nohive` already shaded it. So, there is no change.

### Why are the changes needed?

This will bring the latest bug fixes. The following is the full release note.
- https://issues.apache.org/jira/projects/ORC/versions/12346546

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Pass the Jenkins with the existing tests.

Here is the summary.
1. `Hive 1.2 + Hadoop 2.7` passed. ([here](https://github.com/apache/spark/pull/27421#issuecomment-580924552))
2. `Hive 2.3 + Hadoop 2.7` passed. ([here](https://github.com/apache/spark/pull/27421#issuecomment-580973391))

Closes #27421 from dongjoon-hyun/SPARK-ORC-1.5.9.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2020-01-31 17:41:27 -08:00
Dongjoon Hyun 862959747e [SPARK-30639][BUILD] Upgrade Jersey to 2.30
### What changes were proposed in this pull request?

For better JDK11 support, this PR aims to upgrade **Jersey** and **javassist** to `2.30` and `3.35.0-GA` respectively.

### Why are the changes needed?

**Jersey**: This will bring the following `Jersey` updates.
- https://eclipse-ee4j.github.io/jersey.github.io/release-notes/2.30.html
  - https://github.com/eclipse-ee4j/jersey/issues/4245 (Java 11 java.desktop module dependency)

**javassist**: This is a transitive dependency from 3.20.0-CR2 to 3.25.0-GA.
- `javassist` officially supports JDK11 from [3.24.0-GA release note](https://github.com/jboss-javassist/javassist/blob/master/Readme.html#L308).

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Pass the Jenkins with both JDK8 and JDK11.

Closes #27357 from dongjoon-hyun/SPARK-30639.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2020-01-25 15:41:55 -08:00
cody koeninger 843224ebd4 [SPARK-30570][BUILD] Update scalafmt plugin to 1.0.3 with onlyChangedFiles feature
### What changes were proposed in this pull request?
Update the scalafmt plugin to 1.0.3 and use its new onlyChangedFiles feature rather than --diff

### Why are the changes needed?
Older versions of the plugin either didn't work with scala 2.13, or got rid of the --diff argument and didn't allow for formatting only changed files

### Does this PR introduce any user-facing change?
The /dev/scalafmt script no longer passes through arbitrary args, instead using the arg to select scala version.  The issue here is the plugin name literally contains the scala version, and doesn't appear to have a shorter way to refer to it.   If srowen or someone else with better maven-fu has an idea I'm all ears.

### How was this patch tested?
Manually, e.g. edited a file and ran

dev/scalafmt

or

dev/scalafmt 2.13

Closes #27279 from koeninger/SPARK-30570.

Authored-by: cody koeninger <cody@koeninger.org>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2020-01-23 12:44:43 -08:00
HyukjinKwon cd9ccdc0ac [SPARK-30601][BUILD] Add a Google Maven Central as a primary repository
### What changes were proposed in this pull request?

This PR proposes to address four things. Three issues and fixes were a bit mixed so this PR sorts it out. See also http://apache-spark-developers-list.1001551.n3.nabble.com/Adding-Maven-Central-mirror-from-Google-to-the-build-td28728.html for the discussion in the mailing list.

1. Add the Google Maven Central mirror (GCS) as a primary repository. This will not only help development more stable but also in order to make Github Actions build (where it is always required to download jars) stable. In case of Jenkins PR builder, it wouldn't be affected too much as it uses the pre-downloaded jars under `.m2`.

    - Google Maven Central seems stable for heavy workload but not synced very quickly (e.g., new release is missing)
    - Maven Central (default) seems less stable but synced quickly.

    We already added this GCS mirror as a default additional remote repository at SPARK-29175. So I don't see an issue to add it as a repo.
    abf759a91e/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala (L2111-L2118)

2. Currently, we have the hard-corded repository in [`sbt-pom-reader`](https://github.com/JoshRosen/sbt-pom-reader/blob/v1.0.0-spark/src/main/scala/com/typesafe/sbt/pom/MavenPomResolver.scala#L32) and this seems overwriting Maven's existing resolver by the same ID `central` with `http://` when initially the pom file is ported into SBT instance. This uses `http://` which latently Maven Central disallowed (see https://github.com/apache/spark/pull/27242)

    My speculation is that we just need to be able to load plugin and let it convert POM to SBT instance with another fallback repo. After that, it _seems_ using `central` with `https` properly. See also https://github.com/apache/spark/pull/27307#issuecomment-576720395.

    I double checked that we use `https` properly from the SBT build as well:

    ```
    [debug] downloading https://repo1.maven.org/maven2/com/etsy/sbt-checkstyle-plugin_2.10_0.13/3.1.1/sbt-checkstyle-plugin-3.1.1.pom ...
    [debug] 	public: downloading https://repo1.maven.org/maven2/com/etsy/sbt-checkstyle-plugin_2.10_0.13/3.1.1/sbt-checkstyle-plugin-3.1.1.pom
    [debug] 	public: downloading https://repo1.maven.org/maven2/com/etsy/sbt-checkstyle-plugin_2.10_0.13/3.1.1/sbt-checkstyle-plugin-3.1.1.pom.sha1
    ```

    This was fixed by adding the same repo (https://github.com/apache/spark/pull/27281), `central_without_mirror`, which is a bit awkward. Instead, this PR adds GCS as a main repo, and community Maven central as a fallback repo. So, presumably the community Maven central repo is used when the plugin is loaded as a fallback.

3. While I am here, I fix another issue. Github Action at https://github.com/apache/spark/pull/27279 is being failed. The reason seems to be scalafmt 1.0.3 is in Maven central but not in GCS.

    ```
    org.apache.maven.plugin.PluginResolutionException: Plugin org.antipathy:mvn-scalafmt_2.12:1.0.3 or one of its dependencies could not be resolved: Could not find artifact org.antipathy:mvn-scalafmt_2.12🫙1.0.3 in google-maven-central (https://maven-central.storage-download.googleapis.com/repos/central/data/)
        at org.apache.maven.plugin.internal.DefaultPluginDependenciesResolver.resolve     (DefaultPluginDependenciesResolver.java:131)
    ```

   `mvn-scalafmt` exists in Maven central:

    ```bash
    $ curl https://repo.maven.apache.org/maven2/org/antipathy/mvn-scalafmt_2.12/1.0.3/mvn-scalafmt_2.12-1.0.3.pom
    ```

    ```xml
    <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
        <modelVersion>4.0.0</modelVersion>
        ...
    ```

    whereas not in GCS mirror:

    ```bash
    $ curl https://maven-central.storage-download.googleapis.com/repos/central/data/org/antipathy/mvn-scalafmt_2.12/1.0.3/mvn-scalafmt_2.12-1.0.3.pom
    ```
    ```xml
    <?xml version='1.0' encoding='UTF-8'?><Error><Code>NoSuchKey</Code><Message>The specified key does not exist.</Message><Details>No such object: maven-central/repos/central/data/org/antipathy/mvn-scalafmt_2.12/1.0.3/mvn-scalafmt_2.12-1.0.3.pom</Details></Error>%
    ```

    In this PR, simply make both repos accessible by adding to `pluginRepositories`.

4. Remove the workarounds in Github Actions to switch mirrors because now we have same repos in the same order (Google Maven Central first, and Maven Central second)

### Why are the changes needed?

To make the build and Github Action more stable.

### Does this PR introduce any user-facing change?

No, dev only change.

### How was this patch tested?

I roughly checked local and PR against my fork (https://github.com/HyukjinKwon/spark/pull/2 and https://github.com/HyukjinKwon/spark/pull/3).

Closes #27307 from HyukjinKwon/SPARK-30572.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-01-23 16:00:21 +09:00
HyukjinKwon e170422f74 Revert "[SPARK-30534][INFRA] Use mvn in dev/scalastyle"
This reverts commit 384899944b.
2020-01-21 18:23:03 +09:00
Takeshi Yamamuro 775fae4640 [SPARK-30486][BUILD] Bump lz4-java version to 1.7.1
### What changes were proposed in this pull request?

This pr intends to upgrade lz4-java from 1.7.0 to 1.7.1.

### Why are the changes needed?

This release includes a bug fix for older macOS. You can see the link below for the changes;
https://github.com/lz4/lz4-java/blob/master/CHANGES.md#171

### Does this PR introduce any user-facing change?

### How was this patch tested?

Existing tests.

Closes #27271 from maropu/SPARK-30486.

Authored-by: Takeshi Yamamuro <yamamuro@apache.org>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2020-01-19 19:05:30 -08:00
Sean Owen a2081ae4e1 [SPARK-29290][CORE] Update to chill 0.9.5
### What changes were proposed in this pull request?

Update Twitter Chill to 0.9.5.

### Why are the changes needed?

Primarily, Scala 2.13 support for later.
Other changes from 0.9.3 are apparently just minor fixes and improvements:
https://github.com/twitter/chill/releases

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

Existing tests

Closes #27227 from srowen/SPARK-29290.

Authored-by: Sean Owen <srowen@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2020-01-19 18:39:38 -08:00
Dongjoon Hyun c992716a33 [SPARK-30572][BUILD] Add a fallback Maven repository
### What changes were proposed in this pull request?

This PR aims to add a fallback Maven repository when a mirror to `central` fail.

### Why are the changes needed?

We use `Google Maven Central` in GitHub Action as a mirror of `central`.
However, `Google Maven Central` sometimes doesn't have newly published artifacts
and there is no guarantee when we get the newly published artifacts.

By duplicating `Maven Central` with a new ID, we can add a fallback Maven repository
which is not mirrored by `Google Maven Central`.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Manually testing with the new `Twitter` chill artifacts by switching `chill.version` from `0.9.3` to `0.9.5`.

```
$ rm -rf ~/.m2/repository/com/twitter/chill*
$ mvn compile | grep chill
Downloading from google-maven-central: https://maven-central.storage-download.googleapis.com/repos/central/data/com/twitter/chill_2.12/0.9.5/chill_2.12-0.9.5.pom
Downloading from central_without_mirror: https://repo.maven.apache.org/maven2/com/twitter/chill_2.12/0.9.5/chill_2.12-0.9.5.pom
Downloaded from central_without_mirror: https://repo.maven.apache.org/maven2/com/twitter/chill_2.12/0.9.5/chill_2.12-0.9.5.pom (2.8 kB at 11 kB/s)
```

Closes #27281 from dongjoon-hyun/SPARK-30572.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2020-01-19 17:42:34 -08:00
Dongjoon Hyun 384899944b [SPARK-30534][INFRA] Use mvn in dev/scalastyle
### What changes were proposed in this pull request?

This PR aims to use `mvn` instead of `sbt` in `dev/scalastyle` to recover GitHub Action.

### Why are the changes needed?

As of now, Apache Spark sbt build is broken by the Maven Central repository policy.
https://stackoverflow.com/questions/59764749/requests-to-http-repo1-maven-org-maven2-return-a-501-https-required-status-an

> Effective January 15, 2020, The Central Maven Repository no longer supports insecure
> communication over plain HTTP and requires that all requests to the repository are
> encrypted over HTTPS.

We can reproduce this locally by the following.
```
$ rm -rf ~/.m2/repository/org/apache/apache/18/
$ build/sbt clean
```

And, in GitHub Action, `lint-scala` is the only one which is using `sbt`.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

First of all, GitHub Action should be recovered.

Also, manually, do the following.

**Without Scalastyle violation**
```
$ dev/scalastyle
OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=384m; support was removed in 8.0
Using `mvn` from path: /usr/local/bin/mvn
Scalastyle checks passed.
```

**With Scalastyle violation**
```
$ dev/scalastyle
OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=384m; support was removed in 8.0
Using `mvn` from path: /usr/local/bin/mvn
Scalastyle checks failed at following occurrences:
error file=/Users/dongjoon/PRS/SPARK-HTTP-501/core/src/main/scala/org/apache/spark/SparkConf.scala message=There should be no empty line separating imports in the same group. line=22 column=0
error file=/Users/dongjoon/PRS/SPARK-HTTP-501/core/src/test/scala/org/apache/spark/resource/ResourceProfileSuite.scala message=There should be no empty line separating imports in the same group. line=22 column=0
```

Closes #27242 from dongjoon-hyun/SPARK-30534.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2020-01-16 16:00:58 -08:00
Jungtaek Lim (HeartSaVioR) 8384ff4c9d [SPARK-28144][SPARK-29294][SS] Upgrade Kafka to 2.4.0
### What changes were proposed in this pull request?

This patch upgrades the version of Kafka to 2.4, which supports Scala 2.13.

There're some incompatible changes in Kafka 2.4 which the patch addresses as well:

* `ZkUtils` is removed -> Replaced with `KafkaZkClient`
* Majority of methods are removed in `AdminUtils` -> Replaced with `AdminZkClient`
* Method signature of `Scheduler.schedule` is changed (return type) -> leverage `DeterministicScheduler` to avoid implementing `ScheduledFuture`

### Why are the changes needed?

* Kafka 2.4 supports Scala 2.13

### Does this PR introduce any user-facing change?

No, as Kafka API is known to be compatible across versions.

### How was this patch tested?

Existing UTs

Closes #26960 from HeartSaVioR/SPARK-29294.

Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-12-21 14:01:25 -08:00
Sean Owen fac6b9bde8 Revert [SPARK-27300][GRAPH] Add Spark Graph modules and dependencies
This reverts commit 709387d660.

See https://issues.apache.org/jira/browse/SPARK-27300?focusedCommentId=16990048&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16990048 and previous mailing list discussions.

### What changes were proposed in this pull request?

Revert the addition of skeleton graph API modules for Spark 3.0.

### Why are the changes needed?

It does not appear that content will be added to the module for Spark 3, so I propose avoiding committing to the modules, which are no-ops now, in the upcoming major 3.0 release.

### Does this PR introduce any user-facing change?

No, the modules were not released.

### How was this patch tested?

Existing tests, but mostly N/A.

Closes #26928 from srowen/Revert27300.

Authored-by: Sean Owen <srowen@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-12-17 09:06:23 -08:00
Yuming Wang 696288f623 [INFRA] Reverts commit 56dcd79 and c216ef1
### What changes were proposed in this pull request?
1. Revert "Preparing development version 3.0.1-SNAPSHOT": 56dcd79

2. Revert "Preparing Spark release v3.0.0-preview2-rc2": c216ef1

### Why are the changes needed?
Shouldn't change master.

### Does this PR introduce any user-facing change?
No.

### How was this patch tested?
manual test:
https://github.com/apache/spark/compare/5de5e46..wangyum:revert-master

Closes #26915 from wangyum/revert-master.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Yuming Wang <wgyumg@gmail.com>
2019-12-16 19:57:44 -07:00
Yuming Wang 56dcd79992 Preparing development version 3.0.1-SNAPSHOT 2019-12-17 01:57:27 +00:00
Yuming Wang c216ef1d03 Preparing Spark release v3.0.0-preview2-rc2 2019-12-17 01:57:21 +00:00
Dongjoon Hyun b709091b4f [SPARK-30228][BUILD] Update zstd-jni to 1.4.4-3
### What changes were proposed in this pull request?

This PR aims to update zstd-jni library to 1.4.4-3.

### Why are the changes needed?

This will bring the latest bug fixes in zstd itself and some performance improvement.
- https://github.com/facebook/zstd/releases/tag/v1.4.4

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Pass the Jenkins.

Closes #26856 from dongjoon-hyun/SPARK-ZSTD-144.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2019-12-12 14:16:32 +09:00
Takeshi Yamamuro be867e8a9e [SPARK-30196][BUILD] Bump lz4-java version to 1.7.0
### What changes were proposed in this pull request?

This pr intends to upgrade lz4-java from 1.6.0 to 1.7.0.

### Why are the changes needed?

This release includes a performance bug (https://github.com/lz4/lz4-java/pull/143) fixed by JoshRosen and some improvements (e.g., LZ4 binary update). You can see the link below for the changes;
https://github.com/lz4/lz4-java/blob/master/CHANGES.md#170

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

Existing tests.

Closes #26823 from maropu/LZ4_1_7_0.

Authored-by: Takeshi Yamamuro <yamamuro@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2019-12-10 12:22:03 +09:00
Dongjoon Hyun afc4fa02bd [SPARK-30156][BUILD] Upgrade Jersey from 2.29 to 2.29.1
### What changes were proposed in this pull request?

This PR aims to upgrade `Jersey` from 2.29 to 2.29.1.

### Why are the changes needed?

This will bring several bug fixes and important dependency upgrades.
- https://eclipse-ee4j.github.io/jersey.github.io/release-notes/2.29.1.html

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Pass the Jenkins.

Closes #26785 from dongjoon-hyun/SPARK-30156.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-12-06 18:49:43 -08:00
Dongjoon Hyun 1e0037b5e9 [SPARK-30157][BUILD][TEST-HADOOP3.2][TEST-JAVA11] Upgrade Apache HttpCore from 4.4.10 to 4.4.12
### What changes were proposed in this pull request?

This PR aims to upgrade `Apache HttpCore` from 4.4.10 to 4.4.12.

### Why are the changes needed?

`Apache HttpCore v4.4.11` is the first official release for JDK11.
> This is a maintenance release that corrects a number of defects in non-blocking SSL session code that caused compatibility issues with TLSv1.3 protocol implementation shipped with Java 11.

For the full release note, please see the following.
- https://www.apache.org/dist/httpcomponents/httpcore/RELEASE_NOTES-4.4.x.txt

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Pass the Jenkins.

Closes #26786 from dongjoon-hyun/SPARK-30157.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2019-12-07 10:59:10 +09:00
Dongjoon Hyun 1595e46a4e [SPARK-30142][TEST-MAVEN][BUILD] Upgrade Maven to 3.6.3
### What changes were proposed in this pull request?

This PR aims to upgrade Maven from 3.6.2 to 3.6.3.

### Why are the changes needed?

This will bring bug fixes like the following.
- MNG-6759 Maven fails to use <repositories> section from dependency when resolving transitive dependencies in some cases
- MNG-6760 ExclusionArtifactFilter result invalid when wildcard exclusion is followed by other exclusions

The following is the full release note.
- https://maven.apache.org/docs/3.6.3/release-notes.html

### Does this PR introduce any user-facing change?

No. (This is a dev-environment change.)

### How was this patch tested?

Pass the Jenkins with both SBT and Maven.

Closes #26770 from dongjoon-hyun/SPARK-30142.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2019-12-06 23:41:59 +09:00
Dongjoon Hyun f3abee377d [SPARK-30051][BUILD] Clean up hadoop-3.2 dependency
### What changes were proposed in this pull request?

This PR aims to cut `org.eclipse.jetty:jetty-webapp`and `org.eclipse.jetty:jetty-xml` transitive dependency from `hadoop-common`.

### Why are the changes needed?

This will simplify our dependency management by the removal of unused dependencies.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Pass the GitHub Action with all combinations and the Jenkins UT with (Hadoop-3.2).

Closes #26742 from dongjoon-hyun/SPARK-30051.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-12-03 14:33:36 -08:00
huangtianhua e842033acc [SPARK-27721][BUILD] Switch to use right leveldbjni according to the platforms
This change adds a profile to switch to use the right leveldbjni package according to the platforms:
aarch64 uses org.openlabtesting.leveldbjni:leveldbjni-all.1.8, and other platforms use the old one org.fusesource.leveldbjni:leveldbjni-all.1.8.
And because some hadoop dependencies packages are also depend on org.fusesource.leveldbjni:leveldbjni-all, but hadoop merge the similar change on trunk, details see
https://issues.apache.org/jira/browse/HADOOP-16614, so exclude the dependency of org.fusesource.leveldbjni for these hadoop packages related.
Then Spark can build/test on aarch64 platform successfully.

Closes #26636 from huangtianhua/add-aarch64-leveldbjni.

Authored-by: huangtianhua <huangtianhua@huawei.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-12-02 09:04:00 -06:00
HyukjinKwon f32ca4b279 [SPARK-30076][BUILD][TESTS] Upgrade Mockito to 3.1.0
### What changes were proposed in this pull request?

We used 2.28.2 of Mockito as of https://github.com/apache/spark/pull/25139 because 3.0.0 might be unstable. Now 3.1.0 is released.

See release notes - https://github.com/mockito/mockito/blob/v3.1.0/doc/release-notes/official.md

### Why are the changes needed?

To bring the fixes made in the dependency.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Jenkins will test.

Closes #26707 from HyukjinKwon/upgrade-Mockito.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-11-30 12:23:11 -06:00
Sean Owen e23c135e56 [SPARK-29293][BUILD] Move scalafmt to Scala 2.12 profile; bump to 0.12
### What changes were proposed in this pull request?

Move scalafmt to Scala 2.12 profile; bump to 0.12.

### Why are the changes needed?

To facilitate a future Scala 2.13 build.

### Does this PR introduce any user-facing change?

None.

### How was this patch tested?

This isn't covered by tests, it's a convenience for contributors.

Closes #26655 from srowen/SPARK-29293.

Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-11-26 09:59:19 -08:00
Dongjoon Hyun c2d513f8e9 [SPARK-30035][BUILD] Upgrade to Apache Commons Lang 3.9
### What changes were proposed in this pull request?

This PR aims to upgrade to `Apache Commons Lang 3.9`.

### Why are the changes needed?

`Apache Commons Lang 3.9` is the first official release to support JDK9+. The following is the full release note.
- https://commons.apache.org/proper/commons-lang/release-notes/RELEASE-NOTES-3.9.txt

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Pass the Jenkins with the existing tests.

Closes #26672 from dongjoon-hyun/SPARK-30035.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2019-11-26 21:31:02 +09:00
Dongjoon Hyun 53e19f3678 [SPARK-30032][BUILD] Upgrade to ORC 1.5.8
### What changes were proposed in this pull request?

This PR aims to upgrade to Apache ORC 1.5.8.

### Why are the changes needed?

This will bring the latest bug fixes. The following is the full release note.
- https://issues.apache.org/jira/projects/ORC/versions/12346462

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Pass the Jenkins with the existing tests.

Closes #26669 from dongjoon-hyun/SPARK-ORC-1.5.8.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-11-25 20:08:11 -08:00
Dongjoon Hyun 2a28c73d81 [SPARK-30031][BUILD][SQL] Remove hive-2.3 profile from sql/hive module
### What changes were proposed in this pull request?

This PR aims to remove `hive-2.3` profile from `sql/hive` module.

### Why are the changes needed?

Currently, we need `-Phive-1.2` or `-Phive-2.3` additionally to build `hive` or `hive-thriftserver` module. Without specifying it, the build fails like the following. This PR will recover it.
```
$ build/mvn -DskipTests compile --pl sql/hive
...
[ERROR] [Error] /Users/dongjoon/APACHE/spark-merge/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala:32: object serde is not a member of package org.apache.hadoop.hive
```

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

1. Pass GitHub Action dependency check with no manifest change.
2. Pass GitHub Action build for all combinations.
3. Pass the Jenkins UT.

Closes #26668 from dongjoon-hyun/SPARK-30031.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-11-25 15:17:27 -08:00
Dongjoon Hyun 1466863cee [SPARK-30015][BUILD] Move hive-storage-api dependency from hive-2.3 to sql/core
# What changes were proposed in this pull request?

This PR aims to relocate the following internal dependencies to compile `sql/core` without `-Phive-2.3` profile.

1. Move the `hive-storage-api` to `sql/core` which is using `hive-storage-api` really.

**BEFORE (sql/core compilation)**
```
$ ./build/mvn -DskipTests --pl sql/core --am compile
...
[ERROR] [Error] /Users/dongjoon/APACHE/spark/sql/core/v2.3/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFilters.scala:21: object hive is not a member of package org.apache.hadoop
...
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
```
**AFTER (sql/core compilation)**
```
$ ./build/mvn -DskipTests --pl sql/core --am compile
...
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  02:04 min
[INFO] Finished at: 2019-11-25T00:20:11-08:00
[INFO] ------------------------------------------------------------------------
```

2. For (1), add `commons-lang:commons-lang` test dependency to `spark-core` module to manage the dependency explicitly. Without this, `core` module fails to build the test classes.

```
$ ./build/mvn -DskipTests --pl core --am package -Phadoop-3.2
...
[INFO] --- scala-maven-plugin:4.3.0:testCompile (scala-test-compile-first)  spark-core_2.12 ---
[INFO] Using incremental compilation using Mixed compile order
[INFO] Compiler bridge file: /Users/dongjoon/.sbt/1.0/zinc/org.scala-sbt/org.scala-sbt-compiler-bridge_2.12-1.3.1-bin_2.12.10__52.0-1.3.1_20191012T045515.jar
[INFO] Compiling 271 Scala sources and 26 Java sources to /spark/core/target/scala-2.12/test-classes ...
[ERROR] [Error] /spark/core/src/test/scala/org/apache/spark/util/PropertiesCloneBenchmark.scala:23: object lang is not a member of package org.apache.commons
[ERROR] [Error] /spark/core/src/test/scala/org/apache/spark/util/PropertiesCloneBenchmark.scala:49: not found: value SerializationUtils
[ERROR] two errors found
```

**BEFORE (commons-lang:commons-lang)**
The following is the previous `core` module's `commons-lang:commons-lang` dependency.

1. **branch-2.4**
```
$ mvn dependency:tree -Dincludes=commons-lang:commons-lang
[INFO] --- maven-dependency-plugin:3.0.2:tree (default-cli)  spark-core_2.11 ---
[INFO] org.apache.spark:spark-core_2.11🫙2.4.5-SNAPSHOT
[INFO] \- org.spark-project.hive:hive-exec:jar:1.2.1.spark2:provided
[INFO]    \- commons-lang:commons-lang:jar:2.6:compile
```

2. **v3.0.0-preview (-Phadoop-3.2)**
```
$ mvn dependency:tree -Dincludes=commons-lang:commons-lang -Phadoop-3.2
[INFO] --- maven-dependency-plugin:3.1.1:tree (default-cli)  spark-core_2.12 ---
[INFO] org.apache.spark:spark-core_2.12🫙3.0.0-preview
[INFO] \- org.apache.hive:hive-storage-api:jar:2.6.0:compile
[INFO]    \- commons-lang:commons-lang:jar:2.6:compile
```

3. **v3.0.0-preview(default)**
```
$ mvn dependency:tree -Dincludes=commons-lang:commons-lang
[INFO] --- maven-dependency-plugin:3.1.1:tree (default-cli)  spark-core_2.12 ---
[INFO] org.apache.spark:spark-core_2.12🫙3.0.0-preview
[INFO] \- org.apache.hadoop:hadoop-client:jar:2.7.4:compile
[INFO]    \- org.apache.hadoop:hadoop-common:jar:2.7.4:compile
[INFO]       \- commons-lang:commons-lang:jar:2.6:compile
```

**AFTER (commons-lang:commons-lang)**
```
$ mvn dependency:tree -Dincludes=commons-lang:commons-lang
[INFO] --- maven-dependency-plugin:3.1.1:tree (default-cli)  spark-core_2.12 ---
[INFO] org.apache.spark:spark-core_2.12🫙3.0.0-SNAPSHOT
[INFO] \- commons-lang:commons-lang:jar:2.6:test
```

Since we wanted to verify that this PR doesn't change `hive-1.2` profile, we merged
[SPARK-30005 Update `test-dependencies.sh` to check `hive-1.2/2.3` profile](a1706e2fa7) before this PR.

### Why are the changes needed?

- Apache Spark 2.4's `sql/core` is using `Apache ORC (nohive)` jars including shaded `hive-storage-api` to access ORC data sources.

- Apache Spark 3.0's `sql/core` is using `Apache Hive` jars directly. Previously, `-Phadoop-3.2` hid this `hive-storage-api` dependency. Now, we are using `-Phive-2.3` instead. As I mentioned [previously](https://github.com/apache/spark/pull/26619#issuecomment-556926064), this PR is required to compile `sql/core` module without `-Phive-2.3`.

- For `sql/hive` and `sql/hive-thriftserver`, it's natural that we need `-Phive-1.2` or `-Phive-2.3`.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

This will pass the Jenkins (with the dependency check and unit tests).

We need to check manually with `./build/mvn -DskipTests --pl sql/core --am compile`.

This closes #26657 .

Closes #26658 from dongjoon-hyun/SPARK-30015.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-11-25 10:54:14 -08:00
Sean Owen 13896e4eae [SPARK-30013][SQL] For scala 2.13, omit parens in various BigDecimal value() methods
### What changes were proposed in this pull request?

Omit parens on calls like BigDecimal.longValue()

### Why are the changes needed?

For some reason, this won't compile in Scala 2.13. The calls are otherwise equivalent in 2.12.

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

Existing tests

Closes #26653 from srowen/SPARK-30013.

Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-11-24 18:23:34 -08:00
Dongjoon Hyun 6625b69027 [SPARK-29981][BUILD][FOLLOWUP] Change hive.version.short
### What changes were proposed in this pull request?

This is a follow-up according to liancheng 's advice.
- https://github.com/apache/spark/pull/26619#discussion_r349326090

### Why are the changes needed?

Previously, we chose the full version to be carefully. As of today, it seems that `Apache Hive 2.3` branch seems to become stable.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Pass the compile combination on GitHub Action.
1. hadoop-2.7/hive-1.2/JDK8
2. hadoop-2.7/hive-2.3/JDK8
3. hadoop-3.2/hive-2.3/JDK8
4. hadoop-3.2/hive-2.3/JDK11

Also, pass the Jenkins with `hadoop-2.7` and `hadoop-3.2` for (1) and (4).
(2) and (3) is not ready in Jenkins.

Closes #26645 from dongjoon-hyun/SPARK-RENAME-HIVE-DIRECTORY.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-11-23 12:50:50 -08:00
Dongjoon Hyun c98e5eb339 [SPARK-29981][BUILD] Add hive-1.2/2.3 profiles
### What changes were proposed in this pull request?

This PR aims the followings.
- Add two profiles, `hive-1.2` and `hive-2.3` (default)
- Validate if we keep the existing combination at least. (Hadoop-2.7 + Hive 1.2 / Hadoop-3.2 + Hive 2.3).

For now, we assumes that `hive-1.2` is explicitly used with `hadoop-2.7` and `hive-2.3` with `hadoop-3.2`. The followings are beyond the scope of this PR.

- SPARK-29988 Adjust Jenkins jobs for `hive-1.2/2.3` combination
- SPARK-29989 Update release-script for `hive-1.2/2.3` combination
- SPARK-29991 Support `hive-1.2/2.3` in PR Builder

### Why are the changes needed?

This will help to switch our dependencies to update the exposed dependencies.

### Does this PR introduce any user-facing change?

This is a dev-only change that the build profile combinations are changed.
- `-Phadoop-2.7` => `-Phadoop-2.7 -Phive-1.2`
- `-Phadoop-3.2` => `-Phadoop-3.2 -Phive-2.3`

### How was this patch tested?

Pass the Jenkins with the dependency check and tests to make it sure we don't change anything for now.

- [Jenkins (-Phadoop-2.7 -Phive-1.2)](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114192/consoleFull)
- [Jenkins (-Phadoop-3.2 -Phive-2.3)](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114192/consoleFull)

Also, from now, GitHub Action validates the following combinations.
![gha](https://user-images.githubusercontent.com/9700541/69355365-822d5e00-0c36-11ea-93f7-e00e5459e1d0.png)

Closes #26619 from dongjoon-hyun/SPARK-29981.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-11-23 10:02:22 -08:00
Dongjoon Hyun f77c10de38 [SPARK-29923][SQL][TESTS] Set io.netty.tryReflectionSetAccessible for Arrow on JDK9+
### What changes were proposed in this pull request?

This PR aims to add `io.netty.tryReflectionSetAccessible=true` to the testing configuration for JDK11 because this is an officially documented requirement of Apache Arrow.

Apache Arrow community documented this requirement at `0.15.0` ([ARROW-6206](https://github.com/apache/arrow/pull/5078)).
> #### For java 9 or later, should set "-Dio.netty.tryReflectionSetAccessible=true".
> This fixes `java.lang.UnsupportedOperationException: sun.misc.Unsafe or java.nio.DirectByteBuffer.(long, int) not available`. thrown by netty.

### Why are the changes needed?

After ARROW-3191, Arrow Java library requires the property `io.netty.tryReflectionSetAccessible` to be set to true for JDK >= 9. After https://github.com/apache/spark/pull/26133, JDK11 Jenkins job seem to fail.

- https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-3.2-jdk-11/676/
- https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-3.2-jdk-11/677/
- https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-3.2-jdk-11/678/

```scala
Previous exception in task:
sun.misc.Unsafe or java.nio.DirectByteBuffer.<init>(long, int) not available&#010;
io.netty.util.internal.PlatformDependent.directBuffer(PlatformDependent.java:473)&#010;
io.netty.buffer.NettyArrowBuf.getDirectBuffer(NettyArrowBuf.java:243)&#010;
io.netty.buffer.NettyArrowBuf.nioBuffer(NettyArrowBuf.java:233)&#010;
io.netty.buffer.ArrowBuf.nioBuffer(ArrowBuf.java:245)&#010;
org.apache.arrow.vector.ipc.message.ArrowRecordBatch.computeBodyLength(ArrowRecordBatch.java:222)&#010;
```

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Pass the Jenkins with JDK11.

Closes #26552 from dongjoon-hyun/SPARK-ARROW-JDK11.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-11-15 23:58:15 -08:00
Bryan Cutler 65a189c7a1 [SPARK-29376][SQL][PYTHON] Upgrade Apache Arrow to version 0.15.1
### What changes were proposed in this pull request?

Upgrade Apache Arrow to version 0.15.1. This includes Java artifacts and increases the minimum required version of PyArrow also.

Version 0.12.0 to 0.15.1 includes the following selected fixes/improvements relevant to Spark users:

* ARROW-6898 - [Java] Fix potential memory leak in ArrowWriter and several test classes
* ARROW-6874 - [Python] Memory leak in Table.to_pandas() when conversion to object dtype
* ARROW-5579 - [Java] shade flatbuffer dependency
* ARROW-5843 - [Java] Improve the readability and performance of BitVectorHelper#getNullCount
* ARROW-5881 - [Java] Provide functionalities to efficiently determine if a validity buffer has completely 1 bits/0 bits
* ARROW-5893 - [C++] Remove arrow::Column class from C++ library
* ARROW-5970 - [Java] Provide pointer to Arrow buffer
* ARROW-6070 - [Java] Avoid creating new schema before IPC sending
* ARROW-6279 - [Python] Add Table.slice method or allow slices in \_\_getitem\_\_
* ARROW-6313 - [Format] Tracking for ensuring flatbuffer serialized values are aligned in stream/files.
* ARROW-6557 - [Python] Always return pandas.Series from Array/ChunkedArray.to_pandas, propagate field names to Series from RecordBatch, Table
* ARROW-2015 - [Java] Use Java Time and Date APIs instead of JodaTime
* ARROW-1261 - [Java] Add container type for Map logical type
* ARROW-1207 - [C++] Implement Map logical type

Changelog can be seen at https://arrow.apache.org/release/0.15.0.html

### Why are the changes needed?

Upgrade to get bug fixes, improvements, and maintain compatibility with future versions of PyArrow.

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

Existing tests, manually tested with Python 3.7, 3.8

Closes #26133 from BryanCutler/arrow-upgrade-015-SPARK-29376.

Authored-by: Bryan Cutler <cutlerb@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2019-11-15 13:27:30 +09:00
Dongjoon Hyun e25cfc4bb8 [SPARK-29528][BUILD] Upgrade scala-maven-plugin to 4.3.0 for Scala 2.13.1
### What changes were proposed in this pull request?

This PR aims to upgrade `scala-maven-plugin` to `4.3.0` for Scala `2.13.1`.
We tried 4.2.4, but it's reverted due to Windows build issue. Now, `4.3.0` has a Window fix.

### Why are the changes needed?
Scala 2.13.1 seems to break the binary compatibility.

We need to upgrade `scala-maven-plugin` to bring the the following fixes for the latest Scala 2.13.1.
- https://github.com/davidB/scala-maven-plugin/issues/363
- https://github.com/sbt/zinc/issues/698

Also, `4.3.0` has the following Window fix.
- https://github.com/davidB/scala-maven-plugin/issues/370 (4.2.4 throws error on Windows)

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

- For now, we don't support Scala-2.13. This PR at least needs to pass the existing Jenkins with Maven to get prepared for Scala-2.13.
- `AppVeyor` passed. (https://ci.appveyor.com/project/ApacheSoftwareFoundation/spark/builds/28745383)

Closes #26457 from dongjoon-hyun/SPARK-29528.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-11-10 08:49:05 -08:00
Liang-Chi Hsieh ef1abf2e2c [SPARK-29747][BUILD] Bump joda-time version to 2.10.5
### What changes were proposed in this pull request?

This upgrades joda-time from 2.9 to 2.10.5.

### Why are the changes needed?

Joda 2.9 is almost 4 yrs ago and there are bugs fix and tz database updates.

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

Existing tests.

Closes #26389 from viirya/upgrade-joda.

Authored-by: Liang-Chi Hsieh <liangchi@uber.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2019-11-05 10:08:19 +09:00
angerszhu e524a3a223 [SPARK-29742][BUILD] Update checkstyle plugin's check dir scope
### What changes were proposed in this pull request?
Current checkstyle checking folder can't cover all folder.
Since for support multi version hive, we have some divided hive folder.
We should check it too.

### Why are the changes needed?
Fix build bug

### Does this PR introduce any user-facing change?
NO

### How was this patch tested?
NO

Closes #26385 from AngersZhuuuu/SPARK-29742.

Authored-by: angerszhu <angers.zhu@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-11-04 09:08:47 -08:00
Sean Owen 19b8c71436 [SPARK-29674][CORE] Update dropwizard metrics to 4.1.x for JDK 9+
### What changes were proposed in this pull request?

Update the version of dropwizard metrics that Spark uses for metrics to 4.1.x, from 3.2.x.

### Why are the changes needed?

This helps JDK 9+ support, per for example https://github.com/dropwizard/metrics/pull/1236

### Does this PR introduce any user-facing change?

No, although downstream users with custom metrics may be affected.

### How was this patch tested?

Existing tests.

Closes #26332 from srowen/SPARK-29674.

Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-11-03 15:13:06 -08:00
Dongjoon Hyun 1ac6bd9f79 [SPARK-29729][BUILD] Upgrade ASM to 7.2
### What changes were proposed in this pull request?

This PR aims to upgrade ASM to 7.2.
- https://issues.apache.org/jira/browse/XBEAN-322 (Upgrade to ASM 7.2)
- https://asm.ow2.io/versions.html

### Why are the changes needed?

This will bring the following patches.
- 317875: Infinite loop when parsing invalid method descriptor
- 317873: Add support for RET instruction in AdviceAdapter
- 317872: Throw an exception if visitFrame used incorrectly
- add support for Java 14

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Pass the Jenkins with the existing UTs.

Closes #26373 from dongjoon-hyun/SPARK-29729.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-11-03 10:42:38 -08:00
Eric Meisel be022d9aee [SPARK-29677][DSTREAMS] amazon-kinesis-client 1.12.0
### What changes were proposed in this pull request?
Upgrading the amazon-kinesis-client dependency to 1.12.0.

### Why are the changes needed?
The current amazon-kinesis-client version is 1.8.10. This version depends on the use of `describeStream`, which has a hard limit on an AWS account (10 reqs / second). Versions 1.9.0 and up leverage `listShards`, which has no such limit. For large customers, this can be a major problem.

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
Existing tests

Closes #26333 from etspaceman/kclUpgrade.

Authored-by: Eric Meisel <eric.steven.meisel@gmail.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-11-02 16:42:49 -05:00
Xingbo Jiang 8207c835b4 Revert "Prepare Spark release v3.0.0-preview-rc2"
This reverts commit 007c873ae3.
2019-10-30 17:45:44 -07:00
Xingbo Jiang 007c873ae3 Prepare Spark release v3.0.0-preview-rc2
### What changes were proposed in this pull request?

To push the built jars to maven release repository, we need to remove the 'SNAPSHOT' tag from the version name.

Made the following changes in this PR:
* Update all the `3.0.0-SNAPSHOT` version name to `3.0.0-preview`
* Update the sparkR version number check logic to allow jvm version like `3.0.0-preview`

**Please note those changes were generated by the release script in the past, but this time since we manually add tags on master branch, we need to manually apply those changes too.**

We shall revert the changes after 3.0.0-preview release passed.

### Why are the changes needed?

To make the maven release repository to accept the built jars.

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

N/A
2019-10-30 17:42:59 -07:00
Xingbo Jiang b33a58c0c6 Revert "Prepare Spark release v3.0.0-preview-rc1"
This reverts commit 5eddbb5f1d.
2019-10-28 22:32:34 -07:00
Xingbo Jiang 5eddbb5f1d Prepare Spark release v3.0.0-preview-rc1
### What changes were proposed in this pull request?

To push the built jars to maven release repository, we need to remove the 'SNAPSHOT' tag from the version name.

Made the following changes in this PR:
* Update all the `3.0.0-SNAPSHOT` version name to `3.0.0-preview`
* Update the PySpark version from `3.0.0.dev0` to `3.0.0`

**Please note those changes were generated by the release script in the past, but this time since we manually add tags on master branch, we need to manually apply those changes too.**

We shall revert the changes after 3.0.0-preview release passed.

### Why are the changes needed?

To make the maven release repository to accept the built jars.

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

N/A

Closes #26243 from jiangxb1987/3.0.0-preview-prepare.

Lead-authored-by: Xingbo Jiang <xingbo.jiang@databricks.com>
Co-authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: Xingbo Jiang <xingbo.jiang@databricks.com>
2019-10-28 22:31:29 -07:00
HyukjinKwon a8d5134981 Revert "[SPARK-29528][BUILD][TEST-MAVEN] Upgrade scala-maven-plugin to 4.2.4 for Scala 2.13.1"
This reverts commit 5fc363b307.
2019-10-28 20:46:28 +09:00
Dongjoon Hyun ba9d1610b6 [SPARK-29617][BUILD] Upgrade to ORC 1.5.7
### What changes were proposed in this pull request?

This PR aims to upgrade to Apache ORC 1.5.7.

### Why are the changes needed?

This will bring the latest bug fixes. The following is the full release note.
- https://issues.apache.org/jira/projects/ORC/versions/12345702

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Pass the Jenkins with the existing tests.

Closes #26276 from dongjoon-hyun/SPARK-29617.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-27 21:11:17 -07:00
Dongjoon Hyun a43b966f00 [SPARK-29613][BUILD][SS] Upgrade to Kafka 2.3.1
### What changes were proposed in this pull request?

This PR aims to upgrade to Kafka 2.3.1 client library for client fixes like KAFKA-8950, KAFKA-8570, and KAFKA-8635. The following is the full release note.
- https://archive.apache.org/dist/kafka/2.3.1/RELEASE_NOTES.html

### Why are the changes needed?

- [KAFKA-8950 KafkaConsumer stops fetching](https://issues.apache.org/jira/browse/KAFKA-8950)
- [KAFKA-8570 Downconversion could fail when log contains out of order message formats](https://issues.apache.org/jira/browse/KAFKA-8570)
- [KAFKA-8635 Unnecessary wait when looking up coordinator before transactional request](https://issues.apache.org/jira/browse/KAFKA-8635)

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Pass the Jenkins with the existing tests.

Closes #26271 from dongjoon-hyun/SPARK-29613.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-27 16:15:54 -07:00
Luca Canali 5867707835 [SPARK-29557][BUILD] Update dropwizard/codahale metrics library to 3.2.6
### What changes were proposed in this pull request?
This proposes to update the dropwizard/codahale metrics library version used by Spark to `3.2.6` which is the last version supporting Ganglia.

### Why are the changes needed?
Spark is currently using Dropwizard metrics version 3.1.5, a version that is no more actively developed nor maintained, according to the project's Github repo README.

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
Existing tests + manual tests on a YARN cluster.

Closes #26212 from LucaCanali/updateDropwizardVersion.

Authored-by: Luca Canali <luca.canali@cern.ch>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-23 10:45:11 -07:00
Dongjoon Hyun 5fc363b307 [SPARK-29528][BUILD][TEST-MAVEN] Upgrade scala-maven-plugin to 4.2.4 for Scala 2.13.1
### What changes were proposed in this pull request?

This PR upgrades `scala-maven-plugin` to `4.2.4` for Scala `2.13.1`.

### Why are the changes needed?
Scala 2.13.1 seems to break the binary compatibility.

We need to upgrade `scala-maven-plugin` to bring the the following fixes for the latest Scala 2.13.1.
- https://github.com/davidB/scala-maven-plugin/issues/363
- https://github.com/sbt/zinc/issues/698

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

For now, we don't support Scala-2.13. This PR at least needs to pass the existing Jenkins with Maven to get prepared for Scala-2.13.

Closes #26185 from dongjoon-hyun/SPARK-29528.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2019-10-21 19:05:27 +09:00
Fokko Driesprong 8eb8f7478c [SPARK-29483][BUILD] Bump Jackson to 2.10.0
### What changes were proposed in this pull request?

Release blog: https://medium.com/cowtowncoder/jackson-2-10-features-cd880674d8a2

Fixes the following CVE's:
https://www.cvedetails.com/cve/CVE-2019-16942/
https://www.cvedetails.com/cve/CVE-2019-16943/

Looking back, there were 3 major goals for this minor release:

- Resolve the growing problem of “endless CVE patches”, a stream of fixes for reported CVEs related to “Polymorphic Deserialization” problem (described in “On Jackson CVEs… ”) that resulted in security tools forcing Jackson upgrades. 2.10 now includes “Safe Default Typing” that is hoped to resolve this problem.
- Evolve 2.x API towards 3.0, based on changes that were done in master, within limits of 2.x API backwards-compatibility requirements.
- Add JDK support for versions beyond Java 8: specifically add“module-info.class” for JDK9+, defining proper module definitions for Jackson components

Full changelog: https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.10

Improved Scala 2.13 support: https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.10#scala

### Why are the changes needed?

Patches CVE's reported by the vulnerability scanner.

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

Ran `mvn clean install -DskipTests` locally.

Closes #26131 from Fokko/SPARK-29483.

Authored-by: Fokko Driesprong <fokko@apache.org>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-16 15:38:54 -07:00
Jeff Evans 95de93b24e [SPARK-24540][SQL] Support for multiple character delimiter in Spark CSV read
Updating univocity-parsers version to 2.8.3, which adds support for multiple character delimiters

Moving univocity-parsers version to spark-parent pom dependencyManagement section

Adding new utility method to build multi-char delimiter string, which delegates to existing one

Adding tests for multiple character delimited CSV

### What changes were proposed in this pull request?

Adds support for parsing CSV data using multiple-character delimiters.  Existing logic for converting the input delimiter string to characters was kept and invoked in a loop.  Project dependencies were updated to remove redundant declaration of `univocity-parsers` version, and also to change that version to the latest.

### Why are the changes needed?

It is quite common for people to have delimited data, where the delimiter is not a single character, but rather a sequence of characters.  Currently, it is difficult to handle such data in Spark (typically needs pre-processing).

### Does this PR introduce any user-facing change?

Yes. Specifying the "delimiter" option for the DataFrame read, and providing more than one character, will no longer result in an exception.  Instead, it will be converted as before and passed to the underlying library (Univocity), which has accepted multiple character delimiters since 2.8.0.

### How was this patch tested?

The `CSVSuite` tests were confirmed passing (including new methods), and `sbt` tests for `sql` were executed.

Closes #26027 from jeff303/SPARK-24540.

Authored-by: Jeff Evans <jeffrey.wayne.evans@gmail.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-10-15 15:44:51 -05:00
Dongjoon Hyun 39d53d3e74 [SPARK-29470][BUILD] Update plugins to latest versions
### What changes were proposed in this pull request?

This PR updates plugins to latest versions.

### Why are the changes needed?

This brings bug fixes like the following.
- https://issues.apache.org/jira/projects/MCOMPILER/versions/12343484 (maven-compiler-plugin)
- https://issues.apache.org/jira/projects/MJAVADOC/versions/12345060 (maven-javadoc-plugin)
- https://issues.apache.org/jira/projects/MCHECKSTYLE/versions/12342397 (maven-checkstyle-plugin)
- https://checkstyle.sourceforge.io/releasenotes.html#Release_8.25 (checkstyle)
- https://checkstyle.sourceforge.io/releasenotes.html#Release_8.24 (checkstyle)

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Pass the Jenkins building and testing with the existing code.

Closes #26117 from dongjoon-hyun/SPARK-29470.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-15 11:55:52 -07:00
Fokko Driesprong b5b1b69f79 [SPARK-29445][CORE] Bump netty-all from 4.1.39.Final to 4.1.42.Final
### What changes were proposed in this pull request?

Minor version bump of Netty to patch reported CVE.

Patches: https://www.cvedetails.com/cve/CVE-2019-16869/

### Why are the changes needed?

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

Compiled locally using `mvn clean install -DskipTests`

Closes #26099 from Fokko/SPARK-29445.

Authored-by: Fokko Driesprong <fokko@apache.org>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-10-12 09:43:16 -05:00
Peter Toth 3a7126cea8 [SPARK-29410][BUILD] Update commons-beanutils to 1.9.4
### What changes were proposed in this pull request?
This PR updates commons-beanutils to 1.9.4.

### Why are the changes needed?
CVE fixed in 1.9.4: http://commons.apache.org/proper/commons-beanutils/javadocs/v1.9.4/RELEASE-NOTES.txt

### Does this PR introduce any user-facing change?
No.

### How was this patch tested?
Existing UTs.

Closes #26069 from peter-toth/SPARK-29410-update-commons-beanutils-to-1.9.4.

Authored-by: Peter Toth <peter.toth@gmail.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-10-12 09:24:06 -05:00
Sean Owen 4d93fb7dfa [SPARK-29413][CORE] Rewrite ThreadUtils.parmap to avoid TraversableLike, gone in 2.13
### What changes were proposed in this pull request?

Rewrite declaration of internal `ThreadUtils.parmap` method to avoid `TraversableLike`, which is removed in Scala 2.13.

### Why are the changes needed?

To compile in Scala 2.13.

### Does this PR introduce any user-facing change?

None.

### How was this patch tested?

Existing tests.

Closes #26072 from srowen/SPARK-29413.

Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-09 10:53:51 -07:00
Sean Owen 7aca0dd658 [SPARK-29296][BUILD][CORE] Remove use of .par to make 2.13 support easier; add scala-2.13 profile to enable pulling in par collections library separately, for the future
### What changes were proposed in this pull request?

Scala 2.13 removes the parallel collections classes to a separate library, so first, this establishes a `scala-2.13` profile to bring it back, for future use.

However the library enables use of `.par` implicit conversions via a new class that is not in 2.12, which makes cross-building hard. This implements a suggested workaround from https://github.com/scala/scala-parallel-collections/issues/22 to avoid `.par` entirely.

### Why are the changes needed?

To compile for 2.13 and later to work with 2.13.

### Does this PR introduce any user-facing change?

Should not, no.

### How was this patch tested?

Existing tests.

Closes #25980 from srowen/SPARK-29296.

Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-10-03 08:56:08 -05:00
Dongjoon Hyun 9a84fae216 [SPARK-29332][BUILD] Update zstd-jni to 1.4.3-1
### What changes were proposed in this pull request?

This PR aims to update zstd-jni library to 1.4.3-1.

### Why are the changes needed?

This will bring the latest bug fixes in zstd itself. This is independent from another on-going Spark fix.
- https://github.com/facebook/zstd/releases/tag/v1.4.3

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Pass the Jenkins with the existing tests.

Closes #26002 from dongjoon-hyun/SPARK-29332.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-02 11:37:02 -07:00
gengjiaan 1018390542 [SPARK-29252][BUILD] Upgrade zookeeper to 3.4.14 and fix vulnerabilities
### What changes were proposed in this pull request?
The current code uses org.apache.zookeeper:zookeeper:jar:3.4.6 and it will cause a security vulnerabilities. We could get some security info from https://www.tenable.com/cve/CVE-2019-0201

This reference remind to upgrate the version of `zookeeper` to 3.4.14 or later.

### Why are the changes needed?
This PR fix the security vulnerabilities.

### Does this PR introduce any user-facing change?
No.

### How was this patch tested?
Exists UT.

Closes #25933 from beliefer/upgrade-zookeeper.

Authored-by: gengjiaan <gengjiaan@360.cn>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-09-30 08:16:32 -05:00
Sean Owen 28b8383a6c [SPARK-29289][BUILD] Update scalatest, scalacheck, scopt, clapper, scala-parser-combinators for 2.13
### What changes were proposed in this pull request?

Update scalatest, scalacheck, scopt, clapper, scala-parser-combinators to latest maintenance release that is also cross-published for Scala 2.13.

### Why are the changes needed?

To build in the future for Scala 2.13

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

Existing tests

Closes #25967 from srowen/SPARK-29289.

Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-09-30 08:13:57 -05:00
Yuming Wang 8167714cab [SPARK-27831][FOLLOW-UP][SQL][TEST] Should not use maven to add Hive test jars
### What changes were proposed in this pull request?

This PR moves Hive test jars(`hive-contrib-*.jar` and `hive-hcatalog-core-*.jar`) from maven dependency to local file.

### Why are the changes needed?
`--jars` can't be tested since `hive-contrib-*.jar` and `hive-hcatalog-core-*.jar` are already in classpath.

### Does this PR introduce any user-facing change?
No.

### How was this patch tested?
manual test

Closes #25690 from wangyum/SPARK-27831-revert.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Yuming Wang <wgyumg@gmail.com>
2019-09-28 16:55:49 -07:00
gengjiaan eef3abbb90 [SPARK-29226][BUILD] Upgrade jackson-databind to 2.9.10 and fix vulnerabilities
### What changes were proposed in this pull request?
The current code uses com.fasterxml.jackson.core:jackson-databind:jar:2.9.9.3 and it will cause a security vulnerabilities. We could get some security info from https://www.tenable.com/cve/CVE-2019-16335 and https://www.tenable.com/cve/CVE-2019-14540

This reference remind to upgrate the version of `jackson-databind` to 2.9.10 or later.

This PR also upgrade the version of jackson to 2.9.10.

### Why are the changes needed?
This PR fix the security vulnerabilities.

### Does this PR introduce any user-facing change?
No.

### How was this patch tested?
Exists UT.

Closes #25912 from beliefer/upgrade-jackson.

Authored-by: gengjiaan <gengjiaan@360.cn>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-09-24 22:05:13 -07:00
Sean Owen a9ae262cf2 [SPARK-28772][BUILD][MLLIB] Update breeze to 1.0
### What changes were proposed in this pull request?

Update breeze dependency to 1.0.

### Why are the changes needed?

Breeze 1.0 supports Scala 2.13 and has a few bug fixes.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Existing tests.

Closes #25874 from srowen/SPARK-28772.

Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-09-20 20:31:26 -07:00
Dongjoon Hyun 3bf43fb60d [SPARK-29159][BUILD] Increase ReservedCodeCacheSize to 1G
### What changes were proposed in this pull request?

This PR aims to increase the JVM CodeCacheSize from 0.5G to 1G.

### Why are the changes needed?

After upgrading to `Scala 2.12.10`, the following is observed during building.
```
2019-09-18T20:49:23.5030586Z OpenJDK 64-Bit Server VM warning: CodeCache is full. Compiler has been disabled.
2019-09-18T20:49:23.5032920Z OpenJDK 64-Bit Server VM warning: Try increasing the code cache size using -XX:ReservedCodeCacheSize=
2019-09-18T20:49:23.5034959Z CodeCache: size=524288Kb used=521399Kb max_used=521423Kb free=2888Kb
2019-09-18T20:49:23.5035472Z  bounds [0x00007fa62c000000, 0x00007fa64c000000, 0x00007fa64c000000]
2019-09-18T20:49:23.5035781Z  total_blobs=156549 nmethods=155863 adapters=592
2019-09-18T20:49:23.5036090Z  compilation: disabled (not enough contiguous free space left)
```

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Manually check the Jenkins or GitHub Action build log (which should not have the above).

Closes #25836 from dongjoon-hyun/SPARK-CODE-CACHE-1G.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-09-19 00:24:15 -07:00
Yuming Wang 8c3f27ceb4 [SPARK-28683][BUILD] Upgrade Scala to 2.12.10
## What changes were proposed in this pull request?

This PR upgrade Scala to **2.12.10**.

Release notes:
- Fix regression in large string interpolations with non-String typed splices
- Revert "Generate shallower ASTs in pattern translation"
- Fix regression in classpath when JARs have 'a.b' entries beside 'a/b'

- Faster compiler: 5–10% faster since 2.12.8
- Improved compatibility with JDK 11, 12, and 13
- Experimental support for build pipelining and outline type checking

More details:
https://github.com/scala/scala/releases/tag/v2.12.10
https://github.com/scala/scala/releases/tag/v2.12.9

## How was this patch tested?

Existing tests

Closes #25404 from wangyum/SPARK-28683.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-09-18 13:30:36 -07:00
Owen O'Malley dfb0a8bb04 [SPARK-28208][BUILD][SQL] Upgrade to ORC 1.5.6 including closing the ORC readers
## What changes were proposed in this pull request?

It upgrades ORC from 1.5.5 to 1.5.6 and adds closes the ORC readers when they aren't used to
create RecordReaders.

## How was this patch tested?

The changed unit tests were run.

Closes #25006 from omalley/spark-28208.

Lead-authored-by: Owen O'Malley <omalley@apache.org>
Co-authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-09-18 09:32:43 -07:00
Dongjoon Hyun 8174238d55 [SPARK-29075][BUILD] Add enforcer rule to ban duplicated pom dependency
### What changes were proposed in this pull request?

This PR aims to add a new enforcer rule to ban duplicated pom dependency during build stage.

### Why are the changes needed?

This will help us by preventing the extra effort like the followings.
```
e63098b287 [SPARK-29007][MLLIB][FOLLOWUP] Remove duplicated dependency
39e044e3d8 [MINOR][BUILD] Remove duplicate test-jar:test spark-sql dependency from Hive module
d8fefab4d8 [HOTFIX][BUILD][TEST-MAVEN] Remove duplicate dependency
e9445b187e [SPARK-6866][Build] Remove duplicated dependency in launcher/pom.xml
```

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Manually.

If we have something like e63098b287, it will fail at building phase at PR like the following.
```
[WARNING] Rule 0: org.apache.maven.plugins.enforcer.BanDuplicatePomDependencyVersions failed with message:
Found 1 duplicate dependency declaration in this project:
 - dependencies.dependency[org.apache.spark:spark-streaming_${scala.binary.version}:test-jar] ( 2 times )
...
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-enforcer-plugin:3.0.0-M2:enforce (enforce-no-duplicate-dependencies) on project spark-mllib_2.12: Some Enforcer rules have failed. Look above for specific messages explaining why the rule failed. -> [Help 1]
```

Closes #25784 from dongjoon-hyun/SPARK-29075.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-09-13 14:35:02 -07:00
Nicholas Marion 6fb5ef108e [SPARK-29011][BUILD] Update netty-all from 4.1.30-Final to 4.1.39-Final
### What changes were proposed in this pull request?
Upgrade netty-all to latest in the 4.1.x line which is 4.1.39-Final.

### Why are the changes needed?
Currency of dependencies.

### Does this PR introduce any user-facing change?
No.

### How was this patch tested?
Existing unit-tests against master branch.

Closes #25712 from n-marion/master.

Authored-by: Nicholas Marion <nmarion@us.ibm.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-09-06 17:48:53 -07:00
Jungtaek Lim (HeartSaVioR) 594c9c5a3e [SPARK-25151][SS] Apply Apache Commons Pool to KafkaDataConsumer
## What changes were proposed in this pull request?

This patch does pooling for both kafka consumers as well as fetched data. The overall benefits of the patch are following:

* Both pools support eviction on idle objects, which will help closing invalid idle objects which topic or partition are no longer be assigned to any tasks.
* It also enables applying different policies on pool, which helps optimization of pooling for each pool.
* We concerned about multiple tasks pointing same topic partition as well as same group id, and existing code can't handle this hence excess seek and fetch could happen. This patch properly handles the case.
* It also makes the code always safe to leverage cache, hence no need to maintain reuseCache parameter.

Moreover, pooling kafka consumers is implemented based on Apache Commons Pool, which also gives couple of benefits:

* We can get rid of synchronization of KafkaDataConsumer object while acquiring and returning InternalKafkaConsumer.
* We can extract the feature of object pool to outside of the class, so that the behaviors of the pool can be tested easily.
* We can get various statistics for the object pool, and also be able to enable JMX for the pool.

FetchedData instances are pooled by custom implementation of pool instead of leveraging Apache Commons Pool, because they have CacheKey as first key and "desired offset" as second key which "desired offset" is changing - I haven't found any general pool implementations supporting this.

This patch brings additional dependency, Apache Commons Pool 2.6.0 into `spark-sql-kafka-0-10` module.

## How was this patch tested?

Existing unit tests as well as new tests for object pool.

Also did some experiment regarding proving concurrent access of consumers for same topic partition.

* Made change on both sides (master and patch) to log when creating Kafka consumer or fetching records from Kafka is happening.
* branches
  * master: https://github.com/HeartSaVioR/spark/tree/SPARK-25151-master-ref-debugging
  * patch: https://github.com/HeartSaVioR/spark/tree/SPARK-25151-debugging
* Test query (doing self-join)
  * https://gist.github.com/HeartSaVioR/d831974c3f25c02846f4b15b8d232cc2
* Ran query from spark-shell, with using `local[*]` to maximize the chance to have concurrent access
* Collected the count of fetch requests on Kafka via command: `grep "creating new Kafka consumer" logfile | wc -l`
* Collected the count of creating Kafka consumers via command: `grep "fetching data from Kafka consumer" logfile | wc -l`

Topic and data distribution is follow:

```
truck_speed_events_stream_spark_25151_v1:0:99440
truck_speed_events_stream_spark_25151_v1:1:99489
truck_speed_events_stream_spark_25151_v1:2:397759
truck_speed_events_stream_spark_25151_v1:3:198917
truck_speed_events_stream_spark_25151_v1:4:99484
truck_speed_events_stream_spark_25151_v1:5:497320
truck_speed_events_stream_spark_25151_v1:6:99430
truck_speed_events_stream_spark_25151_v1:7:397887
truck_speed_events_stream_spark_25151_v1:8:397813
truck_speed_events_stream_spark_25151_v1:9:0
```

The experiment only used smallest 4 partitions (0, 1, 4, 6) from these partitions to finish the query earlier.

The result of experiment is below:

branch | create Kafka consumer | fetch request
-- | -- | --
master | 1986 | 2837
patch | 8 | 1706

Closes #22138 from HeartSaVioR/SPARK-25151.

Lead-authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan@gmail.com>
Co-authored-by: Jungtaek Lim <kabhwan@gmail.com>
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
2019-09-04 10:17:38 -07:00
Xiao Li 2856398de9 [SPARK-28961][HOT-FIX][BUILD] Upgrade Maven from 3.6.1 to 3.6.2
### What changes were proposed in this pull request?
This PR is to upgrade the maven dependence from 3.6.1 to 3.6.2.

### Why are the changes needed?
All the builds are broken because 3.6.1 is not available.  http://ftp.wayne.edu/apache//maven/maven-3/

- https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-compile-maven-hadoop-3.2/485/
- https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-compile-maven-hadoop-2.7/10536/

![image](https://user-images.githubusercontent.com/11567269/64196667-36d69100-ce39-11e9-8f93-40eb333d595d.png)

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
N/A

Closes #25665 from gatorsmile/upgradeMVN.

Authored-by: Xiao Li <gatorsmile@gmail.com>
Signed-off-by: Xiao Li <gatorsmile@gmail.com>
2019-09-03 11:06:57 -07:00
HyukjinKwon 92cabf6306 [SPARK-28759][BUILD] Upgrade scala-maven-plugin to 4.2.0 and fix build profile on AppVeyor
### What changes were proposed in this pull request?

This PR proposes to upgrade scala-maven-plugin from 3.4.4 to 4.2.0.

Upgrade to 4.1.1 was reverted due to unexpected build failure on AppVeyor.

The root cause seems to be an issue specific to AppVeyor - loading the system library 'kernel32.dll' seems being failed.

```
Suppressed: java.lang.NoClassDefFoundError: Could not initialize class com.sun.jna.platform.win32.Kernel32
        at sbt.internal.io.WinMilli$.getHandle(Milli.scala:264)
        at sbt.internal.io.WinMilli$.getModifiedTimeNative(Milli.scala:289)
        at sbt.internal.io.WinMilli$.getModifiedTimeNative(Milli.scala:260)
        at sbt.internal.io.MilliNative.getModifiedTime(Milli.scala:61)
        at sbt.internal.io.Milli$.getModifiedTime(Milli.scala:360)
        at sbt.io.IO$.$anonfun$getModifiedTimeOrZero$1(IO.scala:1373)
        at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23)
        at sbt.internal.io.Retry$.liftedTree2$1(Retry.scala:38)
        at sbt.internal.io.Retry$.impl$1(Retry.scala:38)
        at sbt.internal.io.Retry$.apply(Retry.scala:52)
        at sbt.internal.io.Retry$.apply(Retry.scala:24)
        at sbt.io.IO$.getModifiedTimeOrZero(IO.scala:1373)
        at sbt.internal.inc.caching.ClasspathCache$.fromCacheOrHash$1(ClasspathCache.scala:44)
        at sbt.internal.inc.caching.ClasspathCache$.$anonfun$hashClasspath$1(ClasspathCache.scala:53)
        at scala.collection.parallel.mutable.ParArray$Map.leaf(ParArray.scala:659)
        at scala.collection.parallel.Task.$anonfun$tryLeaf$1(Tasks.scala:53)
        at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
        at scala.util.control.Breaks$$anon$1.catchBreak(Breaks.scala:67)
        at scala.collection.parallel.Task.tryLeaf(Tasks.scala:56)
        at scala.collection.parallel.Task.tryLeaf$(Tasks.scala:50)
        at scala.collection.parallel.mutable.ParArray$Map.tryLeaf(ParArray.scala:650)
        at scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask.internal(Tasks.scala:170)
        ... 25 more
```

By setting `-Djna.nosys=true`, it directly loads the library from the jar instead of system's.

In this way, the build seems working fine.

### Why are the changes needed?

It upgrades the plugin to fix bugs and fixes the CI build.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

It was tested at https://github.com/apache/spark/pull/25497

Closes #25633 from HyukjinKwon/SPARK-28759.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-08-30 09:39:15 -07:00
Gabor Somogyi 7d72c073dd [SPARK-28760][SS][TESTS] Add Kafka delegation token end-to-end test with mini KDC
### What changes were proposed in this pull request?
At the moment no end-to-end Kafka delegation token test exists which was mainly because of missing embedded KDC. KDC is missing in general from the testing side so I've discovered what kind of possibilities are there. The most obvious choice is the MiniKDC inside the Hadoop library where Apache Kerby runs in the background. What this PR contains:
* Added MiniKDC as test dependency from Hadoop
* Added `maven-bundle-plugin` because couple of dependencies are coming in bundle format
* Added security mode to `KafkaTestUtils`. Namely start KDC -> start Zookeeper in secure mode -> start Kafka in secure mode
* Added a roundtrip test (saves and reads back data from Kafka)

### Why are the changes needed?
No such test exists + security testing with KDC is completely missing.

### Does this PR introduce any user-facing change?
No.

### How was this patch tested?
Existing + additional unit tests.
I've put the additional test into a loop and was consuming ~10 sec average.

Closes #25477 from gaborgsomogyi/SPARK-28760.

Authored-by: Gabor Somogyi <gabor.g.somogyi@gmail.com>
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
2019-08-29 11:52:35 -07:00
Dongjoon Hyun 7701d29af5 [SPARK-28877][PYSPARK][test-hadoop3.2][test-java11] Make jaxb-runtime compile-time dependency
### What changes were proposed in this pull request?

`mllib` has `jaxb-runtime-2.3.2` as a runtime dependency. This PR makes it as a compile-time dependency. This doesn't change our dependency manifest and `LICENSE`s. Instead, this will add the following three jars into our pre-built artifacts.

- jaxb-runtime-2.3.2.jar
- jakarta.xml.bind-api-2.3.2.jar
- istack-commons-runtime-3.0.8.jar

### Why are the changes needed?

We need to use the followings.
- JDK8: `com.sun.xml.internal.bind.v2.ContextFactory`
- JDK11: `com.sun.xml.bind.v2.ContextFactory`

`com.sun.xml.bind.v2.ContextFactory` is inside `jaxb-runtime-2.3.2`.
```
$ javap -cp jaxb-runtime-2.3.2.jar com.sun.xml.bind.v2.ContextFactory
Compiled from "ContextFactory.java"
public class com.sun.xml.bind.v2.ContextFactory {
  public static final java.lang.String USE_JAXB_PROPERTIES;
  public com.sun.xml.bind.v2.ContextFactory();
  public static javax.xml.bind.JAXBContext createContext(java.lang.Class[], java.util.Map<java.lang.String, java.lang.Object>) throws javax.xml.bind.JAXBException;
  public static com.sun.xml.bind.api.JAXBRIContext createContext(java.lang.Class[], java.util.Collection<com.sun.xml.bind.api.TypeReference>, java.util.Map<java.lang.Class, java.lang.Class>, java.lang.String, boolean, com.sun.xml.bind.v2.model.annotation.RuntimeAnnotationReader, boolean, boolean, boolean) throws javax.xml.bind.JAXBException;
  public static com.sun.xml.bind.api.JAXBRIContext createContext(java.lang.Class[], java.util.Collection<com.sun.xml.bind.api.TypeReference>, java.util.Map<java.lang.Class, java.lang.Class>, java.lang.String, boolean, com.sun.xml.bind.v2.model.annotation.RuntimeAnnotationReader, boolean, boolean, boolean, boolean) throws javax.xml.bind.JAXBException;
  public static javax.xml.bind.JAXBContext createContext(java.lang.String, java.lang.ClassLoader, java.util.Map<java.lang.String, java.lang.Object>) throws javax.xml.bind.JAXBException;
}
```

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Pass the Jenkins with `test-java11`.

For manual testing, do the following with JDK11.
```scala
$ java -version
openjdk version "11.0.3" 2019-04-16
OpenJDK Runtime Environment AdoptOpenJDK (build 11.0.3+7)
OpenJDK 64-Bit Server VM AdoptOpenJDK (build 11.0.3+7, mixed mode)

$ build/sbt -Pyarn -Phadoop-3.2 -Phadoop-cloud -Phive -Phive-thriftserver -Psparkr test:package

$ python/run-tests.py --python-executables python --modules pyspark-ml
...
Finished test(python): pyspark.ml.recommendation (65s)
Tests passed in 174 seconds
```

Closes #25587 from dongjoon-hyun/SPARK-28877.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-08-26 23:23:17 -07:00
Yuming Wang 02a0cdea13 [SPARK-28723][SQL] Upgrade to Hive 2.3.6 for HiveMetastore Client and Hadoop-3.2 profile
### What changes were proposed in this pull request?

This PR upgrade the built-in Hive to 2.3.6 for `hadoop-3.2`.

Hive 2.3.6 release notes:
- [HIVE-22096](https://issues.apache.org/jira/browse/HIVE-22096): Backport [HIVE-21584](https://issues.apache.org/jira/browse/HIVE-21584) (Java 11 preparation: system class loader is not URLClassLoader)
- [HIVE-21859](https://issues.apache.org/jira/browse/HIVE-21859): Backport [HIVE-17466](https://issues.apache.org/jira/browse/HIVE-17466) (Metastore API to list unique partition-key-value combinations)
- [HIVE-21786](https://issues.apache.org/jira/browse/HIVE-21786): Update repo URLs in poms branch 2.3 version

### Why are the changes needed?
Make Spark support JDK 11.

### Does this PR introduce any user-facing change?
Yes. Please see [SPARK-28684](https://issues.apache.org/jira/browse/SPARK-28684) and [SPARK-24417](https://issues.apache.org/jira/browse/SPARK-24417) for more details.

### How was this patch tested?
Existing unit test and manual test.

Closes #25443 from wangyum/test-on-jenkins.

Lead-authored-by: Yuming Wang <yumwang@ebay.com>
Co-authored-by: HyukjinKwon <gurwls223@apache.org>
Co-authored-by: Hyukjin Kwon <gurwls223@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-08-23 21:34:30 -07:00
Sean Owen 9ea37b09cf [SPARK-17875][CORE][BUILD] Remove dependency on Netty 3
### What changes were proposed in this pull request?

Spark uses Netty 4 directly, but also includes Netty 3 only because transitive dependencies do. The dependencies (Hadoop HDFS, Zookeeper, Avro) don't seem to need this dependency as used in Spark. I think we can forcibly remove it to slim down the dependencies.

Previous attempts were blocked by its usage in Flume, but that dependency has gone away.
https://github.com/apache/spark/pull/15436

### Why are the changes needed?

Mostly to reduce the transitive dependency size and complexity a little bit and avoid triggering spurious security alerts on Netty 3.x usage.

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

Existing tests

Closes #25544 from srowen/SPARK-17875.

Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-08-21 21:27:56 -07:00
HyukjinKwon 1de4a22c52 Revert "[SPARK-28759][BUILD] Upgrade scala-maven-plugin to 4.1.1"
This reverts commit 1819a6f22e.
2019-08-19 20:32:07 +09:00
Dongjoon Hyun f7c9de9035 [SPARK-28765][BUILD] Add explict exclusions to avoid JDK11 dependency issue
### What changes were proposed in this pull request?

This PR adds explicit exclusions to avoid Maven `JDK11` dependency issues.

### Why are the changes needed?

Maven/Ivy seems to be confused during dependency generation on `JDK11` environment.
This is not only wrong, but also causes a Jenkins failure during dependency manifest check on `JDK11` environment.

**JDK8**
```
$ cd core
$ mvn -X dependency:tree -Dincludes=jakarta.activation:jakarta.activation-api
...
[DEBUG]       org.glassfish.jersey.core:jersey-server:jar:2.29:compile (version managed from 2.22.2)
[DEBUG]          org.glassfish.jersey.media:jersey-media-jaxb:jar:2.29:compile
[DEBUG]          javax.validation:validation-api:jar:2.0.1.Final:compile
```

**JDK11**
```
[DEBUG]       org.glassfish.jersey.core:jersey-server:jar:2.29:compile (version managed from 2.22.2)
[DEBUG]          org.glassfish.jersey.media:jersey-media-jaxb:jar:2.29:compile
[DEBUG]          javax.validation:validation-api:jar:2.0.1.Final:compile
[DEBUG]          jakarta.xml.bind:jakarta.xml.bind-api🫙2.3.2:compile
[DEBUG]             jakarta.activation:jakarta.activation-api🫙1.2.1:compile
```

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Do the following in both `JDK8` and `JDK11` environment. The dependency manifest should not be changed. In the current `master` branch, `JDK11` changes the dependency manifest.
```
$ dev/test-dependencies.sh --replace-manifest
```

Closes #25481 from dongjoon-hyun/SPARK-28765.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-08-17 10:16:22 -07:00
Dongjoon Hyun 1819a6f22e [SPARK-28759][BUILD] Upgrade scala-maven-plugin to 4.1.1
### What changes were proposed in this pull request?

This PR aims to upgrade `scala-maven-plugin` to 4.1.1 to bring the improvement (including Scala 2.13.0 support, Zinc update) and bug fixes in the plugin.

### Why are the changes needed?

`4.1.1` uses the latest dependent plugins.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Pass the Jenkins.

Closes #25476 from dongjoon-hyun/SPARK-28759.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-08-16 21:23:11 -07:00
Sean Owen c9b49f3978 [SPARK-28737][CORE] Update Jersey to 2.29
## What changes were proposed in this pull request?

Update Jersey to 2.27+, ideally 2.29, for possible JDK 11 fixes.

## How was this patch tested?

Existing tests.

Closes #25455 from srowen/SPARK-28737.

Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-08-16 15:08:04 -07:00
Dongjoon Hyun 43101c7328 [SPARK-28758][BUILD][SQL] Upgrade Janino to 3.0.15
### What changes were proposed in this pull request?

This PR aims to upgrade `Janino` from `3.0.13` to `3.0.15` in order to bring the bug fixes. Please note that `3.1.0` is a major refactoring instead of bug fixes. We had better use `3.0.15` and wait for the stabler 3.1.x.

### Why are the changes needed?

This brings the following bug fixes.

**3.0.15 (2019-07-28)**

- Fix overloaded single static method import

**3.0.14 (2019-07-05)**

- Conflict in sbt-assembly
- Overloaded static on-demand imported methods cause a CompileException: Ambiguous static method import
- Handle overloaded static on-demand imports
- Major refactoring of the Java 8 and Java 9 retrofit mechanism
- Added tests for "JLS8 8.6 Instance Initializers" and "JLS8 8.7 Static Initializers"
- Local variables in instance initializers don't work
- Provide an option to keep generated code files
- Added compile error handler and warning handler to ICompiler

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Pass the Jenkins with the existing tests.

Closes #25474 from dongjoon-hyun/SPARK-28758.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-08-16 11:33:02 -07:00
Fokko Driesprong babdba0f9e [SPARK-28728][BUILD] Bump Jackson Databind to 2.9.9.3
## What changes were proposed in this pull request?

Update Jackson databind to the latest version for some latest changes.

## How was this patch tested?

Pass the Jenkins.

Closes #25451 from Fokko/fd-bump-jackson-databind.

Lead-authored-by: Fokko Driesprong <fokko@apache.org>
Co-authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-08-16 03:40:41 -07:00
Fokko Driesprong d8dd5719b4 [SPARK-28713][BUILD] Bump checkstyle from 8.14 to 8.23
## What changes were proposed in this pull request?

Fixes a vulnerability from the GitHub Security Advisory Database:

_Moderate severity vulnerability that affects com.puppycrawl.tools:checkstyle_
Checkstyle prior to 8.18 loads external DTDs by default, which can potentially lead to denial of service attacks or the leaking of confidential information.

https://github.com/checkstyle/checkstyle/issues/6474

Affected versions: < 8.18

## How was this patch tested?

Ran checkstyle locally.

Closes #25432 from Fokko/SPARK-28713.

Authored-by: Fokko Driesprong <fokko@apache.org>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-08-13 11:09:14 -07:00
Dongjoon Hyun 33e6e4703d [SPARK-28544][BUILD] Update zstd-jni to 1.4.2-1
## What changes were proposed in this pull request?

This PR aims to update `zstd-jni` library to bring the latest improvement and bug fixes in `1.4.1` and `1.4.2`.
- https://github.com/facebook/zstd/releases/tag/v1.4.1 (4.5 ~ 11.8% performance improvement from v1.4.0 and bug fixes)
- https://github.com/facebook/zstd/releases/tag/v1.4.2 (bug fixes)

## How was this patch tested?

Pass the Jenkins.

Closes #25275 from dongjoon-hyun/SPARK-28544.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-07-27 18:08:20 -07:00
Dongjoon Hyun fab75db99b [SPARK-28370][BUILD][TEST] Upgrade Mockito to 2.28.2
## What changes were proposed in this pull request?

This PR aims to upgrade Mockito from **2.23.4** to **2.28.2** in order to bring the latest bug fixes and to be up-to-date for JDK9+ support before Apache Spark 3.0.0. There is Mockito 3.0 released 4 days ago, but we had better wait and see for the stability.

**RELEASE NOTE**
https://github.com/mockito/mockito/blob/release/2.x/doc/release-notes/official.md

**NOTABLE FIXES**
- Configure the MethodVisitor for Java 11+ compatibility (2.27.5)
- When mock is called multiple times, and verify fails, the error message reports only the first invocation (2.27.4)
- Memory leak in mockito-inline calling method on mock with at least a mock as parameter (2.25.0)
- Cross-references and a single spy cause memory leak (2.25.0)
- Nested spies cause memory leaks (2.25.0)
- [Java 9 support] ClassCastExceptions with JDK9 javac (2.24.9, 2.24.3)
- Return null instead of causing a CCE (2.24.9, 2.24.3)
- Issue with mocking type in "java.util.*", Java 12 (2.24.2)

Mainly, Maven (Hadoop-2.7/Hadoop-3.2) and SBT(Hadoop-2.7) Jenkins test passed.

## How was this patch tested?

Pass the Jenkins with the exiting UTs.

Closes #25139 from dongjoon-hyun/SPARK-28370.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-07-13 13:00:07 -07:00
Yuming Wang 4ad0c33be4 [SPARK-28221][BUILD] Upgrade janino to 3.0.13
## What changes were proposed in this pull request?

Mainly change logs:
### Version 3.0.13:
- Support for JDK 9/10 in Full Compiler
- The syntax elements that can have modifiers now all have sets of "is...()" methods that check for each modifier. Some also have methods "getAccess()" and/or "getAnnotations()".
- Implement "type annotations" (JLS8 9.7.4)
- Implemented parsing (but not compilation) of "modular compilation units" (JLS11 7.3).
- Replaced all "assert...Uncookable(..., Pattern messageRegex)" and "assert...Uncookable(..., String messageInfix)" method pairs with a single "assert...Uncookable(..., String messageRegex)" method.
Minor refactoring: Allowed modifiers are now checked in the Parser, not in Java.*. This saves a lot of THROWS clauses.
- Parse Type inference syntax: Type inference for generic instance creation implemented, test cases added.
- Parse MethodReference, ClassInstanceCreationReference and ArrayCreationReference

### Version 3.0.12
- Fixed: Operator "&" not defined on types "java.lang.Long" and "int"
- Major bug in JavaSourceClassLoader: When loading the second and following classes, CUs were compiled again, leading to an inconsistent class hierarchy.
- Fixed: Java 9 added "Override public final CharBuffer CharBuffer.rewind() { ..." -- leads easily to a java.lang.NoSuchMethodError
- Changed all occurences of the words "Java bytecode" to "JVM bytecode" to make clearer that the generated bytecode is for the JVMS and not suitable for, e.g. DALVIK.

http://janino-compiler.github.io/janino/changelog.html

## How was this patch tested?

Existing test

Closes #25021 from wangyum/SPARK-28221.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-07-06 10:02:42 -07:00
Yuming Wang 88cd6dc83d [SPARK-28248][SQL][TEST] Upgrade docker image and library for PostgreSQL integration test
## What changes were proposed in this pull request?

This pr upgrades Postgres docker image for integration tests.

## How was this patch tested?

manual tests:
```
./build/mvn install -DskipTests
./build/mvn test -Pdocker-integration-tests -pl :spark-docker-integration-tests_2.12
```

Closes #25050 from wangyum/SPARK-28248.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-07-05 00:43:28 -07:00
Yuming Wang 90bd017c10 [SPARK-28233][BUILD] Upgrade maven-jar-plugin and maven-source-plugin
## What changes were proposed in this pull request?

This pr upgrade `maven-jar-plugin` to 3.1.2 and `maven-source-plugin` to 3.1.0 to avoid:
- [MJAR-259](https://issues.apache.org/jira/browse/MJAR-259) – Archiving to jar is very slow
- [MSOURCES-119](https://issues.apache.org/jira/browse/MSOURCES-119) – Archiving to jar is very slow

Release notes:
https://blogs.apache.org/maven/entry/apache-maven-source-plugin-version
https://blogs.apache.org/maven/entry/apache-maven-jar-plugin-version2
https://blogs.apache.org/maven/entry/apache-maven-jar-plugin-version1

## How was this patch tested?

N/A

Closes #25031 from wangyum/SPARK-28233.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2019-07-03 21:46:11 +09:00
Dongjoon Hyun a7e16199f3 [SPARK-28174][BUILD][SS] Upgrade to Kafka 2.3.0
## What changes were proposed in this pull request?

This issue updates Kafka dependency to 2.3.0 to bring the following 9 client-side patches at least. Among them, the blocker issue [KAFKA-7703](https://issues.apache.org/jira/browse/KAFKA-7703) was reported by Apache Spark community. This dependency update will help us remove the workaround later.
- https://issues.apache.org/jira/issues/?jql=project%20%3D%20KAFKA%20AND%20fixVersion%20%3D%202.3.0%20AND%20fixVersion%20NOT%20IN%20(2.2.0%2C%202.2.1)%20AND%20component%20%3D%20clients

The following is the full release note.
- https://www.apache.org/dist/kafka/2.3.0/RELEASE_NOTES.html

## How was this patch tested?

Pass the Jenkins.

Closes #24976 from dongjoon-hyun/SPARK-28174.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-06-27 07:49:24 -07:00
Dongjoon Hyun ea0e119f84 [SPARK-28111][BUILD] Upgrade xbean-asm7-shaded to 4.14
## What changes were proposed in this pull request?

This PR aims to update `xbean-asm7-shaded` to bring [XBEAN-318](https://issues.apache.org/jira/browse/XBEAN-318) which is helpful to log the class definition reading failures.
- https://issues.apache.org/jira/projects/XBEAN/versions/12345220

## How was this patch tested?

Pass the Jenkins.

Closes #24914 from dongjoon-hyun/SPARK-28111.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-06-20 07:59:59 -07:00
Sean Owen 15462e1a8f [SPARK-28004][UI] Update jquery to 3.4.1
## What changes were proposed in this pull request?

We're using an old-ish jQuery, 1.12.4, and should probably update for Spark 3 to keep up in general, but also to keep up with CVEs. In fact, we know of at least one resolved in only 3.4.0+ (https://nvd.nist.gov/vuln/detail/CVE-2019-11358). They may not affect Spark, but, if the update isn't painful, maybe worthwhile in order to make future 3.x updates easier.

jQuery 1 -> 2 doesn't sound like a breaking change, as 2.0 is supposed to maintain compatibility with 1.9+ (https://blog.jquery.com/2013/04/18/jquery-2-0-released/)

2 -> 3 has breaking changes: https://jquery.com/upgrade-guide/3.0/. It's hard to evaluate each one, but the most likely area for problems is in ajax(). However, our usage of jQuery (and plugins) is pretty simple.

Update jquery to 3.4.1; update jquery blockUI and mustache to latest

## How was this patch tested?

Manual testing of docs build (except R docs), worker/master UI, spark application UI.
Note: this really doesn't guarantee it works, as our tests can't test javascript, and this is merely anecdotal testing, although I clicked about every link I could find. There's a risk this breaks a minor part of the UI; it does seem to work fine in the main.

Closes #24843 from srowen/SPARK-28004.

Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-06-14 22:19:20 -07:00
Dongjoon Hyun 37ab43339d [SPARK-28013][BUILD][SS] Upgrade to Kafka 2.2.1
## What changes were proposed in this pull request?

For Apache Spark 3.0.0 release, this PR aims to update Kafka dependency to 2.2.1 to bring the following improvement and bug fixes like [KAFKA-8134](https://issues.apache.org/jira/browse/KAFKA-8134) (`'linger.ms' must be a long`).

https://issues.apache.org/jira/projects/KAFKA/versions/12345010

## How was this patch tested?

Pass the Jenkins.

Closes #24847 from dongjoon-hyun/SPARK-28013.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-06-12 07:34:42 -07:00
Martin Junghanns 709387d660 [SPARK-27300][GRAPH] Add Spark Graph modules and dependencies
## What changes were proposed in this pull request?

This PR introduces the necessary Maven modules for the new [Spark Graph](https://issues.apache.org/jira/browse/SPARK-25994) feature for Spark 3.0.

* `spark-graph` is a parent module that users depend on to get all graph functionalities (Cypher and Graph Algorithms)
* `spark-graph-api` defines the [Property Graph API](https://docs.google.com/document/d/1Wxzghj0PvpOVu7XD1iA8uonRYhexwn18utdcTxtkxlI) that is being shared between Cypher and Algorithms
* `spark-cypher` contains a Cypher query engine implementation

Both, `spark-graph-api` and `spark-cypher` depend on Spark SQL.

Note, that the Maven module for Graph Algorithms is not part of this PR and will be introduced in https://issues.apache.org/jira/browse/SPARK-27302

A PoC for a running Cypher implementation can be found in this WIP PR https://github.com/apache/spark/pull/24297

## How was this patch tested?

Pass the Jenkins with all profiles and manually build and check the followings.
```
$ ls assembly/target/scala-2.12/jars/spark-cypher*
assembly/target/scala-2.12/jars/spark-cypher_2.12-3.0.0-SNAPSHOT.jar

$ ls assembly/target/scala-2.12/jars/spark-graph* | grep -v graphx
assembly/target/scala-2.12/jars/spark-graph-api_2.12-3.0.0-SNAPSHOT.jar
assembly/target/scala-2.12/jars/spark-graph_2.12-3.0.0-SNAPSHOT.jar
```

Closes #24490 from s1ck/SPARK-27300.

Lead-authored-by: Martin Junghanns <martin.junghanns@neotechnology.com>
Co-authored-by: Max Kießling <max@kopfueber.org>
Co-authored-by: Martin Junghanns <martin.junghanns@neo4j.com>
Co-authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-06-09 00:26:26 -07:00
Yuming Wang d53b61c311 [SPARK-27831][SQL][TEST] Move Hive test jars to maven dependency
## What changes were proposed in this pull request?

This pr moves Hive test jars(`hive-contrib-0.13.1.jar`, `hive-hcatalog-core-0.13.1.jar`, `hive-contrib-2.3.5.jar` and `hive-hcatalog-core-2.3.5.jar`) to maven dependency.

## How was this patch tested?

Existing test

Please note that this pr need test with `maven` and `sbt`.

Closes #24751 from wangyum/SPARK-27831.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-06-02 20:23:08 -07:00
Izek Greenfield c647f9011c [SPARK-27862][BUILD] Move to json4s 3.6.6
## What changes were proposed in this pull request?
Move to json4s version 3.6.6
Add scala-xml 1.2.0

## How was this patch tested?

Pass the Jenkins

Closes #24736 from igreenfield/master.

Authored-by: Izek Greenfield <igreenfield@axiomsl.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-05-30 19:42:56 -05:00
Dongjoon Hyun 955eef95b3 Revert "[SPARK-27831][SQL][TEST][test-hadoop3.2] Move Hive test jars to maven dependency"
This reverts commit 24180c00e0.
2019-05-30 10:06:55 -07:00
Fokko Driesprong bd87323003 [SPARK-27757][CORE] Bump Jackson to 2.9.9
## What changes were proposed in this pull request?

This fixes CVE-2019-12086 on Databind: https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.9.9

## How was this patch tested?

Existing tests

Closes #24646 from Fokko/SPARK-27757.

Authored-by: Fokko Driesprong <fokko@apache.org>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-05-30 09:35:20 -05:00
Yuming Wang 24180c00e0 [SPARK-27831][SQL][TEST][test-hadoop3.2] Move Hive test jars to maven dependency
## What changes were proposed in this pull request?

This pr moves Hive test jars(`hive-contrib-0.13.1.jar`, `hive-hcatalog-core-0.13.1.jar`, `hive-contrib-2.3.5.jar` and `hive-hcatalog-core-2.3.5.jar`) to maven dependency.

## How was this patch tested?

Existing test

Closes #24695 from wangyum/SPARK-27831.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-05-24 10:33:34 -07:00
Sean Owen 32f310b585 [SPARK-26045][BUILD] Leave avro, avro-ipc dependendencies as compile scope even for hadoop-provided usages
## What changes were proposed in this pull request?

Leave avro, avro-ipc dependendencies as compile scope even for hadoop-provided usages, to ensure 1.8 is used. Hadoop 2.7 has Avro 1.7, and Spark won't generally work with that. Reports from the field are that this works, to include avro 1.8 with the Spark distro on Hadoop 2.7.

## How was this patch tested?

Existing tests

Closes #24680 from srowen/SPARK-26045.

Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-05-23 13:21:21 -07:00
Yuming Wang 6cd1efd0ae [SPARK-27737][SQL] Upgrade to Hive 2.3.5 for Hive Metastore Client and Hadoop-3.2 profile
## What changes were proposed in this pull request?

This PR aims to upgrade to Hive 2.3.5 for Hive Metastore Client and Hadoop-3.2 profile.

Release Notes - Hive - Version 2.3.5

- [[HIVE-21536](https://issues.apache.org/jira/browse/HIVE-21536)] - Backport HIVE-17764 to branch-2.3
- [[HIVE-21585](https://issues.apache.org/jira/browse/HIVE-21585)] - Upgrade branch-2.3 to ORC 1.3.4
- [[HIVE-21639](https://issues.apache.org/jira/browse/HIVE-21639)] - Spark test failed since HIVE-10632
- [[HIVE-21680](https://issues.apache.org/jira/browse/HIVE-21680)] - Backport HIVE-17644 to branch-2 and branch-2.3

https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12345394&styleName=Text&projectId=12310843

## How was this patch tested?

This PR is tested in two ways.
- Pass the Jenkins with the default configuration for `Hive Metastore Client` testing.
- Pass the Jenkins with `test-hadoop3.2` configuration for `Hadoop 3.2` testing.

Closes #24620 from wangyum/SPARK-27737.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2019-05-22 10:24:17 +09:00
Dongjoon Hyun 141a3bfc8d [SPARK-27755][BUILD] Update zstd-jni to 1.4.0-1
## What changes were proposed in this pull request?

This PR aims to update `zstd-jni` library to `1.4.0-1` which improves the `level 1 compression speed` performance by 6% in most scenarios. The following is the full release note.
- https://github.com/facebook/zstd/releases/tag/v1.4.0

## How was this patch tested?

Pass the Jenkins.

Closes #24632 from dongjoon-hyun/SPARK-27755.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-05-17 08:34:45 -07:00
Kazuaki Ishizaki 9e0d8c6ce2 [SPARK-27752][CORE] Upgrade lz4-java from 1.5.1 to 1.6.0
## What changes were proposed in this pull request?

This PR upgrades lz4-java from 1.5.1 to 1.6.0. Lz4-java is available at https://github.com/lz4/lz4-java.

Changes from 1.5.1:
- Upgraded LZ4 to 1.9.1. Updated the JNI bindings, except for the one for Linux/i386. Decompression speed is improved on amd64.
- Deprecated use of LZ4FastDecompressor of a native instance because the corresponding C API function is deprecated. See the release note of LZ4 1.9.0 for details. Updated javadoc accordingly.
- Changed the module name from org.lz4.lz4-java to org.lz4.java to avoid using - in the module name. (severn-everett, Oliver Eikemeier, Rei Odaira)
- Enabled build with Java 11. Note that the distribution is still built with Java 7. (Rei Odaira)

## How was this patch tested?

Existing tests.

Closes #24629 from kiszk/SPARK-27752.

Authored-by: Kazuaki Ishizaki <ishizaki@jp.ibm.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-05-16 20:45:13 -07:00
Adi Muraru 8ef4da753d [SPARK-27610][YARN] Shade netty native libraries
## What changes were proposed in this pull request?

Fixed the `spark-<version>-yarn-shuffle.jar` artifact packaging to shade the native netty libraries:
- shade the `META-INF/native/libnetty_*` native libraries when packagin
the yarn shuffle service jar. This is required as netty library loader
derives that based on shaded package name.
- updated the `org/spark_project` shade package prefix to `org/sparkproject`
(i.e. removed underscore) as the former breaks the netty native lib loading.

This was causing the yarn external shuffle service to fail
when spark.shuffle.io.mode=EPOLL

## How was this patch tested?
Manual tests

Closes #24502 from amuraru/SPARK-27610_master.

Authored-by: Adi Muraru <amuraru@adobe.com>
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
2019-05-07 10:47:36 -07:00
Dongjoon Hyun 375cfa3d89 [SPARK-27467][BUILD] Upgrade Maven to 3.6.1
## What changes were proposed in this pull request?

This PR aims to upgrade Maven to 3.6.1 to bring JDK9+ related patches like [MNG-6506](https://issues.apache.org/jira/browse/MNG-6506). For the full release note, please see the following.
- https://maven.apache.org/docs/3.6.1/release-notes.html

This was committed and reverted due to AppVeyor failure. It turns out that the root cause is `PATH` issue. With the updated AppVeyor script, it passed.

https://ci.appveyor.com/project/ApacheSoftwareFoundation/spark/builds/24273412

## How was this patch tested?

Pass the Jenkins and AppVoyer

Closes #24481 from dongjoon-hyun/SPARK-R.

Lead-authored-by: Dongjoon Hyun <dhyun@apple.com>
Co-authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-05-02 20:01:17 -07:00
Yuming Wang 875e7e1d97 [SPARK-27620][BUILD] Upgrade jetty to 9.4.18.v20190429
## What changes were proposed in this pull request?

This pr upgrade jetty to [9.4.18.v20190429](https://github.com/eclipse/jetty.project/releases/tag/jetty-9.4.18.v20190429) because of [CVE-2019-10247](https://nvd.nist.gov/vuln/detail/CVE-2019-10247).

## How was this patch tested?

Existing test.

Closes #24513 from wangyum/SPARK-27620.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2019-05-03 09:25:54 +09:00
Yuming Wang 3ecafb0e14 [SPARK-27601][BUILD] Upgrade stream-lib to 2.9.6
## What changes were proposed in this pull request?

[stream-lib 2.9.6](https://github.com/addthis/stream-lib/commits/v2.9.6) include several improvements:
![image](https://user-images.githubusercontent.com/5399861/56938062-7eb77580-6b32-11e9-8c36-711ab943d657.png)

## How was this patch tested?

N/A

Closes #24492 from wangyum/SPARK-27601.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-05-02 15:21:57 -05:00
Cheng Lian b73744a147 [SPARK-27611][BUILD] Exclude jakarta.activation:jakarta.activation-api from org.glassfish.jaxb:jaxb-runtime:2.3.2
PR #23890 introduced `org.glassfish.jaxb:jaxb-runtime:2.3.2` as a runtime dependency. As an unexpected side effect, `jakarta.activation:jakarta.activation-api:1.2.1` was also pulled in as a transitive dependency. As a result, for the Maven build, both of the following two jars can be found under `assembly/target/scala-2.12/jars/`:

```
activation-1.1.1.jar
jakarta.activation-api-1.2.1.jar
```

This PR exludes the Jakarta one.

Manually built Spark using Maven and checked files under `assembly/target/scala-2.12/jars/`. After this change, only `activation-1.1.1.jar` is there.

Closes #24507 from liancheng/spark-27611.

Authored-by: Cheng Lian <lian@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-05-01 20:12:17 -07:00
Sean Owen fcc42d4682 [SPARK-27493][BUILD][FOLLOWUP] Upgrade ASM to 7.1 in Maven plugins
## What changes were proposed in this pull request?

One more place to update ASM 7.0 -> 7.1

## How was this patch tested?

Existing tests

Closes #24508 from srowen/SPARK-27493.3.

Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-05-01 15:03:30 -07:00
Dongjoon Hyun 6eca435ac9 [SPARK-27608][BUILD][test-maven] Upgrade Surefire plugin to 3.0.0-M3
## What changes were proposed in this pull request?

This PR aims to upgrade Surefire plugin to 3.0.0-M3 to bring [SUREFIRE-1613](https://issues.apache.org/jira/browse/SUREFIRE-1613).

## How was this patch tested?

Pass the Jenkins with the existing tests.

Closes #24501 from dongjoon-hyun/SPARK-27608.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-04-30 19:15:56 -07:00
HyukjinKwon 447d018a42 Revert "[SPARK-27467][BUILD][TEST-MAVEN] Upgrade Maven to 3.6.1"
This reverts commit 7c4a6439d6.
2019-04-28 11:03:04 +09:00
Yuming Wang fe99305101 [SPARK-27556][BUILD] Exclude com.zaxxer:HikariCP-java7 from hadoop-yarn-server-web-proxy
## What changes were proposed in this pull request?

There are two HikariCP packages in classpath when building with `-Phive -Pyarn -Phadoop-3.2`.

The HikariCP dependency tree:
```
[INFO] | +- org.apache.hadoop:hadoop-yarn-server-web-proxy:jar:3.2.0:compile
[INFO] | | \- org.apache.hadoop:hadoop-yarn-server-common:jar:3.2.0:compile
[INFO] | | +- org.apache.hadoop:hadoop-yarn-registry:jar:3.2.0:compile
[INFO] | | | \- commons-daemon:commons-daemon:jar:1.0.13:compile
[INFO] | | +- org.apache.geronimo.specs:geronimo-jcache_1.0_spec🫙1.0-alpha-1:compile
[INFO] | | +- org.ehcache:ehcache:jar:3.3.1:compile
[INFO] | | +- com.zaxxer:HikariCP-java7:jar:2.4.12:compile
```

```
[INFO] +- org.apache.hive:hive-metastore:jar:2.3.4:compile
[INFO] | +- javolution:javolution:jar:5.5.1:compile
[INFO] | +- com.google.protobuf:protobuf-java:jar:2.5.0:compile
[INFO] | +- com.jolbox:bonecp:jar:0.8.0.RELEASE:compile
[INFO] | +- com.zaxxer:HikariCP:jar:2.5.1:compile
```

This pr exclude `com.zaxxer:HikariCP-java7` from `hadoop-yarn-server-web-proxy`.

## How was this patch tested?

manual tests

Closes #24450 from wangyum/SPARK-27556.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-04-26 12:15:39 -05:00
Yuming Wang 7cc15af156 [SPARK-27481][BUILD] Upgrade commons-logging to 1.1.3 for hadoop-3.2
## What changes were proposed in this pull request?

hadoop-2.7 gets `commons-logging` version from `hive-metastore`:
```
[INFO] +- org.spark-project.hive:hive-metastore:jar:1.2.1.spark2:compile
[INFO] |  +- com.jolbox:bonecp:jar:0.8.0.RELEASE:compile
[INFO] |  +- commons-cli:commons-cli:jar:1.2:compile
[INFO] |  +- commons-logging:commons-logging:jar:1.1.3:compile
```
But Hive removes `commons-logging` since [HIVE-12237(Hive 2.0.0)](https://issues.apache.org/jira/browse/HIVE-12237), so hadoop-3.2 gets `commons-logging` from `commons-httpclient`:
```
[INFO] +- commons-httpclient:commons-httpclient:jar:3.1:compile
[INFO] |  \- commons-logging:commons-logging:jar:1.0.4:compile
```
Thus. we may hint `LogConfigurationException`:
```
bin/spark-sql --conf spark.sql.hive.metastore.version=1.2.2 --conf spark.sql.hive.metastore.jars=file:///apache/hive-1.2.2-bin/lib/*
...
Caused by: org.apache.commons.logging.LogConfigurationException: Invalid class loader hierarchy. You have more than one version of 'org.apache.commons.logging.Log' visible, which is not allowed.
at org.apache.commons.logging.impl.LogFactoryImpl.getLogConstructor(LogFactoryImpl.java:385)
... 43 more
```

This pr upgrade `commons-logging` to 1.1.3 for hadoop-3.2 to fix this issue.

## How was this patch tested?

manual tests

Closes #24388 from wangyum/SPARK-27481.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-04-23 07:08:01 -07:00
Yuming Wang 777b4502b2 [SPARK-27176][FOLLOW-UP][SQL] Upgrade Hive parquet to 1.10.1 for hadoop-3.2
## What changes were proposed in this pull request?

When we compile and test Hadoop 3.2, we will hint the following two issues:
1. JobSummaryLevel is not a member of object org.apache.parquet.hadoop.ParquetOutputFormat. Fixed by [PARQUET-381](https://issues.apache.org/jira/browse/PARQUET-381)(Parquet 1.9.0)
2. java.lang.NoSuchFieldError: BROTLI
    at org.apache.parquet.hadoop.metadata.CompressionCodecName.<clinit>(CompressionCodecName.java:31). Fixed by [PARQUET-1143](https://issues.apache.org/jira/browse/PARQUET-1143)(Parquet 1.10.0)

The reason is that the `parquet-hadoop-bundle-1.8.1.jar` conflicts with Parquet 1.10.1.
I think it would be safe to upgrade Hive's parquet to 1.10.1 to workaround this issue.

This is what Hive did when upgrading Parquet 1.8.1 to 1.10.0: [HIVE-17000](https://issues.apache.org/jira/browse/HIVE-17000) and [HIVE-19464](https://issues.apache.org/jira/browse/HIVE-19464). We can see that all changes are related to vectors, and vectors are disabled by default: see [HIVE-14826](https://issues.apache.org/jira/browse/HIVE-14826) and [HiveConf.java#L2723](https://github.com/apache/hive/blob/rel/release-2.3.4/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L2723).

This pr removes [parquet-hadoop-bundle-1.8.1.jar](https://github.com/apache/parquet-mr/tree/master/parquet-hadoop-bundle) , so Hive serde will use [parquet-common-1.10.1.jar, parquet-column-1.10.1.jar and parquet-hadoop-1.10.1.jar](https://github.com/apache/spark/blob/master/dev/deps/spark-deps-hadoop-3.2#L185-L189).

## How was this patch tested?

1. manual tests
2. [upgrade Hive Parquet to 1.10.1 annd run Hadoop 3.2 test on jenkins](https://github.com/apache/spark/pull/24044#commits-pushed-0c3f962)

Closes #24346 from wangyum/SPARK-27176.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: gatorsmile <gatorsmile@gmail.com>
2019-04-19 08:59:08 -07:00
Dongjoon Hyun f93460dae9 [SPARK-27493][BUILD] Upgrade ASM to 7.1
## What changes were proposed in this pull request?

[SPARK-25946](https://issues.apache.org/jira/browse/SPARK-25946) upgraded ASM to 7.0 to support JDK11. This PR aims to update ASM to 7.1 to bring the bug fixes.
- https://asm.ow2.io/versions.html
- https://issues.apache.org/jira/browse/XBEAN-316

## How was this patch tested?

Pass the Jenkins.

Closes #24395 from dongjoon-hyun/SPARK-27493.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2019-04-18 13:36:52 +09:00
Dongjoon Hyun 7c4a6439d6 [SPARK-27467][BUILD][TEST-MAVEN] Upgrade Maven to 3.6.1
## What changes were proposed in this pull request?

This PR aims to upgrade Maven to 3.6.1 to bring JDK9+ related patches like [MNG-6506](https://issues.apache.org/jira/browse/MNG-6506). For the full release note, please see the following.
- https://maven.apache.org/docs/3.6.1/release-notes.html

## How was this patch tested?

Pass the Jenkins with `[test-maven]` tag.

Closes #24377 from dongjoon-hyun/SPARK-27467.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-04-16 08:55:27 -07:00
Dongjoon Hyun a8f20c95ab [SPARK-27452][BUILD] Update zstd-jni to 1.3.8-9
## What changes were proposed in this pull request?

This PR aims to update `zstd-jni` from 1.3.2-2 to 1.3.8-9 to be aligned with the latest Zstd 1.3.8 in Apache Spark 3.0.0. Currently, Apache Spark is aligned with the old Zstd used in the first PR and there are many bugfix and improvement updates in `zstd-jni` until now.
- https://github.com/facebook/zstd/releases/tag/v1.3.8
- https://github.com/facebook/zstd/releases/tag/v1.3.7
- https://github.com/facebook/zstd/releases/tag/v1.3.6
- https://github.com/facebook/zstd/releases/tag/v1.3.4
- https://github.com/facebook/zstd/releases/tag/v1.3.3

## How was this patch tested?

Pass the Jenkins with the existing tests.

Closes #24364 from dongjoon-hyun/SPARK-ZSTD.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-04-16 08:54:16 -07:00
Sean Owen a4cf1a4f4e [SPARK-27469][CORE] Update Commons BeanUtils to 1.9.3
## What changes were proposed in this pull request?

Unify commons-beanutils deps to latest 1.9.3. This resolves the version inconsistency in Hadoop 2.7's build and also picks up security and bug fixes.

## How was this patch tested?

Existing tests.

Closes #24378 from srowen/SPARK-27469.

Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-04-15 19:18:37 -07:00
Dongjoon Hyun 0881f648cf [SPARK-27451][BUILD] Upgrade lz4-java to 1.5.1
## What changes were proposed in this pull request?

This PR upgrades `lz4-java` to 1.5.1 in order to get a patch for avoiding racing with GC.
- https://github.com/lz4/lz4-java/blob/master/CHANGES.md#151

## How was this patch tested?

Pass the Jenkins with the existing tests.

Closes #24363 from dongjoon-hyun/SPARK-LZ4.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-04-12 19:21:43 -07:00
Yuming Wang 33f3c48cac [SPARK-27176][SQL] Upgrade hadoop-3's built-in Hive maven dependencies to 2.3.4
## What changes were proposed in this pull request?

This PR mainly contains:
1. Upgrade hadoop-3's built-in Hive maven dependencies to 2.3.4.
2. Resolve compatibility issues between Hive 1.2.1 and Hive 2.3.4 in the `sql/hive` module.

## How was this patch tested?
jenkins test hadoop-2.7
manual test hadoop-3:
```shell
build/sbt clean package -Phadoop-3.2 -Phive
export SPARK_PREPEND_CLASSES=true

# rm -rf metastore_db

cat <<EOF > test_hadoop3.scala
spark.range(10).write.saveAsTable("test_hadoop3")
spark.table("test_hadoop3").show
EOF

bin/spark-shell --conf spark.hadoop.hive.metastore.schema.verification=false --conf spark.hadoop.datanucleus.schema.autoCreateAll=true -i test_hadoop3.scala
```

Closes #23788 from wangyum/SPARK-23710-hadoop3.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: gatorsmile <gatorsmile@gmail.com>
2019-04-08 08:42:21 -07:00
LantaoJin 69dd44af19 [SPARK-27216][CORE] Upgrade RoaringBitmap to 0.7.45 to fix Kryo unsafe ser/dser issue
## What changes were proposed in this pull request?

HighlyCompressedMapStatus uses RoaringBitmap to record the empty blocks. But RoaringBitmap couldn't be ser/deser with unsafe KryoSerializer.

It's a bug of RoaringBitmap-0.5.11 and fixed in latest version.

This is an update of #24157

## How was this patch tested?

Add a UT

Closes #24264 from LantaoJin/SPARK-27216.

Lead-authored-by: LantaoJin <jinlantao@gmail.com>
Co-authored-by: Lantao Jin <jinlantao@gmail.com>
Signed-off-by: Imran Rashid <irashid@cloudera.com>
2019-04-03 20:09:50 -05:00
Sean Owen 2ec650d843 [SPARK-27267][CORE] Update snappy to avoid error when decompressing empty serialized data
## What changes were proposed in this pull request?

(See JIRA for problem statement)

Update snappy 1.1.7.1 -> 1.1.7.3 to pick up an empty-stream and Java 9 fix.

There appear to be no other changes of consequence:
https://github.com/xerial/snappy-java/blob/master/Milestone.md

## How was this patch tested?

Existing tests

Closes #24242 from srowen/SPARK-27267.

Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-03-30 02:41:24 -05:00
Sean Owen 8bc304f97e [SPARK-26132][BUILD][CORE] Remove support for Scala 2.11 in Spark 3.0.0
## What changes were proposed in this pull request?

Remove Scala 2.11 support in build files and docs, and in various parts of code that accommodated 2.11. See some targeted comments below.

## How was this patch tested?

Existing tests.

Closes #23098 from srowen/SPARK-26132.

Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-03-25 10:46:42 -05:00
Dongjoon Hyun 6ef94e0f18 [SPARK-27260][SS] Upgrade to Kafka 2.2.0
## What changes were proposed in this pull request?

This PR aims to update Kafka dependency to 2.2.0 to bring the following improvement and bug fixes.
- https://issues.apache.org/jira/projects/KAFKA/versions/12344063

Due to [KAFKA-4453](https://issues.apache.org/jira/browse/KAFKA-4453), data plane API and controller plane API are separated. Apache Spark needs the following changes.
```scala
- servers.head.apis.metadataCache
+ servers.head.dataPlaneRequestProcessor.metadataCache
```

## How was this patch tested?

Pass the Jenkins with the existing tests.

Closes #24190 from dongjoon-hyun/SPARK-27260.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-03-24 17:39:57 -07:00
John Zhuge a15f17ce27 [SPARK-27250][TEST-MAVEN][BUILD] Scala 2.11 maven compile should target Java 1.8
## What changes were proposed in this pull request?

Fix Scala 2.11 maven build issue after merging SPARK-26946.

## How was this patch tested?

Maven Scala 2.11 and 2.12 builds with `-Phadoop-provided -Phadoop-2.7 -Pyarn -Phive -Phive-thriftserver`.

Closes #24184 from jzhuge/SPARK-26946-1.

Authored-by: John Zhuge <jzhuge@apache.org>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-03-24 09:05:41 -05:00
Yuming Wang 6783831f68 [SPARK-27179][BUILD] Exclude javax.ws.rs:jsr311-api from hadoop-client
## What changes were proposed in this pull request?
Since [YARN-7113](https://issues.apache.org/jira/browse/YARN-7113)(Hadoop-3.1.0), `hadoop-client` add `javax.ws.rs:jsr311-api` to its dependency. This conflict with [javax.ws.rs-api-2.0.1.jar](f26a1f3d37/dev/deps/spark-deps-hadoop-3.1 (L105)).
```shell
build/sbt  "core/testOnly *.UISeleniumSuite *.HistoryServerSuite" -Phadoop-3.2
...
[info] <pre>    Server Error</pre></p><h3>Caused by:</h3><pre>java.lang.NoSuchMethodError: javax.ws.rs.core.Application.getProperties()Ljava/util/Map;
...
```

This pr exclude `javax.ws.rs:jsr311-api` from hadoop-client.

## How was this patch tested?

manual tests:
```shell
build/sbt  "core/testOnly *.UISeleniumSuite *.HistoryServerSuite" -Phadoop-3.2
```

Closes #24114 from wangyum/SPARK-27179.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-03-19 13:31:40 -04:00
Yuming Wang 9c0af746e5 [SPARK-27175][BUILD] Upgrade hadoop-3 to 3.2.0
## What changes were proposed in this pull request?

This PR upgrade `hadoop-3` to `3.2.0`  to workaround [HADOOP-16086](https://issues.apache.org/jira/browse/HADOOP-16086). Otherwise some test case will throw IllegalArgumentException:
```java
02:44:34.707 ERROR org.apache.hadoop.hive.ql.exec.Task: Job Submission failed with exception 'java.io.IOException(Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses.)'
java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses.
	at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:116)
	at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:109)
	at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:102)
	at org.apache.hadoop.mapred.JobClient.init(JobClient.java:475)
	at org.apache.hadoop.mapred.JobClient.<init>(JobClient.java:454)
	at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:369)
	at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:151)
	at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199)
	at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
	at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2183)
	at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1839)
	at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1526)
	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237)
	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1227)
	at org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$runHive$1(HiveClientImpl.scala:730)
	at org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:283)
	at org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:221)
	at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:220)
	at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:266)
	at org.apache.spark.sql.hive.client.HiveClientImpl.runHive(HiveClientImpl.scala:719)
	at org.apache.spark.sql.hive.client.HiveClientImpl.runSqlHive(HiveClientImpl.scala:709)
	at org.apache.spark.sql.hive.StatisticsSuite.createNonPartitionedTable(StatisticsSuite.scala:719)
	at org.apache.spark.sql.hive.StatisticsSuite.$anonfun$testAlterTableProperties$2(StatisticsSuite.scala:822)
```

## How was this patch tested?

manual tests

Closes #24106 from wangyum/SPARK-27175.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-03-16 19:42:05 -05:00
Dongjoon Hyun f26a1f3d37 [SPARK-27165][SPARK-27107][BUILD][SQL] Upgrade Apache ORC to 1.5.5
## What changes were proposed in this pull request?

This PR aims to update Apache ORC dependency to fix [SPARK-27107](https://issues.apache.org/jira/browse/SPARK-27107) .
```
[ORC-452] Support converting MAP column from JSON to ORC Improvement
[ORC-447] Change the docker scripts to keep a persistent m2 cache
[ORC-463] Add `version` command
[ORC-475] ORC reader should lazily get filesystem
[ORC-476] Make SearchAgument kryo buffer size configurable
```

## How was this patch tested?

Pass the Jenkins with the existing tests.

Closes #24096 from dongjoon-hyun/SPARK-27165.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-03-14 20:14:31 -07:00
Yuming Wang eed3091a60 [SPARK-27120][BUILD][TEST] Upgrade scalatest version to 3.0.5
## What changes were proposed in this pull request?

**ScalaTest 3.0.5 Release Notes**

**Bug Fixes**

- Fixed the implicit view not available problem when used with compile macro.
- Fixed a stack depth problem in RefSpecLike and fixture.SpecLike under Scala 2.13.
- Changed Framework and ScalaTestFramework to set spanScaleFactor for Runner object instances for different Runners using different class loaders. This fixed a problem whereby an incorrect Runner.spanScaleFactor could be used when the tests for multiple sbt project's were run concurrently.
- Fixed a bug in endsWith regex matcher.

**Improvements**
- Removed duplicated parsing code for -C in ArgsParser.
- Improved performance in WebBrowser.
- Documentation typo rectification.
- Improve validity of Junit XML reports.
- Improved performance by replacing all .size == 0 and .length == 0 to .isEmpty.

**Enhancements**
- Added 'C' option to -P, which will tell -P to use cached thread pool.
- External Dependencies Update
- Bumped up scala-js version to 0.6.22.
- Changed to depend on mockito-core, not mockito-all.
- Bumped up jmock version to 2.8.3.
- Bumped up junit version to 4.12.
- Removed dependency to scala-parser-combinators.

More details:
http://www.scalatest.org/release_notes/3.0.5

## How was this patch tested?

manual tests on local machine:
```
nohup build/sbt clean -Djline.terminal=jline.UnsupportedTerminal -Phadoop-2.7  -Pkubernetes -Phive-thriftserver -Pyarn -Pspark-ganglia-lgpl -Phive -Pkinesis-asl -Pmesos test > run.scalatest.log &
```

Closes #24042 from wangyum/SPARK-27120.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-03-10 15:22:52 -07:00
Yuming Wang f732647ae4 [SPARK-27054][BUILD][SQL] Remove the Calcite dependency
## What changes were proposed in this pull request?

Calcite is only used for [runSqlHive](02bbe977ab/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala (L699-L705)) when `hive.cbo.enable=true`([SemanticAnalyzer](https://github.com/apache/hive/blob/release-1.2.1/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzerFactory.java#L278-L280)).
So we can disable `hive.cbo.enable` and remove Calcite dependency.

## How was this patch tested?

Exist tests

Closes #23970 from wangyum/SPARK-27054.

Lead-authored-by: Yuming Wang <yumwang@ebay.com>
Co-authored-by: Yuming Wang <wgyumg@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-03-09 16:34:24 -08:00
Yuming Wang d70b6a39e1 [MINOR][BUILD] Add 2 maven properties(hive.classifier and hive.parquet.group)
## What changes were proposed in this pull request?

This pr adds 2 maven properties to help us upgrade the built-in Hive.

| Property Name | Default | In future |
| ------ | ------ | ------ |
| hive.classifier | (none) | core |
| hive.parquet.group | com.twitter | org.apache.parquet |

## How was this patch tested?

existing tests

Closes #23996 from wangyum/add_2_maven_properties.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-03-07 16:46:07 -06:00
Yanbo Liang 7857c6d633 [SPARK-27051][CORE] Bump Jackson version to 2.9.8
## What changes were proposed in this pull request?
Fasterxml Jackson version before 2.9.8 is affected by multiple [CVEs](https://github.com/FasterXML/jackson-databind/issues/2186), we need to fix bump the dependent Jackson to 2.9.8.

## How was this patch tested?
Existing tests and offline benchmark.
I have run ```SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain org.apache.spark.sql.execution.datasources.json.JSONBenchmark"``` to check there is no performance degradation for this upgrade.

Closes #23965 from yanboliang/SPARK-27051.

Authored-by: Yanbo Liang <ybliang8@gmail.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
2019-03-05 11:46:51 +09:00
Sean Owen d8754df2bf [SPARK-27029][BUILD] Update Thrift to 0.12.0
## What changes were proposed in this pull request?

Update Thrift to 0.12.0 to pick up bug and security fixes.
Changes: https://github.com/apache/thrift/blob/master/CHANGES.md
The important one is for https://issues.apache.org/jira/browse/THRIFT-4506

## How was this patch tested?

Existing tests. A quick local test suggests this works.

Closes #23935 from srowen/SPARK-27029.

Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-03-02 17:28:37 -08:00
Sean Owen 131b464d0c [SPARK-26986][ML][FOLLOWUP] Add JAXB reference impl to build for Java 9+
## What changes were proposed in this pull request?

Remove a few new JAXB dependencies that shouldn't be necessary now.
See https://github.com/apache/spark/pull/23890#issuecomment-468299922

## How was this patch tested?

Existing tests

Closes #23923 from srowen/SPARK-26986.2.

Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-03-01 11:23:40 -06:00
Sean Owen 9c283662c6 [SPARK-26986][ML] Add JAXB reference impl to build for Java 9+
## What changes were proposed in this pull request?

Add reference JAXB impl for Java 9+ from Glassfish. Right now it's only apparently necessary in MLlib but can be expanded later.

## How was this patch tested?

Existing tests particularly PMML-related ones, which use JAXB.
This works on Java 11.

Closes #23890 from srowen/SPARK-26986.

Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-02-26 18:26:49 -06:00
seancxmao 3d2e55abd0 [MINOR][DOCS] Remove Akka leftover
## What changes were proposed in this pull request?
Since Spark 2.0, Akka is not used anymore and Akka related stuff were removed. However there are still some leftover. This PR aims to remove these leftover.

* `/pom.xml` has a comment about Akka, which is not needed anymore.

## How was this patch tested?
Existing tests.

Closes #23885 from seancxmao/remove-akka-leftover.

Authored-by: seancxmao <seancxmao@gmail.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-02-26 08:31:02 -06:00
Sean Owen d2529788ed [SPARK-26966][ML] Update to JPMML 1.4.8
## What changes were proposed in this pull request?

JPMML apparently only supports Java 9 in 1.4.2+. We are seeing text failures from JPMML relating to JAXB when running on Java 11. It's shaded and not a big change, so should be safe.

## How was this patch tested?

Existing tests.

Closes #23868 from srowen/SPARK-26966.

Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-02-25 04:37:45 -06:00
Dongjoon Hyun f87153a3ac [SPARK-26916][SS] Upgrade to Kafka 2.1.1
## What changes were proposed in this pull request?

As a part of preparing the official JDK 11 support ([SPARK-24417](https://issues.apache.org/jira/browse/SPARK-24417)), Spark 3.0.0 upgraded KAFKA version to 2.1.0. This PR updates Kafka dependency to 2.1.1 to bring the following 42 bug fixes.
- https://issues.apache.org/jira/projects/KAFKA/versions/12344250

## How was this patch tested?

Pass the Jenkins with the existing tests.

Closes #23831 from dongjoon-hyun/SPARK-26916.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-02-19 20:29:11 -08:00
Ryan Blue f72d217788
[SPARK-26677][BUILD] Update Parquet to 1.10.1 with notEq pushdown fix.
## What changes were proposed in this pull request?

Update to Parquet Java 1.10.1.

## How was this patch tested?

Added a test from HyukjinKwon that validates the notEq case from SPARK-26677.

Closes #23704 from rdblue/SPARK-26677-fix-noteq-parquet-bug.

Lead-authored-by: Ryan Blue <blue@apache.org>
Co-authored-by: Hyukjin Kwon <gurwls223@apache.org>
Co-authored-by: Ryan Blue <rdblue@users.noreply.github.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2019-02-02 09:17:52 -08:00
Bryan Cutler 16990f9299 [SPARK-26566][PYTHON][SQL] Upgrade Apache Arrow to version 0.12.0
## What changes were proposed in this pull request?

Upgrade Apache Arrow to version 0.12.0. This includes the Java artifacts and fixes to enable usage with pyarrow 0.12.0

Version 0.12.0 includes the following selected fixes/improvements relevant to Spark users:

* Safe cast fails from numpy float64 array with nans to integer, ARROW-4258
* Java, Reduce heap usage for variable width vectors, ARROW-4147
* Binary identity cast not implemented, ARROW-4101
* pyarrow open_stream deprecated, use ipc.open_stream, ARROW-4098
* conversion to date object no longer needed, ARROW-3910
* Error reading IPC file with no record batches, ARROW-3894
* Signed to unsigned integer cast yields incorrect results when type sizes are the same, ARROW-3790
* from_pandas gives incorrect results when converting floating point to bool, ARROW-3428
* Import pyarrow fails if scikit-learn is installed from conda (boost-cpp / libboost issue), ARROW-3048
* Java update to official Flatbuffers version 1.9.0, ARROW-3175

complete list [here](https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20in%20(Resolved%2C%20Closed)%20AND%20fixVersion%20%3D%200.12.0)

PySpark requires the following fixes to work with PyArrow 0.12.0

* Encrypted pyspark worker fails due to ChunkedStream missing closed property
* pyarrow now converts dates as objects by default, which causes error because type is assumed datetime64
* ArrowTests fails due to difference in raised error message
* pyarrow.open_stream deprecated
* tests fail because groupby adds index column with duplicate name

## How was this patch tested?

Ran unit tests with pyarrow versions 0.8.0, 0.10.0, 0.11.1, 0.12.0

Closes #23657 from BryanCutler/arrow-upgrade-012.

Authored-by: Bryan Cutler <cutlerb@gmail.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
2019-01-29 14:18:45 +08:00
Gabor Somogyi 773efede20 [SPARK-26254][CORE] Extract Hive + Kafka dependencies from Core.
## What changes were proposed in this pull request?

There are ugly provided dependencies inside core for the following:
* Hive
* Kafka

In this PR I've extracted them out. This PR contains the following:
* Token providers are now loaded with service loader
* Hive token provider moved to hive project
* Kafka token provider extracted into a new project

## How was this patch tested?

Existing + newly added unit tests.
Additionally tested on cluster.

Closes #23499 from gaborgsomogyi/SPARK-26254.

Authored-by: Gabor Somogyi <gabor.g.somogyi@gmail.com>
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
2019-01-25 10:36:00 -08:00
Dongjoon Hyun c7daa95d7f
[SPARK-22128][CORE][BUILD] Add paranamer dependency to core module
## What changes were proposed in this pull request?

With Scala-2.12 profile, Spark application fails while Spark is okay. For example, our documented `SimpleApp` Java example succeeds to compile but it fails at runtime because it doesn't use `paranamer 2.8` and hits [SPARK-22128](https://issues.apache.org/jira/browse/SPARK-22128). This PR aims to declare it explicitly for the Spark applications. Note that this doesn't introduce new dependency to Spark itself.

https://dist.apache.org/repos/dist/dev/spark/3.0.0-SNAPSHOT-2019_01_09_13_59-e853afb-docs/_site/quick-start.html

The following is the dependency tree from the Spark application.

**BEFORE**
```
$ mvn dependency:tree -Dincludes=com.thoughtworks.paranamer
[INFO] --- maven-dependency-plugin:2.8:tree (default-cli)  simple ---
[INFO] my.test:simple:jar:1.0-SNAPSHOT
[INFO] \- org.apache.spark:spark-sql_2.12🫙3.0.0-SNAPSHOT:compile
[INFO]    \- org.apache.spark:spark-core_2.12🫙3.0.0-SNAPSHOT:compile
[INFO]       \- org.apache.avro:avro:jar:1.8.2:compile
[INFO]          \- com.thoughtworks.paranamer:paranamer:jar:2.7:compile
```

**AFTER**
```
[INFO] --- maven-dependency-plugin:2.8:tree (default-cli)  simple ---
[INFO] my.test:simple:jar:1.0-SNAPSHOT
[INFO] \- org.apache.spark:spark-sql_2.12🫙3.0.0-SNAPSHOT:compile
[INFO]    \- org.apache.spark:spark-core_2.12🫙3.0.0-SNAPSHOT:compile
[INFO]       \- com.thoughtworks.paranamer:paranamer:jar:2.8:compile
```

## How was this patch tested?

Pass the Jenkins. And manually test with the sample app is running.

Closes #23502 from dongjoon-hyun/SPARK-26583.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2019-01-10 00:40:21 -08:00
Dongjoon Hyun e15a319ccd
[SPARK-26536][BUILD][TEST] Upgrade Mockito to 2.23.4
## What changes were proposed in this pull request?

This PR upgrades Mockito from 1.10.19 to 2.23.4. The following changes are required.

- Replace `org.mockito.Matchers` with `org.mockito.ArgumentMatchers`
- Replace `anyObject` with `any`
- Replace `getArgumentAt` with `getArgument` and add type annotation.
- Use `isNull` matcher in case of `null` is invoked.
```scala
     saslHandler.channelInactive(null);
-    verify(handler).channelInactive(any(TransportClient.class));
+    verify(handler).channelInactive(isNull());
```

- Make and use `doReturn` wrapper to avoid [SI-4775](https://issues.scala-lang.org/browse/SI-4775)
```scala
private def doReturn(value: Any) = org.mockito.Mockito.doReturn(value, Seq.empty: _*)
```

## How was this patch tested?

Pass the Jenkins with the existing tests.

Closes #23452 from dongjoon-hyun/SPARK-26536.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2019-01-04 19:23:38 -08:00
shane knapp bccb8602d7
[SPARK-26537][BUILD] change git-wip-us to gitbox
## What changes were proposed in this pull request?

due to apache recently moving from git-wip-us.apache.org to gitbox.apache.org, we need to update the packaging scripts to point to the new repo location.

this will also need to be backported to 2.4, 2.3, 2.1, 2.0 and 1.6.

## How was this patch tested?

the build system will test this.

Please review http://spark.apache.org/contributing.html before opening a pull request.

Closes #23454 from shaneknapp/update-apache-repo.

Authored-by: shane knapp <incomplete@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2019-01-04 18:27:26 -08:00
Sean Owen 36440e6447 [SPARK-26306][TEST][BUILD] More memory to de-flake SorterSuite
## What changes were proposed in this pull request?

Increase test memory to avoid OOM in TimSort-related tests.

## How was this patch tested?

Existing tests.

Closes #23425 from srowen/SPARK-26306.

Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-01-04 15:35:23 -06:00
Sean Owen 4bdfda92a1 [SPARK-26507][CORE] Fix core tests for Java 11
## What changes were proposed in this pull request?

This should make tests in core modules pass for Java 11.

## How was this patch tested?

Existing tests, with modifications.

Closes #23419 from srowen/Java11.

Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-01-02 11:23:53 -06:00
Dongjoon Hyun c7bfb4cf83
[SPARK-26430][BUILD][TEST-MAVEN] Upgrade Surefire plugin to 3.0.0-M2
## What changes were proposed in this pull request?

This PR aims to upgrade Maven Surefile plugin for JDK11 support. 3.0.0-M2 is [released Dec. 9th.](https://issues.apache.org/jira/projects/SUREFIRE/versions/12344396)
```
[SUREFIRE-1568] Versions 2.21 and higher doesn't work with junit-platform for Java 9 module
[SUREFIRE-1605] NoClassDefFoundError (RunNotifier) with JDK 11
[SUREFIRE-1600] Surefire Project using surefire:2.12.4 is not fully able to work with JDK 10+ on internal build system. Therefore surefire-shadefire should go with Surefire:3.0.0-M2.
[SUREFIRE-1593] 3.0.0-M1 produces invalid code sources on Windows
[SUREFIRE-1602] Surefire fails loading class ForkedBooter when using a sub-directory pom file and a local maven repo
[SUREFIRE-1606] maven-shared-utils must not be on provider's classpath
[SUREFIRE-1531] Option to switch-off Java 9 modules
[SUREFIRE-1590] Deploy multiple versions of Report XSD
[SUREFIRE-1591] Java 1.7 feature Diamonds replaced Generics
[SUREFIRE-1594] Java 1.7 feature try-catch - multiple exceptions in one catch
[SUREFIRE-1595] Java 1.7 feature System.lineSeparator()
[SUREFIRE-1597] ModularClasspathForkConfiguration with debug logs (args file and its path on file system)
[SUREFIRE-1596] Unnecessary check JAVA_RECENT == JAVA_1_7 in unit tests
[SUREFIRE-1598] Fixed typo in assertion statement in integration test Surefire855AllowFailsafeUseArtifactFileIT
[SUREFIRE-1607] Roadmap on Project Site
```

## How was this patch tested?

Pass the Jenkins.

Closes #23370 from dongjoon-hyun/SPARK-26430.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2018-12-22 00:46:36 -08:00
Dongjoon Hyun 81addaa6b7
[SPARK-26427][BUILD] Upgrade Apache ORC to 1.5.4
## What changes were proposed in this pull request?

This PR aims to update Apache ORC dependency to the latest version 1.5.4 released at Dec. 20. ([Release Notes](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12318320&version=12344187]))
```
[ORC-237] OrcFile.mergeFiles Specified block size is less than configured minimum value
[ORC-409] Changes for extending MemoryManagerImpl
[ORC-410] Fix a locale-dependent test in TestCsvReader
[ORC-416] Avoid opening data reader when there is no stripe
[ORC-417] Use dynamic Apache Maven mirror link
[ORC-419] Ensure to call `close` at RecordReaderImpl constructor exception
[ORC-432] openjdk 8 has a bug that prevents surefire from working
[ORC-435] Ability to read stripes that are greater than 2GB
[ORC-437] Make acid schema checks case insensitive
[ORC-411] Update build to work with Java 10.
[ORC-418] Fix broken docker build script
```

## How was this patch tested?

Build and pass Jenkins.

Closes #23364 from dongjoon-hyun/SPARK-26427.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2018-12-22 00:41:21 -08:00
Sean Owen 2ea9792fde [SPARK-26266][BUILD] Update to Scala 2.12.8
## What changes were proposed in this pull request?

Update to Scala 2.12.8

## How was this patch tested?

Existing tests.

Closes #23218 from srowen/SPARK-26266.

Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2018-12-08 05:59:53 -06:00