### What changes were proposed in this pull request?
This PR aims to add a new enforcer rule to ban duplicated pom dependency during build stage.
### Why are the changes needed?
This will help us by preventing the extra effort like the followings.
```
e63098b287 [SPARK-29007][MLLIB][FOLLOWUP] Remove duplicated dependency
39e044e3d8 [MINOR][BUILD] Remove duplicate test-jar:test spark-sql dependency from Hive module
d8fefab4d8 [HOTFIX][BUILD][TEST-MAVEN] Remove duplicate dependency
e9445b187e [SPARK-6866][Build] Remove duplicated dependency in launcher/pom.xml
```
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
Manually.
If we have something like e63098b287, it will fail at building phase at PR like the following.
```
[WARNING] Rule 0: org.apache.maven.plugins.enforcer.BanDuplicatePomDependencyVersions failed with message:
Found 1 duplicate dependency declaration in this project:
- dependencies.dependency[org.apache.spark:spark-streaming_${scala.binary.version}:test-jar] ( 2 times )
...
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-enforcer-plugin:3.0.0-M2:enforce (enforce-no-duplicate-dependencies) on project spark-mllib_2.12: Some Enforcer rules have failed. Look above for specific messages explaining why the rule failed. -> [Help 1]
```
Closes#25784 from dongjoon-hyun/SPARK-29075.
Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
Upgrade netty-all to latest in the 4.1.x line which is 4.1.39-Final.
### Why are the changes needed?
Currency of dependencies.
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
Existing unit-tests against master branch.
Closes#25712 from n-marion/master.
Authored-by: Nicholas Marion <nmarion@us.ibm.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
## What changes were proposed in this pull request?
This patch does pooling for both kafka consumers as well as fetched data. The overall benefits of the patch are following:
* Both pools support eviction on idle objects, which will help closing invalid idle objects which topic or partition are no longer be assigned to any tasks.
* It also enables applying different policies on pool, which helps optimization of pooling for each pool.
* We concerned about multiple tasks pointing same topic partition as well as same group id, and existing code can't handle this hence excess seek and fetch could happen. This patch properly handles the case.
* It also makes the code always safe to leverage cache, hence no need to maintain reuseCache parameter.
Moreover, pooling kafka consumers is implemented based on Apache Commons Pool, which also gives couple of benefits:
* We can get rid of synchronization of KafkaDataConsumer object while acquiring and returning InternalKafkaConsumer.
* We can extract the feature of object pool to outside of the class, so that the behaviors of the pool can be tested easily.
* We can get various statistics for the object pool, and also be able to enable JMX for the pool.
FetchedData instances are pooled by custom implementation of pool instead of leveraging Apache Commons Pool, because they have CacheKey as first key and "desired offset" as second key which "desired offset" is changing - I haven't found any general pool implementations supporting this.
This patch brings additional dependency, Apache Commons Pool 2.6.0 into `spark-sql-kafka-0-10` module.
## How was this patch tested?
Existing unit tests as well as new tests for object pool.
Also did some experiment regarding proving concurrent access of consumers for same topic partition.
* Made change on both sides (master and patch) to log when creating Kafka consumer or fetching records from Kafka is happening.
* branches
* master: https://github.com/HeartSaVioR/spark/tree/SPARK-25151-master-ref-debugging
* patch: https://github.com/HeartSaVioR/spark/tree/SPARK-25151-debugging
* Test query (doing self-join)
* https://gist.github.com/HeartSaVioR/d831974c3f25c02846f4b15b8d232cc2
* Ran query from spark-shell, with using `local[*]` to maximize the chance to have concurrent access
* Collected the count of fetch requests on Kafka via command: `grep "creating new Kafka consumer" logfile | wc -l`
* Collected the count of creating Kafka consumers via command: `grep "fetching data from Kafka consumer" logfile | wc -l`
Topic and data distribution is follow:
```
truck_speed_events_stream_spark_25151_v1:0:99440
truck_speed_events_stream_spark_25151_v1:1:99489
truck_speed_events_stream_spark_25151_v1:2:397759
truck_speed_events_stream_spark_25151_v1:3:198917
truck_speed_events_stream_spark_25151_v1:4:99484
truck_speed_events_stream_spark_25151_v1:5:497320
truck_speed_events_stream_spark_25151_v1:6:99430
truck_speed_events_stream_spark_25151_v1:7:397887
truck_speed_events_stream_spark_25151_v1:8:397813
truck_speed_events_stream_spark_25151_v1:9:0
```
The experiment only used smallest 4 partitions (0, 1, 4, 6) from these partitions to finish the query earlier.
The result of experiment is below:
branch | create Kafka consumer | fetch request
-- | -- | --
master | 1986 | 2837
patch | 8 | 1706
Closes#22138 from HeartSaVioR/SPARK-25151.
Lead-authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan@gmail.com>
Co-authored-by: Jungtaek Lim <kabhwan@gmail.com>
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
### What changes were proposed in this pull request?
This PR proposes to upgrade scala-maven-plugin from 3.4.4 to 4.2.0.
Upgrade to 4.1.1 was reverted due to unexpected build failure on AppVeyor.
The root cause seems to be an issue specific to AppVeyor - loading the system library 'kernel32.dll' seems being failed.
```
Suppressed: java.lang.NoClassDefFoundError: Could not initialize class com.sun.jna.platform.win32.Kernel32
at sbt.internal.io.WinMilli$.getHandle(Milli.scala:264)
at sbt.internal.io.WinMilli$.getModifiedTimeNative(Milli.scala:289)
at sbt.internal.io.WinMilli$.getModifiedTimeNative(Milli.scala:260)
at sbt.internal.io.MilliNative.getModifiedTime(Milli.scala:61)
at sbt.internal.io.Milli$.getModifiedTime(Milli.scala:360)
at sbt.io.IO$.$anonfun$getModifiedTimeOrZero$1(IO.scala:1373)
at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23)
at sbt.internal.io.Retry$.liftedTree2$1(Retry.scala:38)
at sbt.internal.io.Retry$.impl$1(Retry.scala:38)
at sbt.internal.io.Retry$.apply(Retry.scala:52)
at sbt.internal.io.Retry$.apply(Retry.scala:24)
at sbt.io.IO$.getModifiedTimeOrZero(IO.scala:1373)
at sbt.internal.inc.caching.ClasspathCache$.fromCacheOrHash$1(ClasspathCache.scala:44)
at sbt.internal.inc.caching.ClasspathCache$.$anonfun$hashClasspath$1(ClasspathCache.scala:53)
at scala.collection.parallel.mutable.ParArray$Map.leaf(ParArray.scala:659)
at scala.collection.parallel.Task.$anonfun$tryLeaf$1(Tasks.scala:53)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at scala.util.control.Breaks$$anon$1.catchBreak(Breaks.scala:67)
at scala.collection.parallel.Task.tryLeaf(Tasks.scala:56)
at scala.collection.parallel.Task.tryLeaf$(Tasks.scala:50)
at scala.collection.parallel.mutable.ParArray$Map.tryLeaf(ParArray.scala:650)
at scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask.internal(Tasks.scala:170)
... 25 more
```
By setting `-Djna.nosys=true`, it directly loads the library from the jar instead of system's.
In this way, the build seems working fine.
### Why are the changes needed?
It upgrades the plugin to fix bugs and fixes the CI build.
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
It was tested at https://github.com/apache/spark/pull/25497Closes#25633 from HyukjinKwon/SPARK-28759.
Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
At the moment no end-to-end Kafka delegation token test exists which was mainly because of missing embedded KDC. KDC is missing in general from the testing side so I've discovered what kind of possibilities are there. The most obvious choice is the MiniKDC inside the Hadoop library where Apache Kerby runs in the background. What this PR contains:
* Added MiniKDC as test dependency from Hadoop
* Added `maven-bundle-plugin` because couple of dependencies are coming in bundle format
* Added security mode to `KafkaTestUtils`. Namely start KDC -> start Zookeeper in secure mode -> start Kafka in secure mode
* Added a roundtrip test (saves and reads back data from Kafka)
### Why are the changes needed?
No such test exists + security testing with KDC is completely missing.
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
Existing + additional unit tests.
I've put the additional test into a loop and was consuming ~10 sec average.
Closes#25477 from gaborgsomogyi/SPARK-28760.
Authored-by: Gabor Somogyi <gabor.g.somogyi@gmail.com>
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
### What changes were proposed in this pull request?
`mllib` has `jaxb-runtime-2.3.2` as a runtime dependency. This PR makes it as a compile-time dependency. This doesn't change our dependency manifest and `LICENSE`s. Instead, this will add the following three jars into our pre-built artifacts.
- jaxb-runtime-2.3.2.jar
- jakarta.xml.bind-api-2.3.2.jar
- istack-commons-runtime-3.0.8.jar
### Why are the changes needed?
We need to use the followings.
- JDK8: `com.sun.xml.internal.bind.v2.ContextFactory`
- JDK11: `com.sun.xml.bind.v2.ContextFactory`
`com.sun.xml.bind.v2.ContextFactory` is inside `jaxb-runtime-2.3.2`.
```
$ javap -cp jaxb-runtime-2.3.2.jar com.sun.xml.bind.v2.ContextFactory
Compiled from "ContextFactory.java"
public class com.sun.xml.bind.v2.ContextFactory {
public static final java.lang.String USE_JAXB_PROPERTIES;
public com.sun.xml.bind.v2.ContextFactory();
public static javax.xml.bind.JAXBContext createContext(java.lang.Class[], java.util.Map<java.lang.String, java.lang.Object>) throws javax.xml.bind.JAXBException;
public static com.sun.xml.bind.api.JAXBRIContext createContext(java.lang.Class[], java.util.Collection<com.sun.xml.bind.api.TypeReference>, java.util.Map<java.lang.Class, java.lang.Class>, java.lang.String, boolean, com.sun.xml.bind.v2.model.annotation.RuntimeAnnotationReader, boolean, boolean, boolean) throws javax.xml.bind.JAXBException;
public static com.sun.xml.bind.api.JAXBRIContext createContext(java.lang.Class[], java.util.Collection<com.sun.xml.bind.api.TypeReference>, java.util.Map<java.lang.Class, java.lang.Class>, java.lang.String, boolean, com.sun.xml.bind.v2.model.annotation.RuntimeAnnotationReader, boolean, boolean, boolean, boolean) throws javax.xml.bind.JAXBException;
public static javax.xml.bind.JAXBContext createContext(java.lang.String, java.lang.ClassLoader, java.util.Map<java.lang.String, java.lang.Object>) throws javax.xml.bind.JAXBException;
}
```
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
Pass the Jenkins with `test-java11`.
For manual testing, do the following with JDK11.
```scala
$ java -version
openjdk version "11.0.3" 2019-04-16
OpenJDK Runtime Environment AdoptOpenJDK (build 11.0.3+7)
OpenJDK 64-Bit Server VM AdoptOpenJDK (build 11.0.3+7, mixed mode)
$ build/sbt -Pyarn -Phadoop-3.2 -Phadoop-cloud -Phive -Phive-thriftserver -Psparkr test:package
$ python/run-tests.py --python-executables python --modules pyspark-ml
...
Finished test(python): pyspark.ml.recommendation (65s)
Tests passed in 174 seconds
```
Closes#25587 from dongjoon-hyun/SPARK-28877.
Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
Spark uses Netty 4 directly, but also includes Netty 3 only because transitive dependencies do. The dependencies (Hadoop HDFS, Zookeeper, Avro) don't seem to need this dependency as used in Spark. I think we can forcibly remove it to slim down the dependencies.
Previous attempts were blocked by its usage in Flume, but that dependency has gone away.
https://github.com/apache/spark/pull/15436
### Why are the changes needed?
Mostly to reduce the transitive dependency size and complexity a little bit and avoid triggering spurious security alerts on Netty 3.x usage.
### Does this PR introduce any user-facing change?
No
### How was this patch tested?
Existing tests
Closes#25544 from srowen/SPARK-17875.
Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
This PR adds explicit exclusions to avoid Maven `JDK11` dependency issues.
### Why are the changes needed?
Maven/Ivy seems to be confused during dependency generation on `JDK11` environment.
This is not only wrong, but also causes a Jenkins failure during dependency manifest check on `JDK11` environment.
**JDK8**
```
$ cd core
$ mvn -X dependency:tree -Dincludes=jakarta.activation:jakarta.activation-api
...
[DEBUG] org.glassfish.jersey.core:jersey-server:jar:2.29:compile (version managed from 2.22.2)
[DEBUG] org.glassfish.jersey.media:jersey-media-jaxb:jar:2.29:compile
[DEBUG] javax.validation:validation-api:jar:2.0.1.Final:compile
```
**JDK11**
```
[DEBUG] org.glassfish.jersey.core:jersey-server:jar:2.29:compile (version managed from 2.22.2)
[DEBUG] org.glassfish.jersey.media:jersey-media-jaxb:jar:2.29:compile
[DEBUG] javax.validation:validation-api:jar:2.0.1.Final:compile
[DEBUG] jakarta.xml.bind:jakarta.xml.bind-api🫙2.3.2:compile
[DEBUG] jakarta.activation:jakarta.activation-api🫙1.2.1:compile
```
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
Do the following in both `JDK8` and `JDK11` environment. The dependency manifest should not be changed. In the current `master` branch, `JDK11` changes the dependency manifest.
```
$ dev/test-dependencies.sh --replace-manifest
```
Closes#25481 from dongjoon-hyun/SPARK-28765.
Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
This PR aims to upgrade `scala-maven-plugin` to 4.1.1 to bring the improvement (including Scala 2.13.0 support, Zinc update) and bug fixes in the plugin.
### Why are the changes needed?
`4.1.1` uses the latest dependent plugins.
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
Pass the Jenkins.
Closes#25476 from dongjoon-hyun/SPARK-28759.
Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
## What changes were proposed in this pull request?
Update Jersey to 2.27+, ideally 2.29, for possible JDK 11 fixes.
## How was this patch tested?
Existing tests.
Closes#25455 from srowen/SPARK-28737.
Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
This PR aims to upgrade `Janino` from `3.0.13` to `3.0.15` in order to bring the bug fixes. Please note that `3.1.0` is a major refactoring instead of bug fixes. We had better use `3.0.15` and wait for the stabler 3.1.x.
### Why are the changes needed?
This brings the following bug fixes.
**3.0.15 (2019-07-28)**
- Fix overloaded single static method import
**3.0.14 (2019-07-05)**
- Conflict in sbt-assembly
- Overloaded static on-demand imported methods cause a CompileException: Ambiguous static method import
- Handle overloaded static on-demand imports
- Major refactoring of the Java 8 and Java 9 retrofit mechanism
- Added tests for "JLS8 8.6 Instance Initializers" and "JLS8 8.7 Static Initializers"
- Local variables in instance initializers don't work
- Provide an option to keep generated code files
- Added compile error handler and warning handler to ICompiler
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
Pass the Jenkins with the existing tests.
Closes#25474 from dongjoon-hyun/SPARK-28758.
Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
## What changes were proposed in this pull request?
Update Jackson databind to the latest version for some latest changes.
## How was this patch tested?
Pass the Jenkins.
Closes#25451 from Fokko/fd-bump-jackson-databind.
Lead-authored-by: Fokko Driesprong <fokko@apache.org>
Co-authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
## What changes were proposed in this pull request?
Fixes a vulnerability from the GitHub Security Advisory Database:
_Moderate severity vulnerability that affects com.puppycrawl.tools:checkstyle_
Checkstyle prior to 8.18 loads external DTDs by default, which can potentially lead to denial of service attacks or the leaking of confidential information.
https://github.com/checkstyle/checkstyle/issues/6474
Affected versions: < 8.18
## How was this patch tested?
Ran checkstyle locally.
Closes#25432 from Fokko/SPARK-28713.
Authored-by: Fokko Driesprong <fokko@apache.org>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
## What changes were proposed in this pull request?
This PR aims to update `zstd-jni` library to bring the latest improvement and bug fixes in `1.4.1` and `1.4.2`.
- https://github.com/facebook/zstd/releases/tag/v1.4.1 (4.5 ~ 11.8% performance improvement from v1.4.0 and bug fixes)
- https://github.com/facebook/zstd/releases/tag/v1.4.2 (bug fixes)
## How was this patch tested?
Pass the Jenkins.
Closes#25275 from dongjoon-hyun/SPARK-28544.
Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
## What changes were proposed in this pull request?
This PR aims to upgrade Mockito from **2.23.4** to **2.28.2** in order to bring the latest bug fixes and to be up-to-date for JDK9+ support before Apache Spark 3.0.0. There is Mockito 3.0 released 4 days ago, but we had better wait and see for the stability.
**RELEASE NOTE**
https://github.com/mockito/mockito/blob/release/2.x/doc/release-notes/official.md
**NOTABLE FIXES**
- Configure the MethodVisitor for Java 11+ compatibility (2.27.5)
- When mock is called multiple times, and verify fails, the error message reports only the first invocation (2.27.4)
- Memory leak in mockito-inline calling method on mock with at least a mock as parameter (2.25.0)
- Cross-references and a single spy cause memory leak (2.25.0)
- Nested spies cause memory leaks (2.25.0)
- [Java 9 support] ClassCastExceptions with JDK9 javac (2.24.9, 2.24.3)
- Return null instead of causing a CCE (2.24.9, 2.24.3)
- Issue with mocking type in "java.util.*", Java 12 (2.24.2)
Mainly, Maven (Hadoop-2.7/Hadoop-3.2) and SBT(Hadoop-2.7) Jenkins test passed.
## How was this patch tested?
Pass the Jenkins with the exiting UTs.
Closes#25139 from dongjoon-hyun/SPARK-28370.
Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
## What changes were proposed in this pull request?
Mainly change logs:
### Version 3.0.13:
- Support for JDK 9/10 in Full Compiler
- The syntax elements that can have modifiers now all have sets of "is...()" methods that check for each modifier. Some also have methods "getAccess()" and/or "getAnnotations()".
- Implement "type annotations" (JLS8 9.7.4)
- Implemented parsing (but not compilation) of "modular compilation units" (JLS11 7.3).
- Replaced all "assert...Uncookable(..., Pattern messageRegex)" and "assert...Uncookable(..., String messageInfix)" method pairs with a single "assert...Uncookable(..., String messageRegex)" method.
Minor refactoring: Allowed modifiers are now checked in the Parser, not in Java.*. This saves a lot of THROWS clauses.
- Parse Type inference syntax: Type inference for generic instance creation implemented, test cases added.
- Parse MethodReference, ClassInstanceCreationReference and ArrayCreationReference
### Version 3.0.12
- Fixed: Operator "&" not defined on types "java.lang.Long" and "int"
- Major bug in JavaSourceClassLoader: When loading the second and following classes, CUs were compiled again, leading to an inconsistent class hierarchy.
- Fixed: Java 9 added "Override public final CharBuffer CharBuffer.rewind() { ..." -- leads easily to a java.lang.NoSuchMethodError
- Changed all occurences of the words "Java bytecode" to "JVM bytecode" to make clearer that the generated bytecode is for the JVMS and not suitable for, e.g. DALVIK.
http://janino-compiler.github.io/janino/changelog.html
## How was this patch tested?
Existing test
Closes#25021 from wangyum/SPARK-28221.
Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
## What changes were proposed in this pull request?
This pr upgrades Postgres docker image for integration tests.
## How was this patch tested?
manual tests:
```
./build/mvn install -DskipTests
./build/mvn test -Pdocker-integration-tests -pl :spark-docker-integration-tests_2.12
```
Closes#25050 from wangyum/SPARK-28248.
Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
## What changes were proposed in this pull request?
This PR aims to update `xbean-asm7-shaded` to bring [XBEAN-318](https://issues.apache.org/jira/browse/XBEAN-318) which is helpful to log the class definition reading failures.
- https://issues.apache.org/jira/projects/XBEAN/versions/12345220
## How was this patch tested?
Pass the Jenkins.
Closes#24914 from dongjoon-hyun/SPARK-28111.
Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
## What changes were proposed in this pull request?
We're using an old-ish jQuery, 1.12.4, and should probably update for Spark 3 to keep up in general, but also to keep up with CVEs. In fact, we know of at least one resolved in only 3.4.0+ (https://nvd.nist.gov/vuln/detail/CVE-2019-11358). They may not affect Spark, but, if the update isn't painful, maybe worthwhile in order to make future 3.x updates easier.
jQuery 1 -> 2 doesn't sound like a breaking change, as 2.0 is supposed to maintain compatibility with 1.9+ (https://blog.jquery.com/2013/04/18/jquery-2-0-released/)
2 -> 3 has breaking changes: https://jquery.com/upgrade-guide/3.0/. It's hard to evaluate each one, but the most likely area for problems is in ajax(). However, our usage of jQuery (and plugins) is pretty simple.
Update jquery to 3.4.1; update jquery blockUI and mustache to latest
## How was this patch tested?
Manual testing of docs build (except R docs), worker/master UI, spark application UI.
Note: this really doesn't guarantee it works, as our tests can't test javascript, and this is merely anecdotal testing, although I clicked about every link I could find. There's a risk this breaks a minor part of the UI; it does seem to work fine in the main.
Closes#24843 from srowen/SPARK-28004.
Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
## What changes were proposed in this pull request?
For Apache Spark 3.0.0 release, this PR aims to update Kafka dependency to 2.2.1 to bring the following improvement and bug fixes like [KAFKA-8134](https://issues.apache.org/jira/browse/KAFKA-8134) (`'linger.ms' must be a long`).
https://issues.apache.org/jira/projects/KAFKA/versions/12345010
## How was this patch tested?
Pass the Jenkins.
Closes#24847 from dongjoon-hyun/SPARK-28013.
Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
## What changes were proposed in this pull request?
This PR introduces the necessary Maven modules for the new [Spark Graph](https://issues.apache.org/jira/browse/SPARK-25994) feature for Spark 3.0.
* `spark-graph` is a parent module that users depend on to get all graph functionalities (Cypher and Graph Algorithms)
* `spark-graph-api` defines the [Property Graph API](https://docs.google.com/document/d/1Wxzghj0PvpOVu7XD1iA8uonRYhexwn18utdcTxtkxlI) that is being shared between Cypher and Algorithms
* `spark-cypher` contains a Cypher query engine implementation
Both, `spark-graph-api` and `spark-cypher` depend on Spark SQL.
Note, that the Maven module for Graph Algorithms is not part of this PR and will be introduced in https://issues.apache.org/jira/browse/SPARK-27302
A PoC for a running Cypher implementation can be found in this WIP PR https://github.com/apache/spark/pull/24297
## How was this patch tested?
Pass the Jenkins with all profiles and manually build and check the followings.
```
$ ls assembly/target/scala-2.12/jars/spark-cypher*
assembly/target/scala-2.12/jars/spark-cypher_2.12-3.0.0-SNAPSHOT.jar
$ ls assembly/target/scala-2.12/jars/spark-graph* | grep -v graphx
assembly/target/scala-2.12/jars/spark-graph-api_2.12-3.0.0-SNAPSHOT.jar
assembly/target/scala-2.12/jars/spark-graph_2.12-3.0.0-SNAPSHOT.jar
```
Closes#24490 from s1ck/SPARK-27300.
Lead-authored-by: Martin Junghanns <martin.junghanns@neotechnology.com>
Co-authored-by: Max Kießling <max@kopfueber.org>
Co-authored-by: Martin Junghanns <martin.junghanns@neo4j.com>
Co-authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
## What changes were proposed in this pull request?
This pr moves Hive test jars(`hive-contrib-0.13.1.jar`, `hive-hcatalog-core-0.13.1.jar`, `hive-contrib-2.3.5.jar` and `hive-hcatalog-core-2.3.5.jar`) to maven dependency.
## How was this patch tested?
Existing test
Please note that this pr need test with `maven` and `sbt`.
Closes#24751 from wangyum/SPARK-27831.
Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
## What changes were proposed in this pull request?
Move to json4s version 3.6.6
Add scala-xml 1.2.0
## How was this patch tested?
Pass the Jenkins
Closes#24736 from igreenfield/master.
Authored-by: Izek Greenfield <igreenfield@axiomsl.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
## What changes were proposed in this pull request?
This fixes CVE-2019-12086 on Databind: https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.9.9
## How was this patch tested?
Existing tests
Closes#24646 from Fokko/SPARK-27757.
Authored-by: Fokko Driesprong <fokko@apache.org>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
## What changes were proposed in this pull request?
This pr moves Hive test jars(`hive-contrib-0.13.1.jar`, `hive-hcatalog-core-0.13.1.jar`, `hive-contrib-2.3.5.jar` and `hive-hcatalog-core-2.3.5.jar`) to maven dependency.
## How was this patch tested?
Existing test
Closes#24695 from wangyum/SPARK-27831.
Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
## What changes were proposed in this pull request?
Leave avro, avro-ipc dependendencies as compile scope even for hadoop-provided usages, to ensure 1.8 is used. Hadoop 2.7 has Avro 1.7, and Spark won't generally work with that. Reports from the field are that this works, to include avro 1.8 with the Spark distro on Hadoop 2.7.
## How was this patch tested?
Existing tests
Closes#24680 from srowen/SPARK-26045.
Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
## What changes were proposed in this pull request?
This PR aims to update `zstd-jni` library to `1.4.0-1` which improves the `level 1 compression speed` performance by 6% in most scenarios. The following is the full release note.
- https://github.com/facebook/zstd/releases/tag/v1.4.0
## How was this patch tested?
Pass the Jenkins.
Closes#24632 from dongjoon-hyun/SPARK-27755.
Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
## What changes were proposed in this pull request?
This PR upgrades lz4-java from 1.5.1 to 1.6.0. Lz4-java is available at https://github.com/lz4/lz4-java.
Changes from 1.5.1:
- Upgraded LZ4 to 1.9.1. Updated the JNI bindings, except for the one for Linux/i386. Decompression speed is improved on amd64.
- Deprecated use of LZ4FastDecompressor of a native instance because the corresponding C API function is deprecated. See the release note of LZ4 1.9.0 for details. Updated javadoc accordingly.
- Changed the module name from org.lz4.lz4-java to org.lz4.java to avoid using - in the module name. (severn-everett, Oliver Eikemeier, Rei Odaira)
- Enabled build with Java 11. Note that the distribution is still built with Java 7. (Rei Odaira)
## How was this patch tested?
Existing tests.
Closes#24629 from kiszk/SPARK-27752.
Authored-by: Kazuaki Ishizaki <ishizaki@jp.ibm.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
## What changes were proposed in this pull request?
Fixed the `spark-<version>-yarn-shuffle.jar` artifact packaging to shade the native netty libraries:
- shade the `META-INF/native/libnetty_*` native libraries when packagin
the yarn shuffle service jar. This is required as netty library loader
derives that based on shaded package name.
- updated the `org/spark_project` shade package prefix to `org/sparkproject`
(i.e. removed underscore) as the former breaks the netty native lib loading.
This was causing the yarn external shuffle service to fail
when spark.shuffle.io.mode=EPOLL
## How was this patch tested?
Manual tests
Closes#24502 from amuraru/SPARK-27610_master.
Authored-by: Adi Muraru <amuraru@adobe.com>
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
## What changes were proposed in this pull request?
This PR aims to upgrade Maven to 3.6.1 to bring JDK9+ related patches like [MNG-6506](https://issues.apache.org/jira/browse/MNG-6506). For the full release note, please see the following.
- https://maven.apache.org/docs/3.6.1/release-notes.html
This was committed and reverted due to AppVeyor failure. It turns out that the root cause is `PATH` issue. With the updated AppVeyor script, it passed.
https://ci.appveyor.com/project/ApacheSoftwareFoundation/spark/builds/24273412
## How was this patch tested?
Pass the Jenkins and AppVoyer
Closes#24481 from dongjoon-hyun/SPARK-R.
Lead-authored-by: Dongjoon Hyun <dhyun@apple.com>
Co-authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
PR #23890 introduced `org.glassfish.jaxb:jaxb-runtime:2.3.2` as a runtime dependency. As an unexpected side effect, `jakarta.activation:jakarta.activation-api:1.2.1` was also pulled in as a transitive dependency. As a result, for the Maven build, both of the following two jars can be found under `assembly/target/scala-2.12/jars/`:
```
activation-1.1.1.jar
jakarta.activation-api-1.2.1.jar
```
This PR exludes the Jakarta one.
Manually built Spark using Maven and checked files under `assembly/target/scala-2.12/jars/`. After this change, only `activation-1.1.1.jar` is there.
Closes#24507 from liancheng/spark-27611.
Authored-by: Cheng Lian <lian@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
## What changes were proposed in this pull request?
One more place to update ASM 7.0 -> 7.1
## How was this patch tested?
Existing tests
Closes#24508 from srowen/SPARK-27493.3.
Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
## What changes were proposed in this pull request?
This PR aims to upgrade Surefire plugin to 3.0.0-M3 to bring [SUREFIRE-1613](https://issues.apache.org/jira/browse/SUREFIRE-1613).
## How was this patch tested?
Pass the Jenkins with the existing tests.
Closes#24501 from dongjoon-hyun/SPARK-27608.
Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
## What changes were proposed in this pull request?
hadoop-2.7 gets `commons-logging` version from `hive-metastore`:
```
[INFO] +- org.spark-project.hive:hive-metastore:jar:1.2.1.spark2:compile
[INFO] | +- com.jolbox:bonecp:jar:0.8.0.RELEASE:compile
[INFO] | +- commons-cli:commons-cli:jar:1.2:compile
[INFO] | +- commons-logging:commons-logging:jar:1.1.3:compile
```
But Hive removes `commons-logging` since [HIVE-12237(Hive 2.0.0)](https://issues.apache.org/jira/browse/HIVE-12237), so hadoop-3.2 gets `commons-logging` from `commons-httpclient`:
```
[INFO] +- commons-httpclient:commons-httpclient:jar:3.1:compile
[INFO] | \- commons-logging:commons-logging:jar:1.0.4:compile
```
Thus. we may hint `LogConfigurationException`:
```
bin/spark-sql --conf spark.sql.hive.metastore.version=1.2.2 --conf spark.sql.hive.metastore.jars=file:///apache/hive-1.2.2-bin/lib/*
...
Caused by: org.apache.commons.logging.LogConfigurationException: Invalid class loader hierarchy. You have more than one version of 'org.apache.commons.logging.Log' visible, which is not allowed.
at org.apache.commons.logging.impl.LogFactoryImpl.getLogConstructor(LogFactoryImpl.java:385)
... 43 more
```
This pr upgrade `commons-logging` to 1.1.3 for hadoop-3.2 to fix this issue.
## How was this patch tested?
manual tests
Closes#24388 from wangyum/SPARK-27481.
Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
## What changes were proposed in this pull request?
This PR aims to upgrade Maven to 3.6.1 to bring JDK9+ related patches like [MNG-6506](https://issues.apache.org/jira/browse/MNG-6506). For the full release note, please see the following.
- https://maven.apache.org/docs/3.6.1/release-notes.html
## How was this patch tested?
Pass the Jenkins with `[test-maven]` tag.
Closes#24377 from dongjoon-hyun/SPARK-27467.
Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
## What changes were proposed in this pull request?
Unify commons-beanutils deps to latest 1.9.3. This resolves the version inconsistency in Hadoop 2.7's build and also picks up security and bug fixes.
## How was this patch tested?
Existing tests.
Closes#24378 from srowen/SPARK-27469.
Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>