### What changes were proposed in this pull request?
- Add missing `since` annotation.
- Don't show classes under `org.apache.spark.sql.dynamicpruning` package in API docs.
- Fix the scope of `xxxExactNumeric` to remove it from the API docs.
### Why are the changes needed?
Avoid leaking APIs unintentionally in Spark 3.0.0.
### Does this PR introduce any user-facing change?
No. All these changes are to avoid leaking APIs unintentionally in Spark 3.0.0.
### How was this patch tested?
Manually generated the API docs and verified the above issues have been fixed.
Closes#27560 from xuanyuanking/SPARK-30809.
Authored-by: Yuanjian Li <xyliyuanjian@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
### What changes were proposed in this pull request?
This PR tries #26710 (comment) way to fix the test.
### Why are the changes needed?
To make the tests pass.
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
Jenkins will test first, and then `on spark-branch-3.0-test-sbt-hadoop-2.7-hive-2.3` will test it out.
Closes#27513 from HyukjinKwon/test-SPARK-30756.
Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(cherry picked from commit 8efe367a4e)
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
- Fix the scope of `Logging.initializeForcefully` so that it doesn't appear in subclasses' public methods. Right now, `sc.initializeForcefully(false, false)` is allowed to called.
- Don't show classes under `org.apache.spark.internal` package in API docs.
- Add missing `since` annotation.
- Fix the scope of `ArrowUtils` to remove it from the API docs.
### Why are the changes needed?
Avoid leaking APIs unintentionally in Spark 3.0.0.
### Does this PR introduce any user-facing change?
No. All these changes are to avoid leaking APIs unintentionally in Spark 3.0.0.
### How was this patch tested?
Manually generated the API docs and verified the above issues have been fixed.
Closes#27528 from zsxwing/audit-ss-apis.
Authored-by: Shixiong Zhu <zsxwing@gmail.com>
Signed-off-by: Xiao Li <gatorsmile@gmail.com>
### What changes were proposed in this pull request?
We should also expose it in documentation as we marked it as unstable API as of SPARK-30547
Note that, seems Javadoc -> Scaladoc doesn't work but this PR does not target to fix.
### Why are the changes needed?
To show the documentation of API.
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
Manually built the docs via `jykill serve` under `docs` directory:
![Screen Shot 2020-01-31 at 4 04 15 PM](https://user-images.githubusercontent.com/6477701/73519315-12143300-4444-11ea-9260-070c9f672dde.png)
Closes#27412 from HyukjinKwon/SPARK-30547.
Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
Upgrade the version of Genjavadoc from 0.14 to 0.15.
### Why are the changes needed?
To enable to build for Scala 2.13.1.
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
I confirmed there is no dependency error related to genjavadoc by manual build.
Also, I generated javadoc by `LANG=C build/sbt -Pkinesis-asl -Pyarn -Pkubernetes -Phive-thriftserver unidoc` for both code with/without this change and did `diff -r` target/javadoc.
Closes#27255 from sarutak/upgrade-genjavadoc.
Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
This reverts commit 709387d660.
See https://issues.apache.org/jira/browse/SPARK-27300?focusedCommentId=16990048&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16990048 and previous mailing list discussions.
### What changes were proposed in this pull request?
Revert the addition of skeleton graph API modules for Spark 3.0.
### Why are the changes needed?
It does not appear that content will be added to the module for Spark 3, so I propose avoiding committing to the modules, which are no-ops now, in the upcoming major 3.0 release.
### Does this PR introduce any user-facing change?
No, the modules were not released.
### How was this patch tested?
Existing tests, but mostly N/A.
Closes#26928 from srowen/Revert27300.
Authored-by: Sean Owen <srowen@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
This PR proposes to exclude Unidoc checking in Hive domain. We don't publish this as a part of Spark documentation (see also https://github.com/apache/spark/blob/master/docs/_plugins/copy_api_dirs.rb#L30) and most of them are copy of Hive thrift server so that we can officially use Hive 2.3 release.
It doesn't much make sense to check the documentation generation against another domain, and that we don't use in documentation publish.
### Why are the changes needed?
To avoid unnecessary computation.
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
By Jenkins:
```
========================================================================
Building Spark
========================================================================
[info] Building Spark using SBT with these arguments: -Phadoop-2.7 -Phive-2.3 -Phive -Pmesos -Pkubernetes -Phive-thriftserver -Phadoop-cloud -Pkinesis-asl -Pspark-ganglia-lgpl -Pyarn test:package streaming-kinesis-asl-assembly/assembly
...
========================================================================
Building Unidoc API Documentation
========================================================================
[info] Building Spark unidoc using SBT with these arguments: -Phadoop-2.7 -Phive-2.3 -Phive -Pmesos -Pkubernetes -Phive-thriftserver -Phadoop-cloud -Pkinesis-asl -Pspark-ganglia-lgpl -Pyarn unidoc
...
[info] Main Java API documentation successful.
...
[info] Main Scala API documentation successful.
```
Closes#26800 from HyukjinKwon/do-not-merge.
Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
This PR tries to fix flakiness in `HiveThriftServer2ListenerSuite` by using a dedicated JVM (after we switch to Hive 2.3 by default in PR builders). Likewise in 4a73bed318, there's no explicit evidence for this fix.
See https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114653/testReport/org.apache.spark.sql.hive.thriftserver.ui/HiveThriftServer2ListenerSuite/_It_is_not_a_test_it_is_a_sbt_testing_SuiteSelector_/
```
sbt.ForkMain$ForkError: sbt.ForkMain$ForkError: java.lang.LinkageError: loader constraint violation: loader (instance of net/bytebuddy/dynamic/loading/MultipleParentClassLoader) previously initiated loading for a different type with name "org/apache/hive/service/ServiceStateChangeListener"
at org.mockito.codegen.HiveThriftServer2$MockitoMock$1974707245.<clinit>(Unknown Source)
at sun.reflect.GeneratedSerializationConstructorAccessor164.newInstance(Unknown Source)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.objenesis.instantiator.sun.SunReflectionFactoryInstantiator.newInstance(SunReflectionFactoryInstantiator.java:48)
at org.objenesis.ObjenesisBase.newInstance(ObjenesisBase.java:73)
at org.mockito.internal.creation.instance.ObjenesisInstantiator.newInstance(ObjenesisInstantiator.java:19)
at org.mockito.internal.creation.bytebuddy.SubclassByteBuddyMockMaker.createMock(SubclassByteBuddyMockMaker.java:47)
at org.mockito.internal.creation.bytebuddy.ByteBuddyMockMaker.createMock(ByteBuddyMockMaker.java:25)
at org.mockito.internal.util.MockUtil.createMock(MockUtil.java:35)
at org.mockito.internal.MockitoCore.mock(MockitoCore.java:62)
at org.mockito.Mockito.mock(Mockito.java:1908)
at org.mockito.Mockito.mock(Mockito.java:1880)
at org.apache.spark.sql.hive.thriftserver.ui.HiveThriftServer2ListenerSuite.createAppStatusStore(HiveThriftServer2ListenerSuite.scala:156)
at org.apache.spark.sql.hive.thriftserver.ui.HiveThriftServer2ListenerSuite.$anonfun$new$3(HiveThriftServer2ListenerSuite.scala:47)
at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
at org.scalatest.Transformer.apply(Transformer.scala:22)
at org.scalatest.Transformer.apply(Transformer.scala:20)
```
### Why are the changes needed?
To make test cases more robust.
### Does this PR introduce any user-facing change?
No (dev only).
### How was this patch tested?
Jenkins build.
Closes#26720 from shahidki31/mock.
Authored-by: shahid <shahidki31@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
Currently, Apache Spark PR Builder using `hive-1.2` for `hadoop-2.7` and `hive-2.3` for `hadoop-3.2`. This PR aims to support
- `[test-hive1.2]` in PR builder
- `[test-hive2.3]` in PR builder to be consistent and independent of the default profile
- After this PR, all PR builders will use Hive 2.3 by default (because Spark uses Hive 2.3 by default as of c98e5eb339)
- Use default profile in AppVeyor build.
Note that this was reverted due to unexpected test failure at `ThriftServerPageSuite`, which was investigated in https://github.com/apache/spark/pull/26706 . This PR fixed it by letting it use their own forked JVM. There is no explicit evidence for this fix and it was just my speculation, and thankfully it fixed at least.
### Why are the changes needed?
This new tag allows us more flexibility.
### Does this PR introduce any user-facing change?
No. (This is a dev-only change.)
### How was this patch tested?
Check the Jenkins triggers in this PR.
Default:
```
========================================================================
Building Spark
========================================================================
[info] Building Spark using SBT with these arguments: -Phadoop-2.7 -Phive-2.3 -Phive-thriftserver -Pmesos -Pspark-ganglia-lgpl -Phadoop-cloud -Phive -Pkubernetes -Pkinesis-asl -Pyarn test:package streaming-kinesis-asl-assembly/assembly
```
`[test-hive1.2][test-hadoop3.2]`:
```
========================================================================
Building Spark
========================================================================
[info] Building Spark using SBT with these arguments: -Phadoop-3.2 -Phive-1.2 -Phadoop-cloud -Pyarn -Pspark-ganglia-lgpl -Phive -Phive-thriftserver -Pmesos -Pkubernetes -Pkinesis-asl test:package streaming-kinesis-asl-assembly/assembly
```
`[test-maven][test-hive-2.3]`:
```
========================================================================
Building Spark
========================================================================
[info] Building Spark using Maven with these arguments: -Phadoop-2.7 -Phive-2.3 -Pspark-ganglia-lgpl -Pyarn -Phive -Phadoop-cloud -Pkinesis-asl -Pmesos -Pkubernetes -Phive-thriftserver clean package -DskipTests
```
Closes#26710 from HyukjinKwon/SPARK-29991.
Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
This PR aims to add `io.netty.tryReflectionSetAccessible=true` to the testing configuration for JDK11 because this is an officially documented requirement of Apache Arrow.
Apache Arrow community documented this requirement at `0.15.0` ([ARROW-6206](https://github.com/apache/arrow/pull/5078)).
> #### For java 9 or later, should set "-Dio.netty.tryReflectionSetAccessible=true".
> This fixes `java.lang.UnsupportedOperationException: sun.misc.Unsafe or java.nio.DirectByteBuffer.(long, int) not available`. thrown by netty.
### Why are the changes needed?
After ARROW-3191, Arrow Java library requires the property `io.netty.tryReflectionSetAccessible` to be set to true for JDK >= 9. After https://github.com/apache/spark/pull/26133, JDK11 Jenkins job seem to fail.
- https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-3.2-jdk-11/676/
- https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-3.2-jdk-11/677/
- https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-3.2-jdk-11/678/
```scala
Previous exception in task:
sun.misc.Unsafe or java.nio.DirectByteBuffer.<init>(long, int) not available

io.netty.util.internal.PlatformDependent.directBuffer(PlatformDependent.java:473)

io.netty.buffer.NettyArrowBuf.getDirectBuffer(NettyArrowBuf.java:243)

io.netty.buffer.NettyArrowBuf.nioBuffer(NettyArrowBuf.java:233)

io.netty.buffer.ArrowBuf.nioBuffer(ArrowBuf.java:245)

org.apache.arrow.vector.ipc.message.ArrowRecordBatch.computeBodyLength(ArrowRecordBatch.java:222)

```
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
Pass the Jenkins with JDK11.
Closes#26552 from dongjoon-hyun/SPARK-ARROW-JDK11.
Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
This patch addresses CI build issue on sbt Hadoop-3.2 Jenkins job: SparkSQLEnvSuite are failing. Looks like the reason of test failure is the test checks registered listeners from active SparkSession which could be interfered with other test suites running concurrently. If we isolate test suite the problem should be gone.
### Why are the changes needed?
CI builds for "spark-master-test-sbt-hadoop-3.2" are failing.
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
I've run the single test suite with below command and it passed 3 times sequentially:
```
build/sbt "hive-thriftserver/testOnly *.SparkSQLEnvSuite" -Phadoop-3.2 -Phive-thriftserver
```
so we expect the test suite will pass if we isolate the test suite.
Closes#26342 from HeartSaVioR/SPARK-29604-FOLLOWUP.
Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
This PR aims to specify the JDK8 default configurations `-XX:+UseParallelGC -XX:-UseDynamicNumberOfGCThreads` explicitly. As we see in this PR [here](https://github.com/apache/spark/pull/25966/files#diff-12b89b7ee67c63c2254b749c8f8d0694R10), this will make the comparison between JDK8 and JDK11 easier by removing a misleading regression.
**NOTE THAT THESE JVM CONFS ARE ONLY FOR BENCHMARK COMPARISON, NOT FOR A PRODUCTION**
### Why are the changes needed?
There exists many JVM-level changes between JDK8 and JDK11. For example, the followings are notable changes and it turns out that especially (1) and (2) shows a misleading regression in our micro-benchmark environment because our microbenchmark uses small VM memory.
1. [JEP 248: Make G1 the Default Garbage Collector](https://bugs.openjdk.java.net/browse/JDK-8073273) **JDK9+**
2. [Enable UseDynamicNumberOfGCThreads by default](https://bugs.openjdk.java.net/browse/JDK-8198547) **JDK11+**
3. [Change default value of HeapSizePerGCThread](https://bugs.openjdk.java.net/browse/JDK-8200417) **JDK11+**
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
This is a test-only JVM configuration change. Manually, run the benchmark.
Closes#25966 from dongjoon-hyun/SPARK-29282.
Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
## What changes were proposed in this pull request?
This PR upgrade Scala to **2.12.10**.
Release notes:
- Fix regression in large string interpolations with non-String typed splices
- Revert "Generate shallower ASTs in pattern translation"
- Fix regression in classpath when JARs have 'a.b' entries beside 'a/b'
- Faster compiler: 5–10% faster since 2.12.8
- Improved compatibility with JDK 11, 12, and 13
- Experimental support for build pipelining and outline type checking
More details:
https://github.com/scala/scala/releases/tag/v2.12.10https://github.com/scala/scala/releases/tag/v2.12.9
## How was this patch tested?
Existing tests
Closes#25404 from wangyum/SPARK-28683.
Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
## What changes were proposed in this pull request?
This patch tries to keep consistency whenever UTF-8 charset is needed, as using `StandardCharsets.UTF_8` instead of using "UTF-8". If the String type is needed, `StandardCharsets.UTF_8.name()` is used.
This change also brings the benefit of getting rid of `UnsupportedEncodingException`, as we're providing `Charset` instead of `String` whenever possible.
This also changes some private Catalyst helper methods to operate on encodings as `Charset` objects rather than strings.
## How was this patch tested?
Existing unit tests.
Closes#25335 from HeartSaVioR/SPARK-28601.
Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
## What changes were proposed in this pull request?
This PR proposes to make Javadoc in org.apache.spark.shuffle.api visible.
## How was this patch tested?
Manually built the doc and checked:
![Screen Shot 2019-08-01 at 4 48 23 PM](https://user-images.githubusercontent.com/6477701/62275587-400cc080-b47d-11e9-8fba-c4a0607093d1.png)
Closes#25323 from HyukjinKwon/SPARK-28568.
Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
## What changes were proposed in this pull request?
This PR introduces the necessary Maven modules for the new [Spark Graph](https://issues.apache.org/jira/browse/SPARK-25994) feature for Spark 3.0.
* `spark-graph` is a parent module that users depend on to get all graph functionalities (Cypher and Graph Algorithms)
* `spark-graph-api` defines the [Property Graph API](https://docs.google.com/document/d/1Wxzghj0PvpOVu7XD1iA8uonRYhexwn18utdcTxtkxlI) that is being shared between Cypher and Algorithms
* `spark-cypher` contains a Cypher query engine implementation
Both, `spark-graph-api` and `spark-cypher` depend on Spark SQL.
Note, that the Maven module for Graph Algorithms is not part of this PR and will be introduced in https://issues.apache.org/jira/browse/SPARK-27302
A PoC for a running Cypher implementation can be found in this WIP PR https://github.com/apache/spark/pull/24297
## How was this patch tested?
Pass the Jenkins with all profiles and manually build and check the followings.
```
$ ls assembly/target/scala-2.12/jars/spark-cypher*
assembly/target/scala-2.12/jars/spark-cypher_2.12-3.0.0-SNAPSHOT.jar
$ ls assembly/target/scala-2.12/jars/spark-graph* | grep -v graphx
assembly/target/scala-2.12/jars/spark-graph-api_2.12-3.0.0-SNAPSHOT.jar
assembly/target/scala-2.12/jars/spark-graph_2.12-3.0.0-SNAPSHOT.jar
```
Closes#24490 from s1ck/SPARK-27300.
Lead-authored-by: Martin Junghanns <martin.junghanns@neotechnology.com>
Co-authored-by: Max Kießling <max@kopfueber.org>
Co-authored-by: Martin Junghanns <martin.junghanns@neo4j.com>
Co-authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
## What changes were proposed in this pull request?
Move methods that implement v2 catalog operations to CatalogV2Util so they can be used in #24768.
## How was this patch tested?
Behavior is validated by existing tests.
Closes#24813 from rdblue/SPARK-27964-add-catalog-v2-util.
Authored-by: Ryan Blue <blue@apache.org>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
## What changes were proposed in this pull request?
Kind of related to https://github.com/gatorsmile/spark/pull/5 - let's update genjavadoc to see if it generates fewer spurious javadoc errors to begin with.
## How was this patch tested?
Existing docs build
Closes#24443 from srowen/genjavadoc013.
Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
## What changes were proposed in this pull request?
The test time of `HiveClientVersions` is around 3.5 minutes.
This PR is to add it into the parallel test suite list. To make sure there is no colliding warehouse location, we can change the warehouse path to a temporary directory.
## How was this patch tested?
Unit test
Closes#24404 from gengliangwang/parallelTestFollowUp.
Authored-by: Gengliang Wang <gengliang.wang@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
## What changes were proposed in this pull request?
This patch modifies SparkBuild so that the largest / slowest test suites (or collections of suites) can run in their own forked JVMs, allowing them to be run in parallel with each other. This opt-in / whitelisting approach allows us to increase parallelism without having to fix a long-tail of flakiness / brittleness issues in tests which aren't performance bottlenecks.
See comments in SparkBuild.scala for information on the details, including a summary of why we sometimes opt to run entire groups of tests in a single forked JVM .
The time of full new pull request test in Jenkins is reduced by around 53%:
before changes: 4hr 40min
after changes: 2hr 13min
## How was this patch tested?
Unit test
Closes#24373 from gengliangwang/parallelTest.
Authored-by: Gengliang Wang <gengliang.wang@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
## What changes were proposed in this pull request?
The API docs should not include the "org.apache.spark.util.kvstore" package because they are internal private APIs. See the doc link: https://spark.apache.org/docs/latest/api/java/org/apache/spark/util/kvstore/LevelDB.html
## How was this patch tested?
N/A
Closes#24386 from gatorsmile/rmDoc.
Authored-by: gatorsmile <gatorsmile@gmail.com>
Signed-off-by: gatorsmile <gatorsmile@gmail.com>
## What changes were proposed in this pull request?
Remove Scala 2.11 support in build files and docs, and in various parts of code that accommodated 2.11. See some targeted comments below.
## How was this patch tested?
Existing tests.
Closes#23098 from srowen/SPARK-26132.
Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
## What changes were proposed in this pull request?
Use the sbt maven plugin option`antlr4TreatWarningsAsErrors` to make sure the warnings are treated as errors while generating the parser. In the absence of it, we may inadvertently introduce problems while making grammar changes. Please refer to PR-23897 to know more about the context. We made a change in [pr-23925](https://github.com/apache/spark/pull/23925) which handled only the maven build.
In this PR, we handle the sbt build. I had submitted [PR-23](https://github.com/ihji/sbt-antlr4/pull/23) to enhance the sbt-antlr plugin to make is possible to pass the error on warning option.
## How was this patch tested?
Force an warning in the grammar file to check if the build fails. Then remove the warning to verify the build succeeds.
Closes#24060 from dilipbiswal/sbt_build_antlr.
Authored-by: Dilip Biswal <dbiswal@us.ibm.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
## What changes were proposed in this pull request?
The java version here means versions of javac source, javac target, scalac target. They could be consolidated as a single version (currently 1.8)
| |javac|scalac |
|------|-----|---------|
|source|1.8 |2.12/2.11|
|target|1.8 |1.8 |
The current issues are as follows
* Maven build defines a single property (`java.version`) to specify java version while SBT build defines different properties for javac (`javacJVMVersion`) and scalac (`scalacJVMVersion`). SBT build should use a single property as Maven build does.
* Furthermore, it's better for SBT build to refer to `java.version` defined by Maven build. This is possible since we've already been using sbt-pom-reader.
## How was this patch tested?
Tested locally.
```
build/mvn clean compile
build/sbt clean compile
```
Closes#23724 from seancxmao/specify-java-version-once.
Authored-by: seancxmao <seancxmao@gmail.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
## What changes were proposed in this pull request?
Currently ANTLR v4 versions used by Maven and SBT are slightly different. Maven uses `4.7.1` while SBT uses `4.7`.
* Maven(`pom.xml`): `<antlr4.version>4.7.1</antlr4.version>`
* SBT(`project/SparkBuild.scala`): `antlr4Version in Antlr4 := "4.7"`
We should make Maven and SBT use a single version. Furthermore we'd better specify antlr4 version in one place to avoid mismatch between Maven and SBT in the future.
This PR lets SBT use antlr4 version specified in Maven POM file, rather than specify its own antlr4 version. This is in the same as how `hadoop.version` is specified in `project/SparkBuild.scala`
## How was this patch tested?
Test locally.
After run `sbt compile`, Java files generated by ANTLR are located at:
```
sql/catalyst/target/scala-2.12/src_managed/main/antlr4/org/apache/spark/sql/catalyst/parser/*.java
```
These Java files have a comment at the head. We can see now SBT uses ANTLR `4.7.1`.
```
// Generated from .../spark/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 by ANTLR 4.7.1
```
Closes#23713 from seancxmao/antlr4-version-consistent.
Authored-by: seancxmao <seancxmao@gmail.com>
Signed-off-by: gatorsmile <gatorsmile@gmail.com>
## What changes were proposed in this pull request?
There are ugly provided dependencies inside core for the following:
* Hive
* Kafka
In this PR I've extracted them out. This PR contains the following:
* Token providers are now loaded with service loader
* Hive token provider moved to hive project
* Kafka token provider extracted into a new project
## How was this patch tested?
Existing + newly added unit tests.
Additionally tested on cluster.
Closes#23499 from gaborgsomogyi/SPARK-26254.
Authored-by: Gabor Somogyi <gabor.g.somogyi@gmail.com>
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
## What changes were proposed in this pull request?
Increase test memory to avoid OOM in TimSort-related tests.
## How was this patch tested?
Existing tests.
Closes#23425 from srowen/SPARK-26306.
Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
docker-image-tool.sh requires explicit argument to create the python
image now; do that from the sbt integration tests target too.
Closes#23172 from vanzin/SPARK-25957.followup.
Authored-by: Marcelo Vanzin <vanzin@cloudera.com>
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
The "build context" for a docker image - basically the whole contents of the
current directory where "docker" is invoked - can be huge in a dev build,
easily breaking a couple of gigs.
Doing that copy 3 times during the build of docker images severely slows
down the process.
This patch creates a smaller build context - basically mimicking what the
make-distribution.sh script does, so that when building the docker images,
only the necessary bits are in the current directory. For PySpark and R that
is optimized further, since those images are built based on the previously
built Spark main image.
In my current local clone, the dir size is about 2G, but with this script
the "context" sent to docker is about 250M for the main image, 1M for the
pyspark image and 8M for the R image. That speeds up the image builds
considerably.
I also snuck in a fix to the k8s integration test dependencies in the sbt
build, so that the examples are properly built (without having to do it
manually).
Closes#23019 from vanzin/SPARK-26025.
Authored-by: Marcelo Vanzin <vanzin@cloudera.com>
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
## What changes were proposed in this pull request?
This PR makes Spark's default Scala version as 2.12, and Scala 2.11 will be the alternative version. This implies that Scala 2.12 will be used by our CI builds including pull request builds.
We'll update the Jenkins to include a new compile-only jobs for Scala 2.11 to ensure the code can be still compiled with Scala 2.11.
## How was this patch tested?
existing tests
Closes#22967 from dbtsai/scala2.12.
Authored-by: DB Tsai <d_tsai@apple.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
## What changes were proposed in this pull request?
Add `Locale.ROOT` to all internal calls to String `toLowerCase`, `toUpperCase`
## How was this patch tested?
existing tests
Closes#22975 from zhengruifeng/Tokenizer_Locale.
Authored-by: zhengruifeng <ruifengz@foxmail.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
The integration tests can now be run in sbt if the right profile
is enabled, using the "test" task under the respective project.
This avoids having to fall back to maven to run the tests, which
invalidates all your compiled stuff when you go back to sbt, making
development way slower than it should.
There's also a task to run the tests directly without refreshing
the docker images, which is helpful if you just made a change to
the submission code which should not affect the code in the images.
The sbt tasks currently are not very customizable; there's some
very minor things you can set in the sbt shell itself, but otherwise
it's hardcoded to run on minikube.
I also had to make some slight adjustments to the IT code itself,
mostly to remove assumptions about the existing harness.
Tested on sbt and maven.
Closes#22909 from vanzin/SPARK-25897.
Authored-by: Marcelo Vanzin <vanzin@cloudera.com>
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
## What changes were proposed in this pull request?
Remove Kafka 0.8 integration
## How was this patch tested?
Existing tests, build scripts
Closes#22703 from srowen/SPARK-25705.
Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
## What changes were proposed in this pull request?
Removes all vestiges of Flume in the build, for Spark 3.
I don't think this needs Jenkins config changes.
## How was this patch tested?
Existing tests.
Closes#22692 from srowen/SPARK-25598.
Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
## What changes were proposed in this pull request?
Improve build for Scala 2.12. Current build for sbt fails on the subproject `repl`:
```
[info] Compiling 6 Scala sources to /Users/rendong/wdi/spark/repl/target/scala-2.12/classes...
[error] /Users/rendong/wdi/spark/repl/scala-2.11/src/main/scala/org/apache/spark/repl/SparkILoopInterpreter.scala:80: overriding lazy value importableSymbolsWithRenames in class ImportHandler of type List[(this.intp.global.Symbol, this.intp.global.Name)];
[error] lazy value importableSymbolsWithRenames needs `override' modifier
[error] lazy val importableSymbolsWithRenames: List[(Symbol, Name)] = {
[error] ^
[warn] /Users/rendong/wdi/spark/repl/src/main/scala/org/apache/spark/repl/SparkILoop.scala:53: variable addedClasspath in class ILoop is deprecated (since 2.11.0): use reset, replay or require to update class path
[warn] if (addedClasspath != "") {
[warn] ^
[warn] /Users/rendong/wdi/spark/repl/src/main/scala/org/apache/spark/repl/SparkILoop.scala:54: variable addedClasspath in class ILoop is deprecated (since 2.11.0): use reset, replay or require to update class path
[warn] settings.classpath append addedClasspath
[warn] ^
[warn] two warnings found
[error] one error found
[error] (repl/compile:compileIncremental) Compilation failed
[error] Total time: 93 s, completed 2018-9-3 10:07:26
```
## How was this patch tested?
```
./dev/change-scala-version.sh 2.12
## For Maven
./build/mvn -Pscala-2.12 [mvn commands]
## For SBT
sbt -Dscala.version=2.12.6
```
Closes#22310 from sadhen/SPARK-25298.
Authored-by: Darcy Shen <sadhen@zoho.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
## What changes were proposed in this pull request?
When starting spark-shell from Mac terminal (MacOS High Sirra Version 10.13.6), Getting exception
[ERROR] Failed to construct terminal; falling back to unsupported
java.lang.NumberFormatException: For input string: "0x100"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:580)
at java.lang.Integer.valueOf(Integer.java:766)
at jline.internal.InfoCmp.parseInfoCmp(InfoCmp.java:59)
at jline.UnixTerminal.parseInfoCmp(UnixTerminal.java:242)
at jline.UnixTerminal.<init>(UnixTerminal.java:65)
at jline.UnixTerminal.<init>(UnixTerminal.java:50)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at java.lang.Class.newInstance(Class.java:442)
at jline.TerminalFactory.getFlavor(TerminalFactory.java:211)
This issue is due a jline defect : https://github.com/jline/jline2/issues/281, which is fixed in Jline 2.14.4, bumping up JLine version in spark to version >= Jline 2.14.4 will fix the issue
## How was this patch tested?
No new UT/automation test added, after upgrade to latest Jline version 2.14.6, manually tested spark shell features
Closes#22130 from vinodkc/br_UpgradeJLineVersion.
Authored-by: Vinod KC <vinod.kc.in@gmail.com>
Signed-off-by: hyukjinkwon <gurwls223@apache.org>
## What changes were proposed in this pull request?
Apache Avro (https://avro.apache.org) is a popular data serialization format. It is widely used in the Spark and Hadoop ecosystem, especially for Kafka-based data pipelines. Using the external package https://github.com/databricks/spark-avro, Spark SQL can read and write the avro data. Making spark-Avro built-in can provide a better experience for first-time users of Spark SQL and structured streaming. We expect the built-in Avro data source can further improve the adoption of structured streaming.
The proposal is to inline code from spark-avro package (https://github.com/databricks/spark-avro). The target release is Spark 2.4.
[Built-in AVRO Data Source In Spark 2.4.pdf](https://github.com/apache/spark/files/2181511/Built-in.AVRO.Data.Source.In.Spark.2.4.pdf)
## How was this patch tested?
Unit test
Author: Gengliang Wang <gengliang.wang@databricks.com>
Closes#21742 from gengliangwang/export_avro.
## What changes were proposed in this pull request?
During SPARK-24418 (Upgrade Scala to 2.11.12 and 2.12.6), we upgrade `jline` version together. So, `mvn` works correctly. However, `sbt` brings old jline library and is hitting `NoSuchMethodError` in `master` branch, see https://github.com/apache/spark/pull/21495#issuecomment-401560826. This overrides jline version in SBT to make sbt build work.
## How was this patch tested?
Manually test.
Author: Liang-Chi Hsieh <viirya@gmail.com>
Closes#21692 from viirya/SPARK-24715.
## What changes were proposed in this pull request?
Fix java checkstyle failure of kubernetes-integration-tests
## How was this patch tested?
Checked manually on my local environment.
Author: Xingbo Jiang <xingbo.jiang@databricks.com>
Closes#21545 from jiangxb1987/k8s-checkstyle.
Apply the suggestion on the bug to fix source links. Tested with
the 2.3.1 release docs.
Author: Marcelo Vanzin <vanzin@cloudera.com>
Closes#21521 from vanzin/SPARK-23732.
## What changes were proposed in this pull request?
This PR proposes to check Java lint via SBT for Jenkins. It uses the SBT wrapper for checkstyle.
I manually tested. If we build the codes once, running this script takes 2 mins at maximum in my local:
Test codes:
```
Checkstyle failed at following occurrences:
[error] Checkstyle error found in /.../spark/core/src/test/java/test/org/apache/spark/JavaAPISuite.java:82: Line is longer than 100 characters (found 103).
[error] 1 issue(s) found in Checkstyle report: /.../spark/core/target/checkstyle-test-report.xml
[error] Checkstyle error found in /.../spark/sql/hive/src/test/java/org/apache/spark/sql/hive/JavaDataFrameSuite.java:84: Line is longer than 100 characters (found 115).
[error] 1 issue(s) found in Checkstyle report: /.../spark/sql/hive/target/checkstyle-test-report.xml
...
```
Main codes:
```
Checkstyle failed at following occurrences:
[error] Checkstyle error found in /.../spark/sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/InputPartition.java:39: Line is longer than 100 characters (found 104).
[error] Checkstyle error found in /.../spark/sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/InputPartitionReader.java:26: Line is longer than 100 characters (found 110).
[error] Checkstyle error found in /.../spark/sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/InputPartitionReader.java:30: Line is longer than 100 characters (found 104).
...
```
## How was this patch tested?
Manually tested. Jenkins build should test this.
Author: hyukjinkwon <gurwls223@apache.org>
Closes#21399 from HyukjinKwon/SPARK-22269.
Prevents Scala 2.12 scaladoc from blowing up attempting to parse java comments.
## What changes were proposed in this pull request?
Adds -no-java-comments to docs/scalacOptions under Scala 2.12. Also
moves scaladoc configs out of the TestSettings and into the standard sharedSettings
section in SparkBuild.scala.
## How was this patch tested?
SBT_OPTS=-Dscala-2.12 sbt
++2.12.4
tags/publishLocal
Author: Erik LaBianca <erik.labianca@gmail.com>
Closes#20042 from easel/scaladoc-212.
## What changes were proposed in this pull request?
Moves the -Xlint:unchecked flag in the sbt build configuration from Compile to (Compile, compile) scope, allowing publish and publishLocal commands to work.
## How was this patch tested?
Successfully published the spark-launcher subproject from within sbt successfully, where it fails without this patch.
Author: Erik LaBianca <erik.labianca@gmail.com>
Closes#20040 from easel/javadoc-xlint.
## What changes were proposed in this pull request?
This is a stripped down version of the `KubernetesClusterSchedulerBackend` for Spark with the following components:
- Static Allocation of Executors
- Executor Pod Factory
- Executor Recovery Semantics
It's step 1 from the step-wise plan documented [here](https://github.com/apache-spark-on-k8s/spark/issues/441#issuecomment-330802935).
This addition is covered by the [SPIP vote](http://apache-spark-developers-list.1001551.n3.nabble.com/SPIP-Spark-on-Kubernetes-td22147.html) which passed on Aug 31 .
## How was this patch tested?
- The patch contains unit tests which are passing.
- Manual testing: `./build/mvn -Pkubernetes clean package` succeeded.
- It is a **subset** of the entire changelist hosted in http://github.com/apache-spark-on-k8s/spark which is in active use in several organizations.
- There is integration testing enabled in the fork currently [hosted by PepperData](spark-k8s-jenkins.pepperdata.org:8080) which is being moved over to RiseLAB CI.
- Detailed documentation on trying out the patch in its entirety is in: https://apache-spark-on-k8s.github.io/userdocs/running-on-kubernetes.html
cc rxin felixcheung mateiz (shepherd)
k8s-big-data SIG members & contributors: mccheah ash211 ssuchter varunkatta kimoonkim erikerlandson liyinan926 tnachen ifilonenko
Author: Yinan Li <liyinan926@gmail.com>
Author: foxish <ramanathana@google.com>
Author: mcheah <mcheah@palantir.com>
Closes#19468 from foxish/spark-kubernetes-3.
## What changes were proposed in this pull request?
Set `-ea` and `-Xss4m` consistently for tests, to fix in particular:
```
OrderingSuite:
...
- GenerateOrdering with ShortType
*** RUN ABORTED ***
java.lang.StackOverflowError:
at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:370)
at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
...
```
## How was this patch tested?
Existing tests. Manually verified it resolves the StackOverflowError this intends to resolve.
Author: Sean Owen <sowen@cloudera.com>
Closes#19820 from srowen/SPARK-22607.