Commit graph

109 commits

Author SHA1 Message Date
Yuming Wang c87b0085c9 [SPARK-33696][BUILD][SQL] Upgrade built-in Hive to 2.3.8
### What changes were proposed in this pull request?

Hive 2.3.8 changes:
HIVE-19662: Upgrade Avro to 1.8.2
HIVE-24324: Remove deprecated API usage from Avro
HIVE-23980: Shade Guava from hive-exec in Hive 2.3
HIVE-24436: Fix Avro NULL_DEFAULT_VALUE compatibility issue
HIVE-24512: Exclude calcite in packaging.
HIVE-22708: Fix for HttpTransport to replace String.equals
HIVE-24551: Hive should include transitive dependencies from calcite after shading it
HIVE-24553: Exclude calcite from test-jar dependency of hive-exec

### Why are the changes needed?

Upgrade Avro and Parquet to latest version.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing test add test try to upgrade Parquet to 1.11.1 and Avro to 1.10.1: https://github.com/apache/spark/pull/30517

Closes #30657 from wangyum/SPARK-33696.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2021-01-17 21:54:35 -08:00
Josh Soref 485145326a [MINOR] Spelling bin core docs external mllib repl
### What changes were proposed in this pull request?

This PR intends to fix typos in the sub-modules:
* `bin`
* `core`
* `docs`
* `external`
* `mllib`
* `repl`
* `pom.xml`

Split per srowen https://github.com/apache/spark/pull/30323#issuecomment-728981618

NOTE: The misspellings have been reported at 706a726f87 (commitcomment-44064356)

### Why are the changes needed?

Misspelled words make it harder to read / understand content.

### Does this PR introduce _any_ user-facing change?

There are various fixes to documentation, etc...

### How was this patch tested?

No testing was performed

Closes #30530 from jsoref/spelling-bin-core-docs-external-mllib-repl.

Authored-by: Josh Soref <jsoref@users.noreply.github.com>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
2020-11-30 13:59:51 +09:00
Kousuke Saruta 005999721f [SPARK-33046][DOCS] Update how to build doc for Scala 2.13 with sbt
### What changes were proposed in this pull request?

This PR fixes the description how to build Spark for Scala 2.13 with sbt.
In the current doc, how to build Spark for Scala 2.13 with sbt is described like:
![scala-2 13-build-before](https://user-images.githubusercontent.com/4736016/94816248-80c3e900-0436-11eb-9bc2-99af5786971a.png)

But build fails with this command because scala-2.13 profile is not enabled and scala-parallel-collections is absent.

```
[error] /home/kou/work/oss/spark-scala-2.13/core/src/main/scala/org/apache/spark/rdd/UnionRDD.scala:23: object parallel is not a member of package collection
```

The correct command should be:
```
build/sbt -Pspark-2.13 compile
```

### Why are the changes needed?

The build command is wrong.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

I checked that `sbt -Pspark-2.13` is correct with the following command:
```
build/sbt -Dscala.version=2.13.3 -Phive -Phive-thriftserver -Pyarn -Pkubernetes  compile
```

I also build the modified doc and checked the generated html:
![spark-scala-2 13-build-doc-after](https://user-images.githubusercontent.com/4736016/94869259-f2745500-047f-11eb-89e5-20816f3ed24d.png)

Closes #29921 from sarutak/fix-scala-2.13-build-doc.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
2020-10-01 18:01:23 -05:00
Yuming Wang b11e42663b
[SPARK-31381][SPARK-29245][SQL] Upgrade built-in Hive 2.3.6 to 2.3.7
### What changes were proposed in this pull request?

**Hive 2.3.7** fixed these issues:
- HIVE-21508: ClassCastException when initializing HiveMetaStoreClient on JDK10 or newer
- HIVE-21980:Parsing time can be high in case of deeply nested subqueries
- HIVE-22249: Support Parquet through HCatalog

### Why are the changes needed?
Fix CCE during creating HiveMetaStoreClient in JDK11 environment: [SPARK-29245](https://issues.apache.org/jira/browse/SPARK-29245).

### Does this PR introduce any user-facing change?
No.

### How was this patch tested?

- [x] Test Jenkins with Hadoop 2.7 (https://github.com/apache/spark/pull/28148#issuecomment-616757840)
- [x] Test Jenkins with Hadoop 3.2 on JDK11 (https://github.com/apache/spark/pull/28148#issuecomment-616294353)
- [x] Manual test with remote hive metastore.

Hive side:

```
export JAVA_HOME=/usr/lib/jdk1.8.0_221
export PATH=$JAVA_HOME/bin:$PATH
cd /usr/lib/hive-2.3.6 # Start Hive metastore with Hive 2.3.6
bin/schematool -dbType derby -initSchema --verbose
bin/hive --service metastore
```

Spark side:

```
export JAVA_HOME=/usr/lib/jdk-11.0.3
export PATH=$JAVA_HOME/bin:$PATH
build/sbt clean package -Phive -Phadoop-3.2 -Phive-thriftserver
export SPARK_PREPEND_CLASSES=true
bin/spark-sql --conf spark.hadoop.hive.metastore.uris=thrift://localhost:9083
```

Closes #28148 from wangyum/SPARK-31381.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-04-20 13:38:24 -07:00
zero323 298d0a5102 [SPARK-23435][SPARKR][TESTS] Update testthat to >= 2.0.0
### What changes were proposed in this pull request?

- Update `testthat` to >= 2.0.0
- Replace of `testthat:::run_tests` with `testthat:::test_package_dir`
- Add trivial assertions for tests, without any expectations, to avoid skipping.
- Update related docs.

### Why are the changes needed?

`testthat` version has been frozen by [SPARK-22817](https://issues.apache.org/jira/browse/SPARK-22817) / https://github.com/apache/spark/pull/20003, but 1.0.2 is pretty old, and we shouldn't keep things in this state forever.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

- Existing CI pipeline:
     - Windows build on AppVeyor, R 3.6.2, testthtat 2.3.1
     - Linux build on Jenkins, R 3.1.x, testthat 1.0.2

- Additional builds with thesthat 2.3.1  using [sparkr-build-sandbox](https://github.com/zero323/sparkr-build-sandbox) on c7ed64af9e697b3619779857dd820832176b3be3

   R 3.4.4  (image digest ec9032f8cf98)
   ```
   docker pull zero323/sparkr-build-sandbox:3.4.4
   docker run zero323/sparkr-build-sandbox:3.4.4 zero323 --branch SPARK-23435 --commit c7ed64af9e697b3619779857dd820832176b3be3 --public-key https://keybase.io/zero323/pgp_keys.asc
    ```
    3.5.3 (image digest 0b1759ee4d1d)

    ```
    docker pull zero323/sparkr-build-sandbox:3.5.3
    docker run zero323/sparkr-build-sandbox:3.5.3 zero323 --branch SPARK-23435 --commit
    c7ed64af9e697b3619779857dd820832176b3be3 --public-key https://keybase.io/zero323/pgp_keys.asc
    ```

   and 3.6.2 (image digest 6594c8ceb72f)
    ```
   docker pull zero323/sparkr-build-sandbox:3.6.2
   docker run zero323/sparkr-build-sandbox:3.6.2 zero323 --branch SPARK-23435 --commit c7ed64af9e697b3619779857dd820832176b3be3 --public-key https://keybase.io/zero323/pgp_keys.asc
   ````

   Corresponding [asciicast](https://asciinema.org/) are available as 10.5281/zenodo.3629431

     [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.3629431.svg)](https://doi.org/10.5281/zenodo.3629431)

   (a bit to large to burden asciinema.org, but can run locally via `asciinema play`).

----------------------------

Continued from #27328

Closes #27359 from zero323/SPARK-23435.

Authored-by: zero323 <mszymkiewicz@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-01-29 10:37:08 +09:00
Yuming Wang fa47b7faf7 [SPARK-30280][DOC] Update docs for make Hive 2.3 dependency by default
### What changes were proposed in this pull request?

This PR update document for make Hive 2.3 dependency by default.

### Why are the changes needed?

The documentation is incorrect.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

N/A

Closes #26919 from wangyum/SPARK-30280.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-12-21 10:51:28 -08:00
Yuming Wang e1ee3fb72f [SPARK-30216][INFRA] Use python3 in Docker release image
### What changes were proposed in this pull request?

- Reverts commit 1f94bf4 and d6be46e
- Switches python to python3 in Docker release image.

### Why are the changes needed?
`dev/make-distribution.sh` and `python/setup.py` are use python3.
https://github.com/apache/spark/pull/26844/files#diff-ba2c046d92a1d2b5b417788bfb5cb5f8L236
https://github.com/apache/spark/pull/26330/files#diff-8cf6167d58ce775a08acafcfe6f40966

### Does this PR introduce any user-facing change?
No.

### How was this patch tested?

manual test:
```
yumwangubuntu-3513086:~/spark$ dev/create-release/do-release-docker.sh -n -d /home/yumwang/spark-release
Output directory already exists. Overwrite and continue? [y/n] y
Branch [branch-2.4]: master
Current branch version is 3.0.0-SNAPSHOT.
Release [3.0.0]: 3.0.0-preview2
RC # [1]:
This is a dry run. Please confirm the ref that will be built for testing.
Ref [master]:
ASF user [yumwang]:
Full name [Yuming Wang]:
GPG key [yumwangapache.org]: DBD447010C1B4F7DAD3F7DFD6E1B4122F6A3A338
================
Release details:
BRANCH:     master
VERSION:    3.0.0-preview2
TAG:        v3.0.0-preview2-rc1
NEXT:       3.0.1-SNAPSHOT

ASF USER:   yumwang
GPG KEY:    DBD447010C1B4F7DAD3F7DFD6E1B4122F6A3A338
FULL NAME:  Yuming Wang
E-MAIL:     yumwangapache.org
================
Is this info correct [y/n]? y
GPG passphrase:

========================
= Building spark-rm image with tag latest...
Command: docker build -t spark-rm:latest --build-arg UID=110302528 /home/yumwang/spark/dev/create-release/spark-rm
Log file: docker-build.log
Building v3.0.0-preview2-rc1; output will be at /home/yumwang/spark-release/output

gpg: directory '/home/spark-rm/.gnupg' created
gpg: keybox '/home/spark-rm/.gnupg/pubring.kbx' created
gpg: /home/spark-rm/.gnupg/trustdb.gpg: trustdb created
gpg: key 6E1B4122F6A3A338: public key "Yuming Wang <yumwangapache.org>" imported
gpg: key 6E1B4122F6A3A338: secret key imported
gpg: Total number processed: 1
gpg:               imported: 1
gpg:       secret keys read: 1
gpg:   secret keys imported: 1
========================
= Creating release tag v3.0.0-preview2-rc1...
Command: /opt/spark-rm/release-tag.sh
Log file: tag.log
It may take some time for the tag to be synchronized to github.
Press enter when you've verified that the new tag (v3.0.0-preview2-rc1) is available.
========================
= Building Spark...
Command: /opt/spark-rm/release-build.sh package
Log file: build.log
========================
= Building documentation...
Command: /opt/spark-rm/release-build.sh docs
Log file: docs.log
========================
= Publishing release
Command: /opt/spark-rm/release-build.sh publish-release
Log file: publish.log
```
Generated doc:
![image](https://user-images.githubusercontent.com/5399861/70693075-a7723100-1cf7-11ea-9f88-9356a02349a1.png)

Closes #26848 from wangyum/SPARK-30216.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-12-13 11:31:31 -08:00
Yuming Wang eb509968a7 [SPARK-30211][INFRA] Use python3 in make-distribution.sh
### What changes were proposed in this pull request?

This PR switches python to python3 in `make-distribution.sh`.

### Why are the changes needed?

SPARK-29672 changed this
- https://github.com/apache/spark/pull/26330/files#diff-8cf6167d58ce775a08acafcfe6f40966

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
N/A

Closes #26844 from wangyum/SPARK-30211.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-12-10 23:30:12 -08:00
Dongjoon Hyun 1595e46a4e [SPARK-30142][TEST-MAVEN][BUILD] Upgrade Maven to 3.6.3
### What changes were proposed in this pull request?

This PR aims to upgrade Maven from 3.6.2 to 3.6.3.

### Why are the changes needed?

This will bring bug fixes like the following.
- MNG-6759 Maven fails to use <repositories> section from dependency when resolving transitive dependencies in some cases
- MNG-6760 ExclusionArtifactFilter result invalid when wildcard exclusion is followed by other exclusions

The following is the full release note.
- https://maven.apache.org/docs/3.6.3/release-notes.html

### Does this PR introduce any user-facing change?

No. (This is a dev-environment change.)

### How was this patch tested?

Pass the Jenkins with both SBT and Maven.

Closes #26770 from dongjoon-hyun/SPARK-30142.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2019-12-06 23:41:59 +09:00
Tomoko Komiyama 8beb736a00 [SPARK-29256][DOCS] Fix typo in building document
### What changes were proposed in this pull request?
 Changed 'Phive-thriftserver' to ' -Phive-thriftserver'.

### Why are the changes needed?
 Typo

### Does this PR introduce any user-facing change?
Yes.

### How was this patch tested?
Manually tested.

Closes #25937 from TomokoKomiyama/fix-build-doc.

Authored-by: Tomoko Komiyama <btkomiyamatm@oss.nttdata.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-09-26 08:23:43 -05:00
Dongjoon Hyun 3bf43fb60d [SPARK-29159][BUILD] Increase ReservedCodeCacheSize to 1G
### What changes were proposed in this pull request?

This PR aims to increase the JVM CodeCacheSize from 0.5G to 1G.

### Why are the changes needed?

After upgrading to `Scala 2.12.10`, the following is observed during building.
```
2019-09-18T20:49:23.5030586Z OpenJDK 64-Bit Server VM warning: CodeCache is full. Compiler has been disabled.
2019-09-18T20:49:23.5032920Z OpenJDK 64-Bit Server VM warning: Try increasing the code cache size using -XX:ReservedCodeCacheSize=
2019-09-18T20:49:23.5034959Z CodeCache: size=524288Kb used=521399Kb max_used=521423Kb free=2888Kb
2019-09-18T20:49:23.5035472Z  bounds [0x00007fa62c000000, 0x00007fa64c000000, 0x00007fa64c000000]
2019-09-18T20:49:23.5035781Z  total_blobs=156549 nmethods=155863 adapters=592
2019-09-18T20:49:23.5036090Z  compilation: disabled (not enough contiguous free space left)
```

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Manually check the Jenkins or GitHub Action build log (which should not have the above).

Closes #25836 from dongjoon-hyun/SPARK-CODE-CACHE-1G.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-09-19 00:24:15 -07:00
Xiao Li 2856398de9 [SPARK-28961][HOT-FIX][BUILD] Upgrade Maven from 3.6.1 to 3.6.2
### What changes were proposed in this pull request?
This PR is to upgrade the maven dependence from 3.6.1 to 3.6.2.

### Why are the changes needed?
All the builds are broken because 3.6.1 is not available.  http://ftp.wayne.edu/apache//maven/maven-3/

- https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-compile-maven-hadoop-3.2/485/
- https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-compile-maven-hadoop-2.7/10536/

![image](https://user-images.githubusercontent.com/11567269/64196667-36d69100-ce39-11e9-8f93-40eb333d595d.png)

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
N/A

Closes #25665 from gatorsmile/upgradeMVN.

Authored-by: Xiao Li <gatorsmile@gmail.com>
Signed-off-by: Xiao Li <gatorsmile@gmail.com>
2019-09-03 11:06:57 -07:00
Yuming Wang 02a0cdea13 [SPARK-28723][SQL] Upgrade to Hive 2.3.6 for HiveMetastore Client and Hadoop-3.2 profile
### What changes were proposed in this pull request?

This PR upgrade the built-in Hive to 2.3.6 for `hadoop-3.2`.

Hive 2.3.6 release notes:
- [HIVE-22096](https://issues.apache.org/jira/browse/HIVE-22096): Backport [HIVE-21584](https://issues.apache.org/jira/browse/HIVE-21584) (Java 11 preparation: system class loader is not URLClassLoader)
- [HIVE-21859](https://issues.apache.org/jira/browse/HIVE-21859): Backport [HIVE-17466](https://issues.apache.org/jira/browse/HIVE-17466) (Metastore API to list unique partition-key-value combinations)
- [HIVE-21786](https://issues.apache.org/jira/browse/HIVE-21786): Update repo URLs in poms branch 2.3 version

### Why are the changes needed?
Make Spark support JDK 11.

### Does this PR introduce any user-facing change?
Yes. Please see [SPARK-28684](https://issues.apache.org/jira/browse/SPARK-28684) and [SPARK-24417](https://issues.apache.org/jira/browse/SPARK-24417) for more details.

### How was this patch tested?
Existing unit test and manual test.

Closes #25443 from wangyum/test-on-jenkins.

Lead-authored-by: Yuming Wang <yumwang@ebay.com>
Co-authored-by: HyukjinKwon <gurwls223@apache.org>
Co-authored-by: Hyukjin Kwon <gurwls223@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-08-23 21:34:30 -07:00
Dongjoon Hyun 4856c0e33a [SPARK-28609][DOC] Fix broken styles/links and make up-to-date
## What changes were proposed in this pull request?

This PR aims to fix the broken styles/links and make the doc up-to-date for Apache Spark 2.4.4 and 3.0.0 release.

- `building-spark.md`
![Screen Shot 2019-08-02 at 10 33 51 PM](https://user-images.githubusercontent.com/9700541/62407962-a248ec80-b575-11e9-8a16-532e9bc421f8.png)

- `configuration.md`
![Screen Shot 2019-08-02 at 10 34 52 PM](https://user-images.githubusercontent.com/9700541/62407969-c7d5f600-b575-11e9-9b1a-a76c6cc095c5.png)

- `sql-pyspark-pandas-with-arrow.md`
![Screen Shot 2019-08-02 at 10 36 14 PM](https://user-images.githubusercontent.com/9700541/62407979-18e5ea00-b576-11e9-99af-7ad9264656ae.png)

- `streaming-programming-guide.md`
![Screen Shot 2019-08-02 at 10 37 11 PM](https://user-images.githubusercontent.com/9700541/62407981-213e2500-b576-11e9-8bc5-a925df7e98a7.png)

- `structured-streaming-programming-guide.md` (1/2)
![Screen Shot 2019-08-02 at 10 38 20 PM](https://user-images.githubusercontent.com/9700541/62408001-49c61f00-b576-11e9-9519-f699775ceecd.png)

- `structured-streaming-programming-guide.md` (2/2)
![Screen Shot 2019-08-02 at 10 40 05 PM](https://user-images.githubusercontent.com/9700541/62408017-7f6b0800-b576-11e9-9341-52664ba6b460.png)

- `submitting-applications.md`
![Screen Shot 2019-08-02 at 10 41 13 PM](https://user-images.githubusercontent.com/9700541/62408027-b2ad9700-b576-11e9-910e-8f22173e1251.png)

## How was this patch tested?

Manual. Build the doc.
```
SKIP_API=1 jekyll build
```

Closes #25345 from dongjoon-hyun/SPARK-28609.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-08-04 09:42:47 -07:00
Yuming Wang 90c64ea419 [SPARK-28267][DOC] Update building-spark.md(support build with hadoop-3.2)
## What changes were proposed in this pull request?

Since [SPARK-23710](https://issues.apache.org/jira/browse/SPARK-23710), Hadoop 3.x can support Hive. This PR add _build with `hadoop-3.2`_ to building-spark.md.

## How was this patch tested?

manual tests
```
cd docs
SKIP_API=1 jekyll build
```
![image](https://user-images.githubusercontent.com/5399861/60942057-cf5a0480-a313-11e9-9534-4765520e799f.png)

Closes #25063 from wangyum/SPARK-28267.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-07-10 08:51:08 -05:00
Sean Owen eed6de1a65 [MINOR][DOCS] Tighten up some key links to the project and download pages to use HTTPS
## What changes were proposed in this pull request?

Tighten up some key links to the project and download pages to use HTTPS

## How was this patch tested?

N/A

Closes #24665 from srowen/HTTPSURLs.

Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-05-21 10:56:42 -07:00
Dongjoon Hyun 375cfa3d89 [SPARK-27467][BUILD] Upgrade Maven to 3.6.1
## What changes were proposed in this pull request?

This PR aims to upgrade Maven to 3.6.1 to bring JDK9+ related patches like [MNG-6506](https://issues.apache.org/jira/browse/MNG-6506). For the full release note, please see the following.
- https://maven.apache.org/docs/3.6.1/release-notes.html

This was committed and reverted due to AppVeyor failure. It turns out that the root cause is `PATH` issue. With the updated AppVeyor script, it passed.

https://ci.appveyor.com/project/ApacheSoftwareFoundation/spark/builds/24273412

## How was this patch tested?

Pass the Jenkins and AppVoyer

Closes #24481 from dongjoon-hyun/SPARK-R.

Lead-authored-by: Dongjoon Hyun <dhyun@apple.com>
Co-authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-05-02 20:01:17 -07:00
HyukjinKwon d8db7db50b Revert "[SPARK-27467][FOLLOW-UP][BUILD] Upgrade Maven to 3.6.1 in AppVeyor and Doc"
This reverts commit bde30bc57c.
2019-04-28 11:03:15 +09:00
Yuming Wang bde30bc57c [SPARK-27467][FOLLOW-UP][BUILD] Upgrade Maven to 3.6.1 in AppVeyor and Doc
## What changes were proposed in this pull request?

Update the `docs/building-spark.md`. Otherwise:
```
mvn package -DskipTests=true
...
[INFO] --- maven-enforcer-plugin:3.0.0-M2:enforce (enforce-versions)  spark-parent_2.12 ---
[WARNING] Rule 0: org.apache.maven.plugins.enforcer.RequireMavenVersion failed with message:
Detected Maven Version: 3.6.0 is not in the allowed range 3.6.1.
...
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-enforcer-plugin:3.0.0-M2:enforce (enforce-versions) on project spark-parent_2.12: Some Enforcer rules have failed. Look above for specific messages explaining why the rule failed. -> [Help 1]
[ERROR]
...
```

## How was this patch tested?
Just test `https://archive.apache.org/dist/maven/maven-3/3.6.1/binaries/apache-maven-3.6.1-bin.zip` is avilable.

Closes #24477 from wangyum/SPARK-27467.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-04-27 09:09:47 -07:00
Sean Owen 754f820035 [SPARK-26918][DOCS] All .md should have ASF license header
## What changes were proposed in this pull request?

Add AL2 license to metadata of all .md files.
This seemed to be the tidiest way as it will get ignored by .md renderers and other tools. Attempts to write them as markdown comments revealed that there is no such standard thing.

## How was this patch tested?

Doc build

Closes #24243 from srowen/SPARK-26918.

Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-03-30 19:49:45 -05:00
Sean Owen 8bc304f97e [SPARK-26132][BUILD][CORE] Remove support for Scala 2.11 in Spark 3.0.0
## What changes were proposed in this pull request?

Remove Scala 2.11 support in build files and docs, and in various parts of code that accommodated 2.11. See some targeted comments below.

## How was this patch tested?

Existing tests.

Closes #23098 from srowen/SPARK-26132.

Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-03-25 10:46:42 -05:00
Darcy Shen 9d2a11554b [MINOR][DOC] Documentation on JVM options for SBT
## What changes were proposed in this pull request?

Documentation and .gitignore

## How was this patch tested?

Manual test that SBT honors the settings in .jvmopts if present

Closes #23615 from sadhen/impr/gitignore.

Authored-by: Darcy Shen <sadhen@zoho.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-01-22 18:27:24 -06:00
Kazuaki Ishizaki 1abfbda7eb [SPARK-26212][BUILD][TEST-MAVEN] Upgrade maven version to 3.6.0
## What changes were proposed in this pull request?

This PR updates maven version from 3.5.4 to 3.6.0. The release note of the 3.6.0 is [here](https://maven.apache.org/docs/3.6.0/release-notes.html).

From [the release note of the 3.6.0](https://maven.apache.org/docs/3.6.0/release-notes.html), the followings are new features:
1. There had been issues related to the project discoverytime which has been increased in previous version which influenced some of our users.
1. The output in the reactor summary has been improved.
1. There was an issue related to the classpath ordering.

## How was this patch tested?

Existing tests

Closes #23177 from kiszk/SPARK-26212.

Authored-by: Kazuaki Ishizaki <ishizaki@jp.ibm.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2018-12-01 07:06:18 -06:00
DB Tsai ad853c5678
[SPARK-25956] Make Scala 2.12 as default Scala version in Spark 3.0
## What changes were proposed in this pull request?

This PR makes Spark's default Scala version as 2.12, and Scala 2.11 will be the alternative version. This implies that Scala 2.12 will be used by our CI builds including pull request builds.

We'll update the Jenkins to include a new compile-only jobs for Scala 2.11 to ensure the code can be still compiled with Scala 2.11.

## How was this patch tested?

existing tests

Closes #22967 from dbtsai/scala2.12.

Authored-by: DB Tsai <d_tsai@apple.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2018-11-14 16:22:23 -08:00
Dongjoon Hyun fc9ba9dcc6
[MINOR][DOC] Update the building doc to use Maven 3.5.4 and Java 8 only
## What changes were proposed in this pull request?

Since we didn't test Java 9 ~ 11 up to now in the community, fix the document to describe Java 8 only.

## How was this patch tested?
N/A (This is a document only change.)

Closes #22781 from dongjoon-hyun/SPARK-JDK-DOC.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2018-10-19 23:56:40 -07:00
Sean Owen 703e6da1ec [SPARK-25705][BUILD][STREAMING][TEST-MAVEN] Remove Kafka 0.8 integration
## What changes were proposed in this pull request?

Remove Kafka 0.8 integration

## How was this patch tested?

Existing tests, build scripts

Closes #22703 from srowen/SPARK-25705.

Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2018-10-16 09:10:24 -05:00
lajin 541d7e1e4b [SPARK-25685][BUILD] Allow running tests in Jenkins in enterprise Git repository
## What changes were proposed in this pull request?

Many companies have their own enterprise GitHub to manage Spark code. To build and test in those repositories with Jenkins need to modify this script.
So I suggest to add some environment variables to allow regression testing in enterprise Jenkins instead of default Spark repository in GitHub.

## How was this patch tested?

Manually test.

Closes #22678 from LantaoJin/SPARK-25685.

Lead-authored-by: lajin <lajin@ebay.com>
Co-authored-by: LantaoJin <jinlantao@gmail.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2018-10-12 12:41:33 -05:00
Sean Owen a001814189 [SPARK-25598][STREAMING][BUILD][TEST-MAVEN] Remove flume connector in Spark 3
## What changes were proposed in this pull request?

Removes all vestiges of Flume in the build, for Spark 3.
I don't think this needs Jenkins config changes.

## How was this patch tested?

Existing tests.

Closes #22692 from srowen/SPARK-25598.

Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2018-10-11 14:28:06 -07:00
Sean Owen 80813e1980 [SPARK-25016][BUILD][CORE] Remove support for Hadoop 2.6
## What changes were proposed in this pull request?

Remove Hadoop 2.6 references and make 2.7 the default.
Obviously, this is for master/3.0.0 only.
After this we can also get rid of the separate test jobs for Hadoop 2.6.

## How was this patch tested?

Existing tests

Closes #22615 from srowen/SPARK-25016.

Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2018-10-10 12:07:53 -07:00
Yuming Wang b0ada7dce0 [SPARK-25330][BUILD][BRANCH-2.3] Revert Hadoop 2.7 to 2.7.3
## What changes were proposed in this pull request?
How to reproduce permission issue:
```sh
# build spark
./dev/make-distribution.sh --name SPARK-25330 --tgz  -Phadoop-2.7 -Phive -Phive-thriftserver -Pyarn

tar -zxf spark-2.4.0-SNAPSHOT-bin-SPARK-25330.tar && cd spark-2.4.0-SNAPSHOT-bin-SPARK-25330
export HADOOP_PROXY_USER=user_a
bin/spark-sql

export HADOOP_PROXY_USER=user_b
bin/spark-sql
```
```java
Exception in thread "main" java.lang.RuntimeException: org.apache.hadoop.security.AccessControlException: Permission denied: user=user_b, access=EXECUTE, inode="/tmp/hive-$%7Buser.name%7D/user_b/668748f2-f6c5-4325-a797-fd0a7ee7f4d4":user_b:hadoop:drwx------
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:319)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkTraverse(FSPermissionChecker.java:259)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:205)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:190)
```

The issue occurred in this commit: feb886f209. This pr revert Hadoop 2.7 to 2.7.3 to avoid this issue.

## How was this patch tested?
unit tests and manual tests.

Closes #22327 from wangyum/SPARK-25330.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2018-09-06 21:41:13 -07:00
Darcy Shen 546683c21a [SPARK-25298][BUILD] Improve build definition for Scala 2.12
## What changes were proposed in this pull request?

Improve build for Scala 2.12. Current build for sbt fails on the subproject `repl`:

```
[info] Compiling 6 Scala sources to /Users/rendong/wdi/spark/repl/target/scala-2.12/classes...
[error] /Users/rendong/wdi/spark/repl/scala-2.11/src/main/scala/org/apache/spark/repl/SparkILoopInterpreter.scala:80: overriding lazy value importableSymbolsWithRenames in class ImportHandler of type List[(this.intp.global.Symbol, this.intp.global.Name)];
[error]  lazy value importableSymbolsWithRenames needs `override' modifier
[error]       lazy val importableSymbolsWithRenames: List[(Symbol, Name)] = {
[error]                ^
[warn] /Users/rendong/wdi/spark/repl/src/main/scala/org/apache/spark/repl/SparkILoop.scala:53: variable addedClasspath in class ILoop is deprecated (since 2.11.0): use reset, replay or require to update class path
[warn]       if (addedClasspath != "") {
[warn]           ^
[warn] /Users/rendong/wdi/spark/repl/src/main/scala/org/apache/spark/repl/SparkILoop.scala:54: variable addedClasspath in class ILoop is deprecated (since 2.11.0): use reset, replay or require to update class path
[warn]         settings.classpath append addedClasspath
[warn]                                   ^
[warn] two warnings found
[error] one error found
[error] (repl/compile:compileIncremental) Compilation failed
[error] Total time: 93 s, completed 2018-9-3 10:07:26
```

## How was this patch tested?

```
./dev/change-scala-version.sh 2.12

##  For Maven
./build/mvn -Pscala-2.12 [mvn commands]
##  For SBT
sbt -Dscala.version=2.12.6
```

Closes #22310 from sadhen/SPARK-25298.

Authored-by: Darcy Shen <sadhen@zoho.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2018-09-03 07:36:04 -05:00
Maxim Gekk 3c67cb0b52 [SPARK-25273][DOC] How to install testthat 1.0.2
## What changes were proposed in this pull request?

R tests require `testthat` v1.0.2. In the PR, I described how to install the version in the section http://spark.apache.org/docs/latest/building-spark.html#running-r-tests.

Closes #22272 from MaxGekk/r-testthat-doc.

Authored-by: Maxim Gekk <maxim.gekk@databricks.com>
Signed-off-by: hyukjinkwon <gurwls223@apache.org>
2018-08-30 20:25:26 +08:00
Sean Owen 35f7f5ce83 [DOCS][MINOR] Fix a few broken links and typos, and, nit, use HTTPS more consistently
## What changes were proposed in this pull request?

Fix a few broken links and typos, and, nit, use HTTPS more consistently esp. on scripts and Apache links

## How was this patch tested?

Doc build

Closes #22172 from srowen/DocTypo.

Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: hyukjinkwon <gurwls223@apache.org>
2018-08-22 01:02:17 +08:00
Sean Owen 5f9633dc97 [SPARK-25015][BUILD] Update Hadoop 2.7 to 2.7.7
## What changes were proposed in this pull request?

Update Hadoop 2.7 to 2.7.7 to pull in bug and security fixes.

## How was this patch tested?

Existing tests.

Author: Sean Owen <srowen@gmail.com>

Closes #21987 from srowen/SPARK-25015.
2018-08-04 14:59:13 -05:00
Bruce Robbins 4c059ebc60 [SPARK-23776][DOC] Update instructions for running PySpark after building with SBT
## What changes were proposed in this pull request?

This update tells the reader how to build Spark with SBT such that pyspark-sql tests will succeed.

If you follow the current instructions for building Spark with SBT, pyspark/sql/udf.py fails with:
<pre>
AnalysisException: u'Can not load class test.org.apache.spark.sql.JavaStringLength, please make sure it is on the classpath;'
</pre>

## How was this patch tested?

I ran the doc build command (SKIP_API=1 jekyll build) and eyeballed the result.

Author: Bruce Robbins <bersprockets@gmail.com>

Closes #21628 from bersprockets/SPARK-23776_doc.
2018-06-26 09:48:15 +08:00
Daniel Sakuma 6ade5cbb49 [MINOR][DOC] Fix some typos and grammar issues
## What changes were proposed in this pull request?

Easy fix in the documentation.

## How was this patch tested?

N/A

Closes #20948

Author: Daniel Sakuma <dsakuma@gmail.com>

Closes #20928 from dsakuma/fix_typo_configuration_docs.
2018-04-06 13:37:08 +08:00
foxish 7ab165b706 [SPARK-22648][K8S] Spark on Kubernetes - Documentation
What changes were proposed in this pull request?

This PR contains documentation on the usage of Kubernetes scheduler in Spark 2.3, and a shell script to make it easier to build docker images required to use the integration. The changes detailed here are covered by https://github.com/apache/spark/pull/19717 and https://github.com/apache/spark/pull/19468 which have merged already.

How was this patch tested?
The script has been in use for releases on our fork. Rest is documentation.

cc rxin mateiz (shepherd)
k8s-big-data SIG members & contributors: foxish ash211 mccheah liyinan926 erikerlandson ssuchter varunkatta kimoonkim tnachen ifilonenko
reviewers: vanzin felixcheung jiangxb1987 mridulm

TODO:
- [x] Add dockerfiles directory to built distribution. (https://github.com/apache/spark/pull/20007)
- [x] Change references to docker to instead say "container" (https://github.com/apache/spark/pull/19995)
- [x] Update configuration table.
- [x] Modify spark.kubernetes.allocation.batch.delay to take time instead of int (#20032)

Author: foxish <ramanathana@google.com>

Closes #19946 from foxish/update-k8s-docs.
2017-12-21 17:21:11 -08:00
Sean Owen 0c03297bf0 [SPARK-22142][BUILD][STREAMING] Move Flume support behind a profile, take 2
## What changes were proposed in this pull request?

Move flume behind a profile, take 2. See https://github.com/apache/spark/pull/19365 for most of the back-story.

This change should fix the problem by removing the examples module dependency and moving Flume examples to the module itself. It also adds deprecation messages, per a discussion on dev about deprecating for 2.3.0.

## How was this patch tested?

Existing tests, which still enable flume integration.

Author: Sean Owen <sowen@cloudera.com>

Closes #19412 from srowen/SPARK-22142.2.
2017-10-06 15:08:28 +01:00
gatorsmile 472864014c Revert "[SPARK-22142][BUILD][STREAMING] Move Flume support behind a profile"
This reverts commit a2516f41ae.
2017-09-29 11:45:58 -07:00
Sean Owen a2516f41ae [SPARK-22142][BUILD][STREAMING] Move Flume support behind a profile
## What changes were proposed in this pull request?

Add 'flume' profile to enable Flume-related integration modules

## How was this patch tested?

Existing tests; no functional change

Author: Sean Owen <sowen@cloudera.com>

Closes #19365 from srowen/SPARK-22142.
2017-09-29 08:26:53 +01:00
Sean Owen 4fbf748bf8 [SPARK-21893][BUILD][STREAMING][WIP] Put Kafka 0.8 behind a profile
## What changes were proposed in this pull request?

Put Kafka 0.8 support behind a kafka-0-8 profile.

## How was this patch tested?

Existing tests, but, until PR builder and Jenkins configs are updated the effect here is to not build or test Kafka 0.8 support at all.

Author: Sean Owen <sowen@cloudera.com>

Closes #19134 from srowen/SPARK-21893.
2017-09-13 10:10:40 +01:00
Kousuke Saruta 957558235b [DOCS] Fix unreachable links in the document
## What changes were proposed in this pull request?

Recently, I found two unreachable links in the document and fixed them.
Because of small changes related to the document, I don't file this issue in JIRA but please suggest I should do it if you think it's needed.

## How was this patch tested?

Tested manually.

Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>

Closes #19195 from sarutak/fix-unreachable-link.
2017-09-12 15:07:04 +01:00
Sean Owen 425c4ada4c [SPARK-19810][BUILD][CORE] Remove support for Scala 2.10
## What changes were proposed in this pull request?

- Remove Scala 2.10 build profiles and support
- Replace some 2.10 support in scripts with commented placeholders for 2.12 later
- Remove deprecated API calls from 2.10 support
- Remove usages of deprecated context bounds where possible
- Remove Scala 2.10 workarounds like ScalaReflectionLock
- Other minor Scala warning fixes

## How was this patch tested?

Existing tests

Author: Sean Owen <sowen@cloudera.com>

Closes #17150 from srowen/SPARK-19810.
2017-07-13 17:06:24 +08:00
liuzhaokun 24367f23f7 [SPARK-21382] The note about Scala 2.10 in building-spark.md is wrong.
[https://issues.apache.org/jira/browse/SPARK-21382](https://issues.apache.org/jira/browse/SPARK-21382)
There should be "Note that support for Scala 2.10 is deprecated as of Spark 2.1.0 and may be removed in Spark 2.3.0",right?

Author: liuzhaokun <liu.zhaokun@zte.com.cn>

Closes #18606 from liu-zhaokun/new07120923.
2017-07-11 23:02:20 -07:00
Yuming Wang 45824fb608 [MINOR][DOCS] Improve Running R Tests docs
## What changes were proposed in this pull request?

Update Running R Tests dependence packages to:
```bash
R -e "install.packages(c('knitr', 'rmarkdown', 'testthat', 'e1071', 'survival'), repos='http://cran.us.r-project.org')"
```

## How was this patch tested?
manual tests

Author: Yuming Wang <wgyumg@gmail.com>

Closes #18271 from wangyum/building-spark.
2017-06-16 11:03:54 +01:00
Armin Braun c8f1219510 [SPARK-20455][DOCS] Fix Broken Docker IT Docs
## What changes were proposed in this pull request?

Just added the Maven `test`goal.

## How was this patch tested?

No test needed, just a trivial documentation fix.

Author: Armin Braun <me@obrown.io>

Closes #17756 from original-brownbear/SPARK-20455.
2017-04-25 09:13:50 +01:00
hyukjinkwon 364b0db753 [MINOR][DOCS] Replace non-breaking space to normal spaces that breaks rendering markdown
# What changes were proposed in this pull request?

It seems there are several non-breaking spaces were inserted into several `.md`s and they look breaking rendering markdown files.

These are different. For example, this can be checked via `python` as below:

```python
>>> " "
'\xc2\xa0'
>>> " "
' '
```

_Note that it seems this PR description automatically replaces non-breaking spaces into normal spaces. Please open a `vi` and copy and paste it into `python` to verify this (do not copy the characters here)._

I checked the output below in  Sapari and Chrome on Mac OS and, Internal Explorer on Windows 10.

**Before**

![2017-04-03 12 37 17](https://cloud.githubusercontent.com/assets/6477701/24594655/50aaba02-186a-11e7-80bb-d34b17a3398a.png)
![2017-04-03 12 36 57](https://cloud.githubusercontent.com/assets/6477701/24594654/50a855e6-186a-11e7-94e2-661e56544b0f.png)

**After**

![2017-04-03 12 36 46](https://cloud.githubusercontent.com/assets/6477701/24594657/53c2545c-186a-11e7-9a73-00529afbfd75.png)
![2017-04-03 12 36 31](https://cloud.githubusercontent.com/assets/6477701/24594658/53c286c0-186a-11e7-99c9-e66b1f510fe7.png)

## How was this patch tested?

Manually checking.

These instances were found via

```
grep --include=*.scala --include=*.python --include=*.java --include=*.r --include=*.R --include=*.md --include=*.r -r -I " " .
```

in Mac OS.

It seems there are several instances more as below:

```
./docs/sql-programming-guide.md:        │   ├── ...
./docs/sql-programming-guide.md:        │   │
./docs/sql-programming-guide.md:        │   ├── country=US
./docs/sql-programming-guide.md:        │   │   └── data.parquet
./docs/sql-programming-guide.md:        │   ├── country=CN
./docs/sql-programming-guide.md:        │   │   └── data.parquet
./docs/sql-programming-guide.md:        │   └── ...
./docs/sql-programming-guide.md:            ├── ...
./docs/sql-programming-guide.md:            │
./docs/sql-programming-guide.md:            ├── country=US
./docs/sql-programming-guide.md:            │   └── data.parquet
./docs/sql-programming-guide.md:            ├── country=CN
./docs/sql-programming-guide.md:            │   └── data.parquet
./docs/sql-programming-guide.md:            └── ...
./sql/core/src/test/README.md:│   ├── *.avdl                  # Testing Avro IDL(s)
./sql/core/src/test/README.md:│   └── *.avpr                  # !! NO TOUCH !! Protocol files generated from Avro IDL(s)
./sql/core/src/test/README.md:│   ├── gen-avro.sh             # Script used to generate Java code for Avro
./sql/core/src/test/README.md:│   └── gen-thrift.sh           # Script used to generate Java code for Thrift
```

These seems generated via `tree` command which inserts non-breaking spaces. They do not look causing any problem for rendering within code blocks and I did not fix it to reduce the overhead to manually replace it when it is overwritten via `tree` command in the future.

Author: hyukjinkwon <gurwls223@gmail.com>

Closes #17517 from HyukjinKwon/non-breaking-space.
2017-04-03 10:09:11 +01:00
Kay Ousterhout f87a6a59af [SPARK-19684][DOCS] Remove developer info from docs.
This commit moves developer-specific information from the release-
specific documentation in this repo to the developer tools page on
the main Spark website. This commit relies on this PR on the
Spark website: https://github.com/apache/spark-website/pull/33.

srowen

Author: Kay Ousterhout <kayousterhout@gmail.com>

Closes #17018 from kayousterhout/SPARK-19684.
2017-02-23 13:27:47 -08:00
Sean Owen 0e2405490f
[SPARK-19550][BUILD][CORE][WIP] Remove Java 7 support
- Move external/java8-tests tests into core, streaming, sql and remove
- Remove MaxPermGen and related options
- Fix some reflection / TODOs around Java 8+ methods
- Update doc references to 1.7/1.8 differences
- Remove Java 7/8 related build profiles
- Update some plugins for better Java 8 compatibility
- Fix a few Java-related warnings

For the future:

- Update Java 8 examples to fully use Java 8
- Update Java tests to use lambdas for simplicity
- Update Java internal implementations to use lambdas

## How was this patch tested?

Existing tests

Author: Sean Owen <sowen@cloudera.com>

Closes #16871 from srowen/SPARK-19493.
2017-02-16 12:32:45 +00:00
Sean Owen e8d3fca450
[SPARK-19464][CORE][YARN][TEST-HADOOP2.6] Remove support for Hadoop 2.5 and earlier
## What changes were proposed in this pull request?

- Remove support for Hadoop 2.5 and earlier
- Remove reflection and code constructs only needed to support multiple versions at once
- Update docs to reflect newer versions
- Remove older versions' builds and profiles.

## How was this patch tested?

Existing tests

Author: Sean Owen <sowen@cloudera.com>

Closes #16810 from srowen/SPARK-19464.
2017-02-08 12:20:07 +00:00