Commit graph

708 commits

Author SHA1 Message Date
Dongjoon Hyun 33e6e4703d [SPARK-28544][BUILD] Update zstd-jni to 1.4.2-1
## What changes were proposed in this pull request?

This PR aims to update `zstd-jni` library to bring the latest improvement and bug fixes in `1.4.1` and `1.4.2`.
- https://github.com/facebook/zstd/releases/tag/v1.4.1 (4.5 ~ 11.8% performance improvement from v1.4.0 and bug fixes)
- https://github.com/facebook/zstd/releases/tag/v1.4.2 (bug fixes)

## How was this patch tested?

Pass the Jenkins.

Closes #25275 from dongjoon-hyun/SPARK-28544.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-07-27 18:08:20 -07:00
Dongjoon Hyun dbd0a2aa37 [SPARK-28511][INFRA] Get REV from RELEASE_VERSION instead of VERSION
## What changes were proposed in this pull request?

Unlike the other versions, `x.x.0-SNAPSHOT` causes `x.x.-1`. Although this will not happen in the tags (there is no `SNAPSHOT` postfix), we had better fix this.
```
$ dev/create-release/do-release-docker.sh -d /tmp/spark-3.0.0 -n
Output directory already exists. Overwrite and continue? [y/n] y
Branch [branch-2.4]: master
Current branch version is 3.0.0-SNAPSHOT.
Release [3.0.-1]:
```

Since we already have `RELEASE_VERSION` by removing `SNAPSHOT`. This PR uses `RELEASE_VERSION` instead of `VERSION`.
```
$ dev/create-release/do-release-docker.sh -d /tmp/spark-3.0.0 -n
Branch [branch-2.4]: master
Current branch version is 3.0.0-SNAPSHOT.
Release [3.0.0]:
```

## How was this patch tested?

Manually do `dev/create-release/do-release-docker.sh -d /tmp/spark-3.0.0 -n` and see the default value of `Release`.

Closes #25254 from dongjoon-hyun/SPARK-28511.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-07-25 10:54:24 -07:00
Dongjoon Hyun cfca26e973 [SPARK-28496][INFRA] Use branch name instead of tag during dry-run
## What changes were proposed in this pull request?

There are two cases when we use `dry run`.

First, when the tag already exists, we can ask `confirmation` on the existing tag name.
```
$ dev/create-release/do-release-docker.sh -d /tmp/spark-2.4.4 -n -s docs
Output directory already exists. Overwrite and continue? [y/n] y
Branch [branch-2.4]:
Current branch version is 2.4.4-SNAPSHOT.
Release [2.4.4]: 2.4.3
RC # [1]:
v2.4.3-rc1 already exists. Continue anyway [y/n]? y
This is a dry run. Please confirm the ref that will be built for testing.
Ref [v2.4.3-rc1]:
```

Second, when the tag doesn't exist, we had better ask `confirmation` on the branch name. If we do not change the default value, it will fail eventually.
```
$ dev/create-release/do-release-docker.sh -d /tmp/spark-2.4.4 -n -s docs
Branch [branch-2.4]:
Current branch version is 2.4.4-SNAPSHOT.
Release [2.4.4]:
RC # [1]:
This is a dry run. Please confirm the ref that will be built for testing.
Ref [v2.4.4-rc1]:
```

This PR improves the second case by providing the branch name instead. This helps the release testing before tagging.

## How was this patch tested?

Manually do the following and check the default value of `Ref` field.
```
$ dev/create-release/do-release-docker.sh -d /tmp/spark-2.4.4 -n -s docs
Branch [branch-2.4]:
Current branch version is 2.4.4-SNAPSHOT.
Release [2.4.4]:
RC # [1]:
This is a dry run. Please confirm the ref that will be built for testing.
Ref [branch-2.4]:
...
```

Closes #25240 from dongjoon-hyun/SPARK-28496.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
2019-07-24 14:20:25 -07:00
Liang-Chi Hsieh 591de42351 [SPARK-28381][PYSPARK] Upgraded version of Pyrolite to 4.30
## What changes were proposed in this pull request?

This upgraded to a newer version of Pyrolite. Most updates [1] in the newer version are for dotnot. For java, it includes a bug fix to Unpickler regarding cleaning up Unpickler memo, and support of protocol 5.

After upgrading, we can remove the fix at SPARK-27629 for the bug in Unpickler.

[1] https://github.com/irmen/Pyrolite/compare/pyrolite-4.23...master

## How was this patch tested?

Manually tested on Python 3.6 in local on existing tests.

Closes #25143 from viirya/upgrade-pyrolite.

Authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2019-07-15 12:29:58 +09:00
Dongjoon Hyun 13ae9ebb38 [SPARK-28354][INFRA] Use JIRA user name instead of JIRA user key
## What changes were proposed in this pull request?

`dev/merge_spark_pr.py` script always fail for some users because they have different `name` and `key`.

- https://issues.apache.org/jira/rest/api/2/user?username=yumwang

JIRA Client expects `name`, but we are using `key`. This PR fixes it.
```python
# This is JIRA client code `/usr/local/lib/python2.7/site-packages/jira/client.py`
def assign_issue(self, issue, assignee):
    """Assign an issue to a user. None will set it to unassigned. -1 will set it to Automatic.

    :param issue: the issue ID or key to assign
    :param assignee: the user to assign the issue to

    :type issue: int or str
    :type assignee: str

    :rtype: bool
    """
    url = self._options['server'] + \
        '/rest/api/latest/issue/' + str(issue) + '/assignee'
    payload = {'name': assignee}
    r = self._session.put(
        url, data=json.dumps(payload))
    raise_on_error(r)
    return True
```

## How was this patch tested?

Manual with the committer ID/password.

```python
import jira.client
asf_jira = jira.client.JIRA({'server': 'https://issues.apache.org/jira'}, basic_auth=('yourid', 'passwd'))
asf_jira.assign_issue("SPARK-28354", "q79969786")   # This will raise exception.
asf_jira.assign_issue("SPARK-28354", "yumwang")     # This works.
```

Closes #25120 from dongjoon-hyun/SPARK-28354.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2019-07-12 18:44:29 +09:00
Yuming Wang 4ad0c33be4 [SPARK-28221][BUILD] Upgrade janino to 3.0.13
## What changes were proposed in this pull request?

Mainly change logs:
### Version 3.0.13:
- Support for JDK 9/10 in Full Compiler
- The syntax elements that can have modifiers now all have sets of "is...()" methods that check for each modifier. Some also have methods "getAccess()" and/or "getAnnotations()".
- Implement "type annotations" (JLS8 9.7.4)
- Implemented parsing (but not compilation) of "modular compilation units" (JLS11 7.3).
- Replaced all "assert...Uncookable(..., Pattern messageRegex)" and "assert...Uncookable(..., String messageInfix)" method pairs with a single "assert...Uncookable(..., String messageRegex)" method.
Minor refactoring: Allowed modifiers are now checked in the Parser, not in Java.*. This saves a lot of THROWS clauses.
- Parse Type inference syntax: Type inference for generic instance creation implemented, test cases added.
- Parse MethodReference, ClassInstanceCreationReference and ArrayCreationReference

### Version 3.0.12
- Fixed: Operator "&" not defined on types "java.lang.Long" and "int"
- Major bug in JavaSourceClassLoader: When loading the second and following classes, CUs were compiled again, leading to an inconsistent class hierarchy.
- Fixed: Java 9 added "Override public final CharBuffer CharBuffer.rewind() { ..." -- leads easily to a java.lang.NoSuchMethodError
- Changed all occurences of the words "Java bytecode" to "JVM bytecode" to make clearer that the generated bytecode is for the JVMS and not suitable for, e.g. DALVIK.

http://janino-compiler.github.io/janino/changelog.html

## How was this patch tested?

Existing test

Closes #25021 from wangyum/SPARK-28221.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-07-06 10:02:42 -07:00
Marcelo Vanzin 11e21cc17a [SPARK-28187][BUILD] Add support for hadoop-cloud to the PR builder.
Closes #24987 from vanzin/SPARK-28187.

Authored-by: Marcelo Vanzin <vanzin@cloudera.com>
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
2019-06-27 15:59:05 -07:00
Hyukjin Kwon 1d36b892ab [SPARK-7721][INFRA][FOLLOW-UP] Remove cloned coverage repo after posting HTMLs
## What changes were proposed in this pull request?

This PR proposes to remove cloned `pyspark-coverage-site` repo.

it doesn't looks a problem in PR builder but somehow it's problematic in `spark-master-test-sbt-hadoop-2.7`.

## How was this patch tested?

Jenkins.

Closes #23729 from HyukjinKwon/followup-coverage.

Lead-authored-by: Hyukjin Kwon <gurwls223@apache.org>
Co-authored-by: shane knapp <incomplete@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2019-06-25 09:18:32 +09:00
Sean Owen 67042e90e7 [MINOR][BUILD] Exclude pyspark-coverage-site/ dir from RAT
## What changes were proposed in this pull request?

Looks like a directory `pyspark-site-coverage/` is now (?) generated and fails RAT checks. It should just be excluded. See: https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-2.7/6029/console

## How was this patch tested?

N/A

Closes #24950 from srowen/pysparkcoveragesite.

Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-06-24 14:07:41 -05:00
Dongjoon Hyun ea0e119f84 [SPARK-28111][BUILD] Upgrade xbean-asm7-shaded to 4.14
## What changes were proposed in this pull request?

This PR aims to update `xbean-asm7-shaded` to bring [XBEAN-318](https://issues.apache.org/jira/browse/XBEAN-318) which is helpful to log the class definition reading failures.
- https://issues.apache.org/jira/projects/XBEAN/versions/12345220

## How was this patch tested?

Pass the Jenkins.

Closes #24914 from dongjoon-hyun/SPARK-28111.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-06-20 07:59:59 -07:00
Sean Owen 15462e1a8f [SPARK-28004][UI] Update jquery to 3.4.1
## What changes were proposed in this pull request?

We're using an old-ish jQuery, 1.12.4, and should probably update for Spark 3 to keep up in general, but also to keep up with CVEs. In fact, we know of at least one resolved in only 3.4.0+ (https://nvd.nist.gov/vuln/detail/CVE-2019-11358). They may not affect Spark, but, if the update isn't painful, maybe worthwhile in order to make future 3.x updates easier.

jQuery 1 -> 2 doesn't sound like a breaking change, as 2.0 is supposed to maintain compatibility with 1.9+ (https://blog.jquery.com/2013/04/18/jquery-2-0-released/)

2 -> 3 has breaking changes: https://jquery.com/upgrade-guide/3.0/. It's hard to evaluate each one, but the most likely area for problems is in ajax(). However, our usage of jQuery (and plugins) is pretty simple.

Update jquery to 3.4.1; update jquery blockUI and mustache to latest

## How was this patch tested?

Manual testing of docs build (except R docs), worker/master UI, spark application UI.
Note: this really doesn't guarantee it works, as our tests can't test javascript, and this is merely anecdotal testing, although I clicked about every link I could find. There's a risk this breaks a minor part of the UI; it does seem to work fine in the main.

Closes #24843 from srowen/SPARK-28004.

Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-06-14 22:19:20 -07:00
Dongjoon Hyun fd8240d10c [SPARK-28051][INFRA] Exposing JIRA issue component types at GitHub PRs
## What changes were proposed in this pull request?

This PR aims to expose JIRA issue component types at GitHub PRs.

## How was this patch tested?

Manual.
```
$ export GITHUB_OAUTH_KEY=...
$ export JIRA_PASSWORD=...
$ export GITHUB_API_BASE='https://api.github.com/repos/your-id/spark'
$ dev/github_jira_sync.py
```

Please note that the existing script will raise the following exceptions if your repo has less than 100 PRs. This will be handled at #24874 .
```
Traceback (most recent call last):
  File "dev/github_jira_sync.py", line 139, in <module>
    jira_prs = get_jira_prs()
  File "dev/github_jira_sync.py", line 83, in get_jira_prs
    link_header = filter(lambda k: k.startswith("Link"), page.info().headers)[0]
IndexError: list index out of range
```
That is beyond the scope of this PR.

Closes #24871 from dongjoon-hyun/SPARK-28051.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-06-14 20:36:45 -07:00
Dongjoon Hyun 7533cccc5d [SPARK-28053][INFRA] Handle a corner case where there is no Link header
## What changes were proposed in this pull request?

Currently, `github_jira_sync.py` assumes that there is `Link` always. However, it will fail when the number of the open PR is less than 100 (the default paging number). It will not happen in Apache Spark, but we had better fix that because it happens during review process for `github_jira_sync.py` script.
```
Traceback (most recent call last):
  File "dev/github_jira_sync.py", line 139, in <module>
    jira_prs = get_jira_prs()
  File "dev/github_jira_sync.py", line 83, in get_jira_prs
    link_header = filter(lambda k: k.startswith("Link"), page.info().headers)[0]
IndexError: list index out of range
```

## How was this patch tested?

Manually check with another repo which has small number of open PRs (< 100).
```
$ export JIRA_PASSWORD=...
$ export GITHUB_API_BASE='https://api.github.com/repos/your-id/spark'
$ dev/github_jira_sync.py
```

Closes #24874 from dongjoon-hyun/SPARK-28053.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2019-06-14 16:33:34 +09:00
Dongjoon Hyun e5d95117e4 [SPARK-27979][BUILD][test-maven] Remove deprecated --force option in build/mvn and run-tests.py
## What changes were proposed in this pull request?

This is a second try of #24824.

Since Apache Spark 2.0.0, SPARK-14867 deprecated `--force` option and made it ignored. This PR cleans up the related code completely at 3.0.0.

**BEFORE (Jenkins)**
```
========================================================================
Building Spark
========================================================================
[info] Building Spark using Maven with these arguments:  -Phadoop-2.7 -Pkubernetes -Phive-thriftserver -Pkinesis-asl -Pyarn -Pspark-ganglia-lgpl -Phive -Pmesos clean package -DskipTests
WARNING: '--force' is deprecated and ignored.
...
========================================================================
Running Spark unit tests
========================================================================
[info] Running Spark tests using Maven with these arguments:  -Phadoop-2.7 -Phive-thriftserver -Phive -Dtest.exclude.tags=org.apache.spark.tags.ExtendedHiveTest,org.apache.spark.tags.ExtendedYarnTest test --fail-at-end
WARNING: '--force' is deprecated and ignored.
```

**AFTER (Jenkins)**
```
========================================================================
Building Spark
========================================================================
[info] Building Spark using Maven with these arguments:  -Phadoop-2.7 -Pkubernetes -Phive-thriftserver -Pkinesis-asl -Pyarn -Pspark-ganglia-lgpl -Phive -Pmesos clean package -DskipTests
...
========================================================================
Running Spark unit tests
========================================================================
[info] Running Spark tests using Maven with these arguments:  -Phadoop-2.7 -Pkubernetes -Phive-thriftserver -Pyarn -Pspark-ganglia-lgpl -Phive -Pkinesis-asl -Pmesos -Dtest.exclude.tags=org.apache.spark.tags.ExtendedHiveTest,org.apache.spark.tags.ExtendedYarnTest test --fail-at-end
```

## How was this patch tested?

Manually check the Jenkins logs.

Closes #24833 from dongjoon-hyun/SPARK-FORCE-2.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-06-10 18:40:46 -07:00
Dongjoon Hyun 742f805177 Revert "[SPARK-27979][BUILD][test-maven] Remove deprecated --force option in build/mvn and run-tests.py"
This reverts commit 354ec254c5.
2019-06-09 08:33:21 -07:00
Martin Junghanns 709387d660 [SPARK-27300][GRAPH] Add Spark Graph modules and dependencies
## What changes were proposed in this pull request?

This PR introduces the necessary Maven modules for the new [Spark Graph](https://issues.apache.org/jira/browse/SPARK-25994) feature for Spark 3.0.

* `spark-graph` is a parent module that users depend on to get all graph functionalities (Cypher and Graph Algorithms)
* `spark-graph-api` defines the [Property Graph API](https://docs.google.com/document/d/1Wxzghj0PvpOVu7XD1iA8uonRYhexwn18utdcTxtkxlI) that is being shared between Cypher and Algorithms
* `spark-cypher` contains a Cypher query engine implementation

Both, `spark-graph-api` and `spark-cypher` depend on Spark SQL.

Note, that the Maven module for Graph Algorithms is not part of this PR and will be introduced in https://issues.apache.org/jira/browse/SPARK-27302

A PoC for a running Cypher implementation can be found in this WIP PR https://github.com/apache/spark/pull/24297

## How was this patch tested?

Pass the Jenkins with all profiles and manually build and check the followings.
```
$ ls assembly/target/scala-2.12/jars/spark-cypher*
assembly/target/scala-2.12/jars/spark-cypher_2.12-3.0.0-SNAPSHOT.jar

$ ls assembly/target/scala-2.12/jars/spark-graph* | grep -v graphx
assembly/target/scala-2.12/jars/spark-graph-api_2.12-3.0.0-SNAPSHOT.jar
assembly/target/scala-2.12/jars/spark-graph_2.12-3.0.0-SNAPSHOT.jar
```

Closes #24490 from s1ck/SPARK-27300.

Lead-authored-by: Martin Junghanns <martin.junghanns@neotechnology.com>
Co-authored-by: Max Kießling <max@kopfueber.org>
Co-authored-by: Martin Junghanns <martin.junghanns@neo4j.com>
Co-authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-06-09 00:26:26 -07:00
Dongjoon Hyun 354ec254c5 [SPARK-27979][BUILD][test-maven] Remove deprecated --force option in build/mvn and run-tests.py
## What changes were proposed in this pull request?

Since Apache Spark 2.0.0, SPARK-14867 deprecated `--force` option and made it ignored. This PR cleans up the related code completely at 3.0.0.

**BEFORE (Jenkins)**
```
========================================================================
Building Spark
========================================================================
[info] Building Spark using Maven with these arguments:  -Phadoop-2.7 -Pkubernetes -Phive-thriftserver -Pkinesis-asl -Pyarn -Pspark-ganglia-lgpl -Phive -Pmesos clean package -DskipTests
WARNING: '--force' is deprecated and ignored.
...
========================================================================
Running Spark unit tests
========================================================================
[info] Running Spark tests using Maven with these arguments:  -Phadoop-2.7 -Phive-thriftserver -Phive -Dtest.exclude.tags=org.apache.spark.tags.ExtendedHiveTest,org.apache.spark.tags.ExtendedYarnTest test --fail-at-end
WARNING: '--force' is deprecated and ignored.
```

**AFTER (Jenkins)**
```
========================================================================
Building Spark
========================================================================
[info] Building Spark using Maven with these arguments:  -Phadoop-2.7 -Pkubernetes -Phive-thriftserver -Pkinesis-asl -Pyarn -Pspark-ganglia-lgpl -Phive -Pmesos clean package -DskipTests
...
========================================================================
Running Spark unit tests
========================================================================
[info] Running Spark tests using Maven with these arguments:  -Phadoop-2.7 -Pkubernetes -Phive-thriftserver -Pyarn -Pspark-ganglia-lgpl -Phive -Pkinesis-asl -Pmesos -Dtest.exclude.tags=org.apache.spark.tags.ExtendedHiveTest,org.apache.spark.tags.ExtendedYarnTest test --fail-at-end
```

## How was this patch tested?

Manually check the Jenkins logs.

Closes #24824 from dongjoon-hyun/SPARK-27979.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-06-08 08:17:12 -07:00
Yuming Wang 3f102a8229 [SPARK-27749][SQL] hadoop-3.2 support hive-thriftserver
## What changes were proposed in this pull request?

This PR mainly makes the following changes to make `hadoop-3.2` support `sql/hive-thriftserver`:
1. Upgrade [`TCLIService.thrift`](https://github.com/apache/hive/blob/rel/release-2.3.5/service-rpc/if/TCLIService.thrift) and related code to Hive 2.3.5 because of [HIVE-12442](https://issues.apache.org/jira/browse/HIVE-12442)(Note that we only migrate code without adding features, such as [HIVE-4924](https://issues.apache.org/jira/browse/HIVE-4924) and [HIVE-15473](https://issues.apache.org/jira/browse/HIVE-15473)).
2. Use slf4j as logging facade because of [HIVE-12237](https://issues.apache.org/jira/browse/HIVE-12237).
3. Port [HIVE-13169](https://issues.apache.org/jira/browse/HIVE-13169) to compatible with Hive 2.3.

## How was this patch tested?

Exiting test

Closes #24628 from wangyum/SPARK-27749.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: gatorsmile <gatorsmile@gmail.com>
2019-06-05 08:40:05 -07:00
Izek Greenfield c647f9011c [SPARK-27862][BUILD] Move to json4s 3.6.6
## What changes were proposed in this pull request?
Move to json4s version 3.6.6
Add scala-xml 1.2.0

## How was this patch tested?

Pass the Jenkins

Closes #24736 from igreenfield/master.

Authored-by: Izek Greenfield <igreenfield@axiomsl.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-05-30 19:42:56 -05:00
Fokko Driesprong bd87323003 [SPARK-27757][CORE] Bump Jackson to 2.9.9
## What changes were proposed in this pull request?

This fixes CVE-2019-12086 on Databind: https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.9.9

## How was this patch tested?

Existing tests

Closes #24646 from Fokko/SPARK-27757.

Authored-by: Fokko Driesprong <fokko@apache.org>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-05-30 09:35:20 -05:00
HyukjinKwon 90b6cda9af [SPARK-25944][R][BUILD] AppVeyor change to latest R version (3.6.0)
## What changes were proposed in this pull request?

R 3.6.0 is released 2019-04-26. This PR targets to change R version from 3.5.1 to 3.6.0 in AppVeyor.

This PR sets `R_REMOTES_NO_ERRORS_FROM_WARNINGS` to `true` to avoid the warnings below:

```
Error in strptime(xx, f, tz = tz) :
  (converted from warning) unable to identify current timezone 'C':
please set environment variable 'TZ'
Error in i.p(...) :
  (converted from warning) installation of package 'praise' had non-zero exit status
Calls: <Anonymous> ... with_rprofile_user -> with_envvar -> force -> force -> i.p
Execution halted
```

## How was this patch tested?

AppVeyor

Closes #24716 from HyukjinKwon/SPARK-27848.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2019-05-28 14:42:03 +09:00
Sean Owen 6c5827c723 [SPARK-27794][R][DOCS] Use https URL for CRAN repo
## What changes were proposed in this pull request?

Use https URL for CRAN repo (and for a Scala download in a Dockerfile)

## How was this patch tested?

Existing tests.

Closes #24664 from srowen/SPARK-27794.

Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-05-22 14:28:21 -07:00
Sean Owen eed6de1a65 [MINOR][DOCS] Tighten up some key links to the project and download pages to use HTTPS
## What changes were proposed in this pull request?

Tighten up some key links to the project and download pages to use HTTPS

## How was this patch tested?

N/A

Closes #24665 from srowen/HTTPSURLs.

Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-05-21 10:56:42 -07:00
HyukjinKwon b7bf4fd123 [SPARK-27402][INFRA][FOLLOW-UP] Exclude 'hive-thriftserver' in modules to test for hadoop3.2 for now
## What changes were proposed in this pull request?

This PR excludes  'hive-thriftserver' in modules to test for hadoop3.2 for now as well

## How was this patch tested?

Manually tested via `run-tests.py`

Closes #24644 from HyukjinKwon/SPARK-27402.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-05-20 07:53:19 -07:00
Dongjoon Hyun 141a3bfc8d [SPARK-27755][BUILD] Update zstd-jni to 1.4.0-1
## What changes were proposed in this pull request?

This PR aims to update `zstd-jni` library to `1.4.0-1` which improves the `level 1 compression speed` performance by 6% in most scenarios. The following is the full release note.
- https://github.com/facebook/zstd/releases/tag/v1.4.0

## How was this patch tested?

Pass the Jenkins.

Closes #24632 from dongjoon-hyun/SPARK-27755.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-05-17 08:34:45 -07:00
Kazuaki Ishizaki 9e0d8c6ce2 [SPARK-27752][CORE] Upgrade lz4-java from 1.5.1 to 1.6.0
## What changes were proposed in this pull request?

This PR upgrades lz4-java from 1.5.1 to 1.6.0. Lz4-java is available at https://github.com/lz4/lz4-java.

Changes from 1.5.1:
- Upgraded LZ4 to 1.9.1. Updated the JNI bindings, except for the one for Linux/i386. Decompression speed is improved on amd64.
- Deprecated use of LZ4FastDecompressor of a native instance because the corresponding C API function is deprecated. See the release note of LZ4 1.9.0 for details. Updated javadoc accordingly.
- Changed the module name from org.lz4.lz4-java to org.lz4.java to avoid using - in the module name. (severn-everett, Oliver Eikemeier, Rei Odaira)
- Enabled build with Java 11. Note that the distribution is still built with Java 7. (Rei Odaira)

## How was this patch tested?

Existing tests.

Closes #24629 from kiszk/SPARK-27752.

Authored-by: Kazuaki Ishizaki <ishizaki@jp.ibm.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-05-16 20:45:13 -07:00
Yuming Wang f3ddd6f9da [SPARK-27402][SQL][TEST-HADOOP3.2][TEST-MAVEN] Fix hadoop-3.2 test issue(except the hive-thriftserver module)
## What changes were proposed in this pull request?

This pr fix hadoop-3.2 test issues(except the `hive-thriftserver` module):
1. Add `hive.metastore.schema.verification` and `datanucleus.schema.autoCreateAll` to HiveConf.
2. hadoop-3.2 support access the Hive metastore from 0.12 to 2.2

After [SPARK-27176](https://issues.apache.org/jira/browse/SPARK-27176) and this PR, we upgraded the built-in Hive to 2.3 when enabling the Hadoop 3.2+ profile. This upgrade fixes the following issues:
- [HIVE-6727](https://issues.apache.org/jira/browse/HIVE-6727): Table level stats for external tables are set incorrectly.
- [HIVE-15653](https://issues.apache.org/jira/browse/HIVE-15653): Some ALTER TABLE commands drop table stats.
- [SPARK-12014](https://issues.apache.org/jira/browse/SPARK-12014): Spark SQL query containing semicolon is broken in Beeline.
- [SPARK-25193](https://issues.apache.org/jira/browse/SPARK-25193): insert overwrite doesn't throw exception when drop old data fails.
- [SPARK-25919](https://issues.apache.org/jira/browse/SPARK-25919): Date value corrupts when tables are "ParquetHiveSerDe" formatted and target table is Partitioned.
- [SPARK-26332](https://issues.apache.org/jira/browse/SPARK-26332): Spark sql write orc table on viewFS throws exception.
- [SPARK-26437](https://issues.apache.org/jira/browse/SPARK-26437): Decimal data becomes bigint to query, unable to query.

## How was this patch tested?
This pr test Spark’s Hadoop 3.2 profile on jenkins and #24591 test Spark’s Hadoop 2.7 profile on jenkins

This PR close #24591

Closes #24391 from wangyum/SPARK-27402.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: gatorsmile <gatorsmile@gmail.com>
2019-05-13 10:35:26 -07:00
Dongjoon Hyun 375cfa3d89 [SPARK-27467][BUILD] Upgrade Maven to 3.6.1
## What changes were proposed in this pull request?

This PR aims to upgrade Maven to 3.6.1 to bring JDK9+ related patches like [MNG-6506](https://issues.apache.org/jira/browse/MNG-6506). For the full release note, please see the following.
- https://maven.apache.org/docs/3.6.1/release-notes.html

This was committed and reverted due to AppVeyor failure. It turns out that the root cause is `PATH` issue. With the updated AppVeyor script, it passed.

https://ci.appveyor.com/project/ApacheSoftwareFoundation/spark/builds/24273412

## How was this patch tested?

Pass the Jenkins and AppVoyer

Closes #24481 from dongjoon-hyun/SPARK-R.

Lead-authored-by: Dongjoon Hyun <dhyun@apple.com>
Co-authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-05-02 20:01:17 -07:00
Yuming Wang 875e7e1d97 [SPARK-27620][BUILD] Upgrade jetty to 9.4.18.v20190429
## What changes were proposed in this pull request?

This pr upgrade jetty to [9.4.18.v20190429](https://github.com/eclipse/jetty.project/releases/tag/jetty-9.4.18.v20190429) because of [CVE-2019-10247](https://nvd.nist.gov/vuln/detail/CVE-2019-10247).

## How was this patch tested?

Existing test.

Closes #24513 from wangyum/SPARK-27620.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2019-05-03 09:25:54 +09:00
Yuming Wang 3ecafb0e14 [SPARK-27601][BUILD] Upgrade stream-lib to 2.9.6
## What changes were proposed in this pull request?

[stream-lib 2.9.6](https://github.com/addthis/stream-lib/commits/v2.9.6) include several improvements:
![image](https://user-images.githubusercontent.com/5399861/56938062-7eb77580-6b32-11e9-8c36-711ab943d657.png)

## How was this patch tested?

N/A

Closes #24492 from wangyum/SPARK-27601.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-05-02 15:21:57 -05:00
Cheng Lian b73744a147 [SPARK-27611][BUILD] Exclude jakarta.activation:jakarta.activation-api from org.glassfish.jaxb:jaxb-runtime:2.3.2
PR #23890 introduced `org.glassfish.jaxb:jaxb-runtime:2.3.2` as a runtime dependency. As an unexpected side effect, `jakarta.activation:jakarta.activation-api:1.2.1` was also pulled in as a transitive dependency. As a result, for the Maven build, both of the following two jars can be found under `assembly/target/scala-2.12/jars/`:

```
activation-1.1.1.jar
jakarta.activation-api-1.2.1.jar
```

This PR exludes the Jakarta one.

Manually built Spark using Maven and checked files under `assembly/target/scala-2.12/jars/`. After this change, only `activation-1.1.1.jar` is there.

Closes #24507 from liancheng/spark-27611.

Authored-by: Cheng Lian <lian@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-05-01 20:12:17 -07:00
HyukjinKwon d8db7db50b Revert "[SPARK-27467][FOLLOW-UP][BUILD] Upgrade Maven to 3.6.1 in AppVeyor and Doc"
This reverts commit bde30bc57c.
2019-04-28 11:03:15 +09:00
Yuming Wang bde30bc57c [SPARK-27467][FOLLOW-UP][BUILD] Upgrade Maven to 3.6.1 in AppVeyor and Doc
## What changes were proposed in this pull request?

Update the `docs/building-spark.md`. Otherwise:
```
mvn package -DskipTests=true
...
[INFO] --- maven-enforcer-plugin:3.0.0-M2:enforce (enforce-versions)  spark-parent_2.12 ---
[WARNING] Rule 0: org.apache.maven.plugins.enforcer.RequireMavenVersion failed with message:
Detected Maven Version: 3.6.0 is not in the allowed range 3.6.1.
...
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-enforcer-plugin:3.0.0-M2:enforce (enforce-versions) on project spark-parent_2.12: Some Enforcer rules have failed. Look above for specific messages explaining why the rule failed. -> [Help 1]
[ERROR]
...
```

## How was this patch tested?
Just test `https://archive.apache.org/dist/maven/maven-3/3.6.1/binaries/apache-maven-3.6.1-bin.zip` is avilable.

Closes #24477 from wangyum/SPARK-27467.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-04-27 09:09:47 -07:00
Yuming Wang fe99305101 [SPARK-27556][BUILD] Exclude com.zaxxer:HikariCP-java7 from hadoop-yarn-server-web-proxy
## What changes were proposed in this pull request?

There are two HikariCP packages in classpath when building with `-Phive -Pyarn -Phadoop-3.2`.

The HikariCP dependency tree:
```
[INFO] | +- org.apache.hadoop:hadoop-yarn-server-web-proxy:jar:3.2.0:compile
[INFO] | | \- org.apache.hadoop:hadoop-yarn-server-common:jar:3.2.0:compile
[INFO] | | +- org.apache.hadoop:hadoop-yarn-registry:jar:3.2.0:compile
[INFO] | | | \- commons-daemon:commons-daemon:jar:1.0.13:compile
[INFO] | | +- org.apache.geronimo.specs:geronimo-jcache_1.0_spec🫙1.0-alpha-1:compile
[INFO] | | +- org.ehcache:ehcache:jar:3.3.1:compile
[INFO] | | +- com.zaxxer:HikariCP-java7:jar:2.4.12:compile
```

```
[INFO] +- org.apache.hive:hive-metastore:jar:2.3.4:compile
[INFO] | +- javolution:javolution:jar:5.5.1:compile
[INFO] | +- com.google.protobuf:protobuf-java:jar:2.5.0:compile
[INFO] | +- com.jolbox:bonecp:jar:0.8.0.RELEASE:compile
[INFO] | +- com.zaxxer:HikariCP:jar:2.5.1:compile
```

This pr exclude `com.zaxxer:HikariCP-java7` from `hadoop-yarn-server-web-proxy`.

## How was this patch tested?

manual tests

Closes #24450 from wangyum/SPARK-27556.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-04-26 12:15:39 -05:00
Yuming Wang f82ed5e8e0 [MINOR][TEST] Remove out-dated hive version in run-tests.py
## What changes were proposed in this pull request?

```
========================================================================
Building Spark
========================================================================
[info] Building Spark (w/Hive 1.2.1) using SBT with these arguments:  -Phadoop-3.2 -Pkubernetes -Phive-thriftserver -Pkinesis-asl -Pyarn -Pspark-ganglia-lgpl -Phive -Pmesos test:package streaming-kinesis-asl-assembly/assembly
```

`(w/Hive 1.2.1)` is incorrect when testing hadoop-3.2, It's should be (w/Hive 2.3.4).
This pr removes `(w/Hive 1.2.1)` in run-tests.py.

## How was this patch tested?

N/A

Closes #24451 from wangyum/run-tests-invalid-info.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-04-24 21:22:15 -07:00
Yuming Wang 777b4502b2 [SPARK-27176][FOLLOW-UP][SQL] Upgrade Hive parquet to 1.10.1 for hadoop-3.2
## What changes were proposed in this pull request?

When we compile and test Hadoop 3.2, we will hint the following two issues:
1. JobSummaryLevel is not a member of object org.apache.parquet.hadoop.ParquetOutputFormat. Fixed by [PARQUET-381](https://issues.apache.org/jira/browse/PARQUET-381)(Parquet 1.9.0)
2. java.lang.NoSuchFieldError: BROTLI
    at org.apache.parquet.hadoop.metadata.CompressionCodecName.<clinit>(CompressionCodecName.java:31). Fixed by [PARQUET-1143](https://issues.apache.org/jira/browse/PARQUET-1143)(Parquet 1.10.0)

The reason is that the `parquet-hadoop-bundle-1.8.1.jar` conflicts with Parquet 1.10.1.
I think it would be safe to upgrade Hive's parquet to 1.10.1 to workaround this issue.

This is what Hive did when upgrading Parquet 1.8.1 to 1.10.0: [HIVE-17000](https://issues.apache.org/jira/browse/HIVE-17000) and [HIVE-19464](https://issues.apache.org/jira/browse/HIVE-19464). We can see that all changes are related to vectors, and vectors are disabled by default: see [HIVE-14826](https://issues.apache.org/jira/browse/HIVE-14826) and [HiveConf.java#L2723](https://github.com/apache/hive/blob/rel/release-2.3.4/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L2723).

This pr removes [parquet-hadoop-bundle-1.8.1.jar](https://github.com/apache/parquet-mr/tree/master/parquet-hadoop-bundle) , so Hive serde will use [parquet-common-1.10.1.jar, parquet-column-1.10.1.jar and parquet-hadoop-1.10.1.jar](https://github.com/apache/spark/blob/master/dev/deps/spark-deps-hadoop-3.2#L185-L189).

## How was this patch tested?

1. manual tests
2. [upgrade Hive Parquet to 1.10.1 annd run Hadoop 3.2 test on jenkins](https://github.com/apache/spark/pull/24044#commits-pushed-0c3f962)

Closes #24346 from wangyum/SPARK-27176.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: gatorsmile <gatorsmile@gmail.com>
2019-04-19 08:59:08 -07:00
shane knapp e1ece6a319 [SPARK-25079][PYTHON] update python3 executable to 3.6.x
## What changes were proposed in this pull request?

have jenkins test against python3.6 (instead of 3.4).

## How was this patch tested?

extensive testing on both the centos and ubuntu jenkins workers.

NOTE:  this will need to be backported to all active branches.

Closes #24266 from shaneknapp/updating-python3-executable.

Authored-by: shane knapp <incomplete@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2019-04-19 10:03:50 +09:00
Dongjoon Hyun f93460dae9 [SPARK-27493][BUILD] Upgrade ASM to 7.1
## What changes were proposed in this pull request?

[SPARK-25946](https://issues.apache.org/jira/browse/SPARK-25946) upgraded ASM to 7.0 to support JDK11. This PR aims to update ASM to 7.1 to bring the bug fixes.
- https://asm.ow2.io/versions.html
- https://issues.apache.org/jira/browse/XBEAN-316

## How was this patch tested?

Pass the Jenkins.

Closes #24395 from dongjoon-hyun/SPARK-27493.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2019-04-18 13:36:52 +09:00
Dongjoon Hyun a8f20c95ab [SPARK-27452][BUILD] Update zstd-jni to 1.3.8-9
## What changes were proposed in this pull request?

This PR aims to update `zstd-jni` from 1.3.2-2 to 1.3.8-9 to be aligned with the latest Zstd 1.3.8 in Apache Spark 3.0.0. Currently, Apache Spark is aligned with the old Zstd used in the first PR and there are many bugfix and improvement updates in `zstd-jni` until now.
- https://github.com/facebook/zstd/releases/tag/v1.3.8
- https://github.com/facebook/zstd/releases/tag/v1.3.7
- https://github.com/facebook/zstd/releases/tag/v1.3.6
- https://github.com/facebook/zstd/releases/tag/v1.3.4
- https://github.com/facebook/zstd/releases/tag/v1.3.3

## How was this patch tested?

Pass the Jenkins with the existing tests.

Closes #24364 from dongjoon-hyun/SPARK-ZSTD.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-04-16 08:54:16 -07:00
Sean Owen 8718367e2e [SPARK-27470][PYSPARK] Update pyrolite to 4.23
## What changes were proposed in this pull request?

 Update pyrolite to 4.23 to pick up bug and security fixes.

## How was this patch tested?

Existing tests.

Closes #24381 from srowen/SPARK-27470.

Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2019-04-16 19:41:40 +09:00
Sean Owen a4cf1a4f4e [SPARK-27469][CORE] Update Commons BeanUtils to 1.9.3
## What changes were proposed in this pull request?

Unify commons-beanutils deps to latest 1.9.3. This resolves the version inconsistency in Hadoop 2.7's build and also picks up security and bug fixes.

## How was this patch tested?

Existing tests.

Closes #24378 from srowen/SPARK-27469.

Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-04-15 19:18:37 -07:00
Dongjoon Hyun 0881f648cf [SPARK-27451][BUILD] Upgrade lz4-java to 1.5.1
## What changes were proposed in this pull request?

This PR upgrades `lz4-java` to 1.5.1 in order to get a patch for avoiding racing with GC.
- https://github.com/lz4/lz4-java/blob/master/CHANGES.md#151

## How was this patch tested?

Pass the Jenkins with the existing tests.

Closes #24363 from dongjoon-hyun/SPARK-LZ4.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-04-12 19:21:43 -07:00
Yuming Wang 33f3c48cac [SPARK-27176][SQL] Upgrade hadoop-3's built-in Hive maven dependencies to 2.3.4
## What changes were proposed in this pull request?

This PR mainly contains:
1. Upgrade hadoop-3's built-in Hive maven dependencies to 2.3.4.
2. Resolve compatibility issues between Hive 1.2.1 and Hive 2.3.4 in the `sql/hive` module.

## How was this patch tested?
jenkins test hadoop-2.7
manual test hadoop-3:
```shell
build/sbt clean package -Phadoop-3.2 -Phive
export SPARK_PREPEND_CLASSES=true

# rm -rf metastore_db

cat <<EOF > test_hadoop3.scala
spark.range(10).write.saveAsTable("test_hadoop3")
spark.table("test_hadoop3").show
EOF

bin/spark-shell --conf spark.hadoop.hive.metastore.schema.verification=false --conf spark.hadoop.datanucleus.schema.autoCreateAll=true -i test_hadoop3.scala
```

Closes #23788 from wangyum/SPARK-23710-hadoop3.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: gatorsmile <gatorsmile@gmail.com>
2019-04-08 08:42:21 -07:00
Sean Owen 23bde44797 [SPARK-27358][UI] Update jquery to 1.12.x to pick up security fixes
## What changes were proposed in this pull request?

Update jquery -> 1.12.4, datatables -> 1.10.18, mustache -> 2.3.12.
Add missing mustache license

## How was this patch tested?

I manually tested the UI locally with the javascript console open and didn't observe any problems or JS errors. The only 'risky' change seems to be mustache, but on reading its release notes, don't think the changes from 0.8.1 to 2.x would affect Spark's simple usage.

Closes #24288 from srowen/SPARK-27358.

Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-04-05 12:54:01 -05:00
LantaoJin 69dd44af19 [SPARK-27216][CORE] Upgrade RoaringBitmap to 0.7.45 to fix Kryo unsafe ser/dser issue
## What changes were proposed in this pull request?

HighlyCompressedMapStatus uses RoaringBitmap to record the empty blocks. But RoaringBitmap couldn't be ser/deser with unsafe KryoSerializer.

It's a bug of RoaringBitmap-0.5.11 and fixed in latest version.

This is an update of #24157

## How was this patch tested?

Add a UT

Closes #24264 from LantaoJin/SPARK-27216.

Lead-authored-by: LantaoJin <jinlantao@gmail.com>
Co-authored-by: Lantao Jin <jinlantao@gmail.com>
Signed-off-by: Imran Rashid <irashid@cloudera.com>
2019-04-03 20:09:50 -05:00
Yuming Wang 13c5c1fb4b [SPARK-27180][BUILD][YARN] Fix testing issues with yarn module in Hadoop-3
## What changes were proposed in this pull request?

Fix testing issues with `yarn` module in Hadoop-3:

1. Upgrade jersey-1 to `1.19` to fix ```Cause: java.lang.NoClassDefFoundError: com/sun/jersey/spi/container/servlet/ServletContainer```.
2. Copy `ServerSocketUtil` from hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/net/ServerSocketUtil.java to fix ```java.lang.NoClassDefFoundError: org/apache/hadoop/net/ServerSocketUtil```.
3. Adapte `SessionHandler` from jetty-9.3.25.v20180904/jetty-server/src/main/java/org/eclipse/jetty/server/session/SessionHandler.java  to fix ```java.lang.NoSuchMethodError: org.eclipse.jetty.server.session.SessionHandler.getSessionManager()Lorg/eclipse/jetty/server/SessionManager```.

## How was this patch tested?

manual tests:
```shell
build/sbt yarn/test -Pyarn
build/sbt yarn/test -Phadoop-3.2 -Pyarn

build/mvn -Dtest=none -DwildcardSuites=org.apache.spark.deploy.yarn.YarnClusterSuite -pl resource-managers/yarn test -Pyarn
build/mvn -Dtest=none -DwildcardSuites=org.apache.spark.deploy.yarn.YarnClusterSuite -pl resource-managers/yarn test -Pyarn -Phadoop-3.2
```

Closes #24115 from wangyum/hadoop3-yarn.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-04-02 15:38:26 -05:00
Yuming Wang f799e34962 [MINOR][BUILD] Upgrade apache-rat to 0.13
## What changes were proposed in this pull request?

This PR upgrade `apache-rat` to 0.13. Issues fixed by 0.13:
https://issues.apache.org/jira/issues/?jql=project%20%3D%20RAT%20AND%20fixVersion%20%3D%200.13

## How was this patch tested?

manual tests

Closes #24262 from wangyum/apache-rat.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
2019-04-01 16:44:42 +09:00
Sean Owen 754f820035 [SPARK-26918][DOCS] All .md should have ASF license header
## What changes were proposed in this pull request?

Add AL2 license to metadata of all .md files.
This seemed to be the tidiest way as it will get ignored by .md renderers and other tools. Attempts to write them as markdown comments revealed that there is no such standard thing.

## How was this patch tested?

Doc build

Closes #24243 from srowen/SPARK-26918.

Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-03-30 19:49:45 -05:00
Sean Owen 2ec650d843 [SPARK-27267][CORE] Update snappy to avoid error when decompressing empty serialized data
## What changes were proposed in this pull request?

(See JIRA for problem statement)

Update snappy 1.1.7.1 -> 1.1.7.3 to pick up an empty-stream and Java 9 fix.

There appear to be no other changes of consequence:
https://github.com/xerial/snappy-java/blob/master/Milestone.md

## How was this patch tested?

Existing tests

Closes #24242 from srowen/SPARK-27267.

Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-03-30 02:41:24 -05:00
Hyukjin Kwon 0e16a6f5b0 [SPARK-27277][INFRA] Recover from setting fix version failure in merge script
## What changes were proposed in this pull request?

I happened to meet this case few times before:

```
Enter comma-separated fix version(s) [3.0.0]: 3.0,0
Restoring head pointer to master
git checkout master
Already on 'master'
git branch
Traceback (most recent call last):
  File "./dev/merge_spark_pr_jira.py", line 537, in <module>
    main()
  File "./dev/merge_spark_pr_jira.py", line 523, in main
    resolve_jira_issues(title, merged_refs, jira_comment)
  File "./dev/merge_spark_pr_jira.py", line 359, in resolve_jira_issues
    resolve_jira_issue(merge_branches, comment, jira_id)
  File "./dev/merge_spark_pr_jira.py", line 302, in resolve_jira_issue
    jira_fix_versions = map(lambda v: get_version_json(v), fix_versions)
  File "./dev/merge_spark_pr_jira.py", line 302, in <lambda>
    jira_fix_versions = map(lambda v: get_version_json(v), fix_versions)
  File "./dev/merge_spark_pr_jira.py", line 300, in get_version_json
    return filter(lambda v: v.name == version_str, versions)[0].raw
IndexError: list index out of range
```

I typed the fix version wrongly (there's comma in `3.0,0`) and it ended the loop in the merge script. Not a big deal but it bugged me few times. Finally I met this today again, and decided to fix.

This PR proposes to recover from wrongly set fix versions.

## How was this patch tested?

I manually copied and pasted the specific codes and tested separately in both Python 2 and Python 3.

**Positive cases:**

```
Enter comma-separated fix version(s) [3.0.0]:  # blank test (to use default)
['3.0.0']
```

```
Enter comma-separated fix version(s) [3.0.0,2.4.2]:  # multiple default versions
['3.0.0', '2.4.2']
```

```
Enter comma-separated fix version(s) [3.0.0]: 2.4.1  # valid version
['2.4.1']
```

```
Enter comma-separated fix version(s) [3.0.0]: 3.0.0,2.4.2  # multiple valid versions
['3.0.0', '2.4.2']
```

**Keyboard interrupt(Ctrl + c):**

```
Enter comma-separated fix version(s) [3.0.0]: ^CTraceback (most recent call last):  # keyboard interrupt
  File "test_merge_script.py", line 45, in <module>
    test()
  File "test_merge_script.py", line 26, in test
    fix_versions = input("Enter comma-separated fix version(s) [%s]: " % default_fix_versions)
KeyboardInterrupt
```

**Wrongly typed versions (recovered):**

```
Enter comma-separated fix version(s) [3.0.0]: 3.1
Specified version(s) [3.1] not found in the available versions, try again (or leave blank and fix manually).
Enter comma-separated fix version(s) [3.0.0]: 123
Specified version(s) [123] not found in the available versions, try again (or leave blank and fix manually).
Enter comma-separated fix version(s) [3.0.0]: 3.0,0
Specified version(s) [3.0, 0] not found in the available versions, try again (or leave blank and fix manually).
Enter comma-separated fix version(s) [3.0.0]: damn
Specified version(s) [damn] not found in the available versions, try again (or leave blank and fix manually).
Enter comma-separated fix version(s) [3.0.0]: 3.0.0,2.5.2  # one invalid versions in multiple versions
Specified version(s) [3.0.0, 2.5.2] not found in the available versions, try again (or leave blank and fix manually).
```

**Arbitrary exceptions in fix version parsing (recovered)**

```
Enter comma-separated fix version(s) [3.0.0]:
Traceback (most recent call last):
  File "tmp.py", line 11, in <module>
    raise Exception("arbitrary exception")
Exception: arbitrary exception
Error setting fix version(s), try again (or leave blank and fix manually)
Enter comma-separated fix version(s) [3.0.0]:
Traceback (most recent call last):
  File "tmp.py", line 10, in <module>
    raise Exception("arbitrary exception")
Exception: arbitrary exception
Error setting fix version(s), try again (or leave blank and fix manually)
Enter comma-separated fix version(s) [3.0.0]:
```

Closes #24213 from HyukjinKwon/merge_script_fix_version.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
2019-03-26 21:14:07 +09:00