ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Sean Owen	9ea37b09cf	[SPARK-17875][CORE][BUILD] Remove dependency on Netty 3 ### What changes were proposed in this pull request? Spark uses Netty 4 directly, but also includes Netty 3 only because transitive dependencies do. The dependencies (Hadoop HDFS, Zookeeper, Avro) don't seem to need this dependency as used in Spark. I think we can forcibly remove it to slim down the dependencies. Previous attempts were blocked by its usage in Flume, but that dependency has gone away. https://github.com/apache/spark/pull/15436 ### Why are the changes needed? Mostly to reduce the transitive dependency size and complexity a little bit and avoid triggering spurious security alerts on Netty 3.x usage. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Existing tests Closes #25544 from srowen/SPARK-17875. Authored-by: Sean Owen <sean.owen@databricks.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-08-21 21:27:56 -07:00
Sean Owen	c9b49f3978	[SPARK-28737][CORE] Update Jersey to 2.29 ## What changes were proposed in this pull request? Update Jersey to 2.27+, ideally 2.29, for possible JDK 11 fixes. ## How was this patch tested? Existing tests. Closes #25455 from srowen/SPARK-28737. Authored-by: Sean Owen <sean.owen@databricks.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-08-16 15:08:04 -07:00
Dongjoon Hyun	43101c7328	[SPARK-28758][BUILD][SQL] Upgrade Janino to 3.0.15 ### What changes were proposed in this pull request? This PR aims to upgrade `Janino` from `3.0.13` to `3.0.15` in order to bring the bug fixes. Please note that `3.1.0` is a major refactoring instead of bug fixes. We had better use `3.0.15` and wait for the stabler 3.1.x. ### Why are the changes needed? This brings the following bug fixes. 3.0.15 (2019-07-28) - Fix overloaded single static method import 3.0.14 (2019-07-05) - Conflict in sbt-assembly - Overloaded static on-demand imported methods cause a CompileException: Ambiguous static method import - Handle overloaded static on-demand imports - Major refactoring of the Java 8 and Java 9 retrofit mechanism - Added tests for "JLS8 8.6 Instance Initializers" and "JLS8 8.7 Static Initializers" - Local variables in instance initializers don't work - Provide an option to keep generated code files - Added compile error handler and warning handler to ICompiler ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Pass the Jenkins with the existing tests. Closes #25474 from dongjoon-hyun/SPARK-28758. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-08-16 11:33:02 -07:00
Fokko Driesprong	babdba0f9e	[SPARK-28728][BUILD] Bump Jackson Databind to 2.9.9.3 ## What changes were proposed in this pull request? Update Jackson databind to the latest version for some latest changes. ## How was this patch tested? Pass the Jenkins. Closes #25451 from Fokko/fd-bump-jackson-databind. Lead-authored-by: Fokko Driesprong <fokko@apache.org> Co-authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-08-16 03:40:41 -07:00
Dongjoon Hyun	f1d6b19de5	[SPARK-28720][BUILD][R] Update AppVeyor R version to 3.6.1 ## What changes were proposed in this pull request? R version 3.6.1 (Action of the Toes) was released on 2019-07-05. This PR aims to upgrade R installation for AppVeyor CI environment. ## How was this patch tested? Pass the AppVeyor CI. Closes #25441 from dongjoon-hyun/SPARK-28720. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: DB Tsai <d_tsai@apple.com>	2019-08-13 22:56:53 +00:00
WeichenXu	f21bc1874a	[SPARK-27889][INFRA] Make development scripts under dev/ support Python 3 ## What changes were proposed in this pull request? I made an audit and update all dev scripts to support python3. (except `merge_spark_pr.py` which already updated) ## How was this patch tested? Manual. Closes #25289 from WeichenXu123/dev_py3. Authored-by: WeichenXu <weichen.xu@databricks.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2019-08-09 18:55:48 +09:00
Dongjoon Hyun	ae08387b4c	[SPARK-28616][INFRA] Improve merge-spark-pr script to warn WIP PRs and strip trailing dots ## What changes were proposed in this pull request? This PR aims to improve the `merge-spark-pr` script in the following two ways. 1. `[WIP]` is useful when we show that a PR is not ready for merge. Apache Spark allows merging `WIP` PRs. However, sometime, we accidentally forgot to clean up the title for the completed PRs. We had better warn once more during merging stage and get a confirmation from the committers. 2. We have two kinds of PR titles in terms of the ending period. This PR aims to remove the trailing `dot` since the shorter is the better in the commit title. Also, the PR titles without the trailing `dot` is dominant in the Apache Spark commit logs. ``` $ git log --oneline \| grep '[.]$' \| wc -l 4090 $ git log --oneline \| grep '[^.]$' \| wc -l 20747 ``` ## How was this patch tested? Manual. ``` $ dev/merge_spark_pr.py git rev-parse --abbrev-ref HEAD Which pull request would you like to merge? (e.g. 34): 25157 The PR title has `[WIP]`: [WIP][SPARK-28396][SQL] Add PathCatalog for data source V2 Continue? (y/n): ``` ``` $ dev/merge_spark_pr.py git rev-parse --abbrev-ref HEAD Which pull request would you like to merge? (e.g. 34): 25304 I've re-written the title as follows to match the standard format: Original: [SPARK-28570][CORE][SHUFFLE] Make UnsafeShuffleWriter use the new API. Modified: [SPARK-28570][CORE][SHUFFLE] Make UnsafeShuffleWriter use the new API Would you like to use the modified title? (y/n): ``` Closes #25356 from dongjoon-hyun/SPARK-28616. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-08-04 21:23:54 -07:00
Dongjoon Hyun	0c6874fb37	[SPARK-28606][INFRA] Update CRAN key to recover docker image generation ## What changes were proposed in this pull request? CRAN repo changed the key and it causes our release script failure. This is a release blocker for Apache Spark 2.4.4 and 3.0.0. - https://cran.r-project.org/bin/linux/ubuntu/README.html ``` Err:1 https://cloud.r-project.org/bin/linux/ubuntu bionic-cran35/ InRelease The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 51716619E084DAB9 ... W: GPG error: https://cloud.r-project.org/bin/linux/ubuntu bionic-cran35/ InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 51716619E084DAB9 E: The repository 'https://cloud.r-project.org/bin/linux/ubuntu bionic-cran35/ InRelease' is not signed. ``` Note that they are reusing `cran35` for R 3.6 although they changed the key. ``` Even though R has moved to version 3.6, for compatibility the sources.list entry still uses the cran3.5 designation. ``` This PR aims to recover the docker image generation first. We will verify the R doc generation in a separate JIRA and PR. ## How was this patch tested? Manual. After `docker-build.log`, it should continue to the next stage, `Building v3.0.0-rc1`. ``` $ dev/create-release/do-release-docker.sh -d /tmp/spark-3.0.0 -n -s docs ... Log file: docker-build.log Building v3.0.0-rc1; output will be at /tmp/spark-3.0.0/output ``` Closes #25339 from dongjoon-hyun/SPARK-28606. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: DB Tsai <d_tsai@apple.com>	2019-08-02 23:41:00 +00:00
HyukjinKwon	24c1bc2483	[SPARK-28586][INFRA] Make merge-spark-pr script compatible with Python 3 ## What changes were proposed in this pull request? This PR proposes to make `merge_spark_pr.py` script Python 3 compatible. ## How was this patch tested? Manually tested against my forked remote with the PR and JIRA below: https://github.com/apache/spark/pull/25321 https://github.com/apache/spark/pull/25286 https://issues.apache.org/jira/browse/SPARK-28153 Closes #25322 from HyukjinKwon/merge-script. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-08-01 10:17:17 -07:00
Wing Yew Poon	80ab19b9fd	[SPARK-26329][CORE] Faster polling of executor memory metrics. ## What changes were proposed in this pull request? Prior to this change, in an executor, on each heartbeat, memory metrics are polled and sent in the heartbeat. The heartbeat interval is 10s by default. With this change, in an executor, memory metrics can optionally be polled in a separate poller at a shorter interval. For each executor, we use a map of (stageId, stageAttemptId) to (count of running tasks, executor metric peaks) to track what stages are active as well as the per-stage memory metric peaks. When polling the executor memory metrics, we attribute the memory to the active stage(s), and update the peaks. In a heartbeat, we send the per-stage peaks (for stages active at that time), and then reset the peaks. The semantics would be that the per-stage peaks sent in each heartbeat are the peaks since the last heartbeat. We also keep a map of taskId to memory metric peaks. This tracks the metric peaks during the lifetime of the task. The polling thread updates this as well. At end of a task, we send the peak metric values in the task result. In case of task failure, we send the peak metric values in the `TaskFailedReason`. We continue to do the stage-level aggregation in the EventLoggingListener. For the driver, we still only poll on heartbeats. What the driver sends will be the current values of the metrics in the driver at the time of the heartbeat. This is semantically the same as before. ## How was this patch tested? Unit tests. Manually tested applications on an actual system and checked the event logs; the metrics appear in the SparkListenerTaskEnd and SparkListenerStageExecutorMetrics events. Closes #23767 from wypoon/wypoon_SPARK-26329. Authored-by: Wing Yew Poon <wypoon@cloudera.com> Signed-off-by: Imran Rashid <irashid@cloudera.com>	2019-08-01 09:09:46 -05:00
Dongjoon Hyun	a428f40669	[SPARK-28549][BUILD][CORE][SQL] Use `text.StringEscapeUtils` instead `lang3.StringEscapeUtils` ## What changes were proposed in this pull request? `org.apache.commons.lang3.StringEscapeUtils` was deprecated over two years ago at [LANG-1316](https://issues.apache.org/jira/browse/LANG-1316). There is no bug fixes after that. ```java /** * <p>Escapes and unescapes {code String}s for * Java, Java Script, HTML and XML.</p> * * <p>#ThreadSafe#</p> * since 2.0 * deprecated as of 3.6, use commons-text * <a href="https://commons.apache.org/proper/commons-text/javadocs/api-release/org/apache/commons/text/StringEscapeUtils.html"> * StringEscapeUtils</a> instead */ Deprecated public class StringEscapeUtils { ``` This PR aims to use the latest one from `commons-text` module which has more bug fixes like [TEXT-100](https://issues.apache.org/jira/browse/TEXT-100), [TEXT-118](https://issues.apache.org/jira/browse/TEXT-118) and [TEXT-120](https://issues.apache.org/jira/browse/TEXT-120) by the following replacement. ```scala -import org.apache.commons.lang3.StringEscapeUtils +import org.apache.commons.text.StringEscapeUtils ``` This will add a new dependency to `hadoop-2.7` profile distribution. In `hadoop-3.2` profile, we already have it. ``` +commons-text-1.6.jar ``` ## How was this patch tested? Pass the Jenkins with the existing tests. - [Hadoop 2.7](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108281) - [Hadoop 3.2](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108282) Closes #25281 from dongjoon-hyun/SPARK-28549. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2019-07-29 11:45:29 +09:00
Dongjoon Hyun	33e6e4703d	[SPARK-28544][BUILD] Update zstd-jni to 1.4.2-1 ## What changes were proposed in this pull request? This PR aims to update `zstd-jni` library to bring the latest improvement and bug fixes in `1.4.1` and `1.4.2`. - https://github.com/facebook/zstd/releases/tag/v1.4.1 (4.5 ~ 11.8% performance improvement from v1.4.0 and bug fixes) - https://github.com/facebook/zstd/releases/tag/v1.4.2 (bug fixes) ## How was this patch tested? Pass the Jenkins. Closes #25275 from dongjoon-hyun/SPARK-28544. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-07-27 18:08:20 -07:00
Dongjoon Hyun	dbd0a2aa37	[SPARK-28511][INFRA] Get REV from RELEASE_VERSION instead of VERSION ## What changes were proposed in this pull request? Unlike the other versions, `x.x.0-SNAPSHOT` causes `x.x.-1`. Although this will not happen in the tags (there is no `SNAPSHOT` postfix), we had better fix this. ``` $ dev/create-release/do-release-docker.sh -d /tmp/spark-3.0.0 -n Output directory already exists. Overwrite and continue? [y/n] y Branch [branch-2.4]: master Current branch version is 3.0.0-SNAPSHOT. Release [3.0.-1]: ``` Since we already have `RELEASE_VERSION` by removing `SNAPSHOT`. This PR uses `RELEASE_VERSION` instead of `VERSION`. ``` $ dev/create-release/do-release-docker.sh -d /tmp/spark-3.0.0 -n Branch [branch-2.4]: master Current branch version is 3.0.0-SNAPSHOT. Release [3.0.0]: ``` ## How was this patch tested? Manually do `dev/create-release/do-release-docker.sh -d /tmp/spark-3.0.0 -n` and see the default value of `Release`. Closes #25254 from dongjoon-hyun/SPARK-28511. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-07-25 10:54:24 -07:00
Dongjoon Hyun	cfca26e973	[SPARK-28496][INFRA] Use branch name instead of tag during dry-run ## What changes were proposed in this pull request? There are two cases when we use `dry run`. First, when the tag already exists, we can ask `confirmation` on the existing tag name. ``` $ dev/create-release/do-release-docker.sh -d /tmp/spark-2.4.4 -n -s docs Output directory already exists. Overwrite and continue? [y/n] y Branch [branch-2.4]: Current branch version is 2.4.4-SNAPSHOT. Release [2.4.4]: 2.4.3 RC # [1]: v2.4.3-rc1 already exists. Continue anyway [y/n]? y This is a dry run. Please confirm the ref that will be built for testing. Ref [v2.4.3-rc1]: ``` Second, when the tag doesn't exist, we had better ask `confirmation` on the branch name. If we do not change the default value, it will fail eventually. ``` $ dev/create-release/do-release-docker.sh -d /tmp/spark-2.4.4 -n -s docs Branch [branch-2.4]: Current branch version is 2.4.4-SNAPSHOT. Release [2.4.4]: RC # [1]: This is a dry run. Please confirm the ref that will be built for testing. Ref [v2.4.4-rc1]: ``` This PR improves the second case by providing the branch name instead. This helps the release testing before tagging. ## How was this patch tested? Manually do the following and check the default value of `Ref` field. ``` $ dev/create-release/do-release-docker.sh -d /tmp/spark-2.4.4 -n -s docs Branch [branch-2.4]: Current branch version is 2.4.4-SNAPSHOT. Release [2.4.4]: RC # [1]: This is a dry run. Please confirm the ref that will be built for testing. Ref [branch-2.4]: ... ``` Closes #25240 from dongjoon-hyun/SPARK-28496. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>	2019-07-24 14:20:25 -07:00
Liang-Chi Hsieh	591de42351	[SPARK-28381][PYSPARK] Upgraded version of Pyrolite to 4.30 ## What changes were proposed in this pull request? This upgraded to a newer version of Pyrolite. Most updates [1] in the newer version are for dotnot. For java, it includes a bug fix to Unpickler regarding cleaning up Unpickler memo, and support of protocol 5. After upgrading, we can remove the fix at SPARK-27629 for the bug in Unpickler. [1] https://github.com/irmen/Pyrolite/compare/pyrolite-4.23...master ## How was this patch tested? Manually tested on Python 3.6 in local on existing tests. Closes #25143 from viirya/upgrade-pyrolite. Authored-by: Liang-Chi Hsieh <viirya@gmail.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2019-07-15 12:29:58 +09:00
Dongjoon Hyun	13ae9ebb38	[SPARK-28354][INFRA] Use JIRA user name instead of JIRA user key ## What changes were proposed in this pull request? `dev/merge_spark_pr.py` script always fail for some users because they have different `name` and `key`. - https://issues.apache.org/jira/rest/api/2/user?username=yumwang JIRA Client expects `name`, but we are using `key`. This PR fixes it. ```python # This is JIRA client code `/usr/local/lib/python2.7/site-packages/jira/client.py` def assign_issue(self, issue, assignee): """Assign an issue to a user. None will set it to unassigned. -1 will set it to Automatic. :param issue: the issue ID or key to assign :param assignee: the user to assign the issue to :type issue: int or str :type assignee: str :rtype: bool """ url = self._options['server'] + \ '/rest/api/latest/issue/' + str(issue) + '/assignee' payload = {'name': assignee} r = self._session.put( url, data=json.dumps(payload)) raise_on_error(r) return True ``` ## How was this patch tested? Manual with the committer ID/password. ```python import jira.client asf_jira = jira.client.JIRA({'server': 'https://issues.apache.org/jira'}, basic_auth=('yourid', 'passwd')) asf_jira.assign_issue("SPARK-28354", "q79969786") # This will raise exception. asf_jira.assign_issue("SPARK-28354", "yumwang") # This works. ``` Closes #25120 from dongjoon-hyun/SPARK-28354. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2019-07-12 18:44:29 +09:00
Yuming Wang	4ad0c33be4	[SPARK-28221][BUILD] Upgrade janino to 3.0.13 ## What changes were proposed in this pull request? Mainly change logs: ### Version 3.0.13: - Support for JDK 9/10 in Full Compiler - The syntax elements that can have modifiers now all have sets of "is...()" methods that check for each modifier. Some also have methods "getAccess()" and/or "getAnnotations()". - Implement "type annotations" (JLS8 9.7.4) - Implemented parsing (but not compilation) of "modular compilation units" (JLS11 7.3). - Replaced all "assert...Uncookable(..., Pattern messageRegex)" and "assert...Uncookable(..., String messageInfix)" method pairs with a single "assert...Uncookable(..., String messageRegex)" method. Minor refactoring: Allowed modifiers are now checked in the Parser, not in Java.*. This saves a lot of THROWS clauses. - Parse Type inference syntax: Type inference for generic instance creation implemented, test cases added. - Parse MethodReference, ClassInstanceCreationReference and ArrayCreationReference ### Version 3.0.12 - Fixed: Operator "&" not defined on types "java.lang.Long" and "int" - Major bug in JavaSourceClassLoader: When loading the second and following classes, CUs were compiled again, leading to an inconsistent class hierarchy. - Fixed: Java 9 added "Override public final CharBuffer CharBuffer.rewind() { ..." -- leads easily to a java.lang.NoSuchMethodError - Changed all occurences of the words "Java bytecode" to "JVM bytecode" to make clearer that the generated bytecode is for the JVMS and not suitable for, e.g. DALVIK. http://janino-compiler.github.io/janino/changelog.html ## How was this patch tested? Existing test Closes #25021 from wangyum/SPARK-28221. Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-07-06 10:02:42 -07:00
Marcelo Vanzin	11e21cc17a	[SPARK-28187][BUILD] Add support for hadoop-cloud to the PR builder. Closes #24987 from vanzin/SPARK-28187. Authored-by: Marcelo Vanzin <vanzin@cloudera.com> Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>	2019-06-27 15:59:05 -07:00
Hyukjin Kwon	1d36b892ab	[SPARK-7721][INFRA][FOLLOW-UP] Remove cloned coverage repo after posting HTMLs ## What changes were proposed in this pull request? This PR proposes to remove cloned `pyspark-coverage-site` repo. it doesn't looks a problem in PR builder but somehow it's problematic in `spark-master-test-sbt-hadoop-2.7`. ## How was this patch tested? Jenkins. Closes #23729 from HyukjinKwon/followup-coverage. Lead-authored-by: Hyukjin Kwon <gurwls223@apache.org> Co-authored-by: shane knapp <incomplete@gmail.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2019-06-25 09:18:32 +09:00
Sean Owen	67042e90e7	[MINOR][BUILD] Exclude pyspark-coverage-site/ dir from RAT ## What changes were proposed in this pull request? Looks like a directory `pyspark-site-coverage/` is now (?) generated and fails RAT checks. It should just be excluded. See: https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-2.7/6029/console ## How was this patch tested? N/A Closes #24950 from srowen/pysparkcoveragesite. Authored-by: Sean Owen <sean.owen@databricks.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2019-06-24 14:07:41 -05:00
Dongjoon Hyun	ea0e119f84	[SPARK-28111][BUILD] Upgrade `xbean-asm7-shaded` to 4.14 ## What changes were proposed in this pull request? This PR aims to update `xbean-asm7-shaded` to bring [XBEAN-318](https://issues.apache.org/jira/browse/XBEAN-318) which is helpful to log the class definition reading failures. - https://issues.apache.org/jira/projects/XBEAN/versions/12345220 ## How was this patch tested? Pass the Jenkins. Closes #24914 from dongjoon-hyun/SPARK-28111. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-06-20 07:59:59 -07:00
Sean Owen	15462e1a8f	[SPARK-28004][UI] Update jquery to 3.4.1 ## What changes were proposed in this pull request? We're using an old-ish jQuery, 1.12.4, and should probably update for Spark 3 to keep up in general, but also to keep up with CVEs. In fact, we know of at least one resolved in only 3.4.0+ (https://nvd.nist.gov/vuln/detail/CVE-2019-11358). They may not affect Spark, but, if the update isn't painful, maybe worthwhile in order to make future 3.x updates easier. jQuery 1 -> 2 doesn't sound like a breaking change, as 2.0 is supposed to maintain compatibility with 1.9+ (https://blog.jquery.com/2013/04/18/jquery-2-0-released/) 2 -> 3 has breaking changes: https://jquery.com/upgrade-guide/3.0/. It's hard to evaluate each one, but the most likely area for problems is in ajax(). However, our usage of jQuery (and plugins) is pretty simple. Update jquery to 3.4.1; update jquery blockUI and mustache to latest ## How was this patch tested? Manual testing of docs build (except R docs), worker/master UI, spark application UI. Note: this really doesn't guarantee it works, as our tests can't test javascript, and this is merely anecdotal testing, although I clicked about every link I could find. There's a risk this breaks a minor part of the UI; it does seem to work fine in the main. Closes #24843 from srowen/SPARK-28004. Authored-by: Sean Owen <sean.owen@databricks.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-06-14 22:19:20 -07:00
Dongjoon Hyun	fd8240d10c	[SPARK-28051][INFRA] Exposing JIRA issue component types at GitHub PRs ## What changes were proposed in this pull request? This PR aims to expose JIRA issue component types at GitHub PRs. ## How was this patch tested? Manual. ``` $ export GITHUB_OAUTH_KEY=... $ export JIRA_PASSWORD=... $ export GITHUB_API_BASE='https://api.github.com/repos/your-id/spark' $ dev/github_jira_sync.py ``` Please note that the existing script will raise the following exceptions if your repo has less than 100 PRs. This will be handled at #24874 . ``` Traceback (most recent call last): File "dev/github_jira_sync.py", line 139, in <module> jira_prs = get_jira_prs() File "dev/github_jira_sync.py", line 83, in get_jira_prs link_header = filter(lambda k: k.startswith("Link"), page.info().headers)[0] IndexError: list index out of range ``` That is beyond the scope of this PR. Closes #24871 from dongjoon-hyun/SPARK-28051. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-06-14 20:36:45 -07:00
Dongjoon Hyun	7533cccc5d	[SPARK-28053][INFRA] Handle a corner case where there is no `Link` header ## What changes were proposed in this pull request? Currently, `github_jira_sync.py` assumes that there is `Link` always. However, it will fail when the number of the open PR is less than 100 (the default paging number). It will not happen in Apache Spark, but we had better fix that because it happens during review process for `github_jira_sync.py` script. ``` Traceback (most recent call last): File "dev/github_jira_sync.py", line 139, in <module> jira_prs = get_jira_prs() File "dev/github_jira_sync.py", line 83, in get_jira_prs link_header = filter(lambda k: k.startswith("Link"), page.info().headers)[0] IndexError: list index out of range ``` ## How was this patch tested? Manually check with another repo which has small number of open PRs (< 100). ``` $ export JIRA_PASSWORD=... $ export GITHUB_API_BASE='https://api.github.com/repos/your-id/spark' $ dev/github_jira_sync.py ``` Closes #24874 from dongjoon-hyun/SPARK-28053. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2019-06-14 16:33:34 +09:00
Dongjoon Hyun	e5d95117e4	[SPARK-27979][BUILD][test-maven] Remove deprecated `--force` option in `build/mvn` and `run-tests.py` ## What changes were proposed in this pull request? This is a second try of #24824. Since Apache Spark 2.0.0, SPARK-14867 deprecated `--force` option and made it ignored. This PR cleans up the related code completely at 3.0.0. BEFORE (Jenkins) ``` ======================================================================== Building Spark ======================================================================== [info] Building Spark using Maven with these arguments: -Phadoop-2.7 -Pkubernetes -Phive-thriftserver -Pkinesis-asl -Pyarn -Pspark-ganglia-lgpl -Phive -Pmesos clean package -DskipTests WARNING: '--force' is deprecated and ignored. ... ======================================================================== Running Spark unit tests ======================================================================== [info] Running Spark tests using Maven with these arguments: -Phadoop-2.7 -Phive-thriftserver -Phive -Dtest.exclude.tags=org.apache.spark.tags.ExtendedHiveTest,org.apache.spark.tags.ExtendedYarnTest test --fail-at-end WARNING: '--force' is deprecated and ignored. ``` AFTER (Jenkins) ``` ======================================================================== Building Spark ======================================================================== [info] Building Spark using Maven with these arguments: -Phadoop-2.7 -Pkubernetes -Phive-thriftserver -Pkinesis-asl -Pyarn -Pspark-ganglia-lgpl -Phive -Pmesos clean package -DskipTests ... ======================================================================== Running Spark unit tests ======================================================================== [info] Running Spark tests using Maven with these arguments: -Phadoop-2.7 -Pkubernetes -Phive-thriftserver -Pyarn -Pspark-ganglia-lgpl -Phive -Pkinesis-asl -Pmesos -Dtest.exclude.tags=org.apache.spark.tags.ExtendedHiveTest,org.apache.spark.tags.ExtendedYarnTest test --fail-at-end ``` ## How was this patch tested? Manually check the Jenkins logs. Closes #24833 from dongjoon-hyun/SPARK-FORCE-2. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-06-10 18:40:46 -07:00
Dongjoon Hyun	742f805177	Revert "[SPARK-27979][BUILD][test-maven] Remove deprecated `--force` option in `build/mvn` and `run-tests.py`" This reverts commit `354ec254c5`.	2019-06-09 08:33:21 -07:00
Martin Junghanns	709387d660	[SPARK-27300][GRAPH] Add Spark Graph modules and dependencies ## What changes were proposed in this pull request? This PR introduces the necessary Maven modules for the new [Spark Graph](https://issues.apache.org/jira/browse/SPARK-25994) feature for Spark 3.0. * `spark-graph` is a parent module that users depend on to get all graph functionalities (Cypher and Graph Algorithms) * `spark-graph-api` defines the [Property Graph API](https://docs.google.com/document/d/1Wxzghj0PvpOVu7XD1iA8uonRYhexwn18utdcTxtkxlI) that is being shared between Cypher and Algorithms * `spark-cypher` contains a Cypher query engine implementation Both, `spark-graph-api` and `spark-cypher` depend on Spark SQL. Note, that the Maven module for Graph Algorithms is not part of this PR and will be introduced in https://issues.apache.org/jira/browse/SPARK-27302 A PoC for a running Cypher implementation can be found in this WIP PR https://github.com/apache/spark/pull/24297 ## How was this patch tested? Pass the Jenkins with all profiles and manually build and check the followings. ``` $ ls assembly/target/scala-2.12/jars/spark-cypher* assembly/target/scala-2.12/jars/spark-cypher_2.12-3.0.0-SNAPSHOT.jar $ ls assembly/target/scala-2.12/jars/spark-graph* \| grep -v graphx assembly/target/scala-2.12/jars/spark-graph-api_2.12-3.0.0-SNAPSHOT.jar assembly/target/scala-2.12/jars/spark-graph_2.12-3.0.0-SNAPSHOT.jar ``` Closes #24490 from s1ck/SPARK-27300. Lead-authored-by: Martin Junghanns <martin.junghanns@neotechnology.com> Co-authored-by: Max Kießling <max@kopfueber.org> Co-authored-by: Martin Junghanns <martin.junghanns@neo4j.com> Co-authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-06-09 00:26:26 -07:00
Dongjoon Hyun	354ec254c5	[SPARK-27979][BUILD][test-maven] Remove deprecated `--force` option in `build/mvn` and `run-tests.py` ## What changes were proposed in this pull request? Since Apache Spark 2.0.0, SPARK-14867 deprecated `--force` option and made it ignored. This PR cleans up the related code completely at 3.0.0. BEFORE (Jenkins) ``` ======================================================================== Building Spark ======================================================================== [info] Building Spark using Maven with these arguments: -Phadoop-2.7 -Pkubernetes -Phive-thriftserver -Pkinesis-asl -Pyarn -Pspark-ganglia-lgpl -Phive -Pmesos clean package -DskipTests WARNING: '--force' is deprecated and ignored. ... ======================================================================== Running Spark unit tests ======================================================================== [info] Running Spark tests using Maven with these arguments: -Phadoop-2.7 -Phive-thriftserver -Phive -Dtest.exclude.tags=org.apache.spark.tags.ExtendedHiveTest,org.apache.spark.tags.ExtendedYarnTest test --fail-at-end WARNING: '--force' is deprecated and ignored. ``` AFTER (Jenkins) ``` ======================================================================== Building Spark ======================================================================== [info] Building Spark using Maven with these arguments: -Phadoop-2.7 -Pkubernetes -Phive-thriftserver -Pkinesis-asl -Pyarn -Pspark-ganglia-lgpl -Phive -Pmesos clean package -DskipTests ... ======================================================================== Running Spark unit tests ======================================================================== [info] Running Spark tests using Maven with these arguments: -Phadoop-2.7 -Pkubernetes -Phive-thriftserver -Pyarn -Pspark-ganglia-lgpl -Phive -Pkinesis-asl -Pmesos -Dtest.exclude.tags=org.apache.spark.tags.ExtendedHiveTest,org.apache.spark.tags.ExtendedYarnTest test --fail-at-end ``` ## How was this patch tested? Manually check the Jenkins logs. Closes #24824 from dongjoon-hyun/SPARK-27979. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-06-08 08:17:12 -07:00
Yuming Wang	3f102a8229	[SPARK-27749][SQL] hadoop-3.2 support hive-thriftserver ## What changes were proposed in this pull request? This PR mainly makes the following changes to make `hadoop-3.2` support `sql/hive-thriftserver`: 1. Upgrade [`TCLIService.thrift`](https://github.com/apache/hive/blob/rel/release-2.3.5/service-rpc/if/TCLIService.thrift) and related code to Hive 2.3.5 because of [HIVE-12442](https://issues.apache.org/jira/browse/HIVE-12442)(Note that we only migrate code without adding features, such as [HIVE-4924](https://issues.apache.org/jira/browse/HIVE-4924) and [HIVE-15473](https://issues.apache.org/jira/browse/HIVE-15473)). 2. Use slf4j as logging facade because of [HIVE-12237](https://issues.apache.org/jira/browse/HIVE-12237). 3. Port [HIVE-13169](https://issues.apache.org/jira/browse/HIVE-13169) to compatible with Hive 2.3. ## How was this patch tested? Exiting test Closes #24628 from wangyum/SPARK-27749. Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: gatorsmile <gatorsmile@gmail.com>	2019-06-05 08:40:05 -07:00
Izek Greenfield	c647f9011c	[SPARK-27862][BUILD] Move to json4s 3.6.6 ## What changes were proposed in this pull request? Move to json4s version 3.6.6 Add scala-xml 1.2.0 ## How was this patch tested? Pass the Jenkins Closes #24736 from igreenfield/master. Authored-by: Izek Greenfield <igreenfield@axiomsl.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2019-05-30 19:42:56 -05:00
Fokko Driesprong	bd87323003	[SPARK-27757][CORE] Bump Jackson to 2.9.9 ## What changes were proposed in this pull request? This fixes CVE-2019-12086 on Databind: https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.9.9 ## How was this patch tested? Existing tests Closes #24646 from Fokko/SPARK-27757. Authored-by: Fokko Driesprong <fokko@apache.org> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2019-05-30 09:35:20 -05:00
HyukjinKwon	90b6cda9af	[SPARK-25944][R][BUILD] AppVeyor change to latest R version (3.6.0) ## What changes were proposed in this pull request? R 3.6.0 is released 2019-04-26. This PR targets to change R version from 3.5.1 to 3.6.0 in AppVeyor. This PR sets `R_REMOTES_NO_ERRORS_FROM_WARNINGS` to `true` to avoid the warnings below: ``` Error in strptime(xx, f, tz = tz) : (converted from warning) unable to identify current timezone 'C': please set environment variable 'TZ' Error in i.p(...) : (converted from warning) installation of package 'praise' had non-zero exit status Calls: <Anonymous> ... with_rprofile_user -> with_envvar -> force -> force -> i.p Execution halted ``` ## How was this patch tested? AppVeyor Closes #24716 from HyukjinKwon/SPARK-27848. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2019-05-28 14:42:03 +09:00
Sean Owen	6c5827c723	[SPARK-27794][R][DOCS] Use https URL for CRAN repo ## What changes were proposed in this pull request? Use https URL for CRAN repo (and for a Scala download in a Dockerfile) ## How was this patch tested? Existing tests. Closes #24664 from srowen/SPARK-27794. Authored-by: Sean Owen <sean.owen@databricks.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-05-22 14:28:21 -07:00
Sean Owen	eed6de1a65	[MINOR][DOCS] Tighten up some key links to the project and download pages to use HTTPS ## What changes were proposed in this pull request? Tighten up some key links to the project and download pages to use HTTPS ## How was this patch tested? N/A Closes #24665 from srowen/HTTPSURLs. Authored-by: Sean Owen <sean.owen@databricks.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-05-21 10:56:42 -07:00
HyukjinKwon	b7bf4fd123	[SPARK-27402][INFRA][FOLLOW-UP] Exclude 'hive-thriftserver' in modules to test for hadoop3.2 for now ## What changes were proposed in this pull request? This PR excludes 'hive-thriftserver' in modules to test for hadoop3.2 for now as well ## How was this patch tested? Manually tested via `run-tests.py` Closes #24644 from HyukjinKwon/SPARK-27402. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-05-20 07:53:19 -07:00
Dongjoon Hyun	141a3bfc8d	[SPARK-27755][BUILD] Update zstd-jni to 1.4.0-1 ## What changes were proposed in this pull request? This PR aims to update `zstd-jni` library to `1.4.0-1` which improves the `level 1 compression speed` performance by 6% in most scenarios. The following is the full release note. - https://github.com/facebook/zstd/releases/tag/v1.4.0 ## How was this patch tested? Pass the Jenkins. Closes #24632 from dongjoon-hyun/SPARK-27755. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-05-17 08:34:45 -07:00
Kazuaki Ishizaki	9e0d8c6ce2	[SPARK-27752][CORE] Upgrade lz4-java from 1.5.1 to 1.6.0 ## What changes were proposed in this pull request? This PR upgrades lz4-java from 1.5.1 to 1.6.0. Lz4-java is available at https://github.com/lz4/lz4-java. Changes from 1.5.1: - Upgraded LZ4 to 1.9.1. Updated the JNI bindings, except for the one for Linux/i386. Decompression speed is improved on amd64. - Deprecated use of LZ4FastDecompressor of a native instance because the corresponding C API function is deprecated. See the release note of LZ4 1.9.0 for details. Updated javadoc accordingly. - Changed the module name from org.lz4.lz4-java to org.lz4.java to avoid using - in the module name. (severn-everett, Oliver Eikemeier, Rei Odaira) - Enabled build with Java 11. Note that the distribution is still built with Java 7. (Rei Odaira) ## How was this patch tested? Existing tests. Closes #24629 from kiszk/SPARK-27752. Authored-by: Kazuaki Ishizaki <ishizaki@jp.ibm.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-05-16 20:45:13 -07:00
Yuming Wang	f3ddd6f9da	[SPARK-27402][SQL][TEST-HADOOP3.2][TEST-MAVEN] Fix hadoop-3.2 test issue(except the hive-thriftserver module) ## What changes were proposed in this pull request? This pr fix hadoop-3.2 test issues(except the `hive-thriftserver` module): 1. Add `hive.metastore.schema.verification` and `datanucleus.schema.autoCreateAll` to HiveConf. 2. hadoop-3.2 support access the Hive metastore from 0.12 to 2.2 After [SPARK-27176](https://issues.apache.org/jira/browse/SPARK-27176) and this PR, we upgraded the built-in Hive to 2.3 when enabling the Hadoop 3.2+ profile. This upgrade fixes the following issues: - [HIVE-6727](https://issues.apache.org/jira/browse/HIVE-6727): Table level stats for external tables are set incorrectly. - [HIVE-15653](https://issues.apache.org/jira/browse/HIVE-15653): Some ALTER TABLE commands drop table stats. - [SPARK-12014](https://issues.apache.org/jira/browse/SPARK-12014): Spark SQL query containing semicolon is broken in Beeline. - [SPARK-25193](https://issues.apache.org/jira/browse/SPARK-25193): insert overwrite doesn't throw exception when drop old data fails. - [SPARK-25919](https://issues.apache.org/jira/browse/SPARK-25919): Date value corrupts when tables are "ParquetHiveSerDe" formatted and target table is Partitioned. - [SPARK-26332](https://issues.apache.org/jira/browse/SPARK-26332): Spark sql write orc table on viewFS throws exception. - [SPARK-26437](https://issues.apache.org/jira/browse/SPARK-26437): Decimal data becomes bigint to query, unable to query. ## How was this patch tested? This pr test Spark’s Hadoop 3.2 profile on jenkins and #24591 test Spark’s Hadoop 2.7 profile on jenkins This PR close #24591 Closes #24391 from wangyum/SPARK-27402. Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: gatorsmile <gatorsmile@gmail.com>	2019-05-13 10:35:26 -07:00
Dongjoon Hyun	375cfa3d89	[SPARK-27467][BUILD] Upgrade Maven to 3.6.1 ## What changes were proposed in this pull request? This PR aims to upgrade Maven to 3.6.1 to bring JDK9+ related patches like [MNG-6506](https://issues.apache.org/jira/browse/MNG-6506). For the full release note, please see the following. - https://maven.apache.org/docs/3.6.1/release-notes.html This was committed and reverted due to AppVeyor failure. It turns out that the root cause is `PATH` issue. With the updated AppVeyor script, it passed. https://ci.appveyor.com/project/ApacheSoftwareFoundation/spark/builds/24273412 ## How was this patch tested? Pass the Jenkins and AppVoyer Closes #24481 from dongjoon-hyun/SPARK-R. Lead-authored-by: Dongjoon Hyun <dhyun@apple.com> Co-authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-05-02 20:01:17 -07:00
Yuming Wang	875e7e1d97	[SPARK-27620][BUILD] Upgrade jetty to 9.4.18.v20190429 ## What changes were proposed in this pull request? This pr upgrade jetty to [9.4.18.v20190429](https://github.com/eclipse/jetty.project/releases/tag/jetty-9.4.18.v20190429) because of [CVE-2019-10247](https://nvd.nist.gov/vuln/detail/CVE-2019-10247). ## How was this patch tested? Existing test. Closes #24513 from wangyum/SPARK-27620. Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2019-05-03 09:25:54 +09:00
Yuming Wang	3ecafb0e14	[SPARK-27601][BUILD] Upgrade stream-lib to 2.9.6 ## What changes were proposed in this pull request? [stream-lib 2.9.6](https://github.com/addthis/stream-lib/commits/v2.9.6) include several improvements: ![image](https://user-images.githubusercontent.com/5399861/56938062-7eb77580-6b32-11e9-8c36-711ab943d657.png) ## How was this patch tested? N/A Closes #24492 from wangyum/SPARK-27601. Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2019-05-02 15:21:57 -05:00
Cheng Lian	b73744a147	[SPARK-27611][BUILD] Exclude jakarta.activation:jakarta.activation-api from org.glassfish.jaxb:jaxb-runtime:2.3.2 PR #23890 introduced `org.glassfish.jaxb:jaxb-runtime:2.3.2` as a runtime dependency. As an unexpected side effect, `jakarta.activation:jakarta.activation-api:1.2.1` was also pulled in as a transitive dependency. As a result, for the Maven build, both of the following two jars can be found under `assembly/target/scala-2.12/jars/`: ``` activation-1.1.1.jar jakarta.activation-api-1.2.1.jar ``` This PR exludes the Jakarta one. Manually built Spark using Maven and checked files under `assembly/target/scala-2.12/jars/`. After this change, only `activation-1.1.1.jar` is there. Closes #24507 from liancheng/spark-27611. Authored-by: Cheng Lian <lian@databricks.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-05-01 20:12:17 -07:00
HyukjinKwon	d8db7db50b	Revert "[SPARK-27467][FOLLOW-UP][BUILD] Upgrade Maven to 3.6.1 in AppVeyor and Doc" This reverts commit `bde30bc57c`.	2019-04-28 11:03:15 +09:00
Yuming Wang	bde30bc57c	[SPARK-27467][FOLLOW-UP][BUILD] Upgrade Maven to 3.6.1 in AppVeyor and Doc ## What changes were proposed in this pull request? Update the `docs/building-spark.md`. Otherwise: ``` mvn package -DskipTests=true ... [INFO] --- maven-enforcer-plugin:3.0.0-M2:enforce (enforce-versions) spark-parent_2.12 --- [WARNING] Rule 0: org.apache.maven.plugins.enforcer.RequireMavenVersion failed with message: Detected Maven Version: 3.6.0 is not in the allowed range 3.6.1. ... [ERROR] Failed to execute goal org.apache.maven.plugins:maven-enforcer-plugin:3.0.0-M2:enforce (enforce-versions) on project spark-parent_2.12: Some Enforcer rules have failed. Look above for specific messages explaining why the rule failed. -> [Help 1] [ERROR] ... ``` ## How was this patch tested? Just test `https://archive.apache.org/dist/maven/maven-3/3.6.1/binaries/apache-maven-3.6.1-bin.zip` is avilable. Closes #24477 from wangyum/SPARK-27467. Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-04-27 09:09:47 -07:00
Yuming Wang	fe99305101	[SPARK-27556][BUILD] Exclude com.zaxxer:HikariCP-java7 from hadoop-yarn-server-web-proxy ## What changes were proposed in this pull request? There are two HikariCP packages in classpath when building with `-Phive -Pyarn -Phadoop-3.2`. The HikariCP dependency tree: ``` [INFO] \| +- org.apache.hadoop:hadoop-yarn-server-web-proxy:jar:3.2.0:compile [INFO] \| \| \- org.apache.hadoop:hadoop-yarn-server-common:jar:3.2.0:compile [INFO] \| \| +- org.apache.hadoop:hadoop-yarn-registry:jar:3.2.0:compile [INFO] \| \| \| \- commons-daemon:commons-daemon:jar:1.0.13:compile [INFO] \| \| +- org.apache.geronimo.specs:geronimo-jcache_1.0_spec🫙1.0-alpha-1:compile [INFO] \| \| +- org.ehcache:ehcache:jar:3.3.1:compile [INFO] \| \| +- com.zaxxer:HikariCP-java7:jar:2.4.12:compile ``` ``` [INFO] +- org.apache.hive:hive-metastore:jar:2.3.4:compile [INFO] \| +- javolution:javolution:jar:5.5.1:compile [INFO] \| +- com.google.protobuf:protobuf-java:jar:2.5.0:compile [INFO] \| +- com.jolbox:bonecp:jar:0.8.0.RELEASE:compile [INFO] \| +- com.zaxxer:HikariCP:jar:2.5.1:compile ``` This pr exclude `com.zaxxer:HikariCP-java7` from `hadoop-yarn-server-web-proxy`. ## How was this patch tested? manual tests Closes #24450 from wangyum/SPARK-27556. Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2019-04-26 12:15:39 -05:00
Yuming Wang	f82ed5e8e0	[MINOR][TEST] Remove out-dated hive version in run-tests.py ## What changes were proposed in this pull request? ``` ======================================================================== Building Spark ======================================================================== [info] Building Spark (w/Hive 1.2.1) using SBT with these arguments: -Phadoop-3.2 -Pkubernetes -Phive-thriftserver -Pkinesis-asl -Pyarn -Pspark-ganglia-lgpl -Phive -Pmesos test:package streaming-kinesis-asl-assembly/assembly ``` `(w/Hive 1.2.1)` is incorrect when testing hadoop-3.2, It's should be (w/Hive 2.3.4). This pr removes `(w/Hive 1.2.1)` in run-tests.py. ## How was this patch tested? N/A Closes #24451 from wangyum/run-tests-invalid-info. Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-04-24 21:22:15 -07:00
Yuming Wang	777b4502b2	[SPARK-27176][FOLLOW-UP][SQL] Upgrade Hive parquet to 1.10.1 for hadoop-3.2 ## What changes were proposed in this pull request? When we compile and test Hadoop 3.2, we will hint the following two issues: 1. JobSummaryLevel is not a member of object org.apache.parquet.hadoop.ParquetOutputFormat. Fixed by [PARQUET-381](https://issues.apache.org/jira/browse/PARQUET-381)(Parquet 1.9.0) 2. java.lang.NoSuchFieldError: BROTLI at org.apache.parquet.hadoop.metadata.CompressionCodecName.<clinit>(CompressionCodecName.java:31). Fixed by [PARQUET-1143](https://issues.apache.org/jira/browse/PARQUET-1143)(Parquet 1.10.0) The reason is that the `parquet-hadoop-bundle-1.8.1.jar` conflicts with Parquet 1.10.1. I think it would be safe to upgrade Hive's parquet to 1.10.1 to workaround this issue. This is what Hive did when upgrading Parquet 1.8.1 to 1.10.0: [HIVE-17000](https://issues.apache.org/jira/browse/HIVE-17000) and [HIVE-19464](https://issues.apache.org/jira/browse/HIVE-19464). We can see that all changes are related to vectors, and vectors are disabled by default: see [HIVE-14826](https://issues.apache.org/jira/browse/HIVE-14826) and [HiveConf.java#L2723](https://github.com/apache/hive/blob/rel/release-2.3.4/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L2723). This pr removes [parquet-hadoop-bundle-1.8.1.jar](https://github.com/apache/parquet-mr/tree/master/parquet-hadoop-bundle) , so Hive serde will use [parquet-common-1.10.1.jar, parquet-column-1.10.1.jar and parquet-hadoop-1.10.1.jar](https://github.com/apache/spark/blob/master/dev/deps/spark-deps-hadoop-3.2#L185-L189). ## How was this patch tested? 1. manual tests 2. [upgrade Hive Parquet to 1.10.1 annd run Hadoop 3.2 test on jenkins](https://github.com/apache/spark/pull/24044#commits-pushed-0c3f962) Closes #24346 from wangyum/SPARK-27176. Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: gatorsmile <gatorsmile@gmail.com>	2019-04-19 08:59:08 -07:00
shane knapp	e1ece6a319	[SPARK-25079][PYTHON] update python3 executable to 3.6.x ## What changes were proposed in this pull request? have jenkins test against python3.6 (instead of 3.4). ## How was this patch tested? extensive testing on both the centos and ubuntu jenkins workers. NOTE: this will need to be backported to all active branches. Closes #24266 from shaneknapp/updating-python3-executable. Authored-by: shane knapp <incomplete@gmail.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2019-04-19 10:03:50 +09:00
Dongjoon Hyun	f93460dae9	[SPARK-27493][BUILD] Upgrade ASM to 7.1 ## What changes were proposed in this pull request? [SPARK-25946](https://issues.apache.org/jira/browse/SPARK-25946) upgraded ASM to 7.0 to support JDK11. This PR aims to update ASM to 7.1 to bring the bug fixes. - https://asm.ow2.io/versions.html - https://issues.apache.org/jira/browse/XBEAN-316 ## How was this patch tested? Pass the Jenkins. Closes #24395 from dongjoon-hyun/SPARK-27493. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2019-04-18 13:36:52 +09:00
Dongjoon Hyun	a8f20c95ab	[SPARK-27452][BUILD] Update zstd-jni to 1.3.8-9 ## What changes were proposed in this pull request? This PR aims to update `zstd-jni` from 1.3.2-2 to 1.3.8-9 to be aligned with the latest Zstd 1.3.8 in Apache Spark 3.0.0. Currently, Apache Spark is aligned with the old Zstd used in the first PR and there are many bugfix and improvement updates in `zstd-jni` until now. - https://github.com/facebook/zstd/releases/tag/v1.3.8 - https://github.com/facebook/zstd/releases/tag/v1.3.7 - https://github.com/facebook/zstd/releases/tag/v1.3.6 - https://github.com/facebook/zstd/releases/tag/v1.3.4 - https://github.com/facebook/zstd/releases/tag/v1.3.3 ## How was this patch tested? Pass the Jenkins with the existing tests. Closes #24364 from dongjoon-hyun/SPARK-ZSTD. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-04-16 08:54:16 -07:00
Sean Owen	8718367e2e	[SPARK-27470][PYSPARK] Update pyrolite to 4.23 ## What changes were proposed in this pull request? Update pyrolite to 4.23 to pick up bug and security fixes. ## How was this patch tested? Existing tests. Closes #24381 from srowen/SPARK-27470. Authored-by: Sean Owen <sean.owen@databricks.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2019-04-16 19:41:40 +09:00
Sean Owen	a4cf1a4f4e	[SPARK-27469][CORE] Update Commons BeanUtils to 1.9.3 ## What changes were proposed in this pull request? Unify commons-beanutils deps to latest 1.9.3. This resolves the version inconsistency in Hadoop 2.7's build and also picks up security and bug fixes. ## How was this patch tested? Existing tests. Closes #24378 from srowen/SPARK-27469. Authored-by: Sean Owen <sean.owen@databricks.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-04-15 19:18:37 -07:00
Dongjoon Hyun	0881f648cf	[SPARK-27451][BUILD] Upgrade lz4-java to 1.5.1 ## What changes were proposed in this pull request? This PR upgrades `lz4-java` to 1.5.1 in order to get a patch for avoiding racing with GC. - https://github.com/lz4/lz4-java/blob/master/CHANGES.md#151 ## How was this patch tested? Pass the Jenkins with the existing tests. Closes #24363 from dongjoon-hyun/SPARK-LZ4. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-04-12 19:21:43 -07:00
Yuming Wang	33f3c48cac	[SPARK-27176][SQL] Upgrade hadoop-3's built-in Hive maven dependencies to 2.3.4 ## What changes were proposed in this pull request? This PR mainly contains: 1. Upgrade hadoop-3's built-in Hive maven dependencies to 2.3.4. 2. Resolve compatibility issues between Hive 1.2.1 and Hive 2.3.4 in the `sql/hive` module. ## How was this patch tested? jenkins test hadoop-2.7 manual test hadoop-3: ```shell build/sbt clean package -Phadoop-3.2 -Phive export SPARK_PREPEND_CLASSES=true # rm -rf metastore_db cat <<EOF > test_hadoop3.scala spark.range(10).write.saveAsTable("test_hadoop3") spark.table("test_hadoop3").show EOF bin/spark-shell --conf spark.hadoop.hive.metastore.schema.verification=false --conf spark.hadoop.datanucleus.schema.autoCreateAll=true -i test_hadoop3.scala ``` Closes #23788 from wangyum/SPARK-23710-hadoop3. Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: gatorsmile <gatorsmile@gmail.com>	2019-04-08 08:42:21 -07:00
Sean Owen	23bde44797	[SPARK-27358][UI] Update jquery to 1.12.x to pick up security fixes ## What changes were proposed in this pull request? Update jquery -> 1.12.4, datatables -> 1.10.18, mustache -> 2.3.12. Add missing mustache license ## How was this patch tested? I manually tested the UI locally with the javascript console open and didn't observe any problems or JS errors. The only 'risky' change seems to be mustache, but on reading its release notes, don't think the changes from 0.8.1 to 2.x would affect Spark's simple usage. Closes #24288 from srowen/SPARK-27358. Authored-by: Sean Owen <sean.owen@databricks.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2019-04-05 12:54:01 -05:00
LantaoJin	69dd44af19	[SPARK-27216][CORE] Upgrade RoaringBitmap to 0.7.45 to fix Kryo unsafe ser/dser issue ## What changes were proposed in this pull request? HighlyCompressedMapStatus uses RoaringBitmap to record the empty blocks. But RoaringBitmap couldn't be ser/deser with unsafe KryoSerializer. It's a bug of RoaringBitmap-0.5.11 and fixed in latest version. This is an update of #24157 ## How was this patch tested? Add a UT Closes #24264 from LantaoJin/SPARK-27216. Lead-authored-by: LantaoJin <jinlantao@gmail.com> Co-authored-by: Lantao Jin <jinlantao@gmail.com> Signed-off-by: Imran Rashid <irashid@cloudera.com>	2019-04-03 20:09:50 -05:00
Yuming Wang	13c5c1fb4b	[SPARK-27180][BUILD][YARN] Fix testing issues with yarn module in Hadoop-3 ## What changes were proposed in this pull request? Fix testing issues with `yarn` module in Hadoop-3: 1. Upgrade jersey-1 to `1.19` to fix ```Cause: java.lang.NoClassDefFoundError: com/sun/jersey/spi/container/servlet/ServletContainer```. 2. Copy `ServerSocketUtil` from hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/net/ServerSocketUtil.java to fix ```java.lang.NoClassDefFoundError: org/apache/hadoop/net/ServerSocketUtil```. 3. Adapte `SessionHandler` from jetty-9.3.25.v20180904/jetty-server/src/main/java/org/eclipse/jetty/server/session/SessionHandler.java to fix ```java.lang.NoSuchMethodError: org.eclipse.jetty.server.session.SessionHandler.getSessionManager()Lorg/eclipse/jetty/server/SessionManager```. ## How was this patch tested? manual tests: ```shell build/sbt yarn/test -Pyarn build/sbt yarn/test -Phadoop-3.2 -Pyarn build/mvn -Dtest=none -DwildcardSuites=org.apache.spark.deploy.yarn.YarnClusterSuite -pl resource-managers/yarn test -Pyarn build/mvn -Dtest=none -DwildcardSuites=org.apache.spark.deploy.yarn.YarnClusterSuite -pl resource-managers/yarn test -Pyarn -Phadoop-3.2 ``` Closes #24115 from wangyum/hadoop3-yarn. Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2019-04-02 15:38:26 -05:00
Yuming Wang	f799e34962	[MINOR][BUILD] Upgrade apache-rat to 0.13 ## What changes were proposed in this pull request? This PR upgrade `apache-rat` to 0.13. Issues fixed by 0.13: https://issues.apache.org/jira/issues/?jql=project%20%3D%20RAT%20AND%20fixVersion%20%3D%200.13 ## How was this patch tested? manual tests Closes #24262 from wangyum/apache-rat. Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>	2019-04-01 16:44:42 +09:00
Sean Owen	754f820035	[SPARK-26918][DOCS] All .md should have ASF license header ## What changes were proposed in this pull request? Add AL2 license to metadata of all .md files. This seemed to be the tidiest way as it will get ignored by .md renderers and other tools. Attempts to write them as markdown comments revealed that there is no such standard thing. ## How was this patch tested? Doc build Closes #24243 from srowen/SPARK-26918. Authored-by: Sean Owen <sean.owen@databricks.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2019-03-30 19:49:45 -05:00
Sean Owen	2ec650d843	[SPARK-27267][CORE] Update snappy to avoid error when decompressing empty serialized data ## What changes were proposed in this pull request? (See JIRA for problem statement) Update snappy 1.1.7.1 -> 1.1.7.3 to pick up an empty-stream and Java 9 fix. There appear to be no other changes of consequence: https://github.com/xerial/snappy-java/blob/master/Milestone.md ## How was this patch tested? Existing tests Closes #24242 from srowen/SPARK-27267. Authored-by: Sean Owen <sean.owen@databricks.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2019-03-30 02:41:24 -05:00
Hyukjin Kwon	0e16a6f5b0	[SPARK-27277][INFRA] Recover from setting fix version failure in merge script ## What changes were proposed in this pull request? I happened to meet this case few times before: ``` Enter comma-separated fix version(s) [3.0.0]: 3.0,0 Restoring head pointer to master git checkout master Already on 'master' git branch Traceback (most recent call last): File "./dev/merge_spark_pr_jira.py", line 537, in <module> main() File "./dev/merge_spark_pr_jira.py", line 523, in main resolve_jira_issues(title, merged_refs, jira_comment) File "./dev/merge_spark_pr_jira.py", line 359, in resolve_jira_issues resolve_jira_issue(merge_branches, comment, jira_id) File "./dev/merge_spark_pr_jira.py", line 302, in resolve_jira_issue jira_fix_versions = map(lambda v: get_version_json(v), fix_versions) File "./dev/merge_spark_pr_jira.py", line 302, in <lambda> jira_fix_versions = map(lambda v: get_version_json(v), fix_versions) File "./dev/merge_spark_pr_jira.py", line 300, in get_version_json return filter(lambda v: v.name == version_str, versions)[0].raw IndexError: list index out of range ``` I typed the fix version wrongly (there's comma in `3.0,0`) and it ended the loop in the merge script. Not a big deal but it bugged me few times. Finally I met this today again, and decided to fix. This PR proposes to recover from wrongly set fix versions. ## How was this patch tested? I manually copied and pasted the specific codes and tested separately in both Python 2 and Python 3. Positive cases: ``` Enter comma-separated fix version(s) [3.0.0]: # blank test (to use default) ['3.0.0'] ``` ``` Enter comma-separated fix version(s) [3.0.0,2.4.2]: # multiple default versions ['3.0.0', '2.4.2'] ``` ``` Enter comma-separated fix version(s) [3.0.0]: 2.4.1 # valid version ['2.4.1'] ``` ``` Enter comma-separated fix version(s) [3.0.0]: 3.0.0,2.4.2 # multiple valid versions ['3.0.0', '2.4.2'] ``` Keyboard interrupt(Ctrl + c): ``` Enter comma-separated fix version(s) [3.0.0]: ^CTraceback (most recent call last): # keyboard interrupt File "test_merge_script.py", line 45, in <module> test() File "test_merge_script.py", line 26, in test fix_versions = input("Enter comma-separated fix version(s) [%s]: " % default_fix_versions) KeyboardInterrupt ``` Wrongly typed versions (recovered): ``` Enter comma-separated fix version(s) [3.0.0]: 3.1 Specified version(s) [3.1] not found in the available versions, try again (or leave blank and fix manually). Enter comma-separated fix version(s) [3.0.0]: 123 Specified version(s) [123] not found in the available versions, try again (or leave blank and fix manually). Enter comma-separated fix version(s) [3.0.0]: 3.0,0 Specified version(s) [3.0, 0] not found in the available versions, try again (or leave blank and fix manually). Enter comma-separated fix version(s) [3.0.0]: damn Specified version(s) [damn] not found in the available versions, try again (or leave blank and fix manually). Enter comma-separated fix version(s) [3.0.0]: 3.0.0,2.5.2 # one invalid versions in multiple versions Specified version(s) [3.0.0, 2.5.2] not found in the available versions, try again (or leave blank and fix manually). ``` Arbitrary exceptions in fix version parsing (recovered) ``` Enter comma-separated fix version(s) [3.0.0]: Traceback (most recent call last): File "tmp.py", line 11, in <module> raise Exception("arbitrary exception") Exception: arbitrary exception Error setting fix version(s), try again (or leave blank and fix manually) Enter comma-separated fix version(s) [3.0.0]: Traceback (most recent call last): File "tmp.py", line 10, in <module> raise Exception("arbitrary exception") Exception: arbitrary exception Error setting fix version(s), try again (or leave blank and fix manually) Enter comma-separated fix version(s) [3.0.0]: ``` Closes #24213 from HyukjinKwon/merge_script_fix_version. Authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>	2019-03-26 21:14:07 +09:00
Sean Owen	8bc304f97e	[SPARK-26132][BUILD][CORE] Remove support for Scala 2.11 in Spark 3.0.0 ## What changes were proposed in this pull request? Remove Scala 2.11 support in build files and docs, and in various parts of code that accommodated 2.11. See some targeted comments below. ## How was this patch tested? Existing tests. Closes #23098 from srowen/SPARK-26132. Authored-by: Sean Owen <sean.owen@databricks.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2019-03-25 10:46:42 -05:00
Yuming Wang	9c0af746e5	[SPARK-27175][BUILD] Upgrade hadoop-3 to 3.2.0 ## What changes were proposed in this pull request? This PR upgrade `hadoop-3` to `3.2.0` to workaround [HADOOP-16086](https://issues.apache.org/jira/browse/HADOOP-16086). Otherwise some test case will throw IllegalArgumentException: ```java 02:44:34.707 ERROR org.apache.hadoop.hive.ql.exec.Task: Job Submission failed with exception 'java.io.IOException(Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses.)' java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses. at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:116) at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:109) at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:102) at org.apache.hadoop.mapred.JobClient.init(JobClient.java:475) at org.apache.hadoop.mapred.JobClient.<init>(JobClient.java:454) at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:369) at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:151) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2183) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1839) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1526) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1227) at org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$runHive$1(HiveClientImpl.scala:730) at org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:283) at org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:221) at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:220) at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:266) at org.apache.spark.sql.hive.client.HiveClientImpl.runHive(HiveClientImpl.scala:719) at org.apache.spark.sql.hive.client.HiveClientImpl.runSqlHive(HiveClientImpl.scala:709) at org.apache.spark.sql.hive.StatisticsSuite.createNonPartitionedTable(StatisticsSuite.scala:719) at org.apache.spark.sql.hive.StatisticsSuite.$anonfun$testAlterTableProperties$2(StatisticsSuite.scala:822) ``` ## How was this patch tested? manual tests Closes #24106 from wangyum/SPARK-27175. Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2019-03-16 19:42:05 -05:00
Dongjoon Hyun	f26a1f3d37	[SPARK-27165][SPARK-27107][BUILD][SQL] Upgrade Apache ORC to 1.5.5 ## What changes were proposed in this pull request? This PR aims to update Apache ORC dependency to fix [SPARK-27107](https://issues.apache.org/jira/browse/SPARK-27107) . ``` [ORC-452] Support converting MAP column from JSON to ORC Improvement [ORC-447] Change the docker scripts to keep a persistent m2 cache [ORC-463] Add `version` command [ORC-475] ORC reader should lazily get filesystem [ORC-476] Make SearchAgument kryo buffer size configurable ``` ## How was this patch tested? Pass the Jenkins with the existing tests. Closes #24096 from dongjoon-hyun/SPARK-27165. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-03-14 20:14:31 -07:00
Yuming Wang	f0b6245ea4	[SPARK-27158][BUILD] dev/mima and dev/scalastyle support dynamic profiles ## What changes were proposed in this pull request? `dev/mima` and `dev/scalastyle` support dynamic reading profiles from `modules.py`. ## How was this patch tested? manual tests Closes #24089 from wangyum/SPARK-27158. Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>	2019-03-15 08:20:42 +09:00
Jiaxin Shan	2d0b7cfe44	[SPARK-26742][K8S] Update Kubernetes-Client version to 4.1.2 ## What changes were proposed in this pull request? https://github.com/apache/spark/pull/23814 was reverted because of Jenkins integration tests failure. After minikube upgrade, Kubernetes client SDK v1.4.2 work with kubernetes v1.13. We can bring this change back. Reference: [Bump Kubernetes Client Version to 4.1.2](https://issues.apache.org/jira/browse/SPARK-26742) [Original PR against master](https://github.com/apache/spark/pull/23814) [Kubernetes client upgrade for Spark 2.4](https://github.com/apache/spark/pull/23993) ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Unit Tests: ``` All tests passed. [INFO] ------------------------------------------------------------------------ [INFO] Reactor Summary for Spark Project Parent POM 3.0.0-SNAPSHOT: [INFO] [INFO] Spark Project Parent POM ........................... SUCCESS [ 2.343 s] [INFO] Spark Project Tags ................................. SUCCESS [ 2.039 s] [INFO] Spark Project Sketch ............................... SUCCESS [ 12.714 s] [INFO] Spark Project Local DB ............................. SUCCESS [ 2.185 s] [INFO] Spark Project Networking ........................... SUCCESS [ 38.154 s] [INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [ 7.989 s] [INFO] Spark Project Unsafe ............................... SUCCESS [ 2.297 s] [INFO] Spark Project Launcher ............................. SUCCESS [ 2.813 s] [INFO] Spark Project Core ................................. SUCCESS [38:03 min] [INFO] Spark Project ML Local Library ..................... SUCCESS [ 3.848 s] [INFO] Spark Project GraphX ............................... SUCCESS [ 56.084 s] [INFO] Spark Project Streaming ............................ SUCCESS [04:58 min] [INFO] Spark Project Catalyst ............................. SUCCESS [06:39 min] [INFO] Spark Project SQL .................................. SUCCESS [37:12 min] [INFO] Spark Project ML Library ........................... SUCCESS [18:59 min] [INFO] Spark Project Tools ................................ SUCCESS [ 0.767 s] [INFO] Spark Project Hive ................................. SUCCESS [33:45 min] [INFO] Spark Project REPL ................................. SUCCESS [01:14 min] [INFO] Spark Project Assembly ............................. SUCCESS [ 1.444 s] [INFO] Spark Integration for Kafka 0.10 ................... SUCCESS [01:12 min] [INFO] Kafka 0.10+ Token Provider for Streaming ........... SUCCESS [ 6.719 s] [INFO] Kafka 0.10+ Source for Structured Streaming ........ SUCCESS [07:00 min] [INFO] Spark Project Examples ............................. SUCCESS [ 21.805 s] [INFO] Spark Integration for Kafka 0.10 Assembly .......... SUCCESS [ 0.906 s] [INFO] Spark Avro ......................................... SUCCESS [ 50.486 s] [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------------------------ [INFO] Total time: 02:32 h [INFO] Finished at: 2019-03-07T08:39:34Z [INFO] ------------------------------------------------------------------------ ``` Please review http://spark.apache.org/contributing.html before opening a pull request. Closes #24002 from Jeffwan/update_k8s_sdk_master. Authored-by: Jiaxin Shan <seedjeffwan@gmail.com> Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>	2019-03-13 15:04:27 -07:00
DB Tsai	2b9ad2516e	[MINOR][BUILD] Add Scala 2.12 profile back for branch-2.4 build Closes #24074 from dbtsai/scala-2.12. Authored-by: DB Tsai <d_tsai@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-03-12 20:08:52 -07:00
Yuming Wang	dccf6615c3	[SPARK-27130][BUILD] Automatically select profile when executing sbt-checkstyle ## What changes were proposed in this pull request? This PR makes it automatically select profile when executing `sbt-checkstyle`. The reason for this is that `hadoop-2.7` and `hadoop-3.1` may have different `hive-thriftserver` module in the future. ## How was this patch tested? manual tests: ``` Update AbstractService.java file. export HADOOP_PROFILE=hadoop2.7 ./dev/run-tests ``` The result: ![image](https://user-images.githubusercontent.com/5399861/54197992-5337e780-4500-11e9-930c-722982cdcd45.png) Closes #24065 from wangyum/SPARK-27130. Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>	2019-03-13 08:03:46 +09:00
Yuming Wang	8ab13065f6	[SPARK-23807][FOLLOW-UP][BUILD][TEST-HADOOP3.1] Add test-hadoop3.1 phrase ## What changes were proposed in this pull request? Add `test-hadoop3.1` phrase to test Spark against Spark’s Hadoop 3.1 profile. ## How was this patch tested? Tested on jenkins. This is output: ``` [info] Using build tool sbt with Hadoop profile hadoop3.1 under environment amplab_jenkins ... [info] Building Spark (w/Hive 1.2.1) using SBT with these arguments: -Phadoop-3.1 -Pkubernetes -Phive-thriftserver -Pkinesis-asl -Pyarn -Pspark-ganglia-lgpl -Phive -Pmesos test:package streaming-kinesis-asl-assembly/assembly ``` https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/103282/console Closes #24045 from wangyum/SPARK-23807. Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>	2019-03-11 11:15:58 +09:00
Gabor Somogyi	3729efb4d0	[SPARK-26856][PYSPARK] Python support for from_avro and to_avro APIs ## What changes were proposed in this pull request? Avro is built-in but external data source module since Spark 2.4 but `from_avro` and `to_avro` APIs not yet supported in pyspark. In this PR I've made them available from pyspark. ## How was this patch tested? Please see the python API examples what I've added. cd docs/ SKIP_SCALADOC=1 SKIP_RDOC=1 SKIP_SQLDOC=1 jekyll build Manual webpage check. Closes #23797 from gaborgsomogyi/SPARK-26856. Authored-by: Gabor Somogyi <gabor.g.somogyi@gmail.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>	2019-03-11 10:15:07 +09:00
Yuming Wang	eed3091a60	[SPARK-27120][BUILD][TEST] Upgrade scalatest version to 3.0.5 ## What changes were proposed in this pull request? ScalaTest 3.0.5 Release Notes Bug Fixes - Fixed the implicit view not available problem when used with compile macro. - Fixed a stack depth problem in RefSpecLike and fixture.SpecLike under Scala 2.13. - Changed Framework and ScalaTestFramework to set spanScaleFactor for Runner object instances for different Runners using different class loaders. This fixed a problem whereby an incorrect Runner.spanScaleFactor could be used when the tests for multiple sbt project's were run concurrently. - Fixed a bug in endsWith regex matcher. Improvements - Removed duplicated parsing code for -C in ArgsParser. - Improved performance in WebBrowser. - Documentation typo rectification. - Improve validity of Junit XML reports. - Improved performance by replacing all .size == 0 and .length == 0 to .isEmpty. Enhancements - Added 'C' option to -P, which will tell -P to use cached thread pool. - External Dependencies Update - Bumped up scala-js version to 0.6.22. - Changed to depend on mockito-core, not mockito-all. - Bumped up jmock version to 2.8.3. - Bumped up junit version to 4.12. - Removed dependency to scala-parser-combinators. More details: http://www.scalatest.org/release_notes/3.0.5 ## How was this patch tested? manual tests on local machine: ``` nohup build/sbt clean -Djline.terminal=jline.UnsupportedTerminal -Phadoop-2.7 -Pkubernetes -Phive-thriftserver -Pyarn -Pspark-ganglia-lgpl -Phive -Pkinesis-asl -Pmesos test > run.scalatest.log & ``` Closes #24042 from wangyum/SPARK-27120. Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-03-10 15:22:52 -07:00
Yuming Wang	f732647ae4	[SPARK-27054][BUILD][SQL] Remove the Calcite dependency ## What changes were proposed in this pull request? Calcite is only used for [runSqlHive](`02bbe977ab/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala (L699-L705)`) when `hive.cbo.enable=true`([SemanticAnalyzer](https://github.com/apache/hive/blob/release-1.2.1/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzerFactory.java#L278-L280)). So we can disable `hive.cbo.enable` and remove Calcite dependency. ## How was this patch tested? Exist tests Closes #23970 from wangyum/SPARK-27054. Lead-authored-by: Yuming Wang <yumwang@ebay.com> Co-authored-by: Yuming Wang <wgyumg@gmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-03-09 16:34:24 -08:00
DB Tsai	b6375097bc	[SPARK-27026][BUILD] Upgrade Docker image for release build to Ubuntu 18.04 LTS ## What changes were proposed in this pull request? Upgrade Docker image for release build to Ubuntu 18.04LTS ## How was this patch tested? Manually tested. Closes #23932 from dbtsai/ubuntu18.04. Authored-by: DB Tsai <d_tsai@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-03-06 13:58:21 -08:00
Yanbo Liang	7857c6d633	[SPARK-27051][CORE] Bump Jackson version to 2.9.8 ## What changes were proposed in this pull request? Fasterxml Jackson version before 2.9.8 is affected by multiple [CVEs](https://github.com/FasterXML/jackson-databind/issues/2186), we need to fix bump the dependent Jackson to 2.9.8. ## How was this patch tested? Existing tests and offline benchmark. I have run ```SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain org.apache.spark.sql.execution.datasources.json.JSONBenchmark"``` to check there is no performance degradation for this upgrade. Closes #23965 from yanboliang/SPARK-27051. Authored-by: Yanbo Liang <ybliang8@gmail.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>	2019-03-05 11:46:51 +09:00
LantaoJin	e5c502c596	[SPARK-25865][CORE] Add GC information to ExecutorMetrics ## What changes were proposed in this pull request? Only memory usage without GC information could not help us to determinate the proper settings of memory. We need the GC metrics about frequency of major & minor GC. For example, two cases, their configured memory for executor are all 10GB and their usages are all near 10GB. So should we increase or decrease the configured memory for them? This metrics may be helpful. We can increase configured memory for the first one if it has very frequency major GC and decrease the second one if only some minor GC and none major GC. GC metrics are only useful in entire lifetime of executors instead of separated stages. ## How was this patch tested? Adding UT. Closes #22874 from LantaoJin/SPARK-25865. Authored-by: LantaoJin <jinlantao@gmail.com> Signed-off-by: Imran Rashid <irashid@cloudera.com>	2019-03-04 14:26:02 -06:00
Sean Owen	d8754df2bf	[SPARK-27029][BUILD] Update Thrift to 0.12.0 ## What changes were proposed in this pull request? Update Thrift to 0.12.0 to pick up bug and security fixes. Changes: https://github.com/apache/thrift/blob/master/CHANGES.md The important one is for https://issues.apache.org/jira/browse/THRIFT-4506 ## How was this patch tested? Existing tests. A quick local test suggests this works. Closes #23935 from srowen/SPARK-27029. Authored-by: Sean Owen <sean.owen@databricks.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-03-02 17:28:37 -08:00
Marcelo Vanzin	d00eca75b3	[SPARK-26048][BUILD] Enable flume profile when creating 2.x releases. Closes #23931 from vanzin/SPARK-26048. Authored-by: Marcelo Vanzin <vanzin@cloudera.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-03-02 08:14:06 -08:00
Sean Owen	131b464d0c	[SPARK-26986][ML][FOLLOWUP] Add JAXB reference impl to build for Java 9+ ## What changes were proposed in this pull request? Remove a few new JAXB dependencies that shouldn't be necessary now. See https://github.com/apache/spark/pull/23890#issuecomment-468299922 ## How was this patch tested? Existing tests Closes #23923 from srowen/SPARK-26986.2. Authored-by: Sean Owen <sean.owen@databricks.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2019-03-01 11:23:40 -06:00
Sean Owen	9c283662c6	[SPARK-26986][ML] Add JAXB reference impl to build for Java 9+ ## What changes were proposed in this pull request? Add reference JAXB impl for Java 9+ from Glassfish. Right now it's only apparently necessary in MLlib but can be expanded later. ## How was this patch tested? Existing tests particularly PMML-related ones, which use JAXB. This works on Java 11. Closes #23890 from srowen/SPARK-26986. Authored-by: Sean Owen <sean.owen@databricks.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2019-02-26 18:26:49 -06:00
Marcelo Vanzin	afbff6446f	Revert "[SPARK-26742][K8S] Update Kubernetes-Client version to 4.1.2" This reverts commit `a3192d966a`.	2019-02-26 13:42:07 -08:00
Jungtaek Lim (HeartSaVioR)	c5de804093	[MINOR][BUILD] Update all checkstyle dtd to use "https://checkstyle.org " ## What changes were proposed in this pull request? Below build failed with Java checkstyle test, but instead of violation it shows FileNotFound on dtd file. https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/102751/ Looks like the link of dtd file is dead `http://www.puppycrawl.com/dtds/configuration_1_3.dtd`. This patch updates the dtd link to "https://checkstyle.org/dtds/" given checkstyle repository also updated the URL path. https://github.com/checkstyle/checkstyle/issues/5601 ## How was this patch tested? Checked the new links. Closes #23887 from HeartSaVioR/java-checkstyle-dtd-change-url. Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan@gmail.com> Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>	2019-02-25 11:25:53 -08:00
Jiaxin Shan	a3192d966a	[SPARK-26742][K8S] Update Kubernetes-Client version to 4.1.2 ## What changes were proposed in this pull request? Changed the `kubernetes-client` version to 4.1.2. Latest version fix error with exec credentials (used by aws eks) and this will be used to talk with kubernetes API server. Users can submit spark job to EKS api endpoint now with this patch. ## How was this patch tested? unit tests and manual tests. Closes #23814 from Jeffwan/update_k8s_sdk. Authored-by: Jiaxin Shan <seedjeffwan@gmail.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2019-02-25 04:56:04 -06:00
Holden Karau	6b3c832dac	[SPARK-26882] Check the Kubernetes integration tests scalatyle ## What changes were proposed in this pull request? Add the kubernetes integration tests to the scalastyle profiles. ## How was this patch tested? Run ./dev/scalastyle with a bad change manually ## Follow on work See SPARK-26898 to add scalastyle for k8s integration to the CI Closes #23792 from holdenk/SPARK-26882-check-k8s-integration-tests-when-linting. Authored-by: Holden Karau <holden@pigscanfly.ca> Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>	2019-02-19 13:49:47 -08:00
cchung100m	dc46fb77ba	[SPARK-26822] Upgrade the deprecated module 'optparse' Follow the [official document](https://docs.python.org/2/library/argparse.html#upgrading-optparse-code) to upgrade the deprecated module 'optparse' to 'argparse'. ## What changes were proposed in this pull request? This PR proposes to replace 'optparse' module with 'argparse' module. ## How was this patch tested? Follow the [previous testing](`7e3eb3cd20`), manually tested and negative tests were also done. My [test results](https://gist.github.com/cchung100m/1661e7df6e8b66940a6e52a20861f61d) Closes #23730 from cchung100m/solve_deprecated_module_optparse. Authored-by: cchung100m <cchung100m@cs.ccu.edu.tw> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2019-02-10 00:36:22 -06:00
Ryan Blue	f72d217788	[SPARK-26677][BUILD] Update Parquet to 1.10.1 with notEq pushdown fix. ## What changes were proposed in this pull request? Update to Parquet Java 1.10.1. ## How was this patch tested? Added a test from HyukjinKwon that validates the notEq case from SPARK-26677. Closes #23704 from rdblue/SPARK-26677-fix-noteq-parquet-bug. Lead-authored-by: Ryan Blue <blue@apache.org> Co-authored-by: Hyukjin Kwon <gurwls223@apache.org> Co-authored-by: Ryan Blue <rdblue@users.noreply.github.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2019-02-02 09:17:52 -08:00
Hyukjin Kwon	cdd694c52b	[SPARK-7721][INFRA] Run and generate test coverage report from Python via Jenkins ## What changes were proposed in this pull request? ### Background For the current status, the test script that generates coverage information was merged into Spark, https://github.com/apache/spark/pull/20204 So, we can generate the coverage report and site by, for example: ``` run-tests-with-coverage --python-executables=python3 --modules=pyspark-sql ``` like `run-tests` script in `./python`. ### Proposed change The next step is to host this coverage report via `github.io` automatically by Jenkins (see https://spark-test.github.io/pyspark-coverage-site/). This uses my testing account for Spark, spark-test, which is shared to Felix and Shivaram a long time ago for testing purpose including AppVeyor. To cut this short, this PR targets to run the coverage in [spark-master-test-sbt-hadoop-2.7](https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.7/) In the specific job, it will clone the page, and rebase the up-to-date PySpark test coverage from the latest commit. For instance as below: ```bash # Clone PySpark coverage site. git clone https://github.com/spark-test/pyspark-coverage-site.git # Remove existing HTMLs. rm -fr pyspark-coverage-site/* # Copy generated coverage HTMLs. cp -r .../python/test_coverage/htmlcov/* pyspark-coverage-site/ # Check out to a temporary branch. git symbolic-ref HEAD refs/heads/latest_branch # Add all the files. git add -A # Commit current HTMLs. git commit -am "Coverage report at latest commit in Apache Spark" # Delete the old branch. git branch -D gh-pages # Rename the temporary branch to master. git branch -m gh-pages # Finally, force update to our repository. git push -f origin gh-pages ``` So, it is a one single up-to-date coverage can be shown in the `github-io` page. The commands above were manually tested. ### TODOs - [x] Write a draft HyukjinKwon - [x] `pip install coverage` to all python implementations (pypy, python2, python3) in Jenkins workers - shaneknapp - [x] Set hidden `SPARK_TEST_KEY` for spark-test's password in Jenkins via Jenkins's feature This should be set in both PR builder and `spark-master-test-sbt-hadoop-2.7` so that later other PRs can test and fix the bugs - shaneknapp - [x] Set an environment variable that indicates `spark-master-test-sbt-hadoop-2.7` so that that specific build can report and update the coverage site - shaneknapp - [x] Make PR builder's test passed HyukjinKwon - [x] Fix flaky test related with coverage HyukjinKwon - 6 consecutive passes out of 7 runs This PR will be co-authored with me and shaneknapp ## How was this patch tested? It will be tested via Jenkins. Closes #23117 from HyukjinKwon/SPARK-7721. Lead-authored-by: Hyukjin Kwon <gurwls223@apache.org> Co-authored-by: hyukjinkwon <gurwls223@apache.org> Co-authored-by: shane knapp <incomplete@gmail.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>	2019-02-01 10:18:08 +08:00
Bryan Cutler	16990f9299	[SPARK-26566][PYTHON][SQL] Upgrade Apache Arrow to version 0.12.0 ## What changes were proposed in this pull request? Upgrade Apache Arrow to version 0.12.0. This includes the Java artifacts and fixes to enable usage with pyarrow 0.12.0 Version 0.12.0 includes the following selected fixes/improvements relevant to Spark users: * Safe cast fails from numpy float64 array with nans to integer, ARROW-4258 * Java, Reduce heap usage for variable width vectors, ARROW-4147 * Binary identity cast not implemented, ARROW-4101 * pyarrow open_stream deprecated, use ipc.open_stream, ARROW-4098 * conversion to date object no longer needed, ARROW-3910 * Error reading IPC file with no record batches, ARROW-3894 * Signed to unsigned integer cast yields incorrect results when type sizes are the same, ARROW-3790 * from_pandas gives incorrect results when converting floating point to bool, ARROW-3428 * Import pyarrow fails if scikit-learn is installed from conda (boost-cpp / libboost issue), ARROW-3048 * Java update to official Flatbuffers version 1.9.0, ARROW-3175 complete list [here](https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20in%20(Resolved%2C%20Closed)%20AND%20fixVersion%20%3D%200.12.0) PySpark requires the following fixes to work with PyArrow 0.12.0 * Encrypted pyspark worker fails due to ChunkedStream missing closed property * pyarrow now converts dates as objects by default, which causes error because type is assumed datetime64 * ArrowTests fails due to difference in raised error message * pyarrow.open_stream deprecated * tests fail because groupby adds index column with duplicate name ## How was this patch tested? Ran unit tests with pyarrow versions 0.8.0, 0.10.0, 0.11.1, 0.12.0 Closes #23657 from BryanCutler/arrow-upgrade-012. Authored-by: Bryan Cutler <cutlerb@gmail.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>	2019-01-29 14:18:45 +08:00
Sean Owen	c2d0d700b5	[SPARK-26640][CORE][ML][SQL][STREAMING][PYSPARK] Code cleanup from lgtm.com analysis ## What changes were proposed in this pull request? Misc code cleanup from lgtm.com analysis. See comments below for details. ## How was this patch tested? Existing tests. Closes #23571 from srowen/SPARK-26640. Lead-authored-by: Sean Owen <sean.owen@databricks.com> Co-authored-by: Hyukjin Kwon <gurwls223@apache.org> Co-authored-by: Sean Owen <srowen@gmail.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2019-01-17 19:40:39 -06:00
wright	4915cb3adf	[MINOR][BUILD] ensure call to translate_component has correct number of arguments ## What changes were proposed in this pull request? The call to `translate_component` only supplied 2 out of the 3 required arguments. I added a default empty list for the missing argument to avoid a run-time error. I work for Semmle, and noticed the bug with our LGTM code analyzer: `0655f1624f/files/dev/create-release/releaseutils.py`?sort=name&dir=ASC&mode=heatmap#x1434915b6576fb40:1 ## How was this patch tested? I checked that `./dev/run-tests` pass OK. Closes #23567 from ipwright/wrong-number-of-arguments-fix. Authored-by: wright <wright@semmle.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2019-01-16 21:00:58 -06:00
Takeshi Yamamuro	abc937b247	[MINOR][BUILD] Remove binary license/notice files in a source release for branch-2.4+ only ## What changes were proposed in this pull request? To skip some steps to remove binary license/notice files in a source release for branch2.3 (these files only exist in master/branch-2.4 now), this pr checked a Spark release version in `dev/create-release/release-build.sh`. ## How was this patch tested? Manually checked. Closes #23538 from maropu/FixReleaseScript. Authored-by: Takeshi Yamamuro <yamamuro@apache.org> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2019-01-14 19:17:39 -06:00
Dongjoon Hyun	6f35ede31c	[SPARK-26554][BUILD][FOLLOWUP] Use GitHub instead of GitBox to check HEADER ## What changes were proposed in this pull request? This PR uses GitHub repository instead of GitBox because GitHub repo returns HTTP header status correctly. ## How was this patch tested? Manual. ``` $ ./do-release-docker.sh -d /tmp/test -n Branch [branch-2.4]: Current branch version is 2.4.1-SNAPSHOT. Release [2.4.1]: RC # [1]: This is a dry run. Please confirm the ref that will be built for testing. Ref [v2.4.1-rc1]: ``` Closes #23482 from dongjoon-hyun/SPARK-26554-2. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2019-01-07 17:54:05 -08:00
Dongjoon Hyun	468d25ec74	[MINOR][BUILD] Fix script name in `release-tag.sh` usage message ## What changes were proposed in this pull request? This PR fixes the old script name in `release-tag.sh`. $ ./release-tag.sh --help \| head -n1 usage: tag-release.sh ## How was this patch tested? Manual. $ ./release-tag.sh --help \| head -n1 usage: release-tag.sh Closes #23477 from dongjoon-hyun/SPARK-RELEASE-TAG. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2019-01-06 22:45:18 -08:00
Dongjoon Hyun	fe039faddf	[SPARK-26554][BUILD] Update `release-util.sh` to avoid GitBox fake 200 headers ## What changes were proposed in this pull request? Unlike the previous Apache Git repository, new GitBox repository returns a fake HTTP 200 header instead of `404 Not Found` header. This makes release scripts out of order. This PR aims to fix it to handle the html body message instead of the fake HTTP headers. This is a release blocker. ```bash $ curl -s --head --fail "https://gitbox.apache.org/repos/asf?p=spark.git;a=commit;h=v3.0.0" HTTP/1.1 200 OK Date: Sun, 06 Jan 2019 22:42:39 GMT Server: Apache/2.4.18 (Ubuntu) Vary: Accept-Encoding Access-Control-Allow-Origin: * Access-Control-Allow-Methods: POST, GET, OPTIONS Access-Control-Allow-Headers: X-PINGOTHER Access-Control-Max-Age: 1728000 Content-Type: text/html; charset=utf-8 ``` BEFORE ```bash $ ./do-release-docker.sh -d /tmp/test -n Branch [branch-2.4]: Current branch version is 2.4.1-SNAPSHOT. Release [2.4.1]: RC # [1]: v2.4.1-rc1 already exists. Continue anyway [y/n]? ``` AFTER ```bash $ ./do-release-docker.sh -d /tmp/test -n Branch [branch-2.4]: Current branch version is 2.4.1-SNAPSHOT. Release [2.4.1]: RC # [1]: This is a dry run. Please confirm the ref that will be built for testing. Ref [v2.4.1-rc1]: ``` ## How was this patch tested? Manual. Closes #23476 from dongjoon-hyun/SPARK-26554. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2019-01-06 19:59:31 -08:00
Dongjoon Hyun	5969b8a2ed	[SPARK-26541][BUILD] Add `-Pdocker-integration-tests` to `dev/scalastyle` ## What changes were proposed in this pull request? This PR makes `scalastyle` to check `docker-integration-tests` module additionally and fixes one error. ## How was this patch tested? Pass the Jenkins with the updated Scalastyle. ``` ======================================================================== Running Scala style checks ======================================================================== Scalastyle checks passed. ``` Closes #23459 from dongjoon-hyun/SPARK-26541. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2019-01-05 00:55:17 -08:00
shane knapp	bccb8602d7	[SPARK-26537][BUILD] change git-wip-us to gitbox ## What changes were proposed in this pull request? due to apache recently moving from git-wip-us.apache.org to gitbox.apache.org, we need to update the packaging scripts to point to the new repo location. this will also need to be backported to 2.4, 2.3, 2.1, 2.0 and 1.6. ## How was this patch tested? the build system will test this. Please review http://spark.apache.org/contributing.html before opening a pull request. Closes #23454 from shaneknapp/update-apache-repo. Authored-by: shane knapp <incomplete@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2019-01-04 18:27:26 -08:00
Dongjoon Hyun	81addaa6b7	[SPARK-26427][BUILD] Upgrade Apache ORC to 1.5.4 ## What changes were proposed in this pull request? This PR aims to update Apache ORC dependency to the latest version 1.5.4 released at Dec. 20. ([Release Notes](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12318320&version=12344187])) ``` [ORC-237] OrcFile.mergeFiles Specified block size is less than configured minimum value [ORC-409] Changes for extending MemoryManagerImpl [ORC-410] Fix a locale-dependent test in TestCsvReader [ORC-416] Avoid opening data reader when there is no stripe [ORC-417] Use dynamic Apache Maven mirror link [ORC-419] Ensure to call `close` at RecordReaderImpl constructor exception [ORC-432] openjdk 8 has a bug that prevents surefire from working [ORC-435] Ability to read stripes that are greater than 2GB [ORC-437] Make acid schema checks case insensitive [ORC-411] Update build to work with Java 10. [ORC-418] Fix broken docker build script ``` ## How was this patch tested? Build and pass Jenkins. Closes #23364 from dongjoon-hyun/SPARK-26427. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2018-12-22 00:41:21 -08:00
Reza Safi	90c77ea313	[SPARK-24958][CORE] Add memory from procfs to executor metrics. This adds the entire memory used by spark’s executor (as measured by procfs) to the executor metrics. The memory usage is collected from the entire process tree under the executor. The metrics are subdivided into memory used by java, by python, and by other processes, to aid users in diagnosing the source of high memory usage. The additional metrics are sent to the driver in heartbeats, using the mechanism introduced by SPARK-23429. This also slightly extends that approach to allow one ExecutorMetricType to collect multiple metrics. Added unit tests and also tested on a live cluster. Closes #22612 from rezasafi/ptreememory2. Authored-by: Reza Safi <rezasafi@cloudera.com> Signed-off-by: Imran Rashid <irashid@cloudera.com>	2018-12-10 11:14:11 -06:00
Sean Owen	2ea9792fde	[SPARK-26266][BUILD] Update to Scala 2.12.8 ## What changes were proposed in this pull request? Update to Scala 2.12.8 ## How was this patch tested? Existing tests. Closes #23218 from srowen/SPARK-26266. Authored-by: Sean Owen <sean.owen@databricks.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2018-12-08 05:59:53 -06:00
Dongjoon Hyun	4772265203	[SPARK-26298][BUILD] Upgrade Janino to 3.0.11 ## What changes were proposed in this pull request? This PR aims to upgrade Janino compiler to the latest version 3.0.11. The followings are the changes from the [release note](http://janino-compiler.github.io/janino/changelog.html). - Script with many "helper" variables. - Java 9+ compatibility - Compilation Error Messages Generated by JDK. - Added experimental support for the "StackMapFrame" attribute; not active yet. - Make Unparser more flexible. - Fixed NPEs in various "toString()" methods. - Optimize static method invocation with rvalue target expression. - Added all missing "ClassFile.getConstant*Info()" methods, removing the necessity for many type casts. ## How was this patch tested? Pass the Jenkins with the existing tests. Closes #23250 from dongjoon-hyun/SPARK-26298. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2018-12-06 20:50:57 -08:00
cody koeninger	5e5b9f2ee0	[SPARK-26177] Config change followup to [] Automated formatting for Scala code Let's keep this open for a while to see if other configuration tweaks are suggested ## What changes were proposed in this pull request? Formatting configuration changes following up https://github.com/apache/spark/pull/23148 ## How was this patch tested? ./dev/scalafmt Closes #23182 from koeninger/scalafmt-config. Authored-by: cody koeninger <cody@koeninger.org> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2018-12-03 10:03:51 -06:00

1 2 3 4 5 ...

769 commits