Commit graph

143 commits

Author SHA1 Message Date
Hyukjin Kwon 1d36b892ab [SPARK-7721][INFRA][FOLLOW-UP] Remove cloned coverage repo after posting HTMLs
## What changes were proposed in this pull request?

This PR proposes to remove cloned `pyspark-coverage-site` repo.

it doesn't looks a problem in PR builder but somehow it's problematic in `spark-master-test-sbt-hadoop-2.7`.

## How was this patch tested?

Jenkins.

Closes #23729 from HyukjinKwon/followup-coverage.

Lead-authored-by: Hyukjin Kwon <gurwls223@apache.org>
Co-authored-by: shane knapp <incomplete@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2019-06-25 09:18:32 +09:00
Dongjoon Hyun e5d95117e4 [SPARK-27979][BUILD][test-maven] Remove deprecated --force option in build/mvn and run-tests.py
## What changes were proposed in this pull request?

This is a second try of #24824.

Since Apache Spark 2.0.0, SPARK-14867 deprecated `--force` option and made it ignored. This PR cleans up the related code completely at 3.0.0.

**BEFORE (Jenkins)**
```
========================================================================
Building Spark
========================================================================
[info] Building Spark using Maven with these arguments:  -Phadoop-2.7 -Pkubernetes -Phive-thriftserver -Pkinesis-asl -Pyarn -Pspark-ganglia-lgpl -Phive -Pmesos clean package -DskipTests
WARNING: '--force' is deprecated and ignored.
...
========================================================================
Running Spark unit tests
========================================================================
[info] Running Spark tests using Maven with these arguments:  -Phadoop-2.7 -Phive-thriftserver -Phive -Dtest.exclude.tags=org.apache.spark.tags.ExtendedHiveTest,org.apache.spark.tags.ExtendedYarnTest test --fail-at-end
WARNING: '--force' is deprecated and ignored.
```

**AFTER (Jenkins)**
```
========================================================================
Building Spark
========================================================================
[info] Building Spark using Maven with these arguments:  -Phadoop-2.7 -Pkubernetes -Phive-thriftserver -Pkinesis-asl -Pyarn -Pspark-ganglia-lgpl -Phive -Pmesos clean package -DskipTests
...
========================================================================
Running Spark unit tests
========================================================================
[info] Running Spark tests using Maven with these arguments:  -Phadoop-2.7 -Pkubernetes -Phive-thriftserver -Pyarn -Pspark-ganglia-lgpl -Phive -Pkinesis-asl -Pmesos -Dtest.exclude.tags=org.apache.spark.tags.ExtendedHiveTest,org.apache.spark.tags.ExtendedYarnTest test --fail-at-end
```

## How was this patch tested?

Manually check the Jenkins logs.

Closes #24833 from dongjoon-hyun/SPARK-FORCE-2.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-06-10 18:40:46 -07:00
Dongjoon Hyun 742f805177 Revert "[SPARK-27979][BUILD][test-maven] Remove deprecated --force option in build/mvn and run-tests.py"
This reverts commit 354ec254c5.
2019-06-09 08:33:21 -07:00
Dongjoon Hyun 354ec254c5 [SPARK-27979][BUILD][test-maven] Remove deprecated --force option in build/mvn and run-tests.py
## What changes were proposed in this pull request?

Since Apache Spark 2.0.0, SPARK-14867 deprecated `--force` option and made it ignored. This PR cleans up the related code completely at 3.0.0.

**BEFORE (Jenkins)**
```
========================================================================
Building Spark
========================================================================
[info] Building Spark using Maven with these arguments:  -Phadoop-2.7 -Pkubernetes -Phive-thriftserver -Pkinesis-asl -Pyarn -Pspark-ganglia-lgpl -Phive -Pmesos clean package -DskipTests
WARNING: '--force' is deprecated and ignored.
...
========================================================================
Running Spark unit tests
========================================================================
[info] Running Spark tests using Maven with these arguments:  -Phadoop-2.7 -Phive-thriftserver -Phive -Dtest.exclude.tags=org.apache.spark.tags.ExtendedHiveTest,org.apache.spark.tags.ExtendedYarnTest test --fail-at-end
WARNING: '--force' is deprecated and ignored.
```

**AFTER (Jenkins)**
```
========================================================================
Building Spark
========================================================================
[info] Building Spark using Maven with these arguments:  -Phadoop-2.7 -Pkubernetes -Phive-thriftserver -Pkinesis-asl -Pyarn -Pspark-ganglia-lgpl -Phive -Pmesos clean package -DskipTests
...
========================================================================
Running Spark unit tests
========================================================================
[info] Running Spark tests using Maven with these arguments:  -Phadoop-2.7 -Pkubernetes -Phive-thriftserver -Pyarn -Pspark-ganglia-lgpl -Phive -Pkinesis-asl -Pmesos -Dtest.exclude.tags=org.apache.spark.tags.ExtendedHiveTest,org.apache.spark.tags.ExtendedYarnTest test --fail-at-end
```

## How was this patch tested?

Manually check the Jenkins logs.

Closes #24824 from dongjoon-hyun/SPARK-27979.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-06-08 08:17:12 -07:00
Yuming Wang 3f102a8229 [SPARK-27749][SQL] hadoop-3.2 support hive-thriftserver
## What changes were proposed in this pull request?

This PR mainly makes the following changes to make `hadoop-3.2` support `sql/hive-thriftserver`:
1. Upgrade [`TCLIService.thrift`](https://github.com/apache/hive/blob/rel/release-2.3.5/service-rpc/if/TCLIService.thrift) and related code to Hive 2.3.5 because of [HIVE-12442](https://issues.apache.org/jira/browse/HIVE-12442)(Note that we only migrate code without adding features, such as [HIVE-4924](https://issues.apache.org/jira/browse/HIVE-4924) and [HIVE-15473](https://issues.apache.org/jira/browse/HIVE-15473)).
2. Use slf4j as logging facade because of [HIVE-12237](https://issues.apache.org/jira/browse/HIVE-12237).
3. Port [HIVE-13169](https://issues.apache.org/jira/browse/HIVE-13169) to compatible with Hive 2.3.

## How was this patch tested?

Exiting test

Closes #24628 from wangyum/SPARK-27749.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: gatorsmile <gatorsmile@gmail.com>
2019-06-05 08:40:05 -07:00
HyukjinKwon b7bf4fd123 [SPARK-27402][INFRA][FOLLOW-UP] Exclude 'hive-thriftserver' in modules to test for hadoop3.2 for now
## What changes were proposed in this pull request?

This PR excludes  'hive-thriftserver' in modules to test for hadoop3.2 for now as well

## How was this patch tested?

Manually tested via `run-tests.py`

Closes #24644 from HyukjinKwon/SPARK-27402.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-05-20 07:53:19 -07:00
Yuming Wang f82ed5e8e0 [MINOR][TEST] Remove out-dated hive version in run-tests.py
## What changes were proposed in this pull request?

```
========================================================================
Building Spark
========================================================================
[info] Building Spark (w/Hive 1.2.1) using SBT with these arguments:  -Phadoop-3.2 -Pkubernetes -Phive-thriftserver -Pkinesis-asl -Pyarn -Pspark-ganglia-lgpl -Phive -Pmesos test:package streaming-kinesis-asl-assembly/assembly
```

`(w/Hive 1.2.1)` is incorrect when testing hadoop-3.2, It's should be (w/Hive 2.3.4).
This pr removes `(w/Hive 1.2.1)` in run-tests.py.

## How was this patch tested?

N/A

Closes #24451 from wangyum/run-tests-invalid-info.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-04-24 21:22:15 -07:00
shane knapp e1ece6a319 [SPARK-25079][PYTHON] update python3 executable to 3.6.x
## What changes were proposed in this pull request?

have jenkins test against python3.6 (instead of 3.4).

## How was this patch tested?

extensive testing on both the centos and ubuntu jenkins workers.

NOTE:  this will need to be backported to all active branches.

Closes #24266 from shaneknapp/updating-python3-executable.

Authored-by: shane knapp <incomplete@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2019-04-19 10:03:50 +09:00
Yuming Wang 9c0af746e5 [SPARK-27175][BUILD] Upgrade hadoop-3 to 3.2.0
## What changes were proposed in this pull request?

This PR upgrade `hadoop-3` to `3.2.0`  to workaround [HADOOP-16086](https://issues.apache.org/jira/browse/HADOOP-16086). Otherwise some test case will throw IllegalArgumentException:
```java
02:44:34.707 ERROR org.apache.hadoop.hive.ql.exec.Task: Job Submission failed with exception 'java.io.IOException(Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses.)'
java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses.
	at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:116)
	at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:109)
	at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:102)
	at org.apache.hadoop.mapred.JobClient.init(JobClient.java:475)
	at org.apache.hadoop.mapred.JobClient.<init>(JobClient.java:454)
	at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:369)
	at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:151)
	at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199)
	at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
	at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2183)
	at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1839)
	at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1526)
	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237)
	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1227)
	at org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$runHive$1(HiveClientImpl.scala:730)
	at org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:283)
	at org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:221)
	at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:220)
	at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:266)
	at org.apache.spark.sql.hive.client.HiveClientImpl.runHive(HiveClientImpl.scala:719)
	at org.apache.spark.sql.hive.client.HiveClientImpl.runSqlHive(HiveClientImpl.scala:709)
	at org.apache.spark.sql.hive.StatisticsSuite.createNonPartitionedTable(StatisticsSuite.scala:719)
	at org.apache.spark.sql.hive.StatisticsSuite.$anonfun$testAlterTableProperties$2(StatisticsSuite.scala:822)
```

## How was this patch tested?

manual tests

Closes #24106 from wangyum/SPARK-27175.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-03-16 19:42:05 -05:00
Yuming Wang f0b6245ea4 [SPARK-27158][BUILD] dev/mima and dev/scalastyle support dynamic profiles
## What changes were proposed in this pull request?

`dev/mima` and `dev/scalastyle` support dynamic reading profiles from `modules.py`.

## How was this patch tested?

manual tests

Closes #24089 from wangyum/SPARK-27158.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
2019-03-15 08:20:42 +09:00
Yuming Wang dccf6615c3 [SPARK-27130][BUILD] Automatically select profile when executing sbt-checkstyle
## What changes were proposed in this pull request?

This PR makes it automatically select profile when executing `sbt-checkstyle`. The reason for this is that `hadoop-2.7` and `hadoop-3.1` may have different `hive-thriftserver` module in the future.

## How was this patch tested?

manual tests:
```
Update AbstractService.java file.
export HADOOP_PROFILE=hadoop2.7
./dev/run-tests
```
The result:
![image](https://user-images.githubusercontent.com/5399861/54197992-5337e780-4500-11e9-930c-722982cdcd45.png)

Closes #24065 from wangyum/SPARK-27130.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
2019-03-13 08:03:46 +09:00
Yuming Wang 8ab13065f6 [SPARK-23807][FOLLOW-UP][BUILD][TEST-HADOOP3.1] Add test-hadoop3.1 phrase
## What changes were proposed in this pull request?

Add `test-hadoop3.1` phrase to test Spark against Spark’s Hadoop 3.1 profile.

## How was this patch tested?
Tested on jenkins. This is output:
```
[info] Using build tool sbt with Hadoop profile hadoop3.1 under environment amplab_jenkins
...
[info] Building Spark (w/Hive 1.2.1) using SBT with these arguments:  -Phadoop-3.1 -Pkubernetes -Phive-thriftserver -Pkinesis-asl -Pyarn -Pspark-ganglia-lgpl -Phive -Pmesos test:package streaming-kinesis-asl-assembly/assembly
```
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/103282/console

Closes #24045 from wangyum/SPARK-23807.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
2019-03-11 11:15:58 +09:00
cchung100m dc46fb77ba [SPARK-26822] Upgrade the deprecated module 'optparse'
Follow the [official document](https://docs.python.org/2/library/argparse.html#upgrading-optparse-code)  to upgrade the deprecated module 'optparse' to  'argparse'.

## What changes were proposed in this pull request?

This PR proposes to replace 'optparse' module with 'argparse' module.

## How was this patch tested?

Follow the [previous testing](7e3eb3cd20), manually tested and negative tests were also done. My [test results](https://gist.github.com/cchung100m/1661e7df6e8b66940a6e52a20861f61d)

Closes #23730 from cchung100m/solve_deprecated_module_optparse.

Authored-by: cchung100m <cchung100m@cs.ccu.edu.tw>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-02-10 00:36:22 -06:00
Hyukjin Kwon cdd694c52b [SPARK-7721][INFRA] Run and generate test coverage report from Python via Jenkins
## What changes were proposed in this pull request?

### Background

For the current status, the test script that generates coverage information was merged
into Spark, https://github.com/apache/spark/pull/20204

So, we can generate the coverage report and site by, for example:

```
run-tests-with-coverage --python-executables=python3 --modules=pyspark-sql
```

like `run-tests` script in `./python`.

### Proposed change

The next step is to host this coverage report via `github.io` automatically
by Jenkins (see https://spark-test.github.io/pyspark-coverage-site/).

This uses my testing account for Spark, spark-test, which is shared to Felix and Shivaram a long time ago for testing purpose including AppVeyor.

To cut this short, this PR targets to run the coverage in
[spark-master-test-sbt-hadoop-2.7](https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.7/)

In the specific job, it will clone the page, and rebase the up-to-date PySpark test coverage from the latest commit. For instance as below:

```bash
# Clone PySpark coverage site.
git clone https://github.com/spark-test/pyspark-coverage-site.git

# Remove existing HTMLs.
rm -fr pyspark-coverage-site/*

# Copy generated coverage HTMLs.
cp -r .../python/test_coverage/htmlcov/* pyspark-coverage-site/

# Check out to a temporary branch.
git symbolic-ref HEAD refs/heads/latest_branch

# Add all the files.
git add -A

# Commit current HTMLs.
git commit -am "Coverage report at latest commit in Apache Spark"

# Delete the old branch.
git branch -D gh-pages

# Rename the temporary branch to master.
git branch -m gh-pages

# Finally, force update to our repository.
git push -f origin gh-pages
```

So, it is a one single up-to-date coverage can be shown in the `github-io` page. The commands above were manually tested.

### TODOs

- [x] Write a draft HyukjinKwon
- [x] `pip install coverage` to all python implementations (pypy, python2, python3) in Jenkins workers  - shaneknapp
- [x] Set hidden `SPARK_TEST_KEY` for spark-test's password in Jenkins via Jenkins's feature
 This should be set in both PR builder and `spark-master-test-sbt-hadoop-2.7` so that later other PRs can test and fix the bugs - shaneknapp
- [x] Set an environment variable that indicates `spark-master-test-sbt-hadoop-2.7` so that that specific build can report and update the coverage site - shaneknapp
- [x] Make PR builder's test passed HyukjinKwon
- [x] Fix flaky test related with coverage HyukjinKwon
  -  6 consecutive passes out of 7 runs

This PR will be co-authored with me and shaneknapp

## How was this patch tested?

It will be tested via Jenkins.

Closes #23117 from HyukjinKwon/SPARK-7721.

Lead-authored-by: Hyukjin Kwon <gurwls223@apache.org>
Co-authored-by: hyukjinkwon <gurwls223@apache.org>
Co-authored-by: shane knapp <incomplete@gmail.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
2019-02-01 10:18:08 +08:00
Sean Owen c2d0d700b5 [SPARK-26640][CORE][ML][SQL][STREAMING][PYSPARK] Code cleanup from lgtm.com analysis
## What changes were proposed in this pull request?

Misc code cleanup from lgtm.com analysis. See comments below for details.

## How was this patch tested?

Existing tests.

Closes #23571 from srowen/SPARK-26640.

Lead-authored-by: Sean Owen <sean.owen@databricks.com>
Co-authored-by: Hyukjin Kwon <gurwls223@apache.org>
Co-authored-by: Sean Owen <srowen@gmail.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-01-17 19:40:39 -06:00
hyukjinkwon 41d5aaec84 [SPARK-26148][PYTHON][TESTS] Increases default parallelism in PySpark tests to speed up
## What changes were proposed in this pull request?

This PR proposes to increase parallelism in PySpark tests to speed up from 4 to 8.

It decreases the elapsed time from

https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99163/consoleFull
Tests passed in 1770 seconds

to

https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99186/testReport/
Tests passed in 1027 seconds

## How was this patch tested?

Jenkins tests

Closes #23111 from HyukjinKwon/parallelism.

Authored-by: hyukjinkwon <gurwls223@apache.org>
Signed-off-by: hyukjinkwon <gurwls223@apache.org>
2018-11-26 00:26:24 +09:00
Wenchen Fan 130121711c fix security issue of zinc(update run-tests.py) 2018-10-20 00:23:16 +08:00
Sean Owen 703e6da1ec [SPARK-25705][BUILD][STREAMING][TEST-MAVEN] Remove Kafka 0.8 integration
## What changes were proposed in this pull request?

Remove Kafka 0.8 integration

## How was this patch tested?

Existing tests, build scripts

Closes #22703 from srowen/SPARK-25705.

Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2018-10-16 09:10:24 -05:00
Sean Owen a001814189 [SPARK-25598][STREAMING][BUILD][TEST-MAVEN] Remove flume connector in Spark 3
## What changes were proposed in this pull request?

Removes all vestiges of Flume in the build, for Spark 3.
I don't think this needs Jenkins config changes.

## How was this patch tested?

Existing tests.

Closes #22692 from srowen/SPARK-25598.

Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2018-10-11 14:28:06 -07:00
Sean Owen 80813e1980 [SPARK-25016][BUILD][CORE] Remove support for Hadoop 2.6
## What changes were proposed in this pull request?

Remove Hadoop 2.6 references and make 2.7 the default.
Obviously, this is for master/3.0.0 only.
After this we can also get rid of the separate test jobs for Hadoop 2.6.

## How was this patch tested?

Existing tests

Closes #22615 from srowen/SPARK-25016.

Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2018-10-10 12:07:53 -07:00
Sean Owen 08c76b5d39 [SPARK-25238][PYTHON] lint-python: Fix W605 warnings for pycodestyle 2.4
(This change is a subset of the changes needed for the JIRA; see https://github.com/apache/spark/pull/22231)

## What changes were proposed in this pull request?

Use raw strings and simpler regex syntax consistently in Python, which also avoids warnings from pycodestyle about accidentally relying Python's non-escaping of non-reserved chars in normal strings. Also, fix a few long lines.

## How was this patch tested?

Existing tests, and some manual double-checking of the behavior of regexes in Python 2/3 to be sure.

Closes #22400 from srowen/SPARK-25238.2.

Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: hyukjinkwon <gurwls223@apache.org>
2018-09-13 11:19:43 +08:00
Gengliang Wang 395860a986 [SPARK-24768][SQL] Have a built-in AVRO data source implementation
## What changes were proposed in this pull request?

Apache Avro (https://avro.apache.org) is a popular data serialization format. It is widely used in the Spark and Hadoop ecosystem, especially for Kafka-based data pipelines.  Using the external package https://github.com/databricks/spark-avro, Spark SQL can read and write the avro data. Making spark-Avro built-in can provide a better experience for first-time users of Spark SQL and structured streaming. We expect the built-in Avro data source can further improve the adoption of structured streaming.
The proposal is to inline code from spark-avro package (https://github.com/databricks/spark-avro). The target release is Spark 2.4.

[Built-in AVRO Data Source In Spark 2.4.pdf](https://github.com/apache/spark/files/2181511/Built-in.AVRO.Data.Source.In.Spark.2.4.pdf)

## How was this patch tested?

Unit test

Author: Gengliang Wang <gengliang.wang@databricks.com>

Closes #21742 from gengliangwang/export_avro.
2018-07-12 13:55:25 -07:00
hyukjinkwon b0a9352559 [SPARK-24573][INFRA] Runs SBT checkstyle after the build to work around a side-effect
## What changes were proposed in this pull request?

Seems checkstyle affects the build in the PR builder in Jenkins. I can't reproduce in my local and seems it can only be reproduced in the PR builder.

I was checking the places it goes through and this is just a speculation that checkstyle's compilation in SBT has a side effect to the assembly build.

This PR proposes to run the SBT checkstyle after the build.

## How was this patch tested?

Jenkins tests.

Author: hyukjinkwon <gurwls223@apache.org>

Closes #21579 from HyukjinKwon/investigate-javastyle.
2018-06-18 15:32:34 +08:00
hyukjinkwon 4a14dc0aff [SPARK-22269][BUILD] Run Java linter via SBT for Jenkins
## What changes were proposed in this pull request?

This PR proposes to check Java lint via SBT for Jenkins. It uses the SBT wrapper for checkstyle.

I manually tested. If we build the codes once, running this script takes 2 mins at maximum in my local:

Test codes:

```
Checkstyle failed at following occurrences:
[error] Checkstyle error found in /.../spark/core/src/test/java/test/org/apache/spark/JavaAPISuite.java:82: Line is longer than 100 characters (found 103).
[error] 1 issue(s) found in Checkstyle report: /.../spark/core/target/checkstyle-test-report.xml
[error] Checkstyle error found in /.../spark/sql/hive/src/test/java/org/apache/spark/sql/hive/JavaDataFrameSuite.java:84: Line is longer than 100 characters (found 115).
[error] 1 issue(s) found in Checkstyle report: /.../spark/sql/hive/target/checkstyle-test-report.xml
...
```

Main codes:

```
Checkstyle failed at following occurrences:
[error] Checkstyle error found in /.../spark/sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/InputPartition.java:39: Line is longer than 100 characters (found 104).
[error] Checkstyle error found in /.../spark/sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/InputPartitionReader.java:26: Line is longer than 100 characters (found 110).
[error] Checkstyle error found in /.../spark/sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/InputPartitionReader.java:30: Line is longer than 100 characters (found 104).
...
```

## How was this patch tested?

Manually tested. Jenkins build should test this.

Author: hyukjinkwon <gurwls223@apache.org>

Closes #21399 from HyukjinKwon/SPARK-22269.
2018-05-24 14:19:32 +08:00
Benjamin Peterson 7013eea11c [SPARK-23522][PYTHON] always use sys.exit over builtin exit
The exit() builtin is only for interactive use. applications should use sys.exit().

## What changes were proposed in this pull request?

All usage of the builtin `exit()` function is replaced by `sys.exit()`.

## How was this patch tested?

I ran `python/run-tests`.

Please review http://spark.apache.org/contributing.html before opening a pull request.

Author: Benjamin Peterson <benjamin@python.org>

Closes #20682 from benjaminp/sys-exit.
2018-03-08 20:38:34 +09:00
Rekha Joshi 7af1a325da [SPARK-23174][BUILD][PYTHON] python code style checker update
## What changes were proposed in this pull request?
Referencing latest python code style checking from PyPi/pycodestyle
Removed pending TODO
For now, in tox.ini excluded the additional style error discovered on existing python due to latest style checker (will fallback on review comment to finalize exclusion or fix py)
Any further code styling requirement needs to be part of pycodestyle, not in SPARK.

## How was this patch tested?
./dev/run-tests

Author: Rekha Joshi <rekhajoshm@gmail.com>
Author: rjoshi2 <rekhajoshm@gmail.com>

Closes #20338 from rekhajoshm/SPARK-11222.
2018-01-24 21:13:47 +09:00
hyukjinkwon 12faae295e [SPARK-23169][INFRA][R] Run lintr on the changes of lint-r script and .lintr configuration
## What changes were proposed in this pull request?

When running the `run-tests` script, seems we don't run lintr on the changes of `lint-r` script and `.lintr` configuration.

## How was this patch tested?

Jenkins builds

Author: hyukjinkwon <gurwls223@gmail.com>

Closes #20339 from HyukjinKwon/check-r-changed.
2018-01-22 09:45:27 +09:00
Kazuaki Ishizaki 3a07eff5af [SPARK-22813][BUILD] Use lsof or /usr/sbin/lsof in run-tests.py
## What changes were proposed in this pull request?

In [the environment where `/usr/sbin/lsof` does not exist](https://github.com/apache/spark/pull/19695#issuecomment-342865001), `./dev/run-tests.py` for `maven` causes the following error. This is because the current `./dev/run-tests.py` checks existence of only `/usr/sbin/lsof` and aborts immediately if it does not exist.

This PR changes to check whether `lsof` or `/usr/sbin/lsof` exists.

```
/bin/sh: 1: /usr/sbin/lsof: not found

Usage:
 kill [options] <pid> [...]

Options:
 <pid> [...]            send signal to every <pid> listed
 -<signal>, -s, --signal <signal>
                        specify the <signal> to be sent
 -l, --list=[<signal>]  list all signal names, or convert one to a name
 -L, --table            list all signal names in a nice table

 -h, --help     display this help and exit
 -V, --version  output version information and exit

For more details see kill(1).
Traceback (most recent call last):
  File "./dev/run-tests.py", line 626, in <module>
    main()
  File "./dev/run-tests.py", line 597, in main
    build_apache_spark(build_tool, hadoop_version)
  File "./dev/run-tests.py", line 389, in build_apache_spark
    build_spark_maven(hadoop_version)
  File "./dev/run-tests.py", line 329, in build_spark_maven
    exec_maven(profiles_and_goals)
  File "./dev/run-tests.py", line 270, in exec_maven
    kill_zinc_on_port(zinc_port)
  File "./dev/run-tests.py", line 258, in kill_zinc_on_port
    subprocess.check_call(cmd, shell=True)
  File "/usr/lib/python2.7/subprocess.py", line 541, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '/usr/sbin/lsof -P |grep 3156 | grep LISTEN | awk '{ print $2; }' | xargs kill' returned non-zero exit status 123
```

## How was this patch tested?

manually tested

Author: Kazuaki Ishizaki <ishizaki@jp.ibm.com>

Closes #19998 from kiszk/SPARK-22813.
2017-12-19 07:35:03 +09:00
hyukjinkwon 160a540610 [SPARK-22376][TESTS] Makes dev/run-tests.py script compatible with Python 3
## What changes were proposed in this pull request?

This PR proposes to fix `dev/run-tests.py` script to support Python 3.

Here are some backgrounds. Up to my knowledge,

In Python 2,
- `unicode` is NOT `str` in Python 2 (`type("foo") != type(u"foo")`).
- `str` has an alias, `bytes` in Python 2 (`type("foo") == type(b"foo")`).

In Python 3,
- `unicode` was (roughly) replaced by `str` in Python 3 (`type("foo") == type(u"foo")`).
- `str` is NOT `bytes` in Python 3 (`type("foo") != type(b"foo")`).

So, this PR fixes:

  1. Use `b''` instead of `''` so that both `str` in Python 2 and `bytes` in Python 3 can be hanlded. `sbt_proc.stdout.readline()` returns `str` (which has an alias, `bytes`) in Python 2 and `bytes` in Python 3

  2. Similarily, use `b''` instead of `''` so that both `str` in Python 2 and `bytes` in Python 3 can be hanlded. `re.compile` with `str` pattern does not seem supporting to match `bytes` in Python 3:

Actually, this change is recommended up to my knowledge - https://docs.python.org/3/howto/pyporting.html#text-versus-binary-data:

> Mark all binary literals with a b prefix, textual literals with a u prefix

## How was this patch tested?

I manually tested this via Python 3 with few additional changes to reduce the elapsed time.

Author: hyukjinkwon <gurwls223@gmail.com>

Closes #19665 from HyukjinKwon/SPARK-22376.
2017-11-07 19:45:34 +09:00
Wenchen Fan 864d94fe87 [SPARK-20974][BUILD] we should run REPL tests if SQL module has code changes
## What changes were proposed in this pull request?

REPL module depends on SQL module, so we should run REPL tests if SQL module has code changes.

## How was this patch tested?

N/A

Author: Wenchen Fan <wenchen@databricks.com>

Closes #18191 from cloud-fan/test.
2017-06-02 21:59:52 -07:00
hyukjinkwon 35378766ad [SPARK-20343][BUILD] Avoid Unidoc build only if Hadoop 2.6 is explicitly set in SBT build
## What changes were proposed in this pull request?

This PR proposes two things as below:

- Avoid Unidoc build only if Hadoop 2.6 is explicitly set in SBT build

  Due to a different dependency resolution in SBT & Unidoc by an unknown reason, the documentation build fails on a specific machine & environment in Jenkins but it was unable to reproduce.

  So, this PR just checks an environment variable `AMPLAB_JENKINS_BUILD_PROFILE` that is set in Hadoop 2.6 SBT build against branches on Jenkins, and then disables Unidoc build. **Note that PR builder will still build it with Hadoop 2.6 & SBT.**

  ```
  ========================================================================
  Building Unidoc API Documentation
  ========================================================================
  [info] Building Spark unidoc (w/Hive 1.2.1) using SBT with these arguments:  -Phadoop-2.6 -Pmesos -Pkinesis-asl -Pyarn -Phive-thriftserver -Phive unidoc
  Using /usr/java/jdk1.8.0_60 as default JAVA_HOME.
  ...
  ```

  I checked the environment variables from the logs (first bit) as below:

  - **spark-master-test-sbt-hadoop-2.6** (this one is being failed) - https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-2.6/lastBuild/consoleFull

  ```
  JAVA_HOME=/usr/java/jdk1.8.0_60
  JAVA_7_HOME=/usr/java/jdk1.7.0_79
  SPARK_BRANCH=master
  AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.6   <- I use this variable
  AMPLAB_JENKINS="true"
  ```
  - spark-master-test-sbt-hadoop-2.7 - https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-2.7/lastBuild/consoleFull

  ```
  JAVA_HOME=/usr/java/jdk1.8.0_60
  JAVA_7_HOME=/usr/java/jdk1.7.0_79
  SPARK_BRANCH=master
  AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.7
  AMPLAB_JENKINS="true"
  ```

  - spark-master-test-maven-hadoop-2.6 - https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-2.6/lastBuild/consoleFull

  ```
  JAVA_HOME=/usr/java/jdk1.8.0_60
  JAVA_7_HOME=/usr/java/jdk1.7.0_79
  HADOOP_PROFILE=hadoop-2.6
  HADOOP_VERSION=
  SPARK_BRANCH=master
  AMPLAB_JENKINS="true"
  ```

  - spark-master-test-maven-hadoop-2.7 - https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-2.7/lastBuild/consoleFull

  ```
  JAVA_HOME=/usr/java/jdk1.8.0_60
  JAVA_7_HOME=/usr/java/jdk1.7.0_79
  HADOOP_PROFILE=hadoop-2.7
  HADOOP_VERSION=
  SPARK_BRANCH=master
  AMPLAB_JENKINS="true"
  ```

  - PR builder - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75843/consoleFull

  ```
  JENKINS_MASTER_HOSTNAME=amp-jenkins-master
  JAVA_HOME=/usr/java/jdk1.8.0_60
  JAVA_7_HOME=/usr/java/jdk1.7.0_79
  ```

  Assuming from other logs in branch-2.1

    - SBT & Hadoop 2.6 against branch-2.1 https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.1-test-sbt-hadoop-2.6/lastBuild/consoleFull

      ```
      JAVA_HOME=/usr/java/jdk1.8.0_60
      JAVA_7_HOME=/usr/java/jdk1.7.0_79
      SPARK_BRANCH=branch-2.1
      AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.6
      AMPLAB_JENKINS="true"
      ```

    - Maven & Hadoop 2.6 against branch-2.1 https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.1-test-maven-hadoop-2.6/lastBuild/consoleFull

      ```
      JAVA_HOME=/usr/java/jdk1.8.0_60
      JAVA_7_HOME=/usr/java/jdk1.7.0_79
      HADOOP_PROFILE=hadoop-2.6
      HADOOP_VERSION=
      SPARK_BRANCH=branch-2.1
      AMPLAB_JENKINS="true"
      ```

  We have been using the same convention for those variables. These are actually being used in `run-tests.py` script - here https://github.com/apache/spark/blob/master/dev/run-tests.py#L519-L520

- Revert the previous try

  After https://github.com/apache/spark/pull/17651, it seems the build still fails on SBT Hadoop 2.6 master.

  I am unable to reproduce this - https://github.com/apache/spark/pull/17477#issuecomment-294094092 and the reviewer was too. So, this got merged as it looks the only way to verify this is to merge it currently (as no one seems able to reproduce this).

## How was this patch tested?

I only checked `is_hadoop_version_2_6 = os.environ.get("AMPLAB_JENKINS_BUILD_PROFILE") == "hadoop2.6"` is working fine as expected as below:

```python
>>> import collections
>>> os = collections.namedtuple('os', 'environ')(environ={"AMPLAB_JENKINS_BUILD_PROFILE": "hadoop2.6"})
>>> print(not os.environ.get("AMPLAB_JENKINS_BUILD_PROFILE") == "hadoop2.6")
False
>>> os = collections.namedtuple('os', 'environ')(environ={"AMPLAB_JENKINS_BUILD_PROFILE": "hadoop2.7"})
>>> print(not os.environ.get("AMPLAB_JENKINS_BUILD_PROFILE") == "hadoop2.6")
True
>>> os = collections.namedtuple('os', 'environ')(environ={})
>>> print(not os.environ.get("AMPLAB_JENKINS_BUILD_PROFILE") == "hadoop2.6")
True
```

I tried many ways but I was unable to reproduce this in my local. Sean also tried the way I did but he was also unable to reproduce this.

Please refer the comments in https://github.com/apache/spark/pull/17477#issuecomment-294094092

Author: hyukjinkwon <gurwls223@gmail.com>

Closes #17669 from HyukjinKwon/revert-SPARK-20343.
2017-04-19 12:18:54 +01:00
hyukjinkwon ceaf77ae43 [SPARK-18692][BUILD][DOCS] Test Java 8 unidoc build on Jenkins
## What changes were proposed in this pull request?

This PR proposes to run Spark unidoc to test Javadoc 8 build as Javadoc 8 is easily re-breakable.

There are several problems with it:

- It introduces little extra bit of time to run the tests. In my case, it took 1.5 mins more (`Elapsed :[94.8746569157]`). How it was tested is described in "How was this patch tested?".

- > One problem that I noticed was that Unidoc appeared to be processing test sources: if we can find a way to exclude those from being processed in the first place then that might significantly speed things up.

  (see  joshrosen's [comment](https://issues.apache.org/jira/browse/SPARK-18692?focusedCommentId=15947627&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15947627))

To complete this automated build, It also suggests to fix existing Javadoc breaks / ones introduced by test codes as described above.

There fixes are similar instances that previously fixed. Please refer https://github.com/apache/spark/pull/15999 and https://github.com/apache/spark/pull/16013

Note that this only fixes **errors** not **warnings**. Please see my observation https://github.com/apache/spark/pull/17389#issuecomment-288438704 for spurious errors by warnings.

## How was this patch tested?

Manually via `jekyll build` for building tests. Also, tested via running `./dev/run-tests`.

This was tested via manually adding `time.time()` as below:

```diff
     profiles_and_goals = build_profiles + sbt_goals

     print("[info] Building Spark unidoc (w/Hive 1.2.1) using SBT with these arguments: ",
           " ".join(profiles_and_goals))

+    import time
+    st = time.time()
     exec_sbt(profiles_and_goals)
+    print("Elapsed :[%s]" % str(time.time() - st))
```

produces

```
...
========================================================================
Building Unidoc API Documentation
========================================================================
...
[info] Main Java API documentation successful.
...
Elapsed :[94.8746569157]
...

Author: hyukjinkwon <gurwls223@gmail.com>

Closes #17477 from HyukjinKwon/SPARK-18692.
2017-04-12 12:38:48 +01:00
Sean Owen 0e2405490f
[SPARK-19550][BUILD][CORE][WIP] Remove Java 7 support
- Move external/java8-tests tests into core, streaming, sql and remove
- Remove MaxPermGen and related options
- Fix some reflection / TODOs around Java 8+ methods
- Update doc references to 1.7/1.8 differences
- Remove Java 7/8 related build profiles
- Update some plugins for better Java 8 compatibility
- Fix a few Java-related warnings

For the future:

- Update Java 8 examples to fully use Java 8
- Update Java tests to use lambdas for simplicity
- Update Java internal implementations to use lambdas

## How was this patch tested?

Existing tests

Author: Sean Owen <sowen@cloudera.com>

Closes #16871 from srowen/SPARK-19493.
2017-02-16 12:32:45 +00:00
Dongjoon Hyun c618ccdbe9
[SPARK-19464][BUILD][HOTFIX] run-tests should use hadoop2.6
## What changes were proposed in this pull request?

After SPARK-19464, **SparkPullRequestBuilder** fails because it still tries to use hadoop2.3.

**BEFORE**
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72595/console
```
========================================================================
Building Spark
========================================================================
[error] Could not find hadoop2.3 in the list. Valid options  are ['hadoop2.6', 'hadoop2.7']
Attempting to post to Github...
 > Post successful.
```

**AFTER**
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72595/console
```
========================================================================
Building Spark
========================================================================
[info] Building Spark (w/Hive 1.2.1) using SBT with these arguments:  -Phadoop-2.6 -Pmesos -Pkinesis-asl -Pyarn -Phive-thriftserver -Phive test:package streaming-kafka-0-8-assembly/assembly streaming-flume-assembly/assembly streaming-kinesis-asl-assembly/assembly
Using /usr/java/jdk1.8.0_60 as default JAVA_HOME.
Note, this will be overridden by -java-home if it is set.
```

## How was this patch tested?

Pass the existing test.

Author: Dongjoon Hyun <dongjoon@apache.org>

Closes #16858 from dongjoon-hyun/hotfix_run-tests.
2017-02-08 21:28:04 +00:00
Sean Owen e8d3fca450
[SPARK-19464][CORE][YARN][TEST-HADOOP2.6] Remove support for Hadoop 2.5 and earlier
## What changes were proposed in this pull request?

- Remove support for Hadoop 2.5 and earlier
- Remove reflection and code constructs only needed to support multiple versions at once
- Update docs to reflect newer versions
- Remove older versions' builds and profiles.

## How was this patch tested?

Existing tests

Author: Sean Owen <sowen@cloudera.com>

Closes #16810 from srowen/SPARK-19464.
2017-02-08 12:20:07 +00:00
Holden Karau a36a76ac43 [SPARK-1267][SPARK-18129] Allow PySpark to be pip installed
## What changes were proposed in this pull request?

This PR aims to provide a pip installable PySpark package. This does a bunch of work to copy the jars over and package them with the Python code (to prevent challenges from trying to use different versions of the Python code with different versions of the JAR). It does not currently publish to PyPI but that is the natural follow up (SPARK-18129).

Done:
- pip installable on conda [manual tested]
- setup.py installed on a non-pip managed system (RHEL) with YARN [manual tested]
- Automated testing of this (virtualenv)
- packaging and signing with release-build*

Possible follow up work:
- release-build update to publish to PyPI (SPARK-18128)
- figure out who owns the pyspark package name on prod PyPI (is it someone with in the project or should we ask PyPI or should we choose a different name to publish with like ApachePySpark?)
- Windows support and or testing ( SPARK-18136 )
- investigate details of wheel caching and see if we can avoid cleaning the wheel cache during our test
- consider how we want to number our dev/snapshot versions

Explicitly out of scope:
- Using pip installed PySpark to start a standalone cluster
- Using pip installed PySpark for non-Python Spark programs

*I've done some work to test release-build locally but as a non-committer I've just done local testing.
## How was this patch tested?

Automated testing with virtualenv, manual testing with conda, a system wide install, and YARN integration.

release-build changes tested locally as a non-committer (no testing of upload artifacts to Apache staging websites)

Author: Holden Karau <holden@us.ibm.com>
Author: Juliet Hougland <juliet@cloudera.com>
Author: Juliet Hougland <not@myemail.com>

Closes #15659 from holdenk/SPARK-1267-pip-install-pyspark.
2016-11-16 14:22:15 -08:00
Shixiong Zhu 9293734d35 [SPARK-17346][SQL] Add Kafka source for Structured Streaming
## What changes were proposed in this pull request?

This PR adds a new project ` external/kafka-0-10-sql` for Structured Streaming Kafka source.

It's based on the design doc: https://docs.google.com/document/d/19t2rWe51x7tq2e5AOfrsM9qb8_m7BRuv9fel9i0PqR8/edit?usp=sharing

tdas did most of work and part of them was inspired by koeninger's work.

### Introduction

The Kafka source is a structured streaming data source to poll data from Kafka. The schema of reading data is as follows:

Column | Type
---- | ----
key | binary
value | binary
topic | string
partition | int
offset | long
timestamp | long
timestampType | int

The source can deal with deleting topics. However, the user should make sure there is no Spark job processing the data when deleting a topic.

### Configuration

The user can use `DataStreamReader.option` to set the following configurations.

Kafka Source's options | value | default | meaning
------ | ------- | ------ | -----
startingOffset | ["earliest", "latest"] | "latest" | The start point when a query is started, either "earliest" which is from the earliest offset, or "latest" which is just from the latest offset. Note: This only applies when a new Streaming query is started, and that resuming will always pick up from where the query left off.
failOnDataLost | [true, false] | true | Whether to fail the query when it's possible that data is lost (e.g., topics are deleted, or offsets are out of range). This may be a false alarm. You can disable it when it doesn't work as you expected.
subscribe | A comma-separated list of topics | (none) | The topic list to subscribe. Only one of "subscribe" and "subscribeParttern" options can be specified for Kafka source.
subscribePattern | Java regex string | (none) | The pattern used to subscribe the topic. Only one of "subscribe" and "subscribeParttern" options can be specified for Kafka source.
kafka.consumer.poll.timeoutMs | long | 512 | The timeout in milliseconds to poll data from Kafka in executors
fetchOffset.numRetries | int | 3 | Number of times to retry before giving up fatch Kafka latest offsets.
fetchOffset.retryIntervalMs | long | 10 | milliseconds to wait before retrying to fetch Kafka offsets

Kafka's own configurations can be set via `DataStreamReader.option` with `kafka.` prefix, e.g, `stream.option("kafka.bootstrap.servers", "host:port")`

### Usage

* Subscribe to 1 topic
```Scala
spark
  .readStream
  .format("kafka")
  .option("kafka.bootstrap.servers", "host:port")
  .option("subscribe", "topic1")
  .load()
```

* Subscribe to multiple topics
```Scala
spark
  .readStream
  .format("kafka")
  .option("kafka.bootstrap.servers", "host:port")
  .option("subscribe", "topic1,topic2")
  .load()
```

* Subscribe to a pattern
```Scala
spark
  .readStream
  .format("kafka")
  .option("kafka.bootstrap.servers", "host:port")
  .option("subscribePattern", "topic.*")
  .load()
```

## How was this patch tested?

The new unit tests.

Author: Shixiong Zhu <shixiong@databricks.com>
Author: Tathagata Das <tathagata.das1565@gmail.com>
Author: Shixiong Zhu <zsxwing@gmail.com>
Author: cody koeninger <cody@koeninger.org>

Closes #15102 from zsxwing/kafka-source.
2016-10-05 16:45:45 -07:00
Sean Owen 536fa911c1 [SPARK-17329][BUILD] Don't build PRs with -Pyarn unless YARN code changed
## What changes were proposed in this pull request?

Only build PRs with -Pyarn if YARN code was modified.

## How was this patch tested?

Jenkins tests (will look to verify whether -Pyarn was included in the PR builder for this one.)

Author: Sean Owen <sowen@cloudera.com>

Closes #14892 from srowen/SPARK-17329.
2016-09-01 09:10:01 +01:00
Josh Rosen acef843f67 [SPARK-15975] Fix improper Popen retcode code handling in dev/run-tests
In the `dev/run-tests.py` script we check a `Popen.retcode` for success using `retcode > 0`, but this is subtlety wrong because Popen's return code will be negative if the child process was terminated by a signal: https://docs.python.org/2/library/subprocess.html#subprocess.Popen.returncode

In order to properly handle signals, we should change this to check `retcode != 0` instead.

Author: Josh Rosen <joshrosen@databricks.com>

Closes #13692 from JoshRosen/dev-run-tests-return-code-handling.
2016-06-16 14:18:58 -07:00
Shixiong Zhu 0ee9fd9e52 [SPARK-15935][PYSPARK] Fix a wrong format tag in the error message
## What changes were proposed in this pull request?

A follow up PR for #13655 to fix a wrong format tag.

## How was this patch tested?

Jenkins unit tests.

Author: Shixiong Zhu <shixiong@databricks.com>

Closes #13665 from zsxwing/fix.
2016-06-14 19:45:11 -07:00
Reynold Xin 45b7557e61 [SPARK-15424][SPARK-15437][SPARK-14807][SQL] Revert Create a hivecontext-compatibility module
## What changes were proposed in this pull request?
I initially asked to create a hivecontext-compatibility module to put the HiveContext there. But we are so close to Spark 2.0 release and there is only a single class in it. It seems overkill to have an entire package, which makes it more inconvenient, for a single class.

## How was this patch tested?
Tests were moved.

Author: Reynold Xin <rxin@databricks.com>

Closes #13207 from rxin/SPARK-15424.
2016-05-20 22:01:55 -07:00
cody koeninger 89e67d6667 [SPARK-15085][STREAMING][KAFKA] Rename streaming-kafka artifact
## What changes were proposed in this pull request?
Renaming the streaming-kafka artifact to include kafka version, in anticipation of needing a different artifact for later kafka versions

## How was this patch tested?
Unit tests

Author: cody koeninger <cody@koeninger.org>

Closes #12946 from koeninger/SPARK-15085.
2016-05-11 12:15:41 -07:00
Yin Huai 7dde1da949 [SPARK-14807] Create a compatibility module
## What changes were proposed in this pull request?

This PR creates a compatibility module in sql (called `hive-1-x-compatibility`), which will host HiveContext in Spark 2.0 (moving HiveContext to here will be done separately). This module is not included in assembly because only users who still want to access HiveContext need it.

## How was this patch tested?
I manually tested `sbt/sbt -Phive package` and `mvn -Phive package -DskipTests`.

Author: Yin Huai <yhuai@databricks.com>

Closes #12580 from yhuai/compatibility.
2016-04-22 17:50:24 -07:00
Marcelo Vanzin 24d7d2e453 [SPARK-13579][BUILD] Stop building the main Spark assembly.
This change modifies the "assembly/" module to just copy needed
dependencies to its build directory, and modifies the packaging
script to pick those up (and remove duplicate jars packages in the
examples module).

I also made some minor adjustments to dependencies to remove some
test jars from the final packaging, and remove jars that conflict with each
other when packaged separately (e.g. servlet api).

Also note that this change restores guava in applications' classpaths, even
though it's still shaded inside Spark. This is now needed for the Hadoop
libraries that are packaged with Spark, which now are not processed by
the shade plugin.

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #11796 from vanzin/SPARK-13579.
2016-04-04 16:52:22 -07:00
Shixiong Zhu 24587ce433 [SPARK-14073][STREAMING][TEST-MAVEN] Move flume back to Spark
## What changes were proposed in this pull request?

This PR moves flume back to Spark as per the discussion in the dev mail-list.

## How was this patch tested?

Existing Jenkins tests.

Author: Shixiong Zhu <shixiong@databricks.com>

Closes #11895 from zsxwing/move-flume-back.
2016-03-25 17:37:16 -07:00
Shixiong Zhu 06dec37455 [SPARK-13843][STREAMING] Remove streaming-flume, streaming-mqtt, streaming-zeromq, streaming-akka, streaming-twitter to Spark packages
## What changes were proposed in this pull request?

Currently there are a few sub-projects, each for integrating with different external sources for Streaming.  Now that we have better ability to include external libraries (spark packages) and with Spark 2.0 coming up, we can move the following projects out of Spark to https://github.com/spark-packages

- streaming-flume
- streaming-akka
- streaming-mqtt
- streaming-zeromq
- streaming-twitter

They are just some ancillary packages and considering the overhead of maintenance, running tests and PR failures, it's better to maintain them out of Spark. In addition, these projects can have their different release cycles and we can release them faster.

I have already copied these projects to https://github.com/spark-packages

## How was this patch tested?

Jenkins tests

Author: Shixiong Zhu <shixiong@databricks.com>

Closes #11672 from zsxwing/remove-external-pkg.
2016-03-14 16:56:04 -07:00
Cheng Lian 6d37e1eb90 [SPARK-13817][BUILD][SQL] Re-enable MiMA and removes object DataFrame
## What changes were proposed in this pull request?

PR #11443 temporarily disabled MiMA check, this PR re-enables it.

One extra change is that `object DataFrame` is also removed. The only purpose of introducing `object DataFrame` was to use it as an internal factory for creating `Dataset[Row]`. By replacing this internal factory with `Dataset.newDataFrame`, both `DataFrame` and `DataFrame$` are entirely removed from the API, so that we can simply put a `MissingClassProblem` filter in `MimaExcludes.scala` for most DataFrame API  changes.

## How was this patch tested?

Tested by MiMA check triggered by Jenkins.

Author: Cheng Lian <lian@databricks.com>

Closes #11656 from liancheng/re-enable-mima.
2016-03-11 22:17:50 +08:00
Josh Rosen 6ca990fb36 [SPARK-13294][PROJECT INFRA] Remove MiMa's dependency on spark-class / Spark assembly
This patch removes the need to build a full Spark assembly before running the `dev/mima` script.

- I modified the `tools` project to remove a direct dependency on Spark, so `sbt/sbt tools/fullClasspath` will now return the classpath for the `GenerateMIMAIgnore` class itself plus its own dependencies.
   - This required me to delete two classes full of dead code that we don't use anymore
- `GenerateMIMAIgnore` now uses [ClassUtil](http://software.clapper.org/classutil/) to find all of the Spark classes rather than our homemade JAR traversal code. The problem in our own code was that it didn't handle folders of classes properly, which is necessary in order to generate excludes with an assembly-free Spark build.
- `./dev/mima` no longer runs through `spark-class`, eliminating the need to reason about classpath ordering between `SPARK_CLASSPATH` and the assembly.

Author: Josh Rosen <joshrosen@databricks.com>

Closes #11178 from JoshRosen/remove-assembly-in-run-tests.
2016-03-10 23:28:34 -08:00
Cheng Lian 1d542785b9 [SPARK-13244][SQL] Migrates DataFrame to Dataset
## What changes were proposed in this pull request?

This PR unifies DataFrame and Dataset by migrating existing DataFrame operations to Dataset and make `DataFrame` a type alias of `Dataset[Row]`.

Most Scala code changes are source compatible, but Java API is broken as Java knows nothing about Scala type alias (mostly replacing `DataFrame` with `Dataset<Row>`).

There are several noticeable API changes related to those returning arrays:

1.  `collect`/`take`

    -   Old APIs in class `DataFrame`:

        ```scala
        def collect(): Array[Row]
        def take(n: Int): Array[Row]
        ```

    -   New APIs in class `Dataset[T]`:

        ```scala
        def collect(): Array[T]
        def take(n: Int): Array[T]

        def collectRows(): Array[Row]
        def takeRows(n: Int): Array[Row]
        ```

    Two specialized methods `collectRows` and `takeRows` are added because Java doesn't support returning generic arrays. Thus, for example, `DataFrame.collect(): Array[T]` actually returns `Object` instead of `Array<T>` from Java side.

    Normally, Java users may fall back to `collectAsList` and `takeAsList`.  The two new specialized versions are added to avoid performance regression in ML related code (but maybe I'm wrong and they are not necessary here).

1.  `randomSplit`

    -   Old APIs in class `DataFrame`:

        ```scala
        def randomSplit(weights: Array[Double], seed: Long): Array[DataFrame]
        def randomSplit(weights: Array[Double]): Array[DataFrame]
        ```

    -   New APIs in class `Dataset[T]`:

        ```scala
        def randomSplit(weights: Array[Double], seed: Long): Array[Dataset[T]]
        def randomSplit(weights: Array[Double]): Array[Dataset[T]]
        ```

    Similar problem as above, but hasn't been addressed for Java API yet.  We can probably add `randomSplitAsList` to fix this one.

1.  `groupBy`

    Some original `DataFrame.groupBy` methods have conflicting signature with original `Dataset.groupBy` methods.  To distinguish these two, typed `Dataset.groupBy` methods are renamed to `groupByKey`.

Other noticeable changes:

1.  Dataset always do eager analysis now

    We used to support disabling DataFrame eager analysis to help reporting partially analyzed malformed logical plan on analysis failure.  However, Dataset encoders requires eager analysi during Dataset construction.  To preserve the error reporting feature, `AnalysisException` now takes an extra `Option[LogicalPlan]` argument to hold the partially analyzed plan, so that we can check the plan tree when reporting test failures.  This plan is passed by `QueryExecution.assertAnalyzed`.

## How was this patch tested?

Existing tests do the work.

## TODO

- [ ] Fix all tests
- [ ] Re-enable MiMA check
- [ ] Update ScalaDoc (`since`, `group`, and example code)

Author: Cheng Lian <lian@databricks.com>
Author: Yin Huai <yhuai@databricks.com>
Author: Wenchen Fan <wenchen@databricks.com>
Author: Cheng Lian <liancheng@users.noreply.github.com>

Closes #11443 from liancheng/ds-to-df.
2016-03-10 17:00:17 -08:00
Dongjoon Hyun 941b270b70 [MINOR] Fix typos in comments and testcase name of code
## What changes were proposed in this pull request?

This PR fixes typos in comments and testcase name of code.

## How was this patch tested?

manual.

Author: Dongjoon Hyun <dongjoon@apache.org>

Closes #11481 from dongjoon-hyun/minor_fix_typos_in_code.
2016-03-03 22:42:12 +00:00
Wojciech Jurczyk 75e618def1 Fix run-tests.py typos
## What changes were proposed in this pull request?

The PR fixes typos in an error message in dev/run-tests.py.

Author: Wojciech Jurczyk <wojciech.jurczyk@codilime.com>

Closes #11467 from wjur/wjur/typos_run_tests.
2016-03-02 15:32:32 +00:00
Josh Rosen 41f0c85f9b [SPARK-13023][PROJECT INFRA] Fix handling of root module in modules_to_test()
There's a minor bug in how we handle the `root` module in the `modules_to_test()` function in `dev/run-tests.py`: since `root` now depends on `build` (since every test needs to run on any build test), we now need to check for the presence of root in `modules_to_test` instead of `changed_modules`.

Author: Josh Rosen <joshrosen@databricks.com>

Closes #10933 from JoshRosen/build-module-fix.
2016-01-27 08:32:13 -08:00
Josh Rosen ee74498de3 [SPARK-8725][PROJECT-INFRA] Test modules in topologically-sorted order in dev/run-tests
This patch improves our `dev/run-tests` script to test modules in a topologically-sorted order based on modules' dependencies.  This will help to ensure that bugs in upstream projects are not misattributed to downstream projects because those projects' tests were the first ones to exhibit the failure

Topological sorting is also useful for shortening the feedback loop when testing pull requests: if I make a change in SQL then the SQL tests should run before MLlib, not after.

In addition, this patch also updates our test module definitions to split `sql` into `catalyst`, `sql`, and `hive` in order to allow more tests to be skipped when changing only `hive/` files.

Author: Josh Rosen <joshrosen@databricks.com>

Closes #10885 from JoshRosen/SPARK-8725.
2016-01-26 14:20:11 -08:00
Shixiong Zhu 4bcea1b859 Revert "[SPARK-12829] Turn Java style checker on"
This reverts commit 591c88c9e2. `lint-java` doesn't work on a machine with a clean Maven cache.
2016-01-18 16:26:52 -08:00
Josh Rosen 8dbbf3e75e [SPARK-12842][TEST-HADOOP2.7] Add Hadoop 2.7 build profile
This patch adds a Hadoop 2.7 build profile in order to let us automate tests against that version.

/cc rxin srowen

Author: Josh Rosen <joshrosen@databricks.com>

Closes #10775 from JoshRosen/add-hadoop-2.7-profile.
2016-01-15 17:07:24 -08:00
Reynold Xin 591c88c9e2 [SPARK-12829] Turn Java style checker on
It was previously turned off because there was a problem with a pull request. We should turn it on now.

Author: Reynold Xin <rxin@databricks.com>

Closes #10763 from rxin/SPARK-12829.
2016-01-14 21:02:18 -08:00
Kousuke Saruta bcc7373f67 [SPARK-12821][BUILD] Style checker should run when some configuration files for style are modified but any source files are not.
When running the `run-tests` script, style checkers run only when any source files are modified but they should run when configuration files related to style are modified.

Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>

Closes #10754 from sarutak/SPARK-12821.
2016-01-14 10:43:39 -08:00
felixcheung cc4d5229c9 [SPARK-12625][SPARKR][SQL] replace R usage of Spark SQL deprecated API
rxin davies shivaram
Took save mode from my PR #10480, and move everything to writer methods. This is related to PR #10559

- [x] it seems jsonRDD() is broken, need to investigate - this is not a public API though; will look into some more tonight. (fixed)

Author: felixcheung <felixcheung_m@hotmail.com>

Closes #10584 from felixcheung/rremovedeprecated.
2016-01-04 22:32:07 -08:00
Reynold Xin 77ab49b857 [SPARK-12600][SQL] Remove deprecated methods in Spark SQL
Author: Reynold Xin <rxin@databricks.com>

Closes #10559 from rxin/remove-deprecated-sql.
2016-01-04 18:02:38 -08:00
Josh Rosen 9fd7a2f024 [SPARK-10359][PROJECT-INFRA] Use more random number in dev/test-dependencies.sh; fix version switching
This patch aims to fix another potential source of flakiness in the `dev/test-dependencies.sh` script.

pwendell's original patch and my version used `$(date +%s | tail -c6)` to generate a suffix to use when installing temporary Spark versions into the local Maven cache, but this value only changes once per second and thus is highly collision-prone when concurrent builds launch on AMPLab Jenkins. In order to reduce the potential for conflicts, this patch updates the script to call Python's random number generator instead.

I also fixed a bug in how we captured the original project version; the bug was causing the exit handler code to fail.

Author: Josh Rosen <joshrosen@databricks.com>

Closes #10558 from JoshRosen/build-dep-tests-round-3.
2016-01-04 01:04:29 -08:00
Josh Rosen 0d165ec205 [SPARK-12612][PROJECT-INFRA] Add missing Hadoop profiles to dev/run-tests-*.py scripts and dev/deps
There are a couple of places in the `dev/run-tests-*.py` scripts which deal with Hadoop profiles, but the set of profiles that they handle does not include all Hadoop profiles defined in our POM. Similarly, the `hadoop-2.2` and `hadoop-2.6` profiles were missing from `dev/deps`.

This patch updates these scripts to include all four Hadoop profiles defined in our POM.

Author: Josh Rosen <joshrosen@databricks.com>

Closes #10565 from JoshRosen/add-missing-hadoop-profiles-in-test-scripts.
2016-01-03 22:05:02 -08:00
Reynold Xin 6c20b3c087 Disable test-dependencies.sh. 2016-01-01 13:31:25 -08:00
Josh Rosen 27a42c7108 [SPARK-10359] Enumerate dependencies in a file and diff against it for new pull requests
This patch adds a new build check which enumerates Spark's resolved runtime classpath and saves it to a file, then diffs against that file to detect whether pull requests have introduced dependency changes. The aim of this check is to make it simpler to reason about whether pull request which modify the build have introduced new dependencies or changed transitive dependencies in a way that affects the final classpath.

This supplants the checks added in SPARK-4123 / #5093, which are currently disabled due to bugs.

This patch is based on pwendell's work in #8531.

Closes #8531.

Author: Josh Rosen <joshrosen@databricks.com>
Author: Patrick Wendell <patrick@databricks.com>

Closes #10461 from JoshRosen/SPARK-10359.
2015-12-30 12:47:42 -08:00
Kazuaki Ishizaki 9e85bb71ad [SPARK-12502][BUILD][PYTHON] Script /dev/run-tests fails when IBM Java is used
fix an exception with IBM JDK by removing update field from a JavaVersion tuple. This is because IBM JDK does not have information on update '_xx'

Author: Kazuaki Ishizaki <ishizaki@jp.ibm.com>

Closes #10463 from kiszk/SPARK-12502.
2015-12-24 21:27:55 +09:00
Reynold Xin 0a38637d05 [SPARK-11807] Remove support for Hadoop < 2.2
i.e. Hadoop 1 and Hadoop 2.0

Author: Reynold Xin <rxin@databricks.com>

Closes #10404 from rxin/SPARK-11807.
2015-12-21 22:15:52 -08:00
Reynold Xin 0c4d6ad873 HOTFIX for the previous hot fix. 2015-12-19 16:55:25 -08:00
Reynold Xin 6ad31e79bf HOTFIX: Disable Java style test. 2015-12-19 15:30:31 -08:00
Dmitry Erastov d0d8222778 [SPARK-6990][BUILD] Add Java linting script; fix minor warnings
This replaces https://github.com/apache/spark/pull/9696

Invoke Checkstyle and print any errors to the console, failing the step.
Use Google's style rules modified according to
https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide
Some important checks are disabled (see TODOs in `checkstyle.xml`) due to
multiple violations being present in the codebase.

Suggest fixing those TODOs in a separate PR(s).

More on Checkstyle can be found on the [official website](http://checkstyle.sourceforge.net/).

Sample output (from [build 46345](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46345/consoleFull)) (duplicated because I run the build twice with different profiles):

> Checkstyle checks failed at following occurrences:
[ERROR] src/main/java/org/apache/spark/sql/execution/datasources/parquet/UnsafeRowParquetRecordReader.java:[217,7] (coding) MissingSwitchDefault: switch without "default" clause.
> [ERROR] src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecordReaderBase.java:[198,10] (modifier) ModifierOrder: 'protected' modifier out of order with the JLS suggestions.
> [ERROR] src/main/java/org/apache/spark/sql/execution/datasources/parquet/UnsafeRowParquetRecordReader.java:[217,7] (coding) MissingSwitchDefault: switch without "default" clause.
> [ERROR] src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecordReaderBase.java:[198,10] (modifier) ModifierOrder: 'protected' modifier out of order with the JLS suggestions.
> [error] running /home/jenkins/workspace/SparkPullRequestBuilder2/dev/lint-java ; received return code 1

Also fix some of the minor violations that didn't require sweeping changes.

Apologies for the previous botched PRs - I finally figured out the issue.

cr: JoshRosen, pwendell

> I state that the contribution is my original work, and I license the work to the project under the project's open source license.

Author: Dmitry Erastov <derastov@gmail.com>

Closes #9867 from dskrvk/master.
2015-12-04 12:03:45 -08:00
Jeff Zhang 729f983e66 [SPARK-11342][TESTS] Allow to set hadoop profile when running dev/ru…
…n_tests

Author: Jeff Zhang <zjffdu@apache.org>

Closes #9295 from zjffdu/SPARK-11342.
2015-10-30 18:50:12 +00:00
Brennon York d3180c25d8 [SPARK-7018][BUILD] Refactor dev/run-tests-jenkins into Python
This commit refactors the `run-tests-jenkins` script into Python. This refactoring was done by brennonyork in #7401; this PR contains a few minor edits from joshrosen in order to bring it up to date with other recent changes.

From the original PR description (by brennonyork):

Currently a few things are left out that, could and I think should, be smaller JIRA's after this.

1. There are still a few areas where we use environment variables where we don't need to (like `CURRENT_BLOCK`). I might get around to fixing this one in lieu of everything else, but wanted to point that out.
2. The PR tests are still written in bash. I opted to not change those and just rewrite the runner into Python. This is a great follow-on JIRA IMO.
3. All of the linting scripts are still in bash as well and would likely do to just add those in as follow-on JIRA's as well.

Closes #7401.

Author: Brennon York <brennon.york@capitalone.com>

Closes #9161 from JoshRosen/run-tests-jenkins-refactoring.
2015-10-18 22:45:27 -07:00
Jakob Odersky 08698ee1d6 [SPARK-11094] Strip extra strings from Java version in test runner
Removes any extra strings from the Java version, fixing subsequent integer parsing.
This is required since some OpenJDK versions (specifically in Debian testing), append an extra "-internal" string to the version field.

Author: Jakob Odersky <jodersky@gmail.com>

Closes #9111 from jodersky/fixtestrunner.
2015-10-16 14:26:34 +01:00
Marcelo Vanzin 94fc57afdf [SPARK-10300] [BUILD] [TESTS] Add support for test tags in run-tests.py.
Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #8775 from vanzin/SPARK-10300.
2015-10-07 14:11:21 -07:00
Marcelo Vanzin b42059d2ef Revert "[SPARK-10300] [BUILD] [TESTS] Add support for test tags in run-tests.py."
This reverts commit 8abef21dac.
2015-09-15 13:03:38 -07:00
Marcelo Vanzin 8abef21dac [SPARK-10300] [BUILD] [TESTS] Add support for test tags in run-tests.py.
This change does two things:

- tag a few tests and adds the mechanism in the build to be able to disable those tags,
  both in maven and sbt, for both junit and scalatest suites.
- add some logic to run-tests.py to disable some tags depending on what files have
  changed; that's used to disable expensive tests when a module hasn't explicitly
  been changed, to speed up testing for changes that don't directly affect those
  modules.

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #8437 from vanzin/test-tags.
2015-09-15 10:45:02 -07:00
Patrick Wendell 35e896a79b SPARK-9545, SPARK-9547: Use Maven in PRB if title contains "[test-maven]"
This is just some small glue code to actually make use of the
AMPLAB_JENKINS_BUILD_TOOL switch. As far as I can tell, we actually
don't currently use the Maven support in the tool even though it exists.
This patch switches to Maven when the PR title contains "test-maven".

There are a few small other pieces of cleanup in the patch as well.

Author: Patrick Wendell <patrick@databricks.com>

Closes #7878 from pwendell/maven-tests.
2015-08-30 21:39:16 -07:00
Yu ISHIKAWA 1f90c5e219 [SPARK-8505] [SPARKR] Add settings to kick lint-r from ./dev/run-test.py
JoshRosen we'd like to check the SparkR source code with the `dev/lint-r` script on the Jenkins. I tried to incorporate the script into `dev/run-test.py`. Could you review it when you have time?

shivaram I modified `dev/lint-r` and `dev/lint-r.R` to install lintr package into a local directory(`R/lib/`) and to exit with a lint status. Could you review it?

- [[SPARK-8505] Add settings to kick `lint-r` from `./dev/run-test.py` - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-8505)

Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>

Closes #7883 from yu-iskw/SPARK-8505.
2015-08-27 19:38:53 -07:00
Prabeesh K 853809e948 [SPARK-5155] [PYSPARK] [STREAMING] Mqtt streaming support in Python
This PR is based on #4229, thanks prabeesh.

Closes #4229

Author: Prabeesh K <prabsmails@gmail.com>
Author: zsxwing <zsxwing@gmail.com>
Author: prabs <prabsmails@gmail.com>
Author: Prabeesh K <prabeesh.k@namshi.com>

Closes #7833 from zsxwing/pr4229 and squashes the following commits:

9570bec [zsxwing] Fix the variable name and check null in finally
4a9c79e [zsxwing] Fix pom.xml indentation
abf5f18 [zsxwing] Merge branch 'master' into pr4229
935615c [zsxwing] Fix the flaky MQTT tests
47278c5 [zsxwing] Include the project class files
478f844 [zsxwing] Add unpack
5f8a1d4 [zsxwing] Make the maven build generate the test jar for Python MQTT tests
734db99 [zsxwing] Merge branch 'master' into pr4229
126608a [Prabeesh K] address the comments
b90b709 [Prabeesh K] Merge pull request #1 from zsxwing/pr4229
d07f454 [zsxwing] Register StreamingListerner before starting StreamingContext; Revert unncessary changes; fix the python unit test
a6747cb [Prabeesh K] wait for starting the receiver before publishing data
87fc677 [Prabeesh K] address the comments:
97244ec [zsxwing] Make sbt build the assembly test jar for streaming mqtt
80474d1 [Prabeesh K] fix
1f0cfe9 [Prabeesh K] python style fix
e1ee016 [Prabeesh K] scala style fix
a5a8f9f [Prabeesh K] added Python test
9767d82 [Prabeesh K] implemented Python-friendly class
a11968b [Prabeesh K] fixed python style
795ec27 [Prabeesh K] address comments
ee387ae [Prabeesh K] Fix assembly jar location of mqtt-assembly
3f4df12 [Prabeesh K] updated version
b34c3c1 [prabs] adress comments
3aa7fff [prabs] Added Python streaming mqtt word count example
b7d42ff [prabs] Mqtt streaming support in Python
2015-08-10 16:33:23 -07:00
Steve Loughran a2409d1c8e [SPARK-8064] [SQL] Build against Hive 1.2.1
Cherry picked the parts of the initial SPARK-8064 WiP branch needed to get sql/hive to compile against hive 1.2.1. That's the ASF release packaged under org.apache.hive, not any fork.

Tests not run yet: that's what the machines are for

Author: Steve Loughran <stevel@hortonworks.com>
Author: Cheng Lian <lian@databricks.com>
Author: Michael Armbrust <michael@databricks.com>
Author: Patrick Wendell <patrick@databricks.com>

Closes #7191 from steveloughran/stevel/feature/SPARK-8064-hive-1.2-002 and squashes the following commits:

7556d85 [Cheng Lian] Updates .q files and corresponding golden files
ef4af62 [Steve Loughran] Merge commit '6a92bb09f46a04d6cd8c41bdba3ecb727ebb9030' into stevel/feature/SPARK-8064-hive-1.2-002
6a92bb0 [Cheng Lian] Overrides HiveConf time vars
dcbb391 [Cheng Lian] Adds com.twitter:parquet-hadoop-bundle:1.6.0 for Hive Parquet SerDe
0bbe475 [Steve Loughran] SPARK-8064 scalastyle rejects the standard Hadoop ASF license header...
fdf759b [Steve Loughran] SPARK-8064 classpath dependency suite to be in sync with shading in final (?) hive-exec spark
7a6c727 [Steve Loughran] SPARK-8064 switch to second staging repo of the spark-hive artifacts. This one has the protobuf-shaded hive-exec jar
376c003 [Steve Loughran] SPARK-8064 purge duplicate protobuf declaration
2c74697 [Steve Loughran] SPARK-8064 switch to the protobuf shaded hive-exec jar with tests to chase it down
cc44020 [Steve Loughran] SPARK-8064 remove hadoop.version from runtest.py, as profile will fix that automatically.
6901fa9 [Steve Loughran] SPARK-8064 explicit protobuf import
da310dc [Michael Armbrust] Fixes for Hive tests.
a775a75 [Steve Loughran] SPARK-8064 cherry-pick-incomplete
7404f34 [Patrick Wendell] Add spark-hive staging repo
832c164 [Steve Loughran] SPARK-8064 try to supress compiler warnings on Complex.java pasted-thrift-code
312c0d4 [Steve Loughran] SPARK-8064  maven/ivy dependency purge; calcite declaration needed
fa5ae7b [Steve Loughran] HIVE-8064 fix up hive-thriftserver dependencies and cut back on evicted references in the hive- packages; this keeps mvn and ivy resolution compatible, as the reconciliation policy is "by hand"
c188048 [Steve Loughran] SPARK-8064 manage the Hive depencencies to that -things that aren't needed are excluded -sql/hive built with ivy is in sync with the maven reconciliation policy, rather than latest-first
4c8be8d [Cheng Lian] WIP: Partial fix for Thrift server and CLI tests
314eb3c [Steve Loughran] SPARK-8064 deprecation warning  noise in one of the tests
17b0341 [Steve Loughran] SPARK-8064 IDE-hinted cleanups of Complex.java to reduce compiler warnings. It's all autogenerated code, so still ugly.
d029b92 [Steve Loughran] SPARK-8064 rely on unescaping to have already taken place, so go straight to map of serde options
23eca7e [Steve Loughran] HIVE-8064 handle raw and escaped property tokens
54d9b06 [Steve Loughran] SPARK-8064 fix compilation regression surfacing from rebase
0b12d5f [Steve Loughran] HIVE-8064 use subset of hive complex type whose types deserialize
fce73b6 [Steve Loughran] SPARK-8064 poms rely implicitly on the version of kryo chill provides
fd3aa5d [Steve Loughran] SPARK-8064 version of hive to d/l from ivy is 1.2.1
dc73ece [Steve Loughran] SPARK-8064 revert to master's determinstic pushdown strategy
d3c1e4a [Steve Loughran] SPARK-8064 purge UnionType
051cc21 [Steve Loughran] SPARK-8064 switch to an unshaded version of hive-exec-core, which must have been built with Kryo 2.21. This currently looks for a (locally built) version 1.2.1.spark
6684c60 [Steve Loughran] SPARK-8064 ignore RTE raised in blocking process.exitValue() call
e6121e5 [Steve Loughran] SPARK-8064 address review comments
aa43dc6 [Steve Loughran] SPARK-8064  more robust teardown on JavaMetastoreDatasourcesSuite
f2bff01 [Steve Loughran] SPARK-8064 better takeup of asynchronously caught error text
8b1ef38 [Steve Loughran] SPARK-8064: on failures executing spark-submit in HiveSparkSubmitSuite, print command line and all logged output.
5a9ce6b [Steve Loughran] SPARK-8064 add explicit reason for kv split failure, rather than array OOB. *does not address the issue*
642b63a [Steve Loughran] SPARK-8064 reinstate something cut briefly during rebasing
97194dc [Steve Loughran] SPARK-8064 add extra logging to the YarnClusterSuite classpath test. There should be no reason why this is failing on jenkins, but as it is (and presumably its CP-related), improve the logging including any exception raised.
335357f [Steve Loughran] SPARK-8064 fail fast on thrive process spawning tests on exit codes and/or error string patterns seen in log.
3ed872f [Steve Loughran] SPARK-8064 rename field double to  dbl
bca55e5 [Steve Loughran] SPARK-8064 missed one of the `date` escapes
41d6479 [Steve Loughran] SPARK-8064 wrap tests with withTable() calls to avoid table-exists exceptions
2bc29a4 [Steve Loughran] SPARK-8064 ParquetSuites to escape `date` field name
1ab9bc4 [Steve Loughran] SPARK-8064 TestHive to use sered2.thrift.test.Complex
bf3a249 [Steve Loughran] SPARK-8064: more resubmit than fix; tighten startup timeout to 60s. Still no obvious reason why jersey server code in spark-assembly isn't being picked up -it hasn't been shaded
c829b8f [Steve Loughran] SPARK-8064: reinstate yarn-rm-server dependencies to hive-exec to ensure that jersey server is on classpath on hadoop versions < 2.6
0b0f738 [Steve Loughran] SPARK-8064: thrift server startup to fail fast on any exception in the main thread
13abaf1 [Steve Loughran] SPARK-8064 Hive compatibilty tests sin sync with explain/show output from Hive 1.2.1
d14d5ea [Steve Loughran] SPARK-8064: DATE is now a predicate; you can't use it as a field in select ops
26eef1c [Steve Loughran] SPARK-8064: HIVE-9039 renamed TOK_UNION => TOK_UNIONALL while adding TOK_UNIONDISTINCT
3d64523 [Steve Loughran] SPARK-8064 improve diagns on uknown token; fix scalastyle failure
d0360f6 [Steve Loughran] SPARK-8064: delicate merge in of the branch vanzin/hive-1.1
1126e5a [Steve Loughran] SPARK-8064: name of unrecognized file format wasn't appearing in error text
8cb09c4 [Steve Loughran] SPARK-8064: test resilience/assertion improvements. Independent of the rest of the work; can be backported to earlier versions
dec12cb [Steve Loughran] SPARK-8064: when a CLI suite test fails include the full output text in the raised exception; this ensures that the stdout/stderr is included in jenkins reports, so it becomes possible to diagnose the cause.
463a670 [Steve Loughran] SPARK-8064 run-tests.py adds a hadoop-2.6 profile, and changes info messages to say "w/Hive 1.2.1" in console output
2531099 [Steve Loughran] SPARK-8064 successful attempt to get rid of pentaho as a transitive dependency of hive-exec
1d59100 [Steve Loughran] SPARK-8064 (unsuccessful) attempt to get rid of pentaho as a transitive dependency of hive-exec
75733fc [Steve Loughran] SPARK-8064 change thrift binary startup message to "Starting ThriftBinaryCLIService on port"
3ebc279 [Steve Loughran] SPARK-8064 move strings used to check for http/bin thrift services up into constants
c80979d [Steve Loughran] SPARK-8064: SparkSQLCLIDriver drops remote mode support. CLISuite Tests pass instead of timing out: undetected regression?
27e8370 [Steve Loughran] SPARK-8064 fix some style & IDE warnings
00e50d6 [Steve Loughran] SPARK-8064 stop excluding hive shims from dependency (commented out , for now)
cb4f142 [Steve Loughran] SPARK-8054 cut pentaho dependency from calcite
f7aa9cb [Steve Loughran] SPARK-8064 everything compiles with some commenting and moving of classes into a hive package
6c310b4 [Steve Loughran] SPARK-8064 subclass  Hive ServerOptionsProcessor to make it public again
f61a675 [Steve Loughran] SPARK-8064 thrift server switched to Hive 1.2.1, though it doesn't compile everywhere
4890b9d [Steve Loughran] SPARK-8064, build against Hive 1.2.1
2015-08-03 15:24:42 -07:00
zsxwing 3afc1de89c [SPARK-8564] [STREAMING] Add the Python API for Kinesis
This PR adds the Python API for Kinesis, including a Python example and a simple unit test.

Author: zsxwing <zsxwing@gmail.com>

Closes #6955 from zsxwing/kinesis-python and squashes the following commits:

e42e471 [zsxwing] Merge branch 'master' into kinesis-python
455f7ea [zsxwing] Remove streaming_kinesis_asl_assembly module and simply add the source folder to streaming_kinesis_asl module
32e6451 [zsxwing] Merge remote-tracking branch 'origin/master' into kinesis-python
5082d28 [zsxwing] Fix the syntax error for Python 2.6
fca416b [zsxwing] Fix wrong comparison
96670ff [zsxwing] Fix the compilation error after merging master
756a128 [zsxwing] Merge branch 'master' into kinesis-python
6c37395 [zsxwing] Print stack trace for debug
7c5cfb0 [zsxwing] RUN_KINESIS_TESTS -> ENABLE_KINESIS_TESTS
cc9d071 [zsxwing] Fix the python test errors
466b425 [zsxwing] Add python tests for Kinesis
e33d505 [zsxwing] Merge remote-tracking branch 'origin/master' into kinesis-python
3da2601 [zsxwing] Fix the kinesis folder
687446b [zsxwing] Fix the error message and the maven output path
add2beb [zsxwing] Merge branch 'master' into kinesis-python
4957c0b [zsxwing] Add the Python API for Kinesis
2015-07-31 12:09:48 -07:00
zsxwing 76f2e393a5 [SPARK-9335] [TESTS] Enable Kinesis tests only when files in extras/kinesis-asl are changed
Author: zsxwing <zsxwing@gmail.com>

Closes #7711 from zsxwing/SPARK-9335-test and squashes the following commits:

c13ec2f [zsxwing] environs -> environ
69c2865 [zsxwing] Merge remote-tracking branch 'origin/master' into SPARK-9335-test
ef84a08 [zsxwing] Revert "Modify the Kinesis project to trigger ENABLE_KINESIS_TESTS"
f691028 [zsxwing] Modify the Kinesis project to trigger ENABLE_KINESIS_TESTS
7618205 [zsxwing] Enable Kinesis tests only when files in extras/kinesis-asl are changed
2015-07-30 00:46:36 -07:00
Yin Huai dafe8d857d [SPARK-9385] [PYSPARK] Enable PEP8 but disable installing pylint.
Instead of disabling all python style check, we should enable PEP8. So, this PR just comments out the part installing pylint.

Author: Yin Huai <yhuai@databricks.com>

Closes #7704 from yhuai/SPARK-9385 and squashes the following commits:

0056359 [Yin Huai] Enable PEP8 but disable installing pylint.
2015-07-27 15:49:42 -07:00
Yin Huai 2104931d7d [SPARK-9385] [HOT-FIX] [PYSPARK] Comment out Python style check
https://issues.apache.org/jira/browse/SPARK-9385

Comment out Python style check because of error shown in https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-SBT/3088/AMPLAB_JENKINS_BUILD_PROFILE=hadoop1.0,label=centos/console

Author: Yin Huai <yhuai@databricks.com>

Closes #7702 from yhuai/SPARK-9385 and squashes the following commits:

146e6ef [Yin Huai] Comment out Python style check because of error shown in https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-SBT/3088/AMPLAB_JENKINS_BUILD_PROFILE=hadoop1.0,label=centos/console
2015-07-27 15:18:48 -07:00
zsxwing 75b9fe4c5f [SPARK-8378] [STREAMING] Add the Python API for Flume
Author: zsxwing <zsxwing@gmail.com>

Closes #6830 from zsxwing/flume-python and squashes the following commits:

78dfdac [zsxwing] Fix the compile error in the test code
f1bf3c0 [zsxwing] Address TD's comments
0449723 [zsxwing] Add sbt goal streaming-flume-assembly/assembly
e93736b [zsxwing] Fix the test case for determine_modules_to_test
9d5821e [zsxwing] Fix pyspark_core dependencies
f9ee681 [zsxwing] Merge branch 'master' into flume-python
7a55837 [zsxwing] Add streaming_flume_assembly to run-tests.py
b96b0de [zsxwing] Merge branch 'master' into flume-python
ce85e83 [zsxwing] Fix incompatible issues for Python 3
01cbb3d [zsxwing] Add import sys
152364c [zsxwing] Fix the issue that StringIO doesn't work in Python 3
14ba0ff [zsxwing] Add flume-assembly for sbt building
b8d5551 [zsxwing] Merge branch 'master' into flume-python
4762c34 [zsxwing] Fix the doc
0336579 [zsxwing] Refactor Flume unit tests and also add tests for Python API
9f33873 [zsxwing] Add the Python API for Flume
2015-07-01 11:59:24 -07:00
Josh Rosen 7bbbe380c5 [SPARK-5161] Parallelize Python test execution
This commit parallelizes the Python unit test execution, significantly reducing Jenkins build times.  Parallelism is now configurable by passing the `-p` or `--parallelism` flags to either `dev/run-tests` or `python/run-tests` (the default parallelism is 4, but I've successfully tested with higher parallelism).

To avoid flakiness, I've disabled the Spark Web UI for the Python tests, similar to what we've done for the JVM tests.

Author: Josh Rosen <joshrosen@databricks.com>

Closes #7031 from JoshRosen/parallelize-python-tests and squashes the following commits:

feb3763 [Josh Rosen] Re-enable other tests
f87ea81 [Josh Rosen] Only log output from failed tests
d4ded73 [Josh Rosen] Logging improvements
a2717e1 [Josh Rosen] Make parallelism configurable via dev/run-tests
1bacf1b [Josh Rosen] Merge remote-tracking branch 'origin/master' into parallelize-python-tests
110cd9d [Josh Rosen] Fix universal_newlines for Python 3
cd13db8 [Josh Rosen] Also log python_implementation
9e31127 [Josh Rosen] Log Python --version output for each executable.
a2b9094 [Josh Rosen] Bump up parallelism.
5552380 [Josh Rosen] Python 3 fix
866b5b9 [Josh Rosen] Fix lazy logging warnings in Prospector checks
87cb988 [Josh Rosen] Skip MLLib tests for PyPy
8309bfe [Josh Rosen] Temporarily disable parallelism to debug a failure
9129027 [Josh Rosen] Disable Spark UI in Python tests
037b686 [Josh Rosen] Temporarily disable JVM tests so we can test Python speedup in Jenkins.
af4cef4 [Josh Rosen] Initial attempt at parallelizing Python test execution
2015-06-29 21:32:40 -07:00
Brennon York 5c796d576e [SPARK-8693] [PROJECT INFRA] profiles and goals are not printed in a nice way
Hotfix to correct formatting errors of print statements within the dev and jenkins builds. Error looks like:

```
-Phadoop-1[info] Building Spark (w/Hive 0.13.1) using SBT with these arguments:  -Dhadoop.version=1.0.4[info] Building Spark (w/Hive 0.13.1) using SBT with these arguments:  -Pkinesis-asl[info] Building Spark (w/Hive 0.13.1) using SBT with these arguments:  -Phive-thriftserver[info] Building Spark (w/Hive 0.13.1) using SBT with these arguments:  -Phive[info] Building Spark (w/Hive 0.13.1) using SBT with these arguments:  package[info] Building Spark (w/Hive 0.13.1) using SBT with these arguments:  assembly/assembly[info] Building Spark (w/Hive 0.13.1) using SBT with these arguments:  streaming-kafka-assembly/assembly
```

Author: Brennon York <brennon.york@capitalone.com>

Closes #7085 from brennonyork/SPARK-8693 and squashes the following commits:

c5575f1 [Brennon York] added commas to end of print statements for proper printing
2015-06-29 08:55:06 -07:00
Cheng Lian 00a9d22bd6 [SPARK-7845] [BUILD] Bumping default Hadoop version used in profile hadoop-1 to 1.2.1
PR #5694 reverted PR #6384 while refactoring `dev/run-tests` to `dev/run-tests.py`. Also, PR #6384 didn't bump Hadoop 1 version defined in POM.

Author: Cheng Lian <lian@databricks.com>

Closes #7062 from liancheng/spark-7845 and squashes the following commits:

c088b72 [Cheng Lian] Bumping default Hadoop version used in profile hadoop-1 to 1.2.1
2015-06-28 19:34:59 -07:00
Josh Rosen 42db3a1c2f [HOTFIX] Fix pull request builder bug in #6967 2015-06-27 23:07:20 -07:00
Josh Rosen 40648c56cd [SPARK-8583] [SPARK-5482] [BUILD] Refactor python/run-tests to integrate with dev/run-tests module system
This patch refactors the `python/run-tests` script:

- It's now written in Python instead of Bash.
- The descriptions of the tests to run are now stored in `dev/run-tests`'s modules.  This allows the pull request builder to skip Python tests suites that were not affected by the pull request's changes.  For example, we can now skip the PySpark Streaming test cases when only SQL files are changed.
- `python/run-tests` now supports command-line flags to make it easier to run individual test suites (this addresses SPARK-5482):

  ```
Usage: run-tests [options]

Options:
  -h, --help            show this help message and exit
  --python-executables=PYTHON_EXECUTABLES
                        A comma-separated list of Python executables to test
                        against (default: python2.6,python3.4,pypy)
  --modules=MODULES     A comma-separated list of Python modules to test
                        (default: pyspark-core,pyspark-ml,pyspark-mllib
                        ,pyspark-sql,pyspark-streaming)
   ```
- `dev/run-tests` has been split into multiple files: the module definitions and test utility functions are now stored inside of a `dev/sparktestsupport` Python module, allowing them to be re-used from the Python test runner script.

Author: Josh Rosen <joshrosen@databricks.com>

Closes #6967 from JoshRosen/run-tests-python-modules and squashes the following commits:

f578d6d [Josh Rosen] Fix print for Python 2.x
8233d61 [Josh Rosen] Add python/run-tests.py to Python lint checks
34c98d2 [Josh Rosen] Fix universal_newlines for Python 3
8f65ed0 [Josh Rosen] Fix handling of  module in python/run-tests
37aff00 [Josh Rosen] Python 3 fix
27a389f [Josh Rosen] Skip MLLib tests for PyPy
c364ccf [Josh Rosen] Use which() to convert PYSPARK_PYTHON to an absolute path before shelling out to run tests
568a3fd [Josh Rosen] Fix hashbang
3b852ae [Josh Rosen] Fall back to PYSPARK_PYTHON when sys.executable is None (fixes a test)
f53db55 [Josh Rosen] Remove python2 flag, since the test runner script also works fine under Python 3
9c80469 [Josh Rosen] Fix passing of PYSPARK_PYTHON
d33e525 [Josh Rosen] Merge remote-tracking branch 'origin/master' into run-tests-python-modules
4f8902c [Josh Rosen] Python lint fixes.
8f3244c [Josh Rosen] Use universal_newlines to fix dev/run-tests doctest failures on Python 3.
f542ac5 [Josh Rosen] Fix lint check for Python 3
fff4d09 [Josh Rosen] Add dev/sparktestsupport to pep8 checks
2efd594 [Josh Rosen] Update dev/run-tests to use new Python test runner flags
b2ab027 [Josh Rosen] Add command-line options for running individual suites in python/run-tests
caeb040 [Josh Rosen] Fixes to PySpark test module definitions
d6a77d3 [Josh Rosen] Fix the tests of dev/run-tests
def2d8a [Josh Rosen] Two minor fixes
aec0b8f [Josh Rosen] Actually get the Kafka stuff to run properly
04015b9 [Josh Rosen] First attempt at getting PySpark Kafka test to work in new runner script
4c97136 [Josh Rosen] PYTHONPATH fixes
dcc9c09 [Josh Rosen] Fix time division
32660fc [Josh Rosen] Initial cut at Python test runner refactoring
311c6a9 [Josh Rosen] Move shell utility functions to own module.
1bdeb87 [Josh Rosen] Move module definitions to separate file.
2015-06-27 20:24:34 -07:00
fe2s dca21a83ac [SPARK-8558] [BUILD] Script /dev/run-tests fails when _JAVA_OPTIONS env var set
Author: fe2s <aka.fe2s@gmail.com>
Author: Oleksiy Dyagilev <oleksiy_dyagilev@epam.com>

Closes #6956 from fe2s/fix-run-tests and squashes the following commits:

31b6edc [fe2s] str is a built-in function, so using it as a variable name will lead to spurious warnings in some Python linters
7d781a0 [fe2s] fixing for openjdk/IBM, seems like they have slightly different wording, but all have 'version' word. Surrounding with spaces for the case if version word appears in _JAVA_OPTIONS
cd455ef [fe2s] address comment, looking for java version string rather than expecting to have on a certain line number
ad577d7 [Oleksiy Dyagilev] [SPARK-8558][BUILD] Script /dev/run-tests fails when _JAVA_OPTIONS env var set
2015-06-24 15:12:23 -07:00
Andrew Or 1dfb0f7b2a [HOTFIX] [TESTS] Typo mqqt -> mqtt
This was introduced in #6866.
2015-06-22 16:16:26 -07:00
Josh Rosen 7a3c424ecf [SPARK-8422] [BUILD] [PROJECT INFRA] Add a module abstraction to dev/run-tests
This patch builds upon #5694 to add a 'module' abstraction to the `dev/run-tests` script which groups together the per-module test logic, including the mapping from file paths to modules, the mapping from modules to test goals and build profiles, and the dependencies / relationships between modules.

This refactoring makes it much easier to increase the granularity of test modules, which will let us skip even more tests.  It's also a prerequisite for other changes that will reduce test time, such as running subsets of the Python tests based on which files / modules have changed.

This patch also adds doctests for the new graph traversal / change mapping code.

Author: Josh Rosen <joshrosen@databricks.com>

Closes #6866 from JoshRosen/more-dev-run-tests-refactoring and squashes the following commits:

75de450 [Josh Rosen] Use module system to determine which build profiles to enable.
4224da5 [Josh Rosen] Add documentation to Module.
a86a953 [Josh Rosen] Clean up modules; add new modules for streaming external projects
e46539f [Josh Rosen] Fix camel-cased endswith()
35a3052 [Josh Rosen] Enable Hive tests when running all tests
df10e23 [Josh Rosen] update to reflect fact that no module depends on root
3670d50 [Josh Rosen] mllib should depend on streaming
dc6f1c6 [Josh Rosen] Use changed files' extensions to decide whether to run style checks
7092d3e [Josh Rosen] Skip SBT tests if no test goals are specified
43a0ced [Josh Rosen] Minor fixes
3371441 [Josh Rosen] Test everything if nothing has changed (needed for non-PRB builds)
37f3fb3 [Josh Rosen] Remove doc profiles option, since it's not actually needed (see #6865)
f53864b [Josh Rosen] Finish integrating module changes
f0249bd [Josh Rosen] WIP
2015-06-20 16:05:54 -07:00
Josh Rosen 165f52f2f9 [HOTFIX] [PROJECT-INFRA] Fix bug in dev/run-tests for MLlib-only PRs 2015-06-17 19:03:41 -07:00
Brennon York 50a0496a43 [SPARK-7017] [BUILD] [PROJECT INFRA] Refactor dev/run-tests into Python
All, this is a first attempt at refactoring `dev/run-tests` into Python. Initially I merely converted all Bash calls over to Python, then moved to a much more modular approach (more functions, moved the calls around, etc.). What is here is the initial culmination and should provide a great base to various downstream issues (e.g. SPARK-7016, modularize / parallelize testing, etc.). Would love comments / suggestions for this initial first step!

/cc srowen pwendell nchammas

Author: Brennon York <brennon.york@capitalone.com>

Closes #5694 from brennonyork/SPARK-7017 and squashes the following commits:

154ed73 [Brennon York] updated finding java binary if JAVA_HOME not set
3922a85 [Brennon York] removed necessary passed in variable
f9fbe54 [Brennon York] reverted doc test change
8135518 [Brennon York] removed the test check for documentation changes until jenkins can get updated
05d435b [Brennon York] added check for jekyll install
22edb78 [Brennon York] add check if jekyll isn't installed on the path
2dff136 [Brennon York] fixed pep8 whitespace errors
767a668 [Brennon York] fixed path joining issues, ensured docs actually build on doc changes
c42cf9a [Brennon York] unpack set operations with splat (*)
fb85a41 [Brennon York] fixed minor set bug
0379833 [Brennon York] minor doc addition to print the changed modules
aa03d9e [Brennon York] added documentation builds as a top level test component, altered high level project changes to properly execute core tests only when necessary, changed variable names for simplicity
ec1ae78 [Brennon York] minor name changes, bug fixes
b7c72b9 [Brennon York] reverting streaming context
03fdd7b [Brennon York] fixed the tuple () wraps around example lambda
705d12e [Brennon York] changed example to comply with pep3113 supporting python3
60b3d51 [Brennon York] prepend rather than append onto PATH
7d2f5e2 [Brennon York] updated python tests to remove unused variable
2898717 [Brennon York] added a change to streaming test to check if it only runs streaming tests
eb684b6 [Brennon York] fixed sbt_test_goals reference error
db7ae6f [Brennon York] reverted SPARK_HOME from start of command
1ecca26 [Brennon York] fixed merge conflicts
2fcdfc0 [Brennon York] testing targte branch dump on jenkins
1f607b1 [Brennon York] finalizing revisions to modular tests
8afbe93 [Brennon York] made error codes a global
0629de8 [Brennon York] updated to refactor and remove various small bugs, removed pep8 complaints
d90ab2d [Brennon York] fixed merge conflicts, ensured that for regular builds both core and sql tests always run
b1248dc [Brennon York] exec python rather than running python and exiting with return code
f9deba1 [Brennon York] python to python2 and removed newline
6d0a052 [Brennon York] incorporated merge conflicts with SPARK-7249
f950010 [Brennon York] removed building hive-0.12.0 per SPARK-6908
703f095 [Brennon York] fixed merge conflicts
b1ca593 [Brennon York] reverted the sparkR test
afeb093 [Brennon York] updated to make sparkR test fail
1dada6b [Brennon York] reverted pyspark test failure
9a592ec [Brennon York] reverted mima exclude issue, added pyspark test failure
d825aa4 [Brennon York] revert build break, add mima break
f041d8a [Brennon York] added space from commented import to now test build breaking
983f2a2 [Brennon York] comment out import to fail build test
2386785 [Brennon York] Merge remote-tracking branch 'upstream/master' into SPARK-7017
76335fb [Brennon York] reverted rat license issue for sparkconf
e4a96cc [Brennon York] removed the import error and added license error, fixed the way run-tests and run-tests.py report their error codes
56d3cb9 [Brennon York] changed test back and commented out import to break compile
b37328c [Brennon York] fixed typo and added default return is no error block was found in the environment
7613558 [Brennon York] updated to return the proper env variable for return codes
a5bd445 [Brennon York] reverted license, changed test in shuffle to fail
803143a [Brennon York] removed license file for SparkContext
b0b2604 [Brennon York] comment out import to see if build fails and returns properly
83e80ef [Brennon York] attempt at better python output when called from bash
c095fa6 [Brennon York] removed another wait() call
26e18e8 [Brennon York] removed unnecessary wait()
07210a9 [Brennon York] minor doc string change for java version with namedtuple update
ec03bf3 [Brennon York] added namedtuple for java version to add readability
2cb413b [Brennon York] upcased global variables, changes various calling methods from check_output to check_call
639f1e9 [Brennon York] updated with pep8 rules, fixed minor bugs, added run-tests file in bash to call the run-tests.py script
3c53a1a [Brennon York] uncomment the scala tests :)
6126c4f [Brennon York] refactored run-tests into python
2015-06-17 12:00:34 -07:00