Commit graph

101 commits

Author SHA1 Message Date
Prashant Sharma e16a8e7db5 SPARK-3337 Paranoid quoting in shell to allow install dirs with spaces within.
...

Tested ! TBH, it isn't a great idea to have directory with spaces within. Because emacs doesn't like it then hadoop doesn't like it. and so on...

Author: Prashant Sharma <prashant.s@imaginea.com>

Closes #2229 from ScrapCodes/SPARK-3337/quoting-shell-scripts and squashes the following commits:

d4ad660 [Prashant Sharma] SPARK-3337 Paranoid quoting in shell to allow install dirs with spaces within.
2014-09-08 10:24:15 -07:00
Nicholas Chammas 9422c4ee0e [SPARK-3361] Expand PEP 8 checks to include EC2 script and Python examples
This PR resolves [SPARK-3361](https://issues.apache.org/jira/browse/SPARK-3361) by expanding the PEP 8 checks to cover the remaining Python code base:
* The EC2 script
* All Python / PySpark examples

Author: Nicholas Chammas <nicholas.chammas@gmail.com>

Closes #2297 from nchammas/pep8-rulez and squashes the following commits:

1e5ac9a [Nicholas Chammas] PEP 8 fixes to Python examples
c3dbeff [Nicholas Chammas] PEP 8 fixes to EC2 script
65ef6e8 [Nicholas Chammas] expand PEP 8 checks
2014-09-05 23:08:54 -07:00
Nicholas Chammas 19f61c1659 [Build] suppress curl/wget progress bars
In the Jenkins console output, `curl` gives us mountains of `#` symbols as it tries to show its download progress.

![noise from curl in Jenkins output](http://i.imgur.com/P2E7yUw.png)

I don't think this is useful so I've changed things to suppress these progress bars. If there is actually some use to this, feel free to reject this proposal.

Author: Nicholas Chammas <nicholas.chammas@gmail.com>

Closes #2279 from nchammas/trim-test-output and squashes the following commits:

14a720c [Nicholas Chammas] suppress curl/wget progress bars
2014-09-05 21:46:45 -07:00
Kousuke Saruta dc1ba9e9fc [SPARK-3378] [DOCS] Replace the word "SparkSQL" with right word "Spark SQL"
Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>

Closes #2251 from sarutak/SPARK-3378 and squashes the following commits:

0bfe234 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-3378
bb5938f [Kousuke Saruta] Replaced rest of "SparkSQL" with "Spark SQL"
6df66de [Kousuke Saruta] Replaced "SparkSQL" with "Spark SQL"
2014-09-04 15:06:08 -07:00
Sean Owen 32ec0a8cd4 SPARK-3331 [BUILD] PEP8 tests fail because they check unzipped py4j code
PEP8 tests run on files under "./python", but unzipped py4j code is found at "./python/build/py4j". Py4J code fails style checks and can fail ./dev/run-tests if this code is present locally.

Author: Sean Owen <sowen@cloudera.com>

Closes #2222 from srowen/SPARK-3331 and squashes the following commits:

34711ec [Sean Owen] Restrict lint check to pyspark/, since the local directory can contain unzipped py4j code in build/py4j
2014-09-02 10:30:26 -07:00
Nicholas Chammas c567a68a59 [Spark QA] only check code files for new classes
Look only at code files (`.py`, `.java`, and `.scala`) for new classes.

Should get rid of false alarms like [the one reported here](https://github.com/apache/spark/pull/2014#issuecomment-52912040).

Author: Nicholas Chammas <nicholas.chammas@gmail.com>

Closes #2184 from nchammas/jenkins-ignore-noncode and squashes the following commits:

33786ac [Nicholas Chammas] break up long line
3f91a14 [Nicholas Chammas] rename array of source files
8b82a26 [Nicholas Chammas] [Spark QA] only check code files for new classes
2014-08-30 21:11:48 -07:00
Patrick Wendell a004a8d879 BUILD: Adding back CDH4 as per user requests 2014-08-29 22:24:35 -07:00
nchammas 3c517a812e [Spark QA] Link to console output on test time out
When tests time out we should link to the Jenkins console output for easy review. We already do this for when tests start or complete normally.

Here's [a recent example](https://github.com/apache/spark/pull/2109#issuecomment-53374032) of where this would be helpful.

Author: nchammas <nicholas.chammas@gmail.com>

Closes #2140 from nchammas/patch-1 and squashes the following commits:

3b26c8d [nchammas] [Spark QA] Link to console output on test time out
2014-08-28 18:08:28 -07:00
Matthew Farrellee 64d8ecbbe9 Add line continuation for script to work w/ py2.7.5
Error was -

$ SPARK_HOME=$PWD/dist ./dev/create-release/generate-changelist.py
  File "./dev/create-release/generate-changelist.py", line 128
    if day < SPARK_REPO_CHANGE_DATE1 or
                                      ^
SyntaxError: invalid syntax

Author: Matthew Farrellee <matt@redhat.com>

Closes #2139 from mattf/master-fix-generate-changelist.py-0 and squashes the following commits:

6b3a900 [Matthew Farrellee] Add line continuation for script to work w/ py2.7.5
2014-08-27 15:50:30 -07:00
Patrick Wendell 8712653f11 HOTFIX: Don't build with YARN support for Mapr3 2014-08-27 15:41:09 -07:00
Cheng Lian cf46e72581 [SPARK-3126][SPARK-3127][SQL] Fixed HiveThriftServer2Suite
This PR fixes two issues:

1. Fixes wrongly quoted command line option in `HiveThriftServer2Suite` that makes test cases hang until timeout.
1. Asks `dev/run-test` to run Spark SQL tests when `bin/spark-sql` and/or `sbin/start-thriftserver.sh` are modified.

Author: Cheng Lian <lian.cs.zju@gmail.com>

Closes #2036 from liancheng/fix-thriftserver-test and squashes the following commits:

f38c4eb [Cheng Lian] Fixed the same quotation issue in CliSuite
26b82a0 [Cheng Lian] Run SQL tests when dff contains bin/spark-sql and/or sbin/start-thriftserver.sh
a87f83d [Cheng Lian] Extended timeout
e5aa31a [Cheng Lian] Fixed metastore JDBC URI quotation
2014-08-20 12:57:39 -07:00
Patrick Wendell ceb19830b8 BUILD: Bump Hadoop versions in the release build.
Also, minor modifications to the MapR profile.
2014-08-20 12:19:19 -07:00
Patrick Wendell f2f26c2a1d SPARK-3092 [SQL]: Always include the thriftserver when -Phive is enabled.
Currently we have a separate profile called hive-thriftserver. I originally suggested this in case users did not want to bundle the thriftserver, but it's ultimately lead to a lot of confusion. Since the thriftserver is only a few classes, I don't see a really good reason to isolate it from the rest of Hive. So let's go ahead and just include it in the same profile to simplify things.

This has been suggested in the past by liancheng.

Author: Patrick Wendell <pwendell@gmail.com>

Closes #2006 from pwendell/hiveserver and squashes the following commits:

742ea40 [Patrick Wendell] Merge remote-tracking branch 'apache/master' into hiveserver
034ad47 [Patrick Wendell] SPARK-3092: Always include the thriftserver when -Phive is enabled.
2014-08-20 12:13:31 -07:00
Josh Rosen 1f1819b20f [SPARK-3114] [PySpark] Fix Python UDFs in Spark SQL.
This fixes SPARK-3114, an issue where we inadvertently broke Python UDFs in Spark SQL.

This PR modifiers the test runner script to always run the PySpark SQL tests, irrespective of whether SparkSQL itself has been modified.  It also includes Davies' fix for the bug.

Closes #2026.

Author: Josh Rosen <joshrosen@apache.org>
Author: Davies Liu <davies.liu@gmail.com>

Closes #2027 from JoshRosen/pyspark-sql-fix and squashes the following commits:

9af2708 [Davies Liu] bugfix: disable compression of command
0d8d3a4 [Josh Rosen] Always run Python Spark SQL tests.
2014-08-18 20:42:19 -07:00
Patrick Wendell 5173f3c40f SPARK-2884: Create binary builds in parallel with release script. 2014-08-17 22:31:04 -07:00
Nicholas Chammas 4bdfaa16fc [SPARK-3076] [Jenkins] catch & report test timeouts
* Remove unused code to get jq
* Set timeout on tests and report gracefully on them

Author: Nicholas Chammas <nicholas.chammas@gmail.com>

Closes #1974 from nchammas/master and squashes the following commits:

d1f1b6b [Nicholas Chammas] set timeout to realistic number
8b1ea41 [Nicholas Chammas] fix formatting
279526e [Nicholas Chammas] [SPARK-3076] catch & report test timeouts
2014-08-16 12:43:36 -07:00
Nicholas Chammas 500f84e49d [SPARK-2912] [Spark QA] Include commit hash in Spark QA messages
You can find the [discussion that motivated this PR here](http://mail-archives.apache.org/mod_mbox/spark-dev/201408.mbox/%3CCABPQxssy0ri2QAz=cc9Tx+EXYWARm7pNcVm8apqCwc-esLbO4Qmail.gmail.com%3E).

As described in [SPARK-2912](https://issues.apache.org/jira/browse/SPARK-2912), the goal of this PR (and related ones to come) is to include useful detail in Spark QA's messages that are intended to make a committer's job easier to do.

Since this work depends on Jenkins, I cannot test this locally. Hence, I will be iterating via this PR.

Notes:
* This is a duplicate of a [previous PR](https://github.com/apache/spark/pull/1811), without the extraneous commits.
* This PR also resolves an issue targeted by [another open PR](https://github.com/apache/spark/pull/1809).

Closes #1809.

Author: Nicholas Chammas <nicholas.chammas@gmail.com>
Author: nchammas <nicholas.chammas@gmail.com>

Closes #1816 from nchammas/master and squashes the following commits:

c1be644 [Nicholas Chammas] [SPARK-2912] include commit hash in messages
8f641ac [nchammas] Merge pull request #7 from apache/master
2014-08-14 22:05:40 -07:00
Reynold Xin fa5a08e67d Make dev/mima runnable on Mac OS X.
Mac OS X's find is from the BSD variant that doesn't have -printf option.

Author: Reynold Xin <rxin@apache.org>

Closes #1953 from rxin/mima and squashes the following commits:

e284afe [Reynold Xin] Make dev/mima runnable on Mac OS X.
2014-08-14 16:27:11 -07:00
Kousuke Saruta 4f4a9884d9 [SPARK-2894] spark-shell doesn't accept flags
As sryza reported, spark-shell doesn't accept any flags.
The root cause is wrong usage of spark-submit in spark-shell and it come to the surface by #1801

Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
Author: Cheng Lian <lian.cs.zju@gmail.com>

Closes #1715, Closes #1864, and Closes #1861

Closes #1825 from sarutak/SPARK-2894 and squashes the following commits:

47f3510 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-2894
2c899ed [Kousuke Saruta] Removed useless code from java_gateway.py
98287ed [Kousuke Saruta] Removed useless code from java_gateway.py
513ad2e [Kousuke Saruta] Modified util.sh to enable to use option including white spaces
28a374e [Kousuke Saruta] Modified java_gateway.py to recognize arguments
5afc584 [Cheng Lian] Filter out spark-submit options when starting Python gateway
e630d19 [Cheng Lian] Fixing pyspark and spark-shell CLI options
2014-08-09 21:11:00 -07:00
Patrick Wendell a263a7e9f0 HOTFIX: Support custom Java 7 location 2014-08-06 18:45:19 -07:00
Nicholas Chammas d614967b0b [SPARK-2627] [PySpark] have the build enforce PEP 8 automatically
As described in [SPARK-2627](https://issues.apache.org/jira/browse/SPARK-2627), we'd like Python code to automatically be checked for PEP 8 compliance by Jenkins. This pull request aims to do that.

Notes:
* We may need to install [`pep8`](https://pypi.python.org/pypi/pep8) on the build server.
* I'm expecting tests to fail now that PEP 8 compliance is being checked as part of the build. I'm fine with cleaning up any remaining PEP 8 violations as part of this pull request.
* I did not understand why the RAT and scalastyle reports are saved to text files. I did the same for the PEP 8 check, but only so that the console output style can match those for the RAT and scalastyle checks. The PEP 8 report is removed right after the check is complete.
* Updates to the ["Contributing to Spark"](https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark) guide will be submitted elsewhere, as I don't believe that text is part of the Spark repo.

Author: Nicholas Chammas <nicholas.chammas@gmail.com>
Author: nchammas <nicholas.chammas@gmail.com>

Closes #1744 from nchammas/master and squashes the following commits:

274b238 [Nicholas Chammas] [SPARK-2627] [PySpark] minor indentation changes
983d963 [nchammas] Merge pull request #5 from apache/master
1db5314 [nchammas] Merge pull request #4 from apache/master
0e0245f [Nicholas Chammas] [SPARK-2627] undo erroneous whitespace fixes
bf30942 [Nicholas Chammas] [SPARK-2627] PEP8: comment spacing
6db9a44 [nchammas] Merge pull request #3 from apache/master
7b4750e [Nicholas Chammas] merge upstream changes
91b7584 [Nicholas Chammas] [SPARK-2627] undo unnecessary line breaks
44e3e56 [Nicholas Chammas] [SPARK-2627] use tox.ini to exclude files
b09fae2 [Nicholas Chammas] don't wrap comments unnecessarily
bfb9f9f [Nicholas Chammas] [SPARK-2627] keep up with the PEP 8 fixes
9da347f [nchammas] Merge pull request #2 from apache/master
aa5b4b5 [Nicholas Chammas] [SPARK-2627] follow Spark bash style for if blocks
d0a83b9 [Nicholas Chammas] [SPARK-2627] check that pep8 downloaded fine
dffb5dd [Nicholas Chammas] [SPARK-2627] download pep8 at runtime
a1ce7ae [Nicholas Chammas] [SPARK-2627] space out test report sections
21da538 [Nicholas Chammas] [SPARK-2627] it's PEP 8, not PEP8
6f4900b [Nicholas Chammas] [SPARK-2627] more misc PEP 8 fixes
fe57ed0 [Nicholas Chammas] removing merge conflict backups
9c01d4c [nchammas] Merge pull request #1 from apache/master
9a66cb0 [Nicholas Chammas] resolving merge conflicts
a31ccc4 [Nicholas Chammas] [SPARK-2627] miscellaneous PEP 8 fixes
beaa9ac [Nicholas Chammas] [SPARK-2627] fail check on non-zero status
723ed39 [Nicholas Chammas] always delete the report file
0541ebb [Nicholas Chammas] [SPARK-2627] call Python linter from run-tests
12440fa [Nicholas Chammas] [SPARK-2627] add Scala linter
61c07b9 [Nicholas Chammas] [SPARK-2627] add Python linter
75ad552 [Nicholas Chammas] make check output style consistent
2014-08-06 12:58:24 -07:00
Michael Armbrust 236dfac676 [SPARK-2784][SQL] Deprecate hql() method in favor of a config option, 'spark.sql.dialect'
Many users have reported being confused by the distinction between the `sql` and `hql` methods.  Specifically, many users think that `sql(...)` cannot be used to read hive tables.  In this PR I introduce a new configuration option `spark.sql.dialect` that picks which dialect with be used for parsing.  For SQLContext this must be set to `sql`.  In `HiveContext` it defaults to `hiveql` but can also be set to `sql`.

The `hql` and `hiveql` methods continue to act the same but are now marked as deprecated.

**This is a possibly breaking change for some users unless they set the dialect manually, though this is unlikely.**

For example: `hiveContex.sql("SELECT 1")` will now throw a parsing exception by default.

Author: Michael Armbrust <michael@databricks.com>

Closes #1746 from marmbrus/sqlLanguageConf and squashes the following commits:

ad375cc [Michael Armbrust] Merge remote-tracking branch 'apache/master' into sqlLanguageConf
20c43f8 [Michael Armbrust] override function instead of just setting the value
7e4ae93 [Michael Armbrust] Deprecate hql() method in favor of a config option, 'spark.sql.dialect'
2014-08-03 12:28:29 -07:00
Michael Armbrust 1a8043739d [SPARK-2739][SQL] Rename registerAsTable to registerTempTable
There have been user complaints that the difference between `registerAsTable` and `saveAsTable` is too subtle.  This PR addresses this by renaming `registerAsTable` to `registerTempTable`, which more clearly reflects what is happening.  `registerAsTable` remains, but will cause a deprecation warning.

Author: Michael Armbrust <michael@databricks.com>

Closes #1743 from marmbrus/registerTempTable and squashes the following commits:

d031348 [Michael Armbrust] Merge remote-tracking branch 'apache/master' into registerTempTable
4dff086 [Michael Armbrust] Fix .java files too
89a2f12 [Michael Armbrust] Merge remote-tracking branch 'apache/master' into registerTempTable
0b7b71e [Michael Armbrust] Rename registerAsTable to registerTempTable
2014-08-02 18:27:04 -07:00
Chris Fregly 91f9504e60 [SPARK-1981] Add AWS Kinesis streaming support
Author: Chris Fregly <chris@fregly.com>

Closes #1434 from cfregly/master and squashes the following commits:

4774581 [Chris Fregly] updated docs, renamed retry to retryRandom to be more clear, removed retries around store() method
0393795 [Chris Fregly] moved Kinesis examples out of examples/ and back into extras/kinesis-asl
691a6be [Chris Fregly] fixed tests and formatting, fixed a bug with JavaKinesisWordCount during union of streams
0e1c67b [Chris Fregly] Merge remote-tracking branch 'upstream/master'
74e5c7c [Chris Fregly] updated per TD's feedback.  simplified examples, updated docs
e33cbeb [Chris Fregly] Merge remote-tracking branch 'upstream/master'
bf614e9 [Chris Fregly] per matei's feedback:  moved the kinesis examples into the examples/ dir
d17ca6d [Chris Fregly] per TD's feedback:  updated docs, simplified the KinesisUtils api
912640c [Chris Fregly] changed the foundKinesis class to be a publically-avail class
db3eefd [Chris Fregly] Merge remote-tracking branch 'upstream/master'
21de67f [Chris Fregly] Merge remote-tracking branch 'upstream/master'
6c39561 [Chris Fregly] parameterized the versions of the aws java sdk and kinesis client
338997e [Chris Fregly] improve build docs for kinesis
828f8ae [Chris Fregly] more cleanup
e7c8978 [Chris Fregly] Merge remote-tracking branch 'upstream/master'
cd68c0d [Chris Fregly] fixed typos and backward compatibility
d18e680 [Chris Fregly] Merge remote-tracking branch 'upstream/master'
b3b0ff1 [Chris Fregly] [SPARK-1981] Add AWS Kinesis streaming support
2014-08-02 13:35:35 -07:00
Josh Rosen e02136214a Improvements to merge_spark_pr.py
This commit fixes a couple of issues in the merge_spark_pr.py developer script:

- Allow recovery from failed cherry-picks.
- Fix detection of pull requests that have already been merged.

Both of these fixes are useful when backporting changes.

Author: Josh Rosen <joshrosen@apache.org>

Closes #1668 from JoshRosen/pr-script-improvements and squashes the following commits:

ff4f33a [Josh Rosen] Default SPARK_HOME to cwd(); detect missing JIRA credentials.
ed5bc57 [Josh Rosen] Improvements for backporting using merge_spark_pr:
2014-07-31 14:35:09 -07:00
Michael Armbrust 72cfb13987 [SPARK-2397][SQL] Deprecate LocalHiveContext
LocalHiveContext is redundant with HiveContext.  The only difference is it creates `./metastore` instead of `./metastore_db`.

Author: Michael Armbrust <michael@databricks.com>

Closes #1641 from marmbrus/localHiveContext and squashes the following commits:

e5ec497 [Michael Armbrust] Add deprecation version
626e056 [Michael Armbrust] Don't remove from imports yet
905cc5f [Michael Armbrust] Merge remote-tracking branch 'apache/master' into localHiveContext
1c2727e [Michael Armbrust] Deprecate LocalHiveContext
2014-07-31 11:26:43 -07:00
Brock Noland 2ac37db7ac SPARK-2741 - Publish version of spark assembly which does not contain Hive
Provide a version of the Spark tarball which does not package Hive. This is meant for HIve + Spark users.

Author: Brock Noland <brock@apache.org>

Closes #1667 from brockn/master and squashes the following commits:

5beafb2 [Brock Noland] SPARK-2741 - Publish version of spark assembly which does not contain Hive
2014-07-30 17:04:30 -07:00
Reynold Xin 2f4b17056f Properly pass SBT_MAVEN_PROFILES into sbt. 2014-07-30 14:31:20 -07:00
Reynold Xin 1097327538 Set AMPLAB_JENKINS_BUILD_PROFILE. 2014-07-30 14:08:24 -07:00
Reynold Xin 7c7ce54522 Wrap JAR_DL in dev/check-license. 2014-07-30 13:43:17 -07:00
Reynold Xin 437dc8c5b5 dev/check-license wrap folders in quotes. 2014-07-30 13:17:49 -07:00
Reynold Xin 0feb349ea0 More wrapping FWDIR in quotes. 2014-07-30 13:04:20 -07:00
Reynold Xin 95cf203936 Wrap FWDIR in quotes in dev/check-license. 2014-07-30 12:33:42 -07:00
Reynold Xin f2eb84fe73 Wrap FWDIR in quotes. 2014-07-30 12:24:35 -07:00
Reynold Xin ff511bacf2 [SPARK-2746] Set SBT_MAVEN_PROFILES only when it is not set explicitly by the user.
Author: Reynold Xin <rxin@apache.org>

Closes #1655 from rxin/SBT_MAVEN_PROFILES and squashes the following commits:

b268c4b [Reynold Xin] [SPARK-2746] Set SBT_MAVEN_PROFILES only when it is not set explicitly by the user.
2014-07-30 11:45:24 -07:00
Reynold Xin 3bc3f1801e [SPARK-2747] git diff --dirstat can miss sql changes and not run Hive tests
dev/run-tests use "git diff --dirstat master" to check whether sql is changed. However, --dirstat won't show sql if sql's change is negligible (e.g. 1k loc change in core, and only 1 loc change in hive).

We should use "git diff --name-only master" instead.

Author: Reynold Xin <rxin@apache.org>

Closes #1656 from rxin/hiveTest and squashes the following commits:

f5eab9f [Reynold Xin] [SPARK-2747] git diff --dirstat can miss sql changes and not run Hive tests.
2014-07-30 09:28:53 -07:00
Cheng Lian a7a9d14479 [SPARK-2410][SQL] Merging Hive Thrift/JDBC server (with Maven profile fix)
JIRA issue: [SPARK-2410](https://issues.apache.org/jira/browse/SPARK-2410)

Another try for #1399 & #1600. Those two PR breaks Jenkins builds because we made a separate profile `hive-thriftserver` in sub-project `assembly`, but the `hive-thriftserver` module is defined outside the `hive-thriftserver` profile. Thus every time a pull request that doesn't touch SQL code will also execute test suites defined in `hive-thriftserver`, but tests fail because related .class files are not included in the assembly jar.

In the most recent commit, module `hive-thriftserver` is moved into its own profile to fix this problem. All previous commits are squashed for clarity.

Author: Cheng Lian <lian.cs.zju@gmail.com>

Closes #1620 from liancheng/jdbc-with-maven-fix and squashes the following commits:

629988e [Cheng Lian] Moved hive-thriftserver module definition into its own profile
ec3c7a7 [Cheng Lian] Cherry picked the Hive Thrift server
2014-07-28 12:07:30 -07:00
Patrick Wendell e5bbce9a60 Revert "[SPARK-2410][SQL] Merging Hive Thrift/JDBC server"
This reverts commit f6ff2a61d0.
2014-07-27 18:46:58 -07:00
Cheng Lian f6ff2a61d0 [SPARK-2410][SQL] Merging Hive Thrift/JDBC server
(This is a replacement of #1399, trying to fix potential `HiveThriftServer2` port collision between parallel builds. Please refer to [these comments](https://github.com/apache/spark/pull/1399#issuecomment-50212572) for details.)

JIRA issue: [SPARK-2410](https://issues.apache.org/jira/browse/SPARK-2410)

Merging the Hive Thrift/JDBC server from [branch-1.0-jdbc](https://github.com/apache/spark/tree/branch-1.0-jdbc).

Thanks chenghao-intel for his initial contribution of the Spark SQL CLI.

Author: Cheng Lian <lian.cs.zju@gmail.com>

Closes #1600 from liancheng/jdbc and squashes the following commits:

ac4618b [Cheng Lian] Uses random port for HiveThriftServer2 to avoid collision with parallel builds
090beea [Cheng Lian] Revert changes related to SPARK-2678, decided to move them to another PR
21c6cf4 [Cheng Lian] Updated Spark SQL programming guide docs
fe0af31 [Cheng Lian] Reordered spark-submit options in spark-shell[.cmd]
199e3fb [Cheng Lian] Disabled MIMA for hive-thriftserver
1083e9d [Cheng Lian] Fixed failed test suites
7db82a1 [Cheng Lian] Fixed spark-submit application options handling logic
9cc0f06 [Cheng Lian] Starts beeline with spark-submit
cfcf461 [Cheng Lian] Updated documents and build scripts for the newly added hive-thriftserver profile
061880f [Cheng Lian] Addressed all comments by @pwendell
7755062 [Cheng Lian] Adapts test suites to spark-submit settings
40bafef [Cheng Lian] Fixed more license header issues
e214aab [Cheng Lian] Added missing license headers
b8905ba [Cheng Lian] Fixed minor issues in spark-sql and start-thriftserver.sh
f975d22 [Cheng Lian] Updated docs for Hive compatibility and Shark migration guide draft
3ad4e75 [Cheng Lian] Starts spark-sql shell with spark-submit
a5310d1 [Cheng Lian] Make HiveThriftServer2 play well with spark-submit
61f39f4 [Cheng Lian] Starts Hive Thrift server via spark-submit
2c4c539 [Cheng Lian] Cherry picked the Hive Thrift server
2014-07-27 13:03:38 -07:00
Michael Armbrust afd757a241 Revert "[SPARK-2410][SQL] Merging Hive Thrift/JDBC server"
This reverts commit 06dc0d2c6b.

#1399 is making Jenkins fail.  We should investigate and put this back after its passing tests.

Author: Michael Armbrust <michael@databricks.com>

Closes #1594 from marmbrus/revertJDBC and squashes the following commits:

59748da [Michael Armbrust] Revert "[SPARK-2410][SQL] Merging Hive Thrift/JDBC server"
2014-07-25 15:36:57 -07:00
Cheng Lian 06dc0d2c6b [SPARK-2410][SQL] Merging Hive Thrift/JDBC server
JIRA issue:

- Main: [SPARK-2410](https://issues.apache.org/jira/browse/SPARK-2410)
- Related: [SPARK-2678](https://issues.apache.org/jira/browse/SPARK-2678)

Cherry picked the Hive Thrift/JDBC server from [branch-1.0-jdbc](https://github.com/apache/spark/tree/branch-1.0-jdbc).

(Thanks chenghao-intel for his initial contribution of the Spark SQL CLI.)

TODO

- [x] Use `spark-submit` to launch the server, the CLI and beeline
- [x] Migration guideline draft for Shark users

----

Hit by a bug in `SparkSubmitArguments` while working on this PR: all application options that are recognized by `SparkSubmitArguments` are stolen as `SparkSubmit` options. For example:

```bash
$ spark-submit --class org.apache.hive.beeline.BeeLine spark-internal --help
```

This actually shows usage information of `SparkSubmit` rather than `BeeLine`.

~~Fixed this bug here since the `spark-internal` related stuff also touches `SparkSubmitArguments` and I'd like to avoid conflict.~~

**UPDATE** The bug mentioned above is now tracked by [SPARK-2678](https://issues.apache.org/jira/browse/SPARK-2678). Decided to revert changes to this bug since it involves more subtle considerations and worth a separate PR.

Author: Cheng Lian <lian.cs.zju@gmail.com>

Closes #1399 from liancheng/thriftserver and squashes the following commits:

090beea [Cheng Lian] Revert changes related to SPARK-2678, decided to move them to another PR
21c6cf4 [Cheng Lian] Updated Spark SQL programming guide docs
fe0af31 [Cheng Lian] Reordered spark-submit options in spark-shell[.cmd]
199e3fb [Cheng Lian] Disabled MIMA for hive-thriftserver
1083e9d [Cheng Lian] Fixed failed test suites
7db82a1 [Cheng Lian] Fixed spark-submit application options handling logic
9cc0f06 [Cheng Lian] Starts beeline with spark-submit
cfcf461 [Cheng Lian] Updated documents and build scripts for the newly added hive-thriftserver profile
061880f [Cheng Lian] Addressed all comments by @pwendell
7755062 [Cheng Lian] Adapts test suites to spark-submit settings
40bafef [Cheng Lian] Fixed more license header issues
e214aab [Cheng Lian] Added missing license headers
b8905ba [Cheng Lian] Fixed minor issues in spark-sql and start-thriftserver.sh
f975d22 [Cheng Lian] Updated docs for Hive compatibility and Shark migration guide draft
3ad4e75 [Cheng Lian] Starts spark-sql shell with spark-submit
a5310d1 [Cheng Lian] Make HiveThriftServer2 play well with spark-submit
61f39f4 [Cheng Lian] Starts Hive Thrift server via spark-submit
2c4c539 [Cheng Lian] Cherry picked the Hive Thrift server
2014-07-25 12:20:49 -07:00
Patrick Wendell d39e3b9673 SPARK-2596 HOTFIX: Deal with non-existent JIRAs.
A small bug that was found in our JIRA sync script.
2014-07-19 20:06:28 -07:00
Patrick Wendell 49e4727449 SPARK-2596 A tool for mirroring github pull requests on JIRA.
For a bunch of reasons we should automatically populate a JIRA with information about new pull requests when they arrive. I've written a small python script to do this that we can run from Jenkins every 5 or 10 minutes to keep things in sync.

Author: Patrick Wendell <pwendell@gmail.com>

Closes #1496 from pwendell/github-integration and squashes the following commits:

55ad226 [Patrick Wendell] Small fix
afda547 [Patrick Wendell] Use sequence instead of dictiory for JIRA's
3e18cc1 [Patrick Wendell] Small edits
84c5606 [Patrick Wendell] SPARK-2596 A tool for mirroring github pull requests on JIRA.
2014-07-19 18:19:08 -07:00
Patrick Wendell d0ea496877 SPARK-2526: Simplify options in make-distribution.sh
Right now we have a bunch of parallel logic in make-distribution.sh
that's just extra work to maintain. We should just pass through
Maven profiles in this case and keep the script simple. See
the JIRA for more details.

Author: Patrick Wendell <pwendell@gmail.com>

Closes #1445 from pwendell/make-distribution.sh and squashes the following commits:

f1294ea [Patrick Wendell] Simplify options in make-distribution.sh.
2014-07-17 01:02:35 -07:00
witgo 9dd635eb5d SPARK-2480: Resolve sbt warnings "NOTE: SPARK_YARN is deprecated, please use -Pyarn flag"
Author: witgo <witgo@qq.com>

Closes #1404 from witgo/run-tests and squashes the following commits:

f703aee [witgo] fix Note: implicit method fromPairDStream is not applicable here because it comes after the application point and it lacks an explicit result type
2944f51 [witgo] Remove "NOTE: SPARK_YARN is deprecated, please use -Pyarn flag"
ef59c70 [witgo] fix Note: implicit method fromPairDStream is not applicable here because it comes after the application point and it lacks an explicit result type
6cefee5 [witgo] Remove "NOTE: SPARK_YARN is deprecated, please use -Pyarn flag"
2014-07-15 10:46:17 -07:00
Prashant Sharma 628932b8d0 [SPARK-1776] Have Spark's SBT build read dependencies from Maven.
Patch introduces the new way of working also retaining the existing ways of doing things.

For example build instruction for yarn in maven is
`mvn -Pyarn -PHadoop2.2 clean package -DskipTests`
in sbt it can become
`MAVEN_PROFILES="yarn, hadoop-2.2" sbt/sbt clean assembly`
Also supports
`sbt/sbt -Pyarn -Phadoop-2.2 -Dhadoop.version=2.2.0 clean assembly`

Author: Prashant Sharma <prashant.s@imaginea.com>
Author: Patrick Wendell <pwendell@gmail.com>

Closes #772 from ScrapCodes/sbt-maven and squashes the following commits:

a8ac951 [Prashant Sharma] Updated sbt version.
62b09bb [Prashant Sharma] Improvements.
fa6221d [Prashant Sharma] Excluding sql from mima
4b8875e [Prashant Sharma] Sbt assembly no longer builds tools by default.
72651ca [Prashant Sharma] Addresses code reivew comments.
acab73d [Prashant Sharma] Revert "Small fix to run-examples script."
ac4312c [Prashant Sharma] Revert "minor fix"
6af91ac [Prashant Sharma] Ported oldDeps back. + fixes issues with prev commit.
65cf06c [Prashant Sharma] Servelet API jars mess up with the other servlet jars on the class path.
446768e [Prashant Sharma] minor fix
89b9777 [Prashant Sharma] Merge conflicts
d0a02f2 [Prashant Sharma] Bumped up pom versions, Since the build now depends on pom it is better updated there. + general cleanups.
dccc8ac [Prashant Sharma] updated mima to check against 1.0
a49c61b [Prashant Sharma] Fix for tools jar
a2f5ae1 [Prashant Sharma] Fixes a bug in dependencies.
cf88758 [Prashant Sharma] cleanup
9439ea3 [Prashant Sharma] Small fix to run-examples script.
96cea1f [Prashant Sharma] SPARK-1776 Have Spark's SBT build read dependencies from Maven.
36efa62 [Patrick Wendell] Set project name in pom files and added eclipse/intellij plugins.
4973dbd [Patrick Wendell] Example build using pom reader.
2014-07-10 11:03:37 -07:00
Patrick Wendell 2e0a037dff SPARK-2416: Allow richer reporting of unit test results
The built-in Jenkins integration is pretty bad. It's very confusing to users whether tests have passed or failed and we can't easily customize the message.

With some small scripting around the Github API we can do much better than this.

Author: Patrick Wendell <pwendell@gmail.com>

Closes #1340 from pwendell/better-qa-messages and squashes the following commits:

fd6077d [Patrick Wendell] Better automation for unit tests.
2014-07-09 19:26:16 -07:00
Patrick Wendell fc71658938 HOTFIX: Clean before building docs during release.
If the docs are built after a Maven build has finished the intermediate
state somehow causes a compiler bug during sbt compilation. This just
does a clean before attempting to build the docs.
2014-07-04 10:01:19 -07:00
Patrick Wendell f1f7385a50 Strip '@' symbols when merging pull requests.
Currently all of the commits with 'X' in them cause person X to
receive e-mails every time someone makes a public fork of Spark.

marmbrus who requested this.

Author: Patrick Wendell <pwendell@gmail.com>

Closes #1239 from pwendell/strip and squashes the following commits:

22e5a97 [Patrick Wendell] Strip '@' symbols when merging pull requests.
2014-06-26 17:09:24 -07:00
Patrick Wendell 58b32f3470 SPARK-2231: dev/run-tests should include YARN and use a recent Hadoop version
...rsion

Author: Patrick Wendell <pwendell@gmail.com>

Closes #1175 from pwendell/test-hadoop-version and squashes the following commits:

9210ef4 [Patrick Wendell] SPARK-2231: dev/run-tests should include YARN and use a recent Hadoop version
2014-06-22 00:55:27 -07:00