Commit graph

225 commits

Author SHA1 Message Date
Dongjoon Hyun a2c6adcc5d
[SPARK-18857][SQL] Don't use Iterator.duplicate for incrementalCollect in Thrift Server
## What changes were proposed in this pull request?

To support `FETCH_FIRST`, SPARK-16563 used Scala `Iterator.duplicate`. However,
Scala `Iterator.duplicate` uses a **queue to buffer all items between both iterators**,
this causes GC and hangs for queries with large number of rows. We should not use this,
especially for `spark.sql.thriftServer.incrementalCollect`.

https://github.com/scala/scala/blob/2.12.x/src/library/scala/collection/Iterator.scala#L1262-L1300

## How was this patch tested?

Pass the existing tests.

Author: Dongjoon Hyun <dongjoon@apache.org>

Closes #16440 from dongjoon-hyun/SPARK-18857.
2017-01-10 13:27:55 +00:00
Niranjan Padmanabhan a1e40b1f5d
[MINOR][DOCS] Remove consecutive duplicated words/typo in Spark Repo
## What changes were proposed in this pull request?
There are many locations in the Spark repo where the same word occurs consecutively. Sometimes they are appropriately placed, but many times they are not. This PR removes the inappropriately duplicated words.

## How was this patch tested?
N/A since only docs or comments were updated.

Author: Niranjan Padmanabhan <niranjan.padmanabhan@gmail.com>

Closes #16455 from neurons/np.structure_streaming_doc.
2017-01-04 15:07:29 +00:00
gatorsmile 5ac62043cf [SPARK-18992][SQL] Move spark.sql.hive.thriftServer.singleSession to SQLConf
### What changes were proposed in this pull request?

Since `spark.sql.hive.thriftServer.singleSession` is a configuration of SQL component, this conf can be moved from `SparkConf` to `StaticSQLConf`.

When we introduced `spark.sql.hive.thriftServer.singleSession`, all the SQL configuration are session specific. They can be modified in different sessions.

In Spark 2.1, static SQL configuration is added. It is a perfect fit for `spark.sql.hive.thriftServer.singleSession`. Previously, we did the same move for `spark.sql.warehouse.dir` from `SparkConf` to `StaticSQLConf`

### How was this patch tested?
Added test cases in HiveThriftServer2Suites.scala

Author: gatorsmile <gatorsmile@gmail.com>

Closes #16392 from gatorsmile/hiveThriftServerSingleSession.
2016-12-28 10:16:22 +08:00
Ryan Williams afd9bc1d8a [SPARK-17807][CORE] split test-tags into test-JAR
Remove spark-tag's compile-scope dependency (and, indirectly, spark-core's compile-scope transitive-dependency) on scalatest by splitting test-oriented tags into spark-tags' test JAR.

Alternative to #16303.

Author: Ryan Williams <ryan.blake.williams@gmail.com>

Closes #16311 from ryan-williams/tt.
2016-12-21 16:37:20 -08:00
Reynold Xin c7c7265950 [SPARK-18695] Bump master branch version to 2.2.0-SNAPSHOT
## What changes were proposed in this pull request?
This patch bumps master branch version to 2.2.0-SNAPSHOT.

## How was this patch tested?
N/A

Author: Reynold Xin <rxin@databricks.com>

Closes #16126 from rxin/SPARK-18695.
2016-12-02 21:09:37 -08:00
wm624@hotmail.com 22a9d064e9
[SPARK-14914][CORE] Fix Resource not closed after using, for unit tests and example
## What changes were proposed in this pull request?

This is a follow-up work of #15618.

Close file source;
For any newly created streaming context outside the withContext, explicitly close the context.

## How was this patch tested?

Existing unit tests.

Author: wm624@hotmail.com <wm624@hotmail.com>

Closes #15818 from wangmiao1981/rtest.
2016-11-10 10:54:36 +00:00
Ryan Blue 9b0593d5e9 [SPARK-18086] Add support for Hive session vars.
## What changes were proposed in this pull request?

This adds support for Hive variables:

* Makes values set via `spark-sql --hivevar name=value` accessible
* Adds `getHiveVar` and `setHiveVar` to the `HiveClient` interface
* Adds a SessionVariables trait for sessions like Hive that support variables (including Hive vars)
* Adds SessionVariables support to variable substitution
* Adds SessionVariables support to the SET command

## How was this patch tested?

* Adds a test to all supported Hive versions for accessing Hive variables
* Adds HiveVariableSubstitutionSuite

Author: Ryan Blue <blue@apache.org>

Closes #15738 from rdblue/SPARK-18086-add-hivevar-support.
2016-11-07 17:36:15 -08:00
Josh Rosen 6e6298154a [SPARK-17350][SQL] Disable default use of KryoSerializer in Thrift Server
In SPARK-4761 / #3621 (December 2014) we enabled Kryo serialization by default in the Spark Thrift Server. However, I don't think that the original rationale for doing this still holds now that most Spark SQL serialization is now performed via encoders and our UnsafeRow format.

In addition, the use of Kryo as the default serializer can introduce performance problems because the creation of new KryoSerializer instances is expensive and we haven't performed instance-reuse optimizations in several code paths (including DirectTaskResult deserialization).

Given all of this, I propose to revert back to using JavaSerializer as the default serializer in the Thrift Server.

/cc liancheng

Author: Josh Rosen <joshrosen@databricks.com>

Closes #14906 from JoshRosen/disable-kryo-in-thriftserver.
2016-11-01 16:23:47 -07:00
Dongjoon Hyun 59e3eb5af8 [SPARK-17819][SQL] Support default database in connection URIs for Spark Thrift Server
## What changes were proposed in this pull request?

Currently, Spark Thrift Server ignores the default database in URI. This PR supports that like the following.

```sql
$ bin/beeline -u jdbc:hive2://localhost:10000 -e "create database testdb"
$ bin/beeline -u jdbc:hive2://localhost:10000/testdb -e "create table t(a int)"
$ bin/beeline -u jdbc:hive2://localhost:10000/testdb -e "show tables"
...
+------------+--------------+--+
| tableName  | isTemporary  |
+------------+--------------+--+
| t          | false        |
+------------+--------------+--+
1 row selected (0.347 seconds)
$ bin/beeline -u jdbc:hive2://localhost:10000 -e "show tables"
...
+------------+--------------+--+
| tableName  | isTemporary  |
+------------+--------------+--+
+------------+--------------+--+
No rows selected (0.098 seconds)
```

## How was this patch tested?

Manual.

Note: I tried to add a test case for this, but I cannot found a suitable testsuite for this. I'll add the testcase if some advice is given.

Author: Dongjoon Hyun <dongjoon@apache.org>

Closes #15399 from dongjoon-hyun/SPARK-17819.
2016-10-16 20:15:32 -07:00
Sean Owen cff5607552 [SPARK-17707][WEBUI] Web UI prevents spark-submit application to be finished
## What changes were proposed in this pull request?

This expands calls to Jetty's simple `ServerConnector` constructor to explicitly specify a `ScheduledExecutorScheduler` that makes daemon threads. It should otherwise result in exactly the same configuration, because the other args are copied from the constructor that is currently called.

(I'm not sure we should change the Hive Thriftserver impl, but I did anyway.)

This also adds `sc.stop()` to the quick start guide example.

## How was this patch tested?

Existing tests; _pending_ at least manual verification of the fix.

Author: Sean Owen <sowen@cloudera.com>

Closes #15381 from srowen/SPARK-17707.
2016-10-07 10:31:41 -07:00
Dongjoon Hyun c571cfb2d0 [SPARK-17112][SQL] "select null" via JDBC triggers IllegalArgumentException in Thriftserver
## What changes were proposed in this pull request?

Currently, Spark Thrift Server raises `IllegalArgumentException` for queries whose column types are `NullType`, e.g., `SELECT null` or `SELECT if(true,null,null)`. This PR fixes that by returning `void` like Hive 1.2.

**Before**
```sql
$ bin/beeline -u jdbc:hive2://localhost:10000 -e "select null"
Connecting to jdbc:hive2://localhost:10000
Connected to: Spark SQL (version 2.1.0-SNAPSHOT)
Driver: Hive JDBC (version 1.2.1.spark2)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Error: java.lang.IllegalArgumentException: Unrecognized type name: null (state=,code=0)
Closing: 0: jdbc:hive2://localhost:10000

$ bin/beeline -u jdbc:hive2://localhost:10000 -e "select if(true,null,null)"
Connecting to jdbc:hive2://localhost:10000
Connected to: Spark SQL (version 2.1.0-SNAPSHOT)
Driver: Hive JDBC (version 1.2.1.spark2)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Error: java.lang.IllegalArgumentException: Unrecognized type name: null (state=,code=0)
Closing: 0: jdbc:hive2://localhost:10000
```

**After**
```sql
$ bin/beeline -u jdbc:hive2://localhost:10000 -e "select null"
Connecting to jdbc:hive2://localhost:10000
Connected to: Spark SQL (version 2.1.0-SNAPSHOT)
Driver: Hive JDBC (version 1.2.1.spark2)
Transaction isolation: TRANSACTION_REPEATABLE_READ
+-------+--+
| NULL  |
+-------+--+
| NULL  |
+-------+--+
1 row selected (3.242 seconds)
Beeline version 1.2.1.spark2 by Apache Hive
Closing: 0: jdbc:hive2://localhost:10000

$ bin/beeline -u jdbc:hive2://localhost:10000 -e "select if(true,null,null)"
Connecting to jdbc:hive2://localhost:10000
Connected to: Spark SQL (version 2.1.0-SNAPSHOT)
Driver: Hive JDBC (version 1.2.1.spark2)
Transaction isolation: TRANSACTION_REPEATABLE_READ
+-------------------------+--+
| (IF(true, NULL, NULL))  |
+-------------------------+--+
| NULL                    |
+-------------------------+--+
1 row selected (0.201 seconds)
Beeline version 1.2.1.spark2 by Apache Hive
Closing: 0: jdbc:hive2://localhost:10000
```

## How was this patch tested?

* Pass the Jenkins test with a new testsuite.
* Also, Manually, after starting Spark Thrift Server, run the following command.
```sql
$ bin/beeline -u jdbc:hive2://localhost:10000 -e "select null"
$ bin/beeline -u jdbc:hive2://localhost:10000 -e "select if(true,null,null)"
```

**Hive 1.2**
```sql
hive> create table null_table as select null;
hive> desc null_table;
OK
_c0                     void
```

Author: Dongjoon Hyun <dongjoon@apache.org>

Closes #15325 from dongjoon-hyun/SPARK-17112.
2016-10-03 21:28:16 -07:00
gatorsmile 4d0706d616 [SPARK-17190][SQL] Removal of HiveSharedState
### What changes were proposed in this pull request?
Since `HiveClient` is used to interact with the Hive metastore, it should be hidden in `HiveExternalCatalog`. After moving `HiveClient` into `HiveExternalCatalog`, `HiveSharedState` becomes a wrapper of `HiveExternalCatalog`. Thus, removal of `HiveSharedState` becomes straightforward. After removal of `HiveSharedState`, the reflection logic is directly applied on the choice of `ExternalCatalog` types, based on the configuration of `CATALOG_IMPLEMENTATION`.

~~`HiveClient` is also used/invoked by the other entities besides HiveExternalCatalog, we defines the following two APIs: getClient and getNewClient~~

### How was this patch tested?
The existing test cases

Author: gatorsmile <gatorsmile@gmail.com>

Closes #14757 from gatorsmile/removeHiveClient.
2016-08-25 12:50:03 +08:00
huangzhaowei a45fefd17e [SPARK-16941] Use concurrentHashMap instead of scala Map in SparkSQLOperationManager.
## What changes were proposed in this pull request?
ThriftServer will have some thread-safe problem in **SparkSQLOperationManager**.
Add a SynchronizedMap trait for the maps in it to avoid this problem.

Details in [SPARK-16941](https://issues.apache.org/jira/browse/SPARK-16941)

## How was this patch tested?
NA

Author: huangzhaowei <carlmartinmax@gmail.com>

Closes #14534 from SaintBacchus/SPARK-16941.
2016-08-11 11:28:28 +01:00
Alice e17a76efdb [SPARK-16563][SQL] fix spark sql thrift server FetchResults bug
## What changes were proposed in this pull request?

Add a constant iterator which point to head of result. The header will be used to reset iterator when fetch result from first row repeatedly.
JIRA ticket https://issues.apache.org/jira/browse/SPARK-16563

## How was this patch tested?

This bug was found when using Cloudera HUE connecting to spark sql thrift server, currently SQL statement result can be only fetched for once. The fix was tested manually with Cloudera HUE, With this fix, HUE can fetch spark SQL results repeatedly through thrift server.

Author: Alice <alice.gugu@gmail.com>
Author: Alice <guhq@garena.com>

Closes #14218 from alicegugu/SparkSQLFetchResultsBug.
2016-08-08 18:00:04 -07:00
Xin Ren 21a6dd2aef [SPARK-16535][BUILD] In pom.xml, remove groupId which is redundant definition and inherited from the parent
https://issues.apache.org/jira/browse/SPARK-16535

## What changes were proposed in this pull request?

When I scan through the pom.xml of sub projects, I found this warning as below and attached screenshot
```
Definition of groupId is redundant, because it's inherited from the parent
```
![screen shot 2016-07-13 at 3 13 11 pm](https://cloud.githubusercontent.com/assets/3925641/16823121/744f893e-4916-11e6-8a52-042f83b9db4e.png)

I've tried to remove some of the lines with groupId definition, and the build on my local machine is still ok.
```
<groupId>org.apache.spark</groupId>
```
As I just find now `<maven.version>3.3.9</maven.version>` is being used in Spark 2.x, and Maven-3 supports versionless parent elements: Maven 3 will remove the need to specify the parent version in sub modules. THIS is great (in Maven 3.1).

ref: http://stackoverflow.com/questions/3157240/maven-3-worth-it/3166762#3166762

## How was this patch tested?

I've tested by re-building the project, and build succeeded.

Author: Xin Ren <iamshrek@126.com>

Closes #14189 from keypointt/SPARK-16535.
2016-07-19 11:59:46 +01:00
Reynold Xin ffcb6e055a [SPARK-16477] Bump master version to 2.1.0-SNAPSHOT
## What changes were proposed in this pull request?
After SPARK-16476 (committed earlier today as #14128), we can finally bump the version number.

## How was this patch tested?
N/A

Author: Reynold Xin <rxin@databricks.com>

Closes #14130 from rxin/SPARK-16477.
2016-07-11 09:42:56 -07:00
Cheng Hao 920cb5fe4e [SPARK-15730][SQL] Respect the --hiveconf in the spark-sql command line
## What changes were proposed in this pull request?
This PR makes spark-sql (backed by SparkSQLCLIDriver) respects confs set by hiveconf, which is what we do in previous versions. The change is that when we start SparkSQLCLIDriver, we explicitly set confs set through --hiveconf to SQLContext's conf (basically treating those confs as a SparkSQL conf).

## How was this patch tested?
A new test in CliSuite.

Closes #13542

Author: Cheng Hao <hao.cheng@intel.com>
Author: Yin Huai <yhuai@databricks.com>

Closes #14058 from yhuai/hiveConfThriftServer.
2016-07-05 16:42:43 -07:00
Sean Owen 158af162ea [SPARK-16129][CORE][SQL] Eliminate direct use of commons-lang classes in favor of commons-lang3
## What changes were proposed in this pull request?

Replace use of `commons-lang` in favor of `commons-lang3` and forbid the former via scalastyle; remove `NotImplementedException` from `comons-lang` in favor of JDK `UnsupportedOperationException`

## How was this patch tested?

Jenkins tests

Author: Sean Owen <sowen@cloudera.com>

Closes #13843 from srowen/SPARK-16129.
2016-06-24 10:35:54 +01:00
Egor Pakhomov 049e639fc2 [SPARK-15934] [SQL] Return binary mode in ThriftServer
Returning binary mode to ThriftServer for backward compatibility.

Tested with Squirrel and Tableau.

Author: Egor Pakhomov <egor@anchorfree.com>

Closes #13667 from epahomov/SPARK-15095-2.0.
2016-06-15 14:29:32 -07:00
Jeff Zhang 53bb030847 doc fix of HiveThriftServer
## What changes were proposed in this pull request?

Just minor doc fix.

\cc yhuai

Author: Jeff Zhang <zjffdu@apache.org>

Closes #13659 from zjffdu/doc_fix.
2016-06-14 14:28:40 +01:00
Xin Wu 019afd9c78 [SPARK-15431][SQL][BRANCH-2.0-TEST] rework the clisuite test cases
## What changes were proposed in this pull request?
This PR reworks on the CliSuite test cases for `LIST FILES/JARS` commands.

CC yhuai Thanks!

Author: Xin Wu <xinwu@us.ibm.com>

Closes #13361 from xwu0226/SPARK-15431-clisuite-new.
2016-05-27 14:07:12 -07:00
Xin Wu 6f95c6c030 [SPARK-15431][SQL][HOTFIX] ignore 'list' command testcase from CliSuite for now
## What changes were proposed in this pull request?
The test cases for  `list` command added in `CliSuite` by PR #13212 can not run in some jenkins jobs after being merged.
However, some jenkins jobs can pass:
https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.6/
https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.4/
https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.2/
https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.7/
https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.3/

Others failed on this test case. But the failures on those jobs are at slightly different checkpoints among different jobs too. So it seems that CliSuite's output capture is flaky for list commands to check for expected output. There are test cases already in `HiveQuerySuite` and `SparkContextSuite` to cover the cases. So I am ignoring 2 test cases added by PR #13212 .

Author: Xin Wu <xinwu@us.ibm.com>

Closes #13276 from xwu0226/SPARK-15431-clisuite.
2016-05-27 08:54:14 -07:00
Zheng RuiFeng 6b1a6180e7 [MINOR] Fix Typos 'a -> an'
## What changes were proposed in this pull request?

`a` -> `an`

I use regex to generate potential error lines:
`grep -in ' a [aeiou]' mllib/src/main/scala/org/apache/spark/ml/*/*scala`
and review them line by line.

## How was this patch tested?

local build
`lint-java` checking

Author: Zheng RuiFeng <ruifengz@foxmail.com>

Closes #13317 from zhengruifeng/a_an.
2016-05-26 22:39:14 -07:00
Reynold Xin 0f61d6efb4 [SPARK-15552][SQL] Remove unnecessary private[sql] methods in SparkSession
## What changes were proposed in this pull request?
SparkSession has a list of unnecessary private[sql] methods. These methods cause some trouble because private[sql] doesn't apply in Java. In the cases that they are easy to remove, we can simply remove them. This patch does that.

As part of this pull request, I also replaced a bunch of protected[sql] with private[sql], to tighten up visibility.

## How was this patch tested?
Updated test cases to reflect the changes.

Author: Reynold Xin <rxin@databricks.com>

Closes #13319 from rxin/SPARK-15552.
2016-05-26 13:03:07 -07:00
lfzCarlosC 02c8072eea [MINOR][MLLIB][STREAMING][SQL] Fix typos
fixed typos for source code for components [mllib] [streaming] and [SQL]

None and obvious.

Author: lfzCarlosC <lfz.carlos@gmail.com>

Closes #13298 from lfzCarlosC/master.
2016-05-25 10:53:57 -07:00
Xin Wu 01659bc50c [SPARK-15431][SQL] Support LIST FILE(s)|JAR(s) command natively
## What changes were proposed in this pull request?
Currently command `ADD FILE|JAR <filepath | jarpath>` is supported natively in SparkSQL. However, when this command is run, the file/jar is added to the resources that can not be looked up by `LIST FILE(s)|JAR(s)` command because the `LIST` command is passed to Hive command processor in Spark-SQL or simply not supported in Spark-shell. There is no way users can find out what files/jars are added to the spark context.
Refer to [Hive commands](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli)

This PR is to support following commands:
`LIST (FILE[s] [filepath ...] | JAR[s] [jarfile ...])`

### For example:
##### LIST FILE(s)
```
scala> spark.sql("add file hdfs://bdavm009.svl.ibm.com:8020/tmp/test.txt")
res1: org.apache.spark.sql.DataFrame = []
scala> spark.sql("add file hdfs://bdavm009.svl.ibm.com:8020/tmp/test1.txt")
res2: org.apache.spark.sql.DataFrame = []

scala> spark.sql("list file hdfs://bdavm009.svl.ibm.com:8020/tmp/test1.txt").show(false)
+----------------------------------------------+
|result                                        |
+----------------------------------------------+
|hdfs://bdavm009.svl.ibm.com:8020/tmp/test1.txt|
+----------------------------------------------+

scala> spark.sql("list files").show(false)
+----------------------------------------------+
|result                                        |
+----------------------------------------------+
|hdfs://bdavm009.svl.ibm.com:8020/tmp/test1.txt|
|hdfs://bdavm009.svl.ibm.com:8020/tmp/test.txt |
+----------------------------------------------+
```

##### LIST JAR(s)
```
scala> spark.sql("add jar /Users/xinwu/spark/core/src/test/resources/TestUDTF.jar")
res9: org.apache.spark.sql.DataFrame = [result: int]

scala> spark.sql("list jar TestUDTF.jar").show(false)
+---------------------------------------------+
|result                                       |
+---------------------------------------------+
|spark://192.168.1.234:50131/jars/TestUDTF.jar|
+---------------------------------------------+

scala> spark.sql("list jars").show(false)
+---------------------------------------------+
|result                                       |
+---------------------------------------------+
|spark://192.168.1.234:50131/jars/TestUDTF.jar|
+---------------------------------------------+
```
## How was this patch tested?
New test cases are added for Spark-SQL, Spark-Shell and SparkContext API code path.

Author: Xin Wu <xinwu@us.ibm.com>
Author: xin Wu <xinwu@us.ibm.com>

Closes #13212 from xwu0226/list_command.
2016-05-23 17:32:01 -07:00
gatorsmile 8f0a3d5bcb [SPARK-15330][SQL] Implement Reset Command
#### What changes were proposed in this pull request?
Like `Set` Command in Hive, `Reset` is also supported by Hive. See the link: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli

Below is the related Hive JIRA: https://issues.apache.org/jira/browse/HIVE-3202

This PR is to implement such a command for resetting the SQL-related configuration to the default values. One of the use case shown in HIVE-3202 is listed below:

> For the purpose of optimization we set various configs per query. It's worthy but all those configs should be reset every time for next query.

#### How was this patch tested?
Added a test case.

Author: gatorsmile <gatorsmile@gmail.com>
Author: xiaoli <lixiao1983@gmail.com>
Author: Xiao Li <xiaoli@Xiaos-MacBook-Pro.local>

Closes #13121 from gatorsmile/resetCommand.
2016-05-21 20:07:34 -07:00
Reynold Xin f2ee0ed4b7 [SPARK-15075][SPARK-15345][SQL] Clean up SparkSession builder and propagate config options to existing sessions if specified
## What changes were proposed in this pull request?
Currently SparkSession.Builder use SQLContext.getOrCreate. It should probably the the other way around, i.e. all the core logic goes in SparkSession, and SQLContext just calls that. This patch does that.

This patch also makes sure config options specified in the builder are propagated to the existing (and of course the new) SparkSession.

## How was this patch tested?
Updated tests to reflect the change, and also introduced a new SparkSessionBuilderSuite that should cover all the branches.

Author: Reynold Xin <rxin@databricks.com>

Closes #13200 from rxin/SPARK-15075.
2016-05-19 21:53:26 -07:00
Sean Owen 122302cbf5 [SPARK-15290][BUILD] Move annotations, like @Since / @DeveloperApi, into spark-tags
## What changes were proposed in this pull request?

(See https://github.com/apache/spark/pull/12416 where most of this was already reviewed and committed; this is just the module structure and move part. This change does not move the annotations into test scope, which was the apparently problem last time.)

Rename `spark-test-tags` -> `spark-tags`; move common annotations like `Since` to `spark-tags`

## How was this patch tested?

Jenkins tests.

Author: Sean Owen <sowen@cloudera.com>

Closes #13074 from srowen/SPARK-15290.
2016-05-17 09:55:53 +01:00
Sean Owen f5576a052d [SPARK-12972][CORE] Update org.apache.httpcomponents.httpclient
## What changes were proposed in this pull request?

(Retry of https://github.com/apache/spark/pull/13049)

- update to httpclient 4.5 / httpcore 4.4
- remove some defunct exclusions
- manage httpmime version to match
- update selenium / httpunit to support 4.5 (possible now that Jetty 9 is used)

## How was this patch tested?

Jenkins tests. Also, locally running the same test command of one Jenkins profile that failed: `mvn -Phadoop-2.6 -Pyarn -Phive -Phive-thriftserver -Pkinesis-asl ...`

Author: Sean Owen <sowen@cloudera.com>

Closes #13117 from srowen/SPARK-12972.2.
2016-05-15 15:56:46 +01:00
bomeng 81bf870848 [SPARK-14897][SQL] upgrade to jetty 9.2.16
## What changes were proposed in this pull request?

Since Jetty 8 is EOL (end of life) and has critical security issue [http://www.securityweek.com/critical-vulnerability-found-jetty-web-server], I think upgrading to 9 is necessary. I am using latest 9.2 since 9.3 requires Java 8+.

`javax.servlet` and `derby` were also upgraded since Jetty 9.2 needs corresponding version.

## How was this patch tested?

Manual test and current test cases should cover it.

Author: bomeng <bmeng@us.ibm.com>

Closes #12916 from bomeng/SPARK-14897.
2016-05-12 20:07:44 +01:00
gatorsmile 5c8fad7b9b [SPARK-15108][SQL] Describe Permanent UDTF
#### What changes were proposed in this pull request?
When Describe a UDTF, the command returns a wrong result. The command is unable to find the function, which has been created and cataloged in the catalog but not in the functionRegistry.

This PR is to correct it. If the function is not in the functionRegistry, we will check the catalog for collecting the information of the UDTF function.

#### How was this patch tested?
Added test cases to verify the results

Author: gatorsmile <gatorsmile@gmail.com>

Closes #12885 from gatorsmile/showFunction.
2016-05-06 11:43:07 -07:00
Dongjoon Hyun 2c170dd3d7 [SPARK-15134][EXAMPLE] Indent SparkSession builder patterns and update binary_classification_metrics_example.py
## What changes were proposed in this pull request?

This issue addresses the comments in SPARK-15031 and also fix java-linter errors.
- Use multiline format in SparkSession builder patterns.
- Update `binary_classification_metrics_example.py` to use `SparkSession`.
- Fix Java Linter errors (in SPARK-13745, SPARK-15031, and so far)

## How was this patch tested?

After passing the Jenkins tests and run `dev/lint-java` manually.

Author: Dongjoon Hyun <dongjoon@apache.org>

Closes #12911 from dongjoon-hyun/SPARK-15134.
2016-05-05 14:37:50 -07:00
Sandeep Singh ed6f3f8a5f [SPARK-15072][SQL][REPL][EXAMPLES] Remove SparkSession.withHiveSupport
## What changes were proposed in this pull request?
Removing the `withHiveSupport` method of `SparkSession`, instead use `enableHiveSupport`

## How was this patch tested?
ran tests locally

Author: Sandeep Singh <sandeep@techaddict.me>

Closes #12851 from techaddict/SPARK-15072.
2016-05-05 14:35:15 -07:00
mcheah b7fdc23ccc [SPARK-12154] Upgrade to Jersey 2
## What changes were proposed in this pull request?

Replace com.sun.jersey with org.glassfish.jersey. Changes to the Spark Web UI code were required to compile. The changes were relatively standard Jersey migration things.

## How was this patch tested?

I did a manual test for the standalone web APIs. Although I didn't test the functionality of the security filter itself, the code that changed non-trivially is how we actually register the filter. I attached a debugger to the Spark master and verified that the SecurityFilter code is indeed invoked upon hitting /api/v1/applications.

Author: mcheah <mcheah@palantir.com>

Closes #12715 from mccheah/feature/upgrade-jersey.
2016-05-05 10:51:03 +01:00
Davies Liu 348c138984 [SPARK-15095][SQL] remove HiveSessionHook from ThriftServer
## What changes were proposed in this pull request?

Remove HiveSessionHook

## How was this patch tested?

No tests needed.

Author: Davies Liu <davies@databricks.com>

Closes #12881 from davies/remove_hooks.
2016-05-03 21:59:03 -07:00
Davies Liu 028c6a5dba [SQL-15102][SQL] remove delegation token support from ThriftServer
## What changes were proposed in this pull request?

These API is only useful for Hadoop, may not work for Spark SQL.

The APIs is kept for source compatibility.

## How was this patch tested?

No unit tests needed.

Author: Davies Liu <davies@databricks.com>

Closes #12878 from davies/remove_delegate.
2016-05-03 14:40:47 -07:00
Davies Liu d6c7b2a5cc [SPARK-15095][SQL] drop binary mode in ThriftServer
## What changes were proposed in this pull request?

This PR drop the support for binary mode in ThriftServer, only HTTP mode is supported now, to reduce the maintain burden.

The code to support binary mode is still kept, just in case if we want it  in future.

## How was this patch tested?

Updated tests to use HTTP mode.

Author: Davies Liu <davies@databricks.com>

Closes #12876 from davies/hide_binary.
2016-05-03 14:15:25 -07:00
Dongjoon Hyun a744457076 [SPARK-15053][BUILD] Fix Java Lint errors on Hive-Thriftserver module
## What changes were proposed in this pull request?

This issue fixes or hides 181 Java linter errors introduced by SPARK-14987 which copied hive service code from Hive. We had better clean up these errors before releasing Spark 2.0.

- Fix UnusedImports (15 lines), RedundantModifier (14 lines), SeparatorWrap (9 lines), MethodParamPad (6 lines), FileTabCharacter (5 lines), ArrayTypeStyle (3 lines), ModifierOrder (3 lines), RedundantImport (1 line), CommentsIndentation (1 line), UpperEll (1 line), FallThrough (1 line), OneStatementPerLine (1 line), NewlineAtEndOfFile (1 line) errors.
- Ignore `LineLength` errors under `hive/service/*` (118 lines).
- Ignore `MethodName` error in `PasswdAuthenticationProvider.java` (1 line).
- Ignore `NoFinalizer` error in `ThreadWithGarbageCleanup.java` (1 line).

## How was this patch tested?

After passing Jenkins building, run `dev/lint-java` manually.
```bash
$ dev/lint-java
Checkstyle checks passed.
```

Author: Dongjoon Hyun <dongjoon@apache.org>

Closes #12831 from dongjoon-hyun/SPARK-15053.
2016-05-03 12:39:37 +01:00
Reynold Xin 8ebae466a3 [SPARK-15004][SQL] Remove zookeeper service discovery code in thrift-server
## What changes were proposed in this pull request?
We recently inlined Hive's thrift server code in SPARK-15004. This patch removes the code related to zookeeper service discovery, Tez, and Hive on Spark, since they are irrelevant.

## How was this patch tested?
N/A - removing dead code

Author: Reynold Xin <rxin@databricks.com>

Closes #12780 from rxin/SPARK-15004.
2016-04-29 13:32:08 -07:00
Davies Liu 7feeb82cb7 [SPARK-14987][SQL] inline hive-service (cli) into sql/hive-thriftserver
## What changes were proposed in this pull request?

This PR copy the thrift-server from hive-service-1.2 (including  TCLIService.thrift and generated Java source code) into sql/hive-thriftserver, so we can do further cleanup and improvements.

## How was this patch tested?

Existing tests.

Author: Davies Liu <davies@databricks.com>

Closes #12764 from davies/thrift_server.
2016-04-29 09:32:42 -07:00
Reynold Xin 054f991c43 [SPARK-14994][SQL] Remove execution hive from HiveSessionState
## What changes were proposed in this pull request?
This patch removes executionHive from HiveSessionState and HiveSharedState.

## How was this patch tested?
Updated test cases.

Author: Reynold Xin <rxin@databricks.com>
Author: Yin Huai <yhuai@databricks.com>

Closes #12770 from rxin/SPARK-14994.
2016-04-29 01:14:02 -07:00
Yin Huai 9c7c42bc6a Revert "[SPARK-14613][ML] Add @Since into the matrix and vector classes in spark-mllib-local"
This reverts commit dae538a4d7.
2016-04-28 19:57:41 -07:00
Pravin Gadakh dae538a4d7 [SPARK-14613][ML] Add @Since into the matrix and vector classes in spark-mllib-local
## What changes were proposed in this pull request?

This PR adds `since` tag into the matrix and vector classes in spark-mllib-local.

## How was this patch tested?

Scala-style checks passed.

Author: Pravin Gadakh <prgadakh@in.ibm.com>

Closes #12416 from pravingadakh/SPARK-14613.
2016-04-28 15:59:18 -07:00
hyukjinkwon f5da592fc6 [SPARK-12143][SQL] Binary type support for Hive thrift server
## What changes were proposed in this pull request?

https://issues.apache.org/jira/browse/SPARK-12143

This PR adds the support for conversion between `SparkRow` in Spark and `RowSet` in Hive for `BinaryType` as `Array[Byte]` (JDBC)
## How was this patch tested?

Unittests in `HiveThriftBinaryServerSuite` (regression test)

Closes #10139

Author: hyukjinkwon <gurwls223@gmail.com>

Closes #12733 from HyukjinKwon/SPARK-12143.
2016-04-27 17:41:05 -07:00
Andrew Or 3c5e65c339 [SPARK-14721][SQL] Remove HiveContext (part 2)
## What changes were proposed in this pull request?

This removes the class `HiveContext` itself along with all code usages associated with it. The bulk of the work was already done in #12485. This is mainly just code cleanup and actually removing the class.

Note: A couple of things will break after this patch. These will be fixed separately.
- the python HiveContext
- all the documentation / comments referencing HiveContext
- there will be no more HiveContext in the REPL (fixed by #12589)

## How was this patch tested?

No change in functionality.

Author: Andrew Or <andrew@databricks.com>

Closes #12585 from andrewor14/delete-hive-context.
2016-04-25 13:23:05 -07:00
Reynold Xin 162e12b085 [SPARK-14877][SQL] Remove HiveMetastoreTypes class
## What changes were proposed in this pull request?
It is unnecessary as DataType.catalogString largely replaces the need for this class.

## How was this patch tested?
Mostly removing dead code and should be covered by existing tests.

Author: Reynold Xin <rxin@databricks.com>

Closes #12644 from rxin/SPARK-14877.
2016-04-23 15:41:17 -07:00
Reynold Xin 890abd1279 [SPARK-14869][SQL] Don't mask exceptions in ResolveRelations
## What changes were proposed in this pull request?
In order to support running SQL directly on files, we added some code in ResolveRelations to catch the exception thrown by catalog.lookupRelation and ignore it. This unfortunately masks all the exceptions. This patch changes the logic to simply test the table's existence.

## How was this patch tested?
I manually hacked some bugs into Spark and made sure the exceptions were being propagated up.

Author: Reynold Xin <rxin@databricks.com>

Closes #12634 from rxin/SPARK-14869.
2016-04-23 12:49:36 -07:00
Reynold Xin fddd3aee0d [SPARK-14871][SQL] Disable StatsReportListener to declutter output
## What changes were proposed in this pull request?
Spark SQL inherited from Shark to use the StatsReportListener. Unfortunately this clutters the spark-sql CLI output and makes it very difficult to read the actual query results.

## How was this patch tested?
Built and tested in spark-sql CLI.

Author: Reynold Xin <rxin@databricks.com>

Closes #12635 from rxin/SPARK-14871.
2016-04-23 12:42:37 -07:00
Reynold Xin d7d0cad0ad [SPARK-14855][SQL] Add "Exec" suffix to physical operators
## What changes were proposed in this pull request?
This patch adds "Exec" suffix to all physical operators. Before this patch, Spark's physical operators and logical operators are named the same (e.g. Project could be logical.Project or execution.Project), which caused small issues in code review and bigger issues in code refactoring.

## How was this patch tested?
N/A

Author: Reynold Xin <rxin@databricks.com>

Closes #12617 from rxin/exec-node.
2016-04-22 17:43:56 -07:00