## What changes were proposed in this pull request?
Currently, Spark Thrift Server ignores the default database in URI. This PR supports that like the following.
```sql
$ bin/beeline -u jdbc:hive2://localhost:10000 -e "create database testdb"
$ bin/beeline -u jdbc:hive2://localhost:10000/testdb -e "create table t(a int)"
$ bin/beeline -u jdbc:hive2://localhost:10000/testdb -e "show tables"
...
+------------+--------------+--+
| tableName | isTemporary |
+------------+--------------+--+
| t | false |
+------------+--------------+--+
1 row selected (0.347 seconds)
$ bin/beeline -u jdbc:hive2://localhost:10000 -e "show tables"
...
+------------+--------------+--+
| tableName | isTemporary |
+------------+--------------+--+
+------------+--------------+--+
No rows selected (0.098 seconds)
```
## How was this patch tested?
Manual.
Note: I tried to add a test case for this, but I cannot found a suitable testsuite for this. I'll add the testcase if some advice is given.
Author: Dongjoon Hyun <dongjoon@apache.org>
Closes#15399 from dongjoon-hyun/SPARK-17819.
## What changes were proposed in this pull request?
This expands calls to Jetty's simple `ServerConnector` constructor to explicitly specify a `ScheduledExecutorScheduler` that makes daemon threads. It should otherwise result in exactly the same configuration, because the other args are copied from the constructor that is currently called.
(I'm not sure we should change the Hive Thriftserver impl, but I did anyway.)
This also adds `sc.stop()` to the quick start guide example.
## How was this patch tested?
Existing tests; _pending_ at least manual verification of the fix.
Author: Sean Owen <sowen@cloudera.com>
Closes#15381 from srowen/SPARK-17707.
### What changes were proposed in this pull request?
Since `HiveClient` is used to interact with the Hive metastore, it should be hidden in `HiveExternalCatalog`. After moving `HiveClient` into `HiveExternalCatalog`, `HiveSharedState` becomes a wrapper of `HiveExternalCatalog`. Thus, removal of `HiveSharedState` becomes straightforward. After removal of `HiveSharedState`, the reflection logic is directly applied on the choice of `ExternalCatalog` types, based on the configuration of `CATALOG_IMPLEMENTATION`.
~~`HiveClient` is also used/invoked by the other entities besides HiveExternalCatalog, we defines the following two APIs: getClient and getNewClient~~
### How was this patch tested?
The existing test cases
Author: gatorsmile <gatorsmile@gmail.com>
Closes#14757 from gatorsmile/removeHiveClient.
## What changes were proposed in this pull request?
ThriftServer will have some thread-safe problem in **SparkSQLOperationManager**.
Add a SynchronizedMap trait for the maps in it to avoid this problem.
Details in [SPARK-16941](https://issues.apache.org/jira/browse/SPARK-16941)
## How was this patch tested?
NA
Author: huangzhaowei <carlmartinmax@gmail.com>
Closes#14534 from SaintBacchus/SPARK-16941.
## What changes were proposed in this pull request?
Add a constant iterator which point to head of result. The header will be used to reset iterator when fetch result from first row repeatedly.
JIRA ticket https://issues.apache.org/jira/browse/SPARK-16563
## How was this patch tested?
This bug was found when using Cloudera HUE connecting to spark sql thrift server, currently SQL statement result can be only fetched for once. The fix was tested manually with Cloudera HUE, With this fix, HUE can fetch spark SQL results repeatedly through thrift server.
Author: Alice <alice.gugu@gmail.com>
Author: Alice <guhq@garena.com>
Closes#14218 from alicegugu/SparkSQLFetchResultsBug.
https://issues.apache.org/jira/browse/SPARK-16535
## What changes were proposed in this pull request?
When I scan through the pom.xml of sub projects, I found this warning as below and attached screenshot
```
Definition of groupId is redundant, because it's inherited from the parent
```
![screen shot 2016-07-13 at 3 13 11 pm](https://cloud.githubusercontent.com/assets/3925641/16823121/744f893e-4916-11e6-8a52-042f83b9db4e.png)
I've tried to remove some of the lines with groupId definition, and the build on my local machine is still ok.
```
<groupId>org.apache.spark</groupId>
```
As I just find now `<maven.version>3.3.9</maven.version>` is being used in Spark 2.x, and Maven-3 supports versionless parent elements: Maven 3 will remove the need to specify the parent version in sub modules. THIS is great (in Maven 3.1).
ref: http://stackoverflow.com/questions/3157240/maven-3-worth-it/3166762#3166762
## How was this patch tested?
I've tested by re-building the project, and build succeeded.
Author: Xin Ren <iamshrek@126.com>
Closes#14189 from keypointt/SPARK-16535.
## What changes were proposed in this pull request?
After SPARK-16476 (committed earlier today as #14128), we can finally bump the version number.
## How was this patch tested?
N/A
Author: Reynold Xin <rxin@databricks.com>
Closes#14130 from rxin/SPARK-16477.
## What changes were proposed in this pull request?
This PR makes spark-sql (backed by SparkSQLCLIDriver) respects confs set by hiveconf, which is what we do in previous versions. The change is that when we start SparkSQLCLIDriver, we explicitly set confs set through --hiveconf to SQLContext's conf (basically treating those confs as a SparkSQL conf).
## How was this patch tested?
A new test in CliSuite.
Closes#13542
Author: Cheng Hao <hao.cheng@intel.com>
Author: Yin Huai <yhuai@databricks.com>
Closes#14058 from yhuai/hiveConfThriftServer.
## What changes were proposed in this pull request?
Replace use of `commons-lang` in favor of `commons-lang3` and forbid the former via scalastyle; remove `NotImplementedException` from `comons-lang` in favor of JDK `UnsupportedOperationException`
## How was this patch tested?
Jenkins tests
Author: Sean Owen <sowen@cloudera.com>
Closes#13843 from srowen/SPARK-16129.
Returning binary mode to ThriftServer for backward compatibility.
Tested with Squirrel and Tableau.
Author: Egor Pakhomov <egor@anchorfree.com>
Closes#13667 from epahomov/SPARK-15095-2.0.
## What changes were proposed in this pull request?
Just minor doc fix.
\cc yhuai
Author: Jeff Zhang <zjffdu@apache.org>
Closes#13659 from zjffdu/doc_fix.
## What changes were proposed in this pull request?
This PR reworks on the CliSuite test cases for `LIST FILES/JARS` commands.
CC yhuai Thanks!
Author: Xin Wu <xinwu@us.ibm.com>
Closes#13361 from xwu0226/SPARK-15431-clisuite-new.
## What changes were proposed in this pull request?
`a` -> `an`
I use regex to generate potential error lines:
`grep -in ' a [aeiou]' mllib/src/main/scala/org/apache/spark/ml/*/*scala`
and review them line by line.
## How was this patch tested?
local build
`lint-java` checking
Author: Zheng RuiFeng <ruifengz@foxmail.com>
Closes#13317 from zhengruifeng/a_an.
## What changes were proposed in this pull request?
SparkSession has a list of unnecessary private[sql] methods. These methods cause some trouble because private[sql] doesn't apply in Java. In the cases that they are easy to remove, we can simply remove them. This patch does that.
As part of this pull request, I also replaced a bunch of protected[sql] with private[sql], to tighten up visibility.
## How was this patch tested?
Updated test cases to reflect the changes.
Author: Reynold Xin <rxin@databricks.com>
Closes#13319 from rxin/SPARK-15552.
fixed typos for source code for components [mllib] [streaming] and [SQL]
None and obvious.
Author: lfzCarlosC <lfz.carlos@gmail.com>
Closes#13298 from lfzCarlosC/master.
## What changes were proposed in this pull request?
Currently command `ADD FILE|JAR <filepath | jarpath>` is supported natively in SparkSQL. However, when this command is run, the file/jar is added to the resources that can not be looked up by `LIST FILE(s)|JAR(s)` command because the `LIST` command is passed to Hive command processor in Spark-SQL or simply not supported in Spark-shell. There is no way users can find out what files/jars are added to the spark context.
Refer to [Hive commands](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli)
This PR is to support following commands:
`LIST (FILE[s] [filepath ...] | JAR[s] [jarfile ...])`
### For example:
##### LIST FILE(s)
```
scala> spark.sql("add file hdfs://bdavm009.svl.ibm.com:8020/tmp/test.txt")
res1: org.apache.spark.sql.DataFrame = []
scala> spark.sql("add file hdfs://bdavm009.svl.ibm.com:8020/tmp/test1.txt")
res2: org.apache.spark.sql.DataFrame = []
scala> spark.sql("list file hdfs://bdavm009.svl.ibm.com:8020/tmp/test1.txt").show(false)
+----------------------------------------------+
|result |
+----------------------------------------------+
|hdfs://bdavm009.svl.ibm.com:8020/tmp/test1.txt|
+----------------------------------------------+
scala> spark.sql("list files").show(false)
+----------------------------------------------+
|result |
+----------------------------------------------+
|hdfs://bdavm009.svl.ibm.com:8020/tmp/test1.txt|
|hdfs://bdavm009.svl.ibm.com:8020/tmp/test.txt |
+----------------------------------------------+
```
##### LIST JAR(s)
```
scala> spark.sql("add jar /Users/xinwu/spark/core/src/test/resources/TestUDTF.jar")
res9: org.apache.spark.sql.DataFrame = [result: int]
scala> spark.sql("list jar TestUDTF.jar").show(false)
+---------------------------------------------+
|result |
+---------------------------------------------+
|spark://192.168.1.234:50131/jars/TestUDTF.jar|
+---------------------------------------------+
scala> spark.sql("list jars").show(false)
+---------------------------------------------+
|result |
+---------------------------------------------+
|spark://192.168.1.234:50131/jars/TestUDTF.jar|
+---------------------------------------------+
```
## How was this patch tested?
New test cases are added for Spark-SQL, Spark-Shell and SparkContext API code path.
Author: Xin Wu <xinwu@us.ibm.com>
Author: xin Wu <xinwu@us.ibm.com>
Closes#13212 from xwu0226/list_command.
#### What changes were proposed in this pull request?
Like `Set` Command in Hive, `Reset` is also supported by Hive. See the link: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli
Below is the related Hive JIRA: https://issues.apache.org/jira/browse/HIVE-3202
This PR is to implement such a command for resetting the SQL-related configuration to the default values. One of the use case shown in HIVE-3202 is listed below:
> For the purpose of optimization we set various configs per query. It's worthy but all those configs should be reset every time for next query.
#### How was this patch tested?
Added a test case.
Author: gatorsmile <gatorsmile@gmail.com>
Author: xiaoli <lixiao1983@gmail.com>
Author: Xiao Li <xiaoli@Xiaos-MacBook-Pro.local>
Closes#13121 from gatorsmile/resetCommand.
## What changes were proposed in this pull request?
Currently SparkSession.Builder use SQLContext.getOrCreate. It should probably the the other way around, i.e. all the core logic goes in SparkSession, and SQLContext just calls that. This patch does that.
This patch also makes sure config options specified in the builder are propagated to the existing (and of course the new) SparkSession.
## How was this patch tested?
Updated tests to reflect the change, and also introduced a new SparkSessionBuilderSuite that should cover all the branches.
Author: Reynold Xin <rxin@databricks.com>
Closes#13200 from rxin/SPARK-15075.
## What changes were proposed in this pull request?
(See https://github.com/apache/spark/pull/12416 where most of this was already reviewed and committed; this is just the module structure and move part. This change does not move the annotations into test scope, which was the apparently problem last time.)
Rename `spark-test-tags` -> `spark-tags`; move common annotations like `Since` to `spark-tags`
## How was this patch tested?
Jenkins tests.
Author: Sean Owen <sowen@cloudera.com>
Closes#13074 from srowen/SPARK-15290.
## What changes were proposed in this pull request?
(Retry of https://github.com/apache/spark/pull/13049)
- update to httpclient 4.5 / httpcore 4.4
- remove some defunct exclusions
- manage httpmime version to match
- update selenium / httpunit to support 4.5 (possible now that Jetty 9 is used)
## How was this patch tested?
Jenkins tests. Also, locally running the same test command of one Jenkins profile that failed: `mvn -Phadoop-2.6 -Pyarn -Phive -Phive-thriftserver -Pkinesis-asl ...`
Author: Sean Owen <sowen@cloudera.com>
Closes#13117 from srowen/SPARK-12972.2.
## What changes were proposed in this pull request?
Since Jetty 8 is EOL (end of life) and has critical security issue [http://www.securityweek.com/critical-vulnerability-found-jetty-web-server], I think upgrading to 9 is necessary. I am using latest 9.2 since 9.3 requires Java 8+.
`javax.servlet` and `derby` were also upgraded since Jetty 9.2 needs corresponding version.
## How was this patch tested?
Manual test and current test cases should cover it.
Author: bomeng <bmeng@us.ibm.com>
Closes#12916 from bomeng/SPARK-14897.
#### What changes were proposed in this pull request?
When Describe a UDTF, the command returns a wrong result. The command is unable to find the function, which has been created and cataloged in the catalog but not in the functionRegistry.
This PR is to correct it. If the function is not in the functionRegistry, we will check the catalog for collecting the information of the UDTF function.
#### How was this patch tested?
Added test cases to verify the results
Author: gatorsmile <gatorsmile@gmail.com>
Closes#12885 from gatorsmile/showFunction.
## What changes were proposed in this pull request?
This issue addresses the comments in SPARK-15031 and also fix java-linter errors.
- Use multiline format in SparkSession builder patterns.
- Update `binary_classification_metrics_example.py` to use `SparkSession`.
- Fix Java Linter errors (in SPARK-13745, SPARK-15031, and so far)
## How was this patch tested?
After passing the Jenkins tests and run `dev/lint-java` manually.
Author: Dongjoon Hyun <dongjoon@apache.org>
Closes#12911 from dongjoon-hyun/SPARK-15134.
## What changes were proposed in this pull request?
Removing the `withHiveSupport` method of `SparkSession`, instead use `enableHiveSupport`
## How was this patch tested?
ran tests locally
Author: Sandeep Singh <sandeep@techaddict.me>
Closes#12851 from techaddict/SPARK-15072.
## What changes were proposed in this pull request?
Replace com.sun.jersey with org.glassfish.jersey. Changes to the Spark Web UI code were required to compile. The changes were relatively standard Jersey migration things.
## How was this patch tested?
I did a manual test for the standalone web APIs. Although I didn't test the functionality of the security filter itself, the code that changed non-trivially is how we actually register the filter. I attached a debugger to the Spark master and verified that the SecurityFilter code is indeed invoked upon hitting /api/v1/applications.
Author: mcheah <mcheah@palantir.com>
Closes#12715 from mccheah/feature/upgrade-jersey.
## What changes were proposed in this pull request?
Remove HiveSessionHook
## How was this patch tested?
No tests needed.
Author: Davies Liu <davies@databricks.com>
Closes#12881 from davies/remove_hooks.
## What changes were proposed in this pull request?
These API is only useful for Hadoop, may not work for Spark SQL.
The APIs is kept for source compatibility.
## How was this patch tested?
No unit tests needed.
Author: Davies Liu <davies@databricks.com>
Closes#12878 from davies/remove_delegate.
## What changes were proposed in this pull request?
This PR drop the support for binary mode in ThriftServer, only HTTP mode is supported now, to reduce the maintain burden.
The code to support binary mode is still kept, just in case if we want it in future.
## How was this patch tested?
Updated tests to use HTTP mode.
Author: Davies Liu <davies@databricks.com>
Closes#12876 from davies/hide_binary.
## What changes were proposed in this pull request?
We recently inlined Hive's thrift server code in SPARK-15004. This patch removes the code related to zookeeper service discovery, Tez, and Hive on Spark, since they are irrelevant.
## How was this patch tested?
N/A - removing dead code
Author: Reynold Xin <rxin@databricks.com>
Closes#12780 from rxin/SPARK-15004.
## What changes were proposed in this pull request?
This PR copy the thrift-server from hive-service-1.2 (including TCLIService.thrift and generated Java source code) into sql/hive-thriftserver, so we can do further cleanup and improvements.
## How was this patch tested?
Existing tests.
Author: Davies Liu <davies@databricks.com>
Closes#12764 from davies/thrift_server.
## What changes were proposed in this pull request?
This patch removes executionHive from HiveSessionState and HiveSharedState.
## How was this patch tested?
Updated test cases.
Author: Reynold Xin <rxin@databricks.com>
Author: Yin Huai <yhuai@databricks.com>
Closes#12770 from rxin/SPARK-14994.
## What changes were proposed in this pull request?
This PR adds `since` tag into the matrix and vector classes in spark-mllib-local.
## How was this patch tested?
Scala-style checks passed.
Author: Pravin Gadakh <prgadakh@in.ibm.com>
Closes#12416 from pravingadakh/SPARK-14613.
## What changes were proposed in this pull request?
https://issues.apache.org/jira/browse/SPARK-12143
This PR adds the support for conversion between `SparkRow` in Spark and `RowSet` in Hive for `BinaryType` as `Array[Byte]` (JDBC)
## How was this patch tested?
Unittests in `HiveThriftBinaryServerSuite` (regression test)
Closes#10139
Author: hyukjinkwon <gurwls223@gmail.com>
Closes#12733 from HyukjinKwon/SPARK-12143.
## What changes were proposed in this pull request?
This removes the class `HiveContext` itself along with all code usages associated with it. The bulk of the work was already done in #12485. This is mainly just code cleanup and actually removing the class.
Note: A couple of things will break after this patch. These will be fixed separately.
- the python HiveContext
- all the documentation / comments referencing HiveContext
- there will be no more HiveContext in the REPL (fixed by #12589)
## How was this patch tested?
No change in functionality.
Author: Andrew Or <andrew@databricks.com>
Closes#12585 from andrewor14/delete-hive-context.
## What changes were proposed in this pull request?
It is unnecessary as DataType.catalogString largely replaces the need for this class.
## How was this patch tested?
Mostly removing dead code and should be covered by existing tests.
Author: Reynold Xin <rxin@databricks.com>
Closes#12644 from rxin/SPARK-14877.
## What changes were proposed in this pull request?
In order to support running SQL directly on files, we added some code in ResolveRelations to catch the exception thrown by catalog.lookupRelation and ignore it. This unfortunately masks all the exceptions. This patch changes the logic to simply test the table's existence.
## How was this patch tested?
I manually hacked some bugs into Spark and made sure the exceptions were being propagated up.
Author: Reynold Xin <rxin@databricks.com>
Closes#12634 from rxin/SPARK-14869.
## What changes were proposed in this pull request?
Spark SQL inherited from Shark to use the StatsReportListener. Unfortunately this clutters the spark-sql CLI output and makes it very difficult to read the actual query results.
## How was this patch tested?
Built and tested in spark-sql CLI.
Author: Reynold Xin <rxin@databricks.com>
Closes#12635 from rxin/SPARK-14871.
## What changes were proposed in this pull request?
This patch adds "Exec" suffix to all physical operators. Before this patch, Spark's physical operators and logical operators are named the same (e.g. Project could be logical.Project or execution.Project), which caused small issues in code review and bigger issues in code refactoring.
## How was this patch tested?
N/A
Author: Reynold Xin <rxin@databricks.com>
Closes#12617 from rxin/exec-node.
## What changes were proposed in this pull request?
Add the native support for LOAD DATA DDL command that loads data into Hive table/partition.
## How was this patch tested?
`HiveDDLCommandSuite` and `HiveQuerySuite`. Besides, few Hive tests (`WindowQuerySuite`, `HiveTableScanSuite` and `HiveSerDeSuite`) also use `LOAD DATA` command.
Author: Liang-Chi Hsieh <simonh@tw.ibm.com>
Closes#12412 from viirya/ddl-load-data.
## What changes were proposed in this pull request?
This patch removes HiveQueryExecution. As part of this, I consolidated all the describe commands into DescribeTableCommand.
## How was this patch tested?
Should be covered by existing tests.
Author: Reynold Xin <rxin@databricks.com>
Closes#12588 from rxin/SPARK-14826.
## What changes were proposed in this pull request?
Just a rename so we can get rid of `HiveContext.scala`. Note that this will conflict with #12585.
## How was this patch tested?
No change in functionality.
Author: Andrew Or <andrew@databricks.com>
Closes#12586 from andrewor14/rename-hc-object.
## What changes were proposed in this pull request?
This PR has two main changes.
1. Move Hive-specific methods from HiveContext to HiveSessionState, which help the work of removing HiveContext.
2. Create a SparkSession Class, which will later be the entry point of Spark SQL users.
## How was this patch tested?
Existing tests
This PR is trying to fix test failures of https://github.com/apache/spark/pull/12485.
Author: Andrew Or <andrew@databricks.com>
Author: Yin Huai <yhuai@databricks.com>
Closes#12522 from yhuai/spark-session.
When `Await.result` throws an exception which originated from a different thread, the resulting stacktrace doesn't include the path leading to the `Await.result` call itself, making it difficult to identify the impact of these exceptions. For example, I've seen cases where broadcast cleaning errors propagate to the main thread and crash it but the resulting stacktrace doesn't include any of the main thread's code, making it difficult to pinpoint which exception crashed that thread.
This patch addresses this issue by explicitly catching, wrapping, and re-throwing exceptions that are thrown by `Await.result`.
I tested this manually using 16b31c8251, a patch which reproduces an issue where an RPC exception which occurs while unpersisting RDDs manages to crash the main thread without any useful stacktrace, and verified that informative, full stacktraces were generated after applying the fix in this PR.
/cc rxin nongli yhuai anabranch
Author: Josh Rosen <joshrosen@databricks.com>
Closes#12433 from JoshRosen/wrap-and-rethrow-await-exceptions.
## What changes were proposed in this pull request?
This is just cleanup. This allows us to remove HiveContext later without inflating the diff too much. This PR fixes the conflicts of https://github.com/apache/spark/pull/12431. It also removes the `def hiveConf` from `HiveSqlParser`. So, we will pass the HiveConf associated with a session explicitly instead of relying on Hive's `SessionState` to pass `HiveConf`.
## How was this patch tested?
Existing tests.
Closes#12431
Author: Andrew Or <andrew@databricks.com>
Author: Yin Huai <yhuai@databricks.com>
Closes#12449 from yhuai/hiveconf.
## What changes were proposed in this pull request?
This patch implements the `CREATE TABLE` command using the `SessionCatalog`. Previously we handled only `CTAS` and `CREATE TABLE ... USING`. This requires us to refactor `CatalogTable` to accept various fields (e.g. bucket and skew columns) and pass them to Hive.
WIP: Note that I haven't verified whether this actually works yet! But I believe it does.
## How was this patch tested?
Tests will come in a future commit.
Author: Andrew Or <andrew@databricks.com>
Author: Yin Huai <yhuai@databricks.com>
Closes#12271 from andrewor14/create-table-ddl.
#### What changes were proposed in this pull request?
This PR is to address the comment: https://github.com/apache/spark/pull/12146#discussion-diff-59092238. It removes the function `isViewSupported` from `SessionCatalog`. After the removal, we still can capture the user errors if users try to drop a table using `DROP VIEW`.
#### How was this patch tested?
Modified the existing test cases
Author: gatorsmile <gatorsmile@gmail.com>
Closes#12284 from gatorsmile/followupDropTable.