Commit graph

533 commits

Author SHA1 Message Date
Max Gekk aa0d00de5e [SPARK-35018][SQL][TESTS] Check transferring of year-month intervals via Hive Thrift server
### What changes were proposed in this pull request?
1. Add a test to check that Thrift server is able to collect year-month intervals and transfer them via thrift protocol.
2. Improve similar test for day-time intervals. After the changes, the test doesn't depend on the result of date subtractions. In the future, the type of date subtract can be changed. So, current PR should make the test tolerant to the changes.

### Why are the changes needed?
To improve test coverage.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
By running the modified test suite:
```
$ ./build/sbt -Phive -Phive-thriftserver "test:testOnly *SparkThriftServerProtocolVersionsSuite"
```

Closes #32240 from MaxGekk/year-month-interval-thrift-protocol.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-20 08:52:37 +03:00
Kent Yao 2d161cb3a1 [SPARK-35102][SQL] Make spark.sql.hive.version read-only, not deprecated and meaningful
### What changes were proposed in this pull request?

Firstly let's take a look at the definition and comment.

```
// A fake config which is only here for backward compatibility reasons. This config has no effect
// to Spark, just for reporting the builtin Hive version of Spark to existing applications that
// already rely on this config.
val FAKE_HIVE_VERSION = buildConf("spark.sql.hive.version")
  .doc(s"deprecated, please use ${HIVE_METASTORE_VERSION.key} to get the Hive version in Spark.")
  .version("1.1.1")
  .fallbackConf(HIVE_METASTORE_VERSION)
```
It is used for reporting the built-in Hive version but the current status is unsatisfactory, as it is could be changed in many ways e.g. --conf/SET syntax.

It is marked as deprecated but kept a long way until now. I guess it is hard for us to remove it and not even necessary.

On second thought, it's actually good for us to keep it to work with the `spark.sql.hive.metastore.version`. As when `spark.sql.hive.metastore.version` is changed, it could be used to report the compiled hive version statically, it's useful when an error occurs in this case. So this parameter should be fixed to compiled hive version.

### Why are the changes needed?

`spark.sql.hive.version` is useful in certain cases and should be read-only

### Does this PR introduce _any_ user-facing change?

`spark.sql.hive.version` now is read-only

### How was this patch tested?

new test cases

Closes #32200 from yaooqinn/SPARK-35102.

Authored-by: Kent Yao <yao@apache.org>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2021-04-19 14:40:21 +00:00
Max Gekk 1d1ed3eb25 [SPARK-35107][SQL] Parse unit-to-unit interval literals to ANSI intervals
### What changes were proposed in this pull request?
Parse the year-month interval literals like `INTERVAL '1-1' YEAR TO MONTH` to values of `YearMonthIntervalType`, and day-time interval literals to `DayTimeIntervalType` values. Currently, Spark SQL supports:
- DAY TO HOUR
- DAY TO MINUTE
- DAY TO SECOND
- HOUR TO MINUTE
- HOUR TO SECOND
- MINUTE TO SECOND

All such interval literals are converted to `DayTimeIntervalType`, and `YEAR TO MONTH` to `YearMonthIntervalType` while loosing info about `from` and `to` units.

**Note**: new behavior is under the SQL config `spark.sql.legacy.interval.enabled` which is `false` by default. When the config is set to `true`, the interval literals are parsed to `CaledarIntervalType` values.

Closes #32176

### Why are the changes needed?
To conform the ANSI SQL standard which assumes conversions of interval literals to year-month or day-time interval but not to mixed interval type like Catalyst's `CalendarIntervalType`.

### Does this PR introduce _any_ user-facing change?
Yes.

Before:
```sql
spark-sql> SELECT INTERVAL '1 01:02:03.123' DAY TO SECOND;
1 days 1 hours 2 minutes 3.123 seconds
spark-sql> SELECT typeof(INTERVAL '1 01:02:03.123' DAY TO SECOND);
interval
```

After:
```sql
spark-sql> SELECT INTERVAL '1 01:02:03.123' DAY TO SECOND;
1 01:02:03.123000000
spark-sql> SELECT typeof(INTERVAL '1 01:02:03.123' DAY TO SECOND);
day-time interval
```

### How was this patch tested?
1. By running the affected test suites:
```
$ ./build/sbt "test:testOnly *.ExpressionParserSuite"
$ SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "sql/testOnly *SQLQueryTestSuite -- -z interval.sql"
$ SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "sql/testOnly *SQLQueryTestSuite -- -z create_view.sql"
$ SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "sql/testOnly *SQLQueryTestSuite -- -z date.sql"
$ SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "sql/testOnly *SQLQueryTestSuite -- -z timestamp.sql"
```
2. PostgresSQL tests are executed with `spark.sql.legacy.interval.enabled` is set to `true` to keep compatibility with PostgreSQL output:
```sql
> SELECT interval '999' second;
0 years 0 mons 0 days 0 hours 16 mins 39.00 secs
```

Closes #32209 from MaxGekk/parse-ansi-interval-literals.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-19 16:00:59 +03:00
Angerszhuuuu 9b2e0d6191 [SPARK-35086][SQL][CORE] --verbose should be passed to Spark SQL CLI too
### What changes were proposed in this pull request?
In current code, if we run spark sql with
```
./bin/spark-sql --verbose
```
It won't be passed to end SparkSQLCliDriver, then the SessionState won't call `setIsVerbose`

In the CLI option, it shows
```
CLI options:
 -v,--verbose                     Verbose mode (echo executed SQL to the
                                  console)
```

It's not consistent. This pr fix this issue
### Why are the changes needed?
Fix bug

### Does this PR introduce _any_ user-facing change?
when user call `-v` when run spark sql, sql will be echoed to console.

### How was this patch tested?
Added UT

Closes #32163 from AngersZhuuuu/SPARK-35086.

Authored-by: Angerszhuuuu <angers.zhu@gmail.com>
Signed-off-by: Yuming Wang <yumwang@ebay.com>
2021-04-15 12:59:20 +08:00
Gengliang Wang 5d126537d3 [MINOR][TESTS] Enhance the test instruction of ThriftServerQueryTestSuite
### What changes were proposed in this pull request?

Enhance the test instruction of ThriftServerQueryTestSuite:
1. how to run a single test case
2. how to regenerate golden file for a single test

### Why are the changes needed?

Better documentation.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

No, just enhance the comments.

Closes #32141 from gengliangwang/updateComment.

Authored-by: Gengliang Wang <ltnwgl@gmail.com>
Signed-off-by: Gengliang Wang <ltnwgl@gmail.com>
2021-04-13 16:49:20 +08:00
Max Gekk 8f8bac6435 [SPARK-34905][SQL][TESTS] Enable ANSI intervals in SQLQueryTestSuite/ThriftServerQueryTestSuite
### What changes were proposed in this pull request?
Remove `spark.sql.legacy.interval.enabled` settings from `SQLQueryTestSuite`/`ThriftServerQueryTestSuite` that enables new ANSI intervals by default.

### Why are the changes needed?
To use default settings for intervals, and test new ANSI intervals - year-month and day-time interval introduced by SPARK-27793.

### Does this PR introduce _any_ user-facing change?
Should not because this affects tests only.

### How was this patch tested?
By running the affected tests, for instance:
```
$ build/sbt "sql/testOnly *SQLQueryTestSuite -- -z datetime.sql"
$ build/sbt "sql/testOnly *SQLQueryTestSuite -- -z date.sql"
$ build/sbt "sql/testOnly *SQLQueryTestSuite -- -z timestamp.sql"
$ build/sbt "sql/testOnly *SQLQueryTestSuite -- -z interval.sql"
```

Closes #32099 from MaxGekk/enable-ansi-intervals-sql-tests.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2021-04-12 09:25:51 +00:00
Max Gekk 90820b3ec3 [SPARK-35017][SQL] Transfer ANSI intervals via Hive Thrift server
### What changes were proposed in this pull request?
1. Map Catalyst's interval types to Hive's types:
    - YearMonthIntervalType -> `interval_year_month`
    - DayTimeIntervalType -> `interval_day_time`
2. Invoke `HiveResult.toHiveString()` to convert external intervals types ` java.time.Period`/`java.time.Duration` to strings.

### Why are the changes needed?
1. To be able to retrieve ANSI intervals via Hive Thrift server.
2. This fixes the issue:
```sql
 $ ./sbin/start-thriftserver.sh
 $ ./bin/beeline
Beeline version 2.3.8 by Apache Hive
beeline> !connect jdbc:hive2://localhost:10000/default "" "" ""
Connecting to jdbc:hive2://localhost:10000/default
Connected to: Spark SQL (version 3.2.0-SNAPSHOT)
0: jdbc:hive2://localhost:10000/default> select timestamp'2021-01-01 01:02:03.000001' - date'2020-12-31';
Error: java.lang.IllegalArgumentException: Unrecognized type name: day-time interval (state=,code=0)
```
3. It should unblock https://github.com/apache/spark/pull/32099 which enables `*.sql` tests in `ThriftServerQueryTestSuite`.

### Does this PR introduce _any_ user-facing change?
Yes. After the changes:
```sql
0: jdbc:hive2://localhost:10000/default> select timestamp'2021-01-01 01:02:03.000001' - date'2020-12-31';
+----------------------------------------------------+
| subtracttimestamps(TIMESTAMP '2021-01-01 01:02:03.000001', DATE '2020-12-31') |
+----------------------------------------------------+
| 1 01:02:03.000001000                               |
+----------------------------------------------------+
1 row selected (1.637 seconds)
```

### How was this patch tested?
By running new test:
```
$ ./build/sbt -Phive -Phive-thriftserver "test:testOnly *SparkThriftServerProtocolVersionsSuite"
$ ./build/sbt -Phive -Phive-thriftserver "test:testOnly *SparkMetadataOperationSuite"
```
Also checked an array of an interval:
```sql
0: jdbc:hive2://localhost:10000/default> select array(timestamp'2021-01-01 01:02:03.000001' - date'2020-12-31');
+----------------------------------------------------+
| array(subtracttimestamps(TIMESTAMP '2021-01-01 01:02:03.000001', DATE '2020-12-31')) |
+----------------------------------------------------+
| [1 01:02:03.000001000]                             |
+----------------------------------------------------+
```

Closes #32121 from MaxGekk/ansi-intervals-thrift-protocol.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2021-04-12 11:56:10 +09:00
Max Gekk 162f0560e6 [SPARK-34896][SQL] Return day-time interval from dates subtraction
### What changes were proposed in this pull request?
1. Add the SQL config `spark.sql.legacy.interval.enabled` which will control when Spark SQL should use `CalendarIntervalType` instead of ANSI intervals.
2. Modify the `SubtractDates` expression to return values of `DayTimeIntervalType` when `spark.sql.legacy.interval.enabled` is set to `false` (which is the default).

### Why are the changes needed?
To conform to the ANSI SQL standard which requires ANSI intervals as the result of dates subtraction, see
<img width="656" alt="Screenshot 2021-03-29 at 19 09 34" src="https://user-images.githubusercontent.com/1580697/112866455-7e2f0d00-90c2-11eb-96e6-3feb7eea7e09.png">

### Does this PR introduce _any_ user-facing change?
Yes.

### How was this patch tested?
By running new tests:
```
$ build/sbt "test:testOnly *DateExpressionsSuite"
$ build/sbt "test:testOnly *ColumnExpressionSuite"
```
and some tests from `SQLQueryTestSuite`:
```
$ build/sbt "sql/testOnly *SQLQueryTestSuite -- -z date.sql"
$ build/sbt "sql/testOnly *SQLQueryTestSuite -- -z datetime.sql"
$ build/sbt "sql/testOnly *SQLQueryTestSuite -- -z interval.sql"
```

Closes #31996 from MaxGekk/subtract-dates-to-intervals.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-03-31 08:20:13 +03:00
Terry Kim 7953fcdb56 [SPARK-34700][SQL] SessionCatalog's temporary view related APIs should take/return more concrete types
### What changes were proposed in this pull request?

Now that all the temporary views are wrapped with `TemporaryViewRelation`(#31273, #31652, and #31825), this PR proposes to update `SessionCatalog`'s APIs for temporary views to take or return more concrete types.

APIs that will take `TemporaryViewRelation` instead of `LogicalPlan`:
```
createTempView, createGlobalTempView, alterTempViewDefinition
```

APIs that will return `TemporaryViewRelation` instead of `LogicalPlan`:
```
getRawTempView, getRawGlobalTempView
```

APIs that will return `View` instead of `LogicalPlan`:
```
getTempView, getGlobalTempView, lookupTempView
```

### Why are the changes needed?

Internal refactoring to work with more concrete types.

### Does this PR introduce _any_ user-facing change?

No, this is internal refactoring.

### How was this patch tested?

Updated existing tests affected by the refactoring.

Closes #31906 from imback82/use_temporary_view_relation.

Authored-by: Terry Kim <yuminkim@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2021-03-22 08:17:54 +00:00
Kent Yao 2cdedef2a0 [SPARK-34128][SQL] Suppress undesirable TTransportException warnings involved in THRIFT-4805
### What changes were proposed in this pull request?

Since Spark 3.0, the `libthrift` has been bumped up from 0.9.3 to 0.12.0.

Due to THRIFT-4805, The SparkThrift Server will print annoying TExceptions. For example, the current thrift server module test in Github action workflow outputs more than 200MB of data for this error only
```java
org.apache.thrift.transport.TTransportException
	at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
	at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
	at org.apache.thrift.transport.TSaslTransport.readLength(TSaslTransport.java:374)
	at org.apache.thrift.transport.TSaslTransport.readFrame(TSaslTransport.java:451)
	at org.apache.thrift.transport.TSaslTransport.read(TSaslTransport.java:433)
	at org.apache.thrift.transport.TSaslServerTransport.read(TSaslServerTransport.java:43)
	at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
	at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:425)
	at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:321)
	at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:225)
	at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:27)
	at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:53)
	at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:310)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
```

I checked the latest `hive-service-rpc` module in the maven center,  https://mvnrepository.com/artifact/org.apache.hive/hive-service-rpc/3.1.2.  It still uses the 0.9.3 version.

Unfortunately, I tried the newly released `libthrift 0.14.1`(w/o shading it), it breaks the metastore client side.

```scala
java.lang.NoSuchMethodError: org.apache.thrift.transport.TSocket.<init>(Ljava/lang/String;II)V
```
On the Thrift side, they just muted it see https://issues.apache.org/jira/browse/THRIFT-4805

So in this PR, I add a filter to suppress the warning

### Why are the changes needed?

if the log is too large, the Github action might truncate it. We need to reduce useless output.

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

```build/sbt "hive-thriftserver/testOnly *ThriftServerQueryTestSuite" -Phive-thriftserver``` locally

#### before

```java
[info] - count.sql (1 second, 537 milliseconds)
[info] - decimalArithmeticOperations.sql !!! IGNORED !!!
14:09:53.233 ERROR org.apache.thrift.server.TThreadPoolServer: Thrift error occurred during processing of message.
org.apache.thrift.transport.TTransportException
	at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
	at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
	at org.apache.thrift.transport.TSaslTransport.readLength(TSaslTransport.java:374)
	at org.apache.thrift.transport.TSaslTransport.readFrame(TSaslTransport.java:451)
	at org.apache.thrift.transport.TSaslTransport.read(TSaslTransport.java:433)
	at org.apache.thrift.transport.TSaslServerTransport.read(TSaslServerTransport.java:43)
	at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
	at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:425)
	at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:321)
	at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:225)
	at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:27)
	at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:53)
	at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:310)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
[info] - group-analytics.sql (4 seconds, 282 milliseconds)

[info] - csv-functions.sql (400 milliseconds)
14:09:24.234 ERROR org.apache.thrift.server.TThreadPoolServer: Thrift error occurred during processing of message.
org.apache.thrift.transport.TTransportException
	at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
	at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
	at org.apache.thrift.transport.TSaslTransport.readLength(TSaslTransport.java:374)
	at org.apache.thrift.transport.TSaslTransport.readFrame(TSaslTransport.java:451)
	at org.apache.thrift.transport.TSaslTransport.read(TSaslTransport.java:433)
	at org.apache.thrift.transport.TSaslServerTransport.read(TSaslServerTransport.java:43)
	at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
	at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:425)
	at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:321)
	at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:225)
	at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:27)
	at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:53)
	at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:310)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
[info] - datetime-formatting-invalid.sql (349 milliseconds)
14:09:26.544 ERROR org.apache.thrift.server.TThreadPoolServer: Thrift error occurred during processing of message.
org.apache.thrift.transport.TTransportException
	at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
	at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
	at org.apache.thrift.transport.TSaslTransport.readLength(TSaslTransport.java:374)
	at org.apache.thrift.transport.TSaslTransport.readFrame(TSaslTransport.java:451)
	at org.apache.thrift.transport.TSaslTransport.read(TSaslTransport.java:433)
	at org.apache.thrift.transport.TSaslServerTransport.read(TSaslServerTransport.java:43)
	at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
	at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:425)
	at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:321)
	at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:225)
	at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:27)
	at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:53)
	at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:310)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
[info] - except.sql (2 seconds, 309 milliseconds)
14:09:27.782 ERROR org.apache.thrift.server.TThreadPoolServer: Thrift error occurred during processing of message.
org.apache.thrift.transport.TTransportException
	at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
	at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
	at org.apache.thrift.transport.TSaslTransport.readLength(TSaslTransport.java:374)
	at org.apache.thrift.transport.TSaslTransport.readFrame(TSaslTransport.java:451)
	at org.apache.thrift.transport.TSaslTransport.read(TSaslTransport.java:433)
	at org.apache.thrift.transport.TSaslServerTransport.read(TSaslServerTransport.java:43)
	at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
	at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:425)
	at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:321)
	at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:225)
	at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:27)
	at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:53)
	at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:310)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
[info] - string-functions.sql (1 second, 237 milliseconds)
14:09:27.835 WARN org.apache.spark.sql.execution.datasources.DataSource: All paths were ignored:

14:09:29.266 ERROR org.apache.thrift.server.TThreadPoolServer: Thrift error occurred during processing of message.
org.apache.thrift.transport.TTransportException
	at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
	at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
	at org.apache.thrift.transport.TSaslTransport.readLength(TSaslTransport.java:374)
	at org.apache.thrift.transport.TSaslTransport.readFrame(TSaslTransport.java:451)
	at org.apache.thrift.transport.TSaslTransport.read(TSaslTransport.java:433)
	at org.apache.thrift.transport.TSaslServerTransport.read(TSaslServerTransport.java:43)
	at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
	at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:425)
	at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:321)
	at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:225)
	at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:27)
	at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:53)
	at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:310)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

```

#### after

```java

[info] - null-propagation.sql (181 milliseconds)
[info] - operators.sql (1 second, 772 milliseconds)
[info] - change-column.sql (241 milliseconds)
[info] - count.sql (1 second, 665 milliseconds)
[info] - decimalArithmeticOperations.sql !!! IGNORED !!!
[info] - group-analytics.sql (3 seconds, 926 milliseconds)
[info] - inline-table.sql (247 milliseconds)
[info] - comparator.sql (223 milliseconds)
[info] - show-tblproperties.sql (148 milliseconds)
[info] - timezone.sql (105 milliseconds)
[info] - parse-schema-string.sql (193 milliseconds)
```

Closes #31895 from yaooqinn/SPARK-34128-2.

Authored-by: Kent Yao <yao@apache.org>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2021-03-19 21:15:28 -07:00
Wenchen Fan 4b4f8e2a25 [SPARK-34558][SQL][FOLLOWUP] Use final Hadoop conf to instantiate FileSystem in SharedState
### What changes were proposed in this pull request?

This is a follow-up of https://github.com/apache/spark/pull/31671

https://github.com/apache/spark/pull/31671 has an unexpected behavior change that it uses a different Hadoop conf (`sparkContext.hadoopConfiguration`) to instantiate `FileSystem`, which is used to qualify the warehouse path. Before https://github.com/apache/spark/pull/31671 , the Hadoop conf to instantiate `FileSystem` is `session.sessionState.newHadoopConf()`.

More specifically, `session.sessionState.newHadoopConf()` has more conf entries:
1. it includes configs from `SharedState.initialConfigs`
2. in includes configs from `sparkContext.conf`

This PR updates `SharedState` to use the final Hadoop conf to instantiate `FileSystem`.

### Why are the changes needed?

fix behavior change

### Does this PR introduce _any_ user-facing change?

yes, the behavior will be the same before https://github.com/apache/spark/pull/31671

### How was this patch tested?

manually check the log of `FileSystem` and verify the passed in configs.

Closes #31868 from cloud-fan/followup.

Lead-authored-by: Wenchen Fan <wenchen@databricks.com>
Co-authored-by: Wenchen Fan <cloud0fan@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2021-03-19 22:02:15 +08:00
Kousuke Saruta c5cadfefdf [SPARK-34762][BUILD] Fix the build failure with Scala 2.13 which is related to commons-cli
### What changes were proposed in this pull request?

This PR fixes the build failure with Scala 2.13 which is related to `commons-cli`.
The last few days, build with Scala 2.13 on GA continues to fail and the error message says like as follows.
```
[error] /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:26:1:  error: package org.apache.commons.cli does not exist
1278[error] import org.apache.commons.cli.GnuParser;
```
The reason is that `mvn help` in `change-scala-version.sh` downloads the POM file of `commons-cli` but doesn't download the JAR file, leading the build failure.

This PR also adds `commons-cli` to the dependencies explicitly because HiveThriftServer depends on it.
### Why are the changes needed?

Expect to fix the build failure with Scala 2.13.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

I confirmed that build successfully finishes with Scala 2.13 on my laptop.
```
find ~/.m2 -name commons-cli -exec rm -rf {} \;
find ~/.ivy2 -name commons-cli -exec rm -rf {} \;
find ~/.cache/ -name commons-cli -exec rm -rf {} \; // For Linux
find ~/Library/Caches -name commons-cli -exec rm -rf {} \; // For macOS

dev/change-scala-version 2.13
./build/sbt -Pyarn -Pmesos -Pkubernetes -Phive -Phive-thriftserver -Phadoop-cloud -Pkinesis-asl -Pdocker-integration-tests -Pkubernetes-integration-tests -Pspark-ganglia-lgpl -Pscala-2.13 clean compile test:compile
```

Closes #31862 from sarutak/commons-cli.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2021-03-18 12:31:50 +09:00
Angerszhuuuu badca975af [SPARK-34712][SQL][TESTS] Refactor UT about hive build in version, avoid to change every time when upgrade hive version
### What changes were proposed in this pull request?
Use HiveUtils.buildinHiveVersion to replace correspoding Ut about hive version

### Why are the changes needed?
Refactor UT about hive build in version, avoid to change every time when upgrade hive version

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Not need

Closes #31807 from AngersZhuuuu/SPARK-34712.

Authored-by: Angerszhuuuu <angers.zhu@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2021-03-11 12:52:29 -08:00
Kent Yao 1fac706db5 [SPARK-34373][SQL] HiveThriftServer2 startWithContext may hang with a race issue
### What changes were proposed in this pull request?

fix a race issue by interrupting the thread

### Why are the changes needed?

```
21:43:26.809 WARN org.apache.thrift.server.TThreadPoolServer: Transport error occurred during acceptance of message.
org.apache.thrift.transport.TTransportException: No underlying server socket.
at org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:126)
at org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:35)
at org.apache.thrift.transport.TServerTransport.acceException in thread "Thread-15" java.io.IOException: Stream closed
at java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:170)
at java.io.BufferedInputStream.read(BufferedInputStream.java:336)
at java.io.FilterInputStream.read(FilterInputStream.java:107)
at scala.sys.process.BasicIO$.loop$1(BasicIO.scala:238)
at scala.sys.process.BasicIO$.transferFullyImpl(BasicIO.scala:246)
at scala.sys.process.BasicIO$.transferFully(BasicIO.scala:227)
at scala.sys.process.BasicIO$.$anonfun$toStdOut$1(BasicIO.scala:221)
```
when the TServer try to `serve` after `stop`, it hangs with the log above forever
### Does this PR introduce _any_ user-facing change?

no
### How was this patch tested?

passing ci

Closes #31479 from yaooqinn/SPARK-34373.

Authored-by: Kent Yao <yao@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2021-02-21 17:37:12 +09:00
Kent Yao 961c85166a [SPARK-34346][CORE][SQL] io.file.buffer.size set by spark.buffer.size will override by loading hive-site.xml accidentally may cause perf regression
### What changes were proposed in this pull request?

In many real-world cases, when interacting with hive catalog through Spark SQL, users may just share the `hive-site.xml` for their hive jobs and make a copy to `SPARK_HOME`/conf w/o modification. In Spark, when we generate Hadoop configurations, we will use `spark.buffer.size(65536)` to reset `io.file.buffer.size(4096)`. But when we load the hive-site.xml, we may ignore this behavior and reset `io.file.buffer.size` again according to `hive-site.xml`.

1. The configuration priority for setting Hadoop and Hive config here is not right, while literally, the order should be `spark > spark.hive > spark.hadoop > hive > hadoop`

2. This breaks `spark.buffer.size` congfig's behavior for tuning the IO performance w/ HDFS if there is an existing `io.file.buffer.size` in hive-site.xml

### Why are the changes needed?

bugfix for configuration behavior and fix performance regression by that behavior change

### Does this PR introduce _any_ user-facing change?

this pr restores silent user face change

### How was this patch tested?

new tests

Closes #31460 from yaooqinn/SPARK-34346.

Authored-by: Kent Yao <yao@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2021-02-05 10:13:19 +09:00
Wenchen Fan b891862fb6 [SPARK-34269][SQL] Simplify SQL view resolution
### What changes were proposed in this pull request?

The currently SQL (temp or permanent) view resolution is done in 2 steps:
1. In `SessionCatalog`, we get the view metadata, parse the view SQL string, and wrap it with `View`.
2. At the beginning of the optimizer, we run `EliminateView`, which drops the wrapper `View`, and apply some special logic to match the view schema.

Step 2 is tricky, as we need to retain the output attr expr id, while we need to add an extra `Project` to add cast and alias. This PR simplifies the view solution by building a completed plan (with cast and alias added) in `SessionCatalog`, so that we only have 1 step.

### Why are the changes needed?

Code simplification. It also fixes issues like https://github.com/apache/spark/pull/31352

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

existing tests

Closes #31368 from cloud-fan/try.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2021-01-29 06:46:01 +00:00
Chao Sun 902a08b9e6 [SPARK-34052][SQL] store SQL text for a temp view created using "CACHE TABLE .. AS SELECT"
### What changes were proposed in this pull request?

This passes original SQL text to `CacheTableAsSelect` command in DSv1 and v2 so that it will be stored instead of the analyzed logical plan, similar to `CREATE VIEW` command.

In addition, this changes the behavior of dropping temporary view to also invalidate dependent caches in a cascade, when the config `SQLConf.STORE_ANALYZED_PLAN_FOR_VIEW` is false (which is the default value).

### Why are the changes needed?

Currently, after creating a temporary view with `CACHE TABLE ... AS SELECT` command, the view can still be queried even after the source table is dropped or replaced (in v2). This can cause correctness issue.

For instance, in the following:
```sql
> CREATE TABLE t ...;
> CACHE TABLE v AS SELECT * FROM t;
> DROP TABLE t;
> SELECT * FROM v;
```
The last select query still returns the old (and stale) result instead of fail. Note that the cache is already invalidated as part of dropping table `t`, but the temporary view `v` still exist.

On the other hand, the following:
```sql
> CREATE TABLE t ...;
> CREATE TEMPORARY VIEW v AS SELECT * FROM t;
> CACHE TABLE v;
> DROP TABLE t;
> SELECT * FROM v;
```
will throw "Table or view not found" error in the last select query.

This is related to #30567 which aligns the behavior of temporary view and global view by storing the original SQL text for temporary view, as opposed to the analyzed logical plan. However, the PR only handles `CreateView` case but not the `CacheTableAsSelect` case.

This also changes uncache logic and use cascade invalidation for temporary views created above. This is to align its behavior to how a permanent view is handled as of today, and also to avoid potential issues where a dependent view becomes invalid while its data is still kept in cache.

### Does this PR introduce _any_ user-facing change?

Yes, now when `SQLConf.STORE_ANALYZED_PLAN_FOR_VIEW` is set to false (the default value), whenever a table/permanent view/temp view that a cached view depends on is dropped, the cached view itself will become invalid during analysis, i.e., user will get "Table or view not found" error. In addition, when the dependent is a temp view in the previous case, the cache itself will also be invalidated.

### How was this patch tested?

Modified/Enhanced some existing tests.

Closes #31107 from sunchao/SPARK-34052.

Lead-authored-by: Chao Sun <sunchao@apple.com>
Co-authored-by: Chao Sun <sunchao@apache.org>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2021-01-20 02:09:39 +00:00
Yuming Wang c87b0085c9 [SPARK-33696][BUILD][SQL] Upgrade built-in Hive to 2.3.8
### What changes were proposed in this pull request?

Hive 2.3.8 changes:
HIVE-19662: Upgrade Avro to 1.8.2
HIVE-24324: Remove deprecated API usage from Avro
HIVE-23980: Shade Guava from hive-exec in Hive 2.3
HIVE-24436: Fix Avro NULL_DEFAULT_VALUE compatibility issue
HIVE-24512: Exclude calcite in packaging.
HIVE-22708: Fix for HttpTransport to replace String.equals
HIVE-24551: Hive should include transitive dependencies from calcite after shading it
HIVE-24553: Exclude calcite from test-jar dependency of hive-exec

### Why are the changes needed?

Upgrade Avro and Parquet to latest version.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing test add test try to upgrade Parquet to 1.11.1 and Avro to 1.10.1: https://github.com/apache/spark/pull/30517

Closes #30657 from wangyum/SPARK-33696.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2021-01-17 21:54:35 -08:00
Kousuke Saruta bec80d7eec [SPARK-34101][SQL] Make spark-sql CLI configurable for the behavior of printing header by SET command
### What changes were proposed in this pull request?

This PR introduces a new property `spark.sql.cli.print.header` to let users change the behavior of printing header for spark-sql CLI by SET command.

### Why are the changes needed?

Like Hive CLI, spark-sql CLI accepts `hive.cli.print.header` property and we can change the behavior of printing header.
But spark-sql CLI doesn't allow users to change Hive specific configurations dynamically by SET command.
So, it's better to support the way to change the behavior by SET command.

### Does this PR introduce _any_ user-facing change?

Yes. Users can dynamically change the behavior by SET command.

### How was this patch tested?

I confirmed with the following commands/queries.
```
spark-sql> select (1) as a, (2) as b, (3) as c, (4) as d;
1	2	3	4
Time taken: 3.218 seconds, Fetched 1 row(s)
spark-sql> set spark.sql.cli.print.header=true;
key	value
spark.sql.cli.print.header	true
Time taken: 1.506 seconds, Fetched 1 row(s)
spark-sql> select (1) as a, (2) as b, (3) as c, (4) as d;
a	b	c	d
1	2	3	4
Time taken: 0.79 seconds, Fetched 1 row(s)
```

Closes #31173 from sarutak/spark-sql-print-header.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2021-01-15 13:15:48 +09:00
yangjie01 8b1ba233f1 [SPARK-34068][CORE][SQL][MLLIB][GRAPHX] Remove redundant collection conversion
### What changes were proposed in this pull request?
There are some redundant collection conversion can be removed, for version compatibility, clean up these with Scala-2.13 profile.

### Why are the changes needed?
Remove redundant collection conversion

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
- Pass the Jenkins or GitHub  Action
- Manual test `core`, `graphx`, `mllib`, `mllib-local`, `sql`, `yarn`,`kafka-0-10` in Scala 2.13 passed

Closes #31125 from LuciferYang/SPARK-34068.

Authored-by: yangjie01 <yangjie01@baidu.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
2021-01-13 18:07:02 -06:00
fwang12 7b06acc28b [SPARK-33100][SQL][FOLLOWUP] Find correct bound of bracketed comment in spark-sql
### What changes were proposed in this pull request?

This PR help find correct bound of bracketed comment in spark-sql.

Here is the log for UT of SPARK-33100 in CliSuite before:
```
2021-01-05 13:22:34.768 - stdout> spark-sql> /* SELECT 'test';*/ SELECT 'test';
2021-01-05 13:22:41.523 - stderr> Time taken: 6.716 seconds, Fetched 1 row(s)
2021-01-05 13:22:41.599 - stdout> test
2021-01-05 13:22:41.6 - stdout> spark-sql> ;;/* SELECT 'test';*/ SELECT 'test';
2021-01-05 13:22:41.709 - stdout> test
2021-01-05 13:22:41.709 - stdout> spark-sql> /* SELECT 'test';*/;; SELECT 'test';
2021-01-05 13:22:41.902 - stdout> spark-sql> SELECT 'test'; -- SELECT 'test';
2021-01-05 13:22:41.902 - stderr> Time taken: 0.129 seconds, Fetched 1 row(s)
2021-01-05 13:22:41.902 - stderr> Error in query:
2021-01-05 13:22:41.902 - stderr> mismatched input '<EOF>' expecting {'(', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 19)
2021-01-05 13:22:42.006 - stderr>
2021-01-05 13:22:42.006 - stderr> == SQL ==
2021-01-05 13:22:42.006 - stderr> /* SELECT 'test';*/
2021-01-05 13:22:42.006 - stderr> -------------------^^^
2021-01-05 13:22:42.006 - stderr>
2021-01-05 13:22:42.006 - stderr> Time taken: 0.226 seconds, Fetched 1 row(s)
2021-01-05 13:22:42.006 - stdout> test
```
The root cause is that the insideBracketedComment is not accurate.

For `/* comment */`, the last character `/` is not insideBracketedComment and it would be treat as beginning of statements.

In this PR, this issue is fixed.

### Why are the changes needed?
To fix the issue described above.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Existing UT

Closes #31054 from turboFei/SPARK-33100-followup.

Authored-by: fwang12 <fwang12@ebay.com>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
2021-01-07 20:49:37 +09:00
fwang12 a071826f72 [SPARK-33100][SQL] Ignore a semicolon inside a bracketed comment in spark-sql
### What changes were proposed in this pull request?
Now the spark-sql does not support parse the sql statements with bracketed comments.
For the sql statements:
```
/* SELECT 'test'; */
SELECT 'test';
```
Would be split to two statements:
The first one: `/* SELECT 'test'`
The second one: `*/ SELECT 'test'`

Then it would throw an exception because the first one is illegal.
In this PR, we ignore the content in bracketed comments while splitting the sql statements.
Besides, we ignore the comment without any content.

### Why are the changes needed?
Spark-sql might split the statements inside bracketed comments and it is not correct.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Added UT.

Closes #29982 from turboFei/SPARK-33110.

Lead-authored-by: fwang12 <fwang12@ebay.com>
Co-authored-by: turbofei <fwang12@ebay.com>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
2021-01-05 15:55:30 +09:00
Kent Yao d7dc42d5f6 [SPARK-33895][SQL] Char and Varchar fail in MetaOperation of ThriftServer
### What changes were proposed in this pull request?

```
Caused by: java.lang.IllegalArgumentException: Unrecognized type name: CHAR(10)
	at org.apache.spark.sql.hive.thriftserver.SparkGetColumnsOperation.toJavaSQLType(SparkGetColumnsOperation.scala:187)
	at org.apache.spark.sql.hive.thriftserver.SparkGetColumnsOperation.$anonfun$addToRowSet$1(SparkGetColumnsOperation.scala:203)
	at scala.collection.immutable.List.foreach(List.scala:392)
	at org.apache.spark.sql.hive.thriftserver.SparkGetColumnsOperation.addToRowSet(SparkGetColumnsOperation.scala:195)
	at org.apache.spark.sql.hive.thriftserver.SparkGetColumnsOperation.$anonfun$runInternal$4(SparkGetColumnsOperation.scala:99)
	at org.apache.spark.sql.hive.thriftserver.SparkGetColumnsOperation.$anonfun$runInternal$4$adapted(SparkGetColumnsOperation.scala:98)
	at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
	at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
```

meta operation is targeting raw table schema, we need to handle these types there.

### Why are the changes needed?

bugfix, see the above case
### Does this PR introduce _any_ user-facing change?

no
### How was this patch tested?

new tests

locally

![image](https://user-images.githubusercontent.com/8326978/103069196-cdfcc480-45f9-11eb-9c6a-d4c42123c6e3.png)

Closes #30914 from yaooqinn/SPARK-33895.

Authored-by: Kent Yao <yaooqinn@hotmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2020-12-24 07:40:38 +00:00
ulysses-you e853f068f6 [SPARK-33526][SQL][FOLLOWUP] Fix flaky test due to timeout and fix docs
### What changes were proposed in this pull request?

Make test stable and fix docs.

### Why are the changes needed?

Query timeout sometime since we set an another config after set query timeout.
```
sbt.ForkMain$ForkError: java.sql.SQLTimeoutException: Query timed out after 0 seconds
	at org.apache.hive.jdbc.HiveStatement.waitForOperationToComplete(HiveStatement.java:381)
	at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:254)
	at org.apache.spark.sql.hive.thriftserver.ThriftServerWithSparkContextSuite.$anonfun$$init$$13(ThriftServerWithSparkContextSuite.scala:107)
	at org.apache.spark.sql.hive.thriftserver.ThriftServerWithSparkContextSuite.$anonfun$$init$$13$adapted(ThriftServerWithSparkContextSuite.scala:106)
	at scala.collection.immutable.List.foreach(List.scala:392)
	at org.apache.spark.sql.hive.thriftserver.ThriftServerWithSparkContextSuite.$anonfun$$init$$12(ThriftServerWithSparkContextSuite.scala:106)
	at org.apache.spark.sql.hive.thriftserver.ThriftServerWithSparkContextSuite.$anonfun$$init$$12$adapted(ThriftServerWithSparkContextSuite.scala:89)
	at org.apache.spark.sql.hive.thriftserver.SharedThriftServer.$anonfun$withJdbcStatement$4(SharedThriftServer.scala:95)
	at org.apache.spark.sql.hive.thriftserver.SharedThriftServer.$anonfun$withJdbcStatement$4$adapted(SharedThriftServer.scala:95)
```

The reason is:
1. we execute `set spark.sql.thriftServer.queryTimeout = 1`, then all the option will be limited in 1s.
2. we execute `set spark.sql.thriftServer.interruptOnCancel = false/true`. This sql will get timeout exception if there is something hung within 1s. It's not our expected.

Reset the timeout before we do the step2 can avoid this problem.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Fix test.

Closes #30897 from ulysses-you/SPARK-33526-followup.

Authored-by: ulysses-you <ulyssesyou18@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2020-12-22 22:43:03 -08:00
Kent Yao 4d47ac4b4b [SPARK-33705][SQL][TEST] Fix HiveThriftHttpServerSuite flakiness
### What changes were proposed in this pull request?
TO FIX flaky tests:

https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132345/testReport/
```
org.apache.spark.sql.hive.thriftserver.HiveThriftHttpServerSuite.JDBC query execution
org.apache.spark.sql.hive.thriftserver.HiveThriftHttpServerSuite.Checks Hive version
org.apache.spark.sql.hive.thriftserver.HiveThriftHttpServerSuite.SPARK-24829 Checks cast as float
```

The root cause here is a jar conflict issue.
`NewCookie.isHttpOnly` is not defined in the `jsr311-api.jar` which conflicts
The transitive artifact `jsr311-api.jar` of `hadoop-client` is excluded at the maven side. See https://issues.apache.org/jira/browse/SPARK-27179.

The Jenkins PR builder and Github Action use `SBT` as the compiler tool.

First, the exclusion rule from maven is not followed by sbt, so I was able to see `jsr311-api.jar` from maven cache to be added to the classpath directly. **This seems to be a  bug of `sbt-pom-reader` plugin but I'm not that sure.**

Then I added an `ExcludeRule` for the `hive-thriftserver` module at the SBT side and did see the `jsr311-api.jar` gone, but the CI jobs still failed with the same error.

I added a trace log in ThriftHttpServlet

```s
ERROR ThriftHttpServlet: !!!!!!!!! Suspect???????? --->
file:/home/jenkins/workspace/SparkPullRequestBuilder/assembly/target/scala-2.12/jars/jsr311-api-1.1.1.jar
```
And the log pointed out that the assembly phase copied it to `assembly/target/scala-2.12/jars/` which will be added to the classpath too. With the help of SBT `dependencyTree` tool, I saw the `jsr311-api` again as a transitive of `jersery-core` from `yarn` module with a `test` scope. So **This seems to be another bug from the SBT side of the `sbt-assembly` plugin.**  It copied a test scope transitive artifact to the assembly output.

In this PR, I defined some rules in SparkBuild.scala to bypass the potential bugs from the SBT side.

First, exclude the `jsr311` from all over the project and then add it back separately to the YARN module for SBT.

Additionally, the HiveThriftServerSuites was reflected for reducing flakiness too, but not related to the bugs I have found so far.

### Why are the changes needed?

fix test here

### Does this PR introduce _any_ user-facing change?

NO
### How was this patch tested?

passing jenkins and ga

Closes #30643 from yaooqinn/HiveThriftHttpServerSuite.

Authored-by: Kent Yao <yaooqinn@hotmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2020-12-14 05:14:38 +00:00
Yuming Wang 94bc2d61a2
[SPARK-33589][SQL][FOLLOWUP] Replace Throwable with NonFatal
### What changes were proposed in this pull request?

This pr replace `Throwable` with `NonFatal`.

### Why are the changes needed?

Improve code.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

N/A

Closes #30744 from wangyum/SPARK-33589-2.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-12-13 14:52:26 -08:00
ulysses-you 5bab27e00b [SPARK-33526][SQL] Add config to control if cancel invoke interrupt task on thriftserver
### What changes were proposed in this pull request?

This PR add a new config `spark.sql.thriftServer.forceCancel` to give user a way to interrupt task when cancel statement.

### Why are the changes needed?

After [#29933](https://github.com/apache/spark/pull/29933), we support cancel query if timeout, but the default behavior of `SparkContext.cancelJobGroups` won't interrupt task and just let task finish by itself. In some case it's dangerous, e.g., data skew or exists a heavily shuffle. A task will hold in a long time after do cancel and the resource will not release.

### Does this PR introduce _any_ user-facing change?

Yes, a new config.

### How was this patch tested?

Add test.

Closes #30481 from ulysses-you/SPARK-33526.

Lead-authored-by: ulysses-you <ulyssesyou18@gmail.com>
Co-authored-by: ulysses-you <youxiduo@weidian.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-12-12 00:52:33 +09:00
Dooyoung Hwang a713a7eee3 [SPARK-33655][SQL] Improve performance of processing FETCH_PRIOR
### What changes were proposed in this pull request?
Currently, when a client requests FETCH_PRIOR to Thriftserver, Thriftserver reiterates from the start position. Because Thriftserver caches a query result with an array when THRIFTSERVER_INCREMENTAL_COLLECT feature is off, FETCH_PRIOR can be implemented without reiterating the result. A trait FeatureIterator is added in order to separate the implementation for iterator and an array. Also, FeatureIterator supports moves cursor with absolute position, which will be useful for the implementation of FETCH_RELATIVE, FETCH_ABSOLUTE.

### Why are the changes needed?
For better performance of Thriftserver.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
FetchIteratorSuite

Closes #30600 from Dooyoung-Hwang/refactor_with_fetch_iterator.

Authored-by: Dooyoung Hwang <dooyoung.hwang@sk.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-12-09 18:35:24 +09:00
Josh Soref c62b84a043 [MINOR] Spelling sql not core
### What changes were proposed in this pull request?

This PR intends to fix typos in the sub-modules:
* `sql/catalyst`
* `sql/hive-thriftserver`
* `sql/hive`

Split per srowen https://github.com/apache/spark/pull/30323#issuecomment-728981618

NOTE: The misspellings have been reported at 706a726f87 (commitcomment-44064356)

### Why are the changes needed?

Misspelled words make it harder to read / understand content.

### Does this PR introduce _any_ user-facing change?

There are various fixes to documentation, etc...

### How was this patch tested?

No testing was performed

Closes #30532 from jsoref/spelling-sql-not-core.

Authored-by: Josh Soref <jsoref@users.noreply.github.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
2020-12-07 08:40:29 -06:00
Dongjoon Hyun de9818f043
[SPARK-33662][BUILD] Setting version to 3.2.0-SNAPSHOT
### What changes were proposed in this pull request?

This PR aims to update `master` branch version to 3.2.0-SNAPSHOT.

### Why are the changes needed?

Start to prepare Apache Spark 3.2.0.

### Does this PR introduce _any_ user-facing change?

N/A.

### How was this patch tested?

Pass the CIs.

Closes #30606 from dongjoon-hyun/SPARK-3.2.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-12-04 14:10:42 -08:00
Linhong Liu e02324f2dd [SPARK-33142][SPARK-33647][SQL] Store SQL text for SQL temp view
### What changes were proposed in this pull request?
Currently, in spark, the temp view is saved as its analyzed logical plan, while the permanent view
is kept in HMS with its origin SQL text. As a result, permanent and temporary views have
different behaviors in some cases. In this PR we store the SQL text for temporary view in order
to unify the behavior between permanent and temporary views.

### Why are the changes needed?
to unify the behavior between permanent and temporary views

### Does this PR introduce _any_ user-facing change?
Yes, with this PR, the temporary view will be re-analyzed when it's referred. So if the
underlying datasource changed, the view will also be updated.

### How was this patch tested?
existing and newly added test cases

Closes #30567 from linhongliu-db/SPARK-33142.

Authored-by: Linhong Liu <linhong.liu@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2020-12-04 06:48:49 +00:00
Terry Kim 0fd9f57dd4 [SPARK-33448][SQL] Support CACHE/UNCACHE TABLE commands for v2 tables
### What changes were proposed in this pull request?

This PR proposes to support `CHACHE/UNCACHE TABLE` commands for v2 tables.

In addtion, this PR proposes to migrate `CACHE/UNCACHE TABLE` to use `UnresolvedTableOrView` to resolve the table identifier. This allows consistent resolution rules (temp view first, etc.) to be applied for both v1/v2 commands. More info about the consistent resolution rule proposal can be found in [JIRA](https://issues.apache.org/jira/browse/SPARK-29900) or [proposal doc](https://docs.google.com/document/d/1hvLjGA8y_W_hhilpngXVub1Ebv8RsMap986nENCFnrg/edit?usp=sharing).

### Why are the changes needed?

To support `CACHE/UNCACHE TABLE` commands for v2 tables.

Note that `CACHE/UNCACHE TABLE` for v1 tables/views go through `SparkSession.table` to resolve identifier, which resolves temp views first, so there is no change in the behavior by moving to the new framework.

### Does this PR introduce _any_ user-facing change?

Yes. Now the user can run `CACHE/UNCACHE TABLE` commands on v2 tables.

### How was this patch tested?

Added/updated existing tests.

Closes #30403 from imback82/cache_table.

Authored-by: Terry Kim <yuminkim@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2020-11-30 05:37:10 +00:00
Yuming Wang f93d4395b2 [SPARK-33589][SQL] Close opened session if the initialization fails
### What changes were proposed in this pull request?

This pr add try catch when opening session.

### Why are the changes needed?

Close opened session if the initialization fails.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manual test.

Before this pr:

```
[rootspark-3267648 spark]#  bin/beeline -u jdbc:hive2://localhost:10000/db_not_exist
NOTE: SPARK_PREPEND_CLASSES is set, placing locally compiled Spark classes ahead of assembly.
Connecting to jdbc:hive2://localhost:10000/db_not_exist
log4j:WARN No appenders could be found for logger (org.apache.hive.jdbc.Utils).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Error: Could not open client transport with JDBC Uri: jdbc:hive2://localhost:10000/db_not_exist: Database 'db_not_exist' not found; (state=08S01,code=0)
Beeline version 2.3.7 by Apache Hive
beeline>
```
![image](https://user-images.githubusercontent.com/5399861/100560975-73ba5d80-32f2-11eb-8f92-b2509e7a121f.png)

After this pr:
```
[rootspark-3267648 spark]#  bin/beeline -u jdbc:hive2://localhost:10000/db_not_exist
NOTE: SPARK_PREPEND_CLASSES is set, placing locally compiled Spark classes ahead of assembly.
log4j:WARN No appenders could be found for logger (org.apache.hadoop.util.Shell).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Connecting to jdbc:hive2://localhost:10000/db_not_exist
Error: Could not open client transport with JDBC Uri: jdbc:hive2://localhost:10000/db_not_exist: Failed to open new session: org.apache.spark.sql.catalyst.analysis.NoSuchDatabaseException: Database 'db_not_exist' not found; (state=08S01,code=0)
Beeline version 2.3.7 by Apache Hive
beeline>
```
![image](https://user-images.githubusercontent.com/5399861/100560917-479edc80-32f2-11eb-986f-7a997f1163fc.png)

Closes #30536 from wangyum/SPARK-33589.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-11-30 11:21:02 +09:00
Yuming Wang 1de3fc4282 [SPARK-33525][SQL] Update hive-service-rpc to 3.1.2
### What changes were proposed in this pull request?

We supported Hive metastore are 0.12.0 through 3.1.2, but we supported hive-jdbc are 0.12.0 through 2.3.7. It will throw `TProtocolException` if we use hive-jdbc 3.x:

```
[rootspark-3267648 apache-hive-3.1.2-bin]# bin/beeline -u jdbc:hive2://localhost:10000/default
Connecting to jdbc:hive2://localhost:10000/default
Connected to: Spark SQL (version 3.1.0-SNAPSHOT)
Driver: Hive JDBC (version 3.1.2)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 3.1.2 by Apache Hive
0: jdbc:hive2://localhost:10000/default> create table t1(id int) using parquet;
Unexpected end of file when reading from HS2 server. The root cause might be too many concurrent connections. Please ask the administrator to check the number of active connections, and adjust hive.server2.thrift.max.worker.threads if applicable.
Error: org.apache.thrift.transport.TTransportException (state=08S01,code=0)
```
```
org.apache.thrift.protocol.TProtocolException: Missing version in readMessageBegin, old client?
	at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:234)
	at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:27)
	at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:53)
	at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:310)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630)
	at java.base/java.lang.Thread.run(Thread.java:832)
```

This pr upgrade hive-service-rpc to 3.1.2 to fix this issue.

### Why are the changes needed?

To support hive-jdbc 3.x.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manual test:
```
[rootspark-3267648 apache-hive-3.1.2-bin]# bin/beeline -u jdbc:hive2://localhost:10000/default
Connecting to jdbc:hive2://localhost:10000/default
Connected to: Spark SQL (version 3.1.0-SNAPSHOT)
Driver: Hive JDBC (version 3.1.2)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 3.1.2 by Apache Hive
0: jdbc:hive2://localhost:10000/default> create table t1(id int) using parquet;
+---------+
| Result  |
+---------+
+---------+
No rows selected (1.051 seconds)
0: jdbc:hive2://localhost:10000/default> insert into t1 values(1);
+---------+
| Result  |
+---------+
+---------+
No rows selected (2.08 seconds)
0: jdbc:hive2://localhost:10000/default> select * from t1;
+-----+
| id  |
+-----+
| 1   |
+-----+
1 row selected (0.605 seconds)
```

Closes #30478 from wangyum/SPARK-33525.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-11-25 12:37:59 -08:00
Gustavo Martin Morcuende 517b810dfa
[SPARK-33463][SQL] Keep Job Id during incremental collect in Spark Thrift Server
### What changes were proposed in this pull request?

When enabling **spark.sql.thriftServer.incrementalCollect** Job Ids get lost and tracing queries in Spark Thrift Server ends up being too complicated.

### Why are the changes needed?

Because it will make easier tracing Spark Thrift Server queries.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

The current tests are enough. No need of more tests.

Closes #30390 from gumartinm/master.

Authored-by: Gustavo Martin Morcuende <gu.martinm@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-11-21 08:39:16 -08:00
yangjie01 e3058ba17c [SPARK-33441][BUILD] Add unused-imports compilation check and remove all unused-imports
### What changes were proposed in this pull request?
This pr add a new Scala compile arg to `pom.xml` to defense against new unused imports:

- `-Ywarn-unused-import` for Scala 2.12
- `-Wconf:cat=unused-imports:e` for Scala 2.13

The other fIles change are remove all unused imports in Spark code

### Why are the changes needed?
Cleanup code and add guarantee to defense against new unused imports

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass the Jenkins or GitHub Action

Closes #30351 from LuciferYang/remove-imports-core-module.

Authored-by: yangjie01 <yangjie01@baidu.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-11-19 14:20:39 +09:00
artiship 1ae6d64b5f
[SPARK-33358][SQL] Return code when command process failed
Exit Spark SQL CLI processing loop if one of the commands (sub sql statement) process failed

This is a regression at Apache Spark 3.0.0.

```
$ cat 1.sql
select * from nonexistent_table;
select 2;
```

**Apache Spark 2.4.7**
```
spark-2.4.7-bin-hadoop2.7:$ bin/spark-sql -f 1.sql
20/11/15 16:14:38 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Error in query: Table or view not found: nonexistent_table; line 1 pos 14
```

**Apache Spark 3.0.1**
```
$ bin/spark-sql -f 1.sql
Error in query: Table or view not found: nonexistent_table; line 1 pos 14;
'Project [*]
+- 'UnresolvedRelation [nonexistent_table]

2
Time taken: 2.786 seconds, Fetched 1 row(s)
```

**Apache Hive 1.2.2**
```
apache-hive-1.2.2-bin:$ bin/hive -f 1.sql

Logging initialized using configuration in jar:file:/Users/dongjoon/APACHE/hive-release/apache-hive-1.2.2-bin/lib/hive-common-1.2.2.jar!/hive-log4j.properties
FAILED: SemanticException [Error 10001]: Line 1:14 Table not found 'nonexistent_table'
```

Yes. This is a fix of regression.

Pass the UT.

Closes #30263 from artiship/SPARK-33358.

Authored-by: artiship <meilziner@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-11-15 16:57:12 -08:00
luluorta 156704ba0d
[SPARK-33432][SQL] SQL parser should use active SQLConf
### What changes were proposed in this pull request?

This PR makes SQL parser using active SQLConf instead of the one in ctor-parameters.

### Why are the changes needed?

In ANSI mode, schema string parsing should fail if the schema uses ANSI reserved keyword as attribute name:

```scala
spark.conf.set("spark.sql.ansi.enabled", "true")
spark.sql("""select from_json('{"time":"26/10/2015"}', 'time Timestamp', map('timestampFormat',  'dd/MM/yyyy'));""").show
```

output:

> Cannot parse the data type:
> no viable alternative at input 'time'(line 1, pos 0)
>
> == SQL ==
> time Timestamp
> ^^^

But this query may accidentally succeed in certain cases cause the DataType parser sticks to the configs of the first created session in the current thread:

```scala
DataType.fromDDL("time Timestamp")
val newSpark = spark.newSession()
newSpark.conf.set("spark.sql.ansi.enabled", "true")
newSpark.sql("""select from_json('{"time":"26/10/2015"}', 'time Timestamp', map('timestampFormat', 'dd/MM/yyyy'));""").show
```

output:

> +--------------------------------+
> |from_json({"time":"26/10/2015"})|
> +--------------------------------+
> |                   {2015-10-26 00:00...|
> +--------------------------------+

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Newly and updated UTs

Closes #30357 from luluorta/SPARK-33432.

Authored-by: luluorta <luluorta@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-11-14 13:37:12 -08:00
artiship 34a9a77ab5 [SPARK-33396][SQL] Spark SQL CLI prints appliction id when process file
### What changes were proposed in this pull request?
Modify SparkSQLCLIDriver.scala to move ahead calling the cli.printMasterAndAppId method before process file.

### Why are the changes needed?
Even though in SPARK-25043 it has already brought in the printing application id feature. But the process file situation seems have not been included. This small change is to make spark-sql will also print out application id when process file.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
env

```
spark version: 3.0.1
os: centos 7
```

/tmp/tmp.sql

```sql
select 1;
```

submit command:

```sh
export HADOOP_USER_NAME=my-hadoop-user
bin/spark-sql  \
--master yarn \
--deploy-mode client \
--queue my.queue.name \
--conf spark.driver.host=$(hostname -i) \
--conf spark.app.name=spark-test  \
--name "spark-test" \
-f /tmp/tmp.sql
```

execution log:

```sh
20/11/09 23:18:39 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
20/11/09 23:18:40 WARN HiveConf: HiveConf of name hive.spark.client.rpc.server.address.use.ip does not exist
20/11/09 23:18:40 WARN HiveConf: HiveConf of name hive.spark.client.submit.timeout.interval does not exist
20/11/09 23:18:40 WARN HiveConf: HiveConf of name hive.enforce.bucketing does not exist
20/11/09 23:18:40 WARN HiveConf: HiveConf of name hive.server2.enable.impersonation does not exist
20/11/09 23:18:40 WARN HiveConf: HiveConf of name hive.run.timeout.seconds does not exist
20/11/09 23:18:40 WARN HiveConf: HiveConf of name hive.support.sql11.reserved.keywords does not exist
20/11/09 23:18:40 WARN DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
20/11/09 23:18:41 WARN SparkConf: Note that spark.local.dir will be overridden by the value set by the cluster manager (via SPARK_LOCAL_DIRS in mesos/standalone/kubernetes and LOCAL_DIRS in YARN).
20/11/09 23:18:42 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
20/11/09 23:18:52 WARN YarnSchedulerBackend$YarnSchedulerEndpoint: Attempted to request executors before the AM has registered!

Spark master: yarn, Application Id: application_1567136266901_27355775
1
1
Time taken: 4.974 seconds, Fetched 1 row(s)

```

Closes #30301 from artiship/SPARK-33396.

Authored-by: artiship <meilziner@gmail.com>
Signed-off-by: Yuming Wang <yumwang@ebay.com>
2020-11-14 20:54:17 +08:00
xuewei.linxuewei 234711a328 Revert "[SPARK-33139][SQL] protect setActionSession and clearActiveSession"
### What changes were proposed in this pull request?

In [SPARK-33139] we defined `setActionSession` and `clearActiveSession` as deprecated API, it turns out it is widely used, and after discussion, even if without this PR, it should work with unify view feature, it might only be a risk if user really abuse using these two API. So revert the PR is needed.

[SPARK-33139] has two commit, include a follow up. Revert them both.

### Why are the changes needed?

Revert.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Existing UT.

Closes #30367 from leanken/leanken-revert-SPARK-33139.

Authored-by: xuewei.linxuewei <xuewei.linxuewei@alibaba-inc.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2020-11-13 13:35:45 +00:00
yangjie01 02fd52cfbc [SPARK-33352][CORE][SQL][SS][MLLIB][AVRO][K8S] Fix procedure-like declaration compilation warnings in Scala 2.13
### What changes were proposed in this pull request?
There are two similar compilation warnings about procedure-like declaration in Scala 2.13:

```
[WARNING] [Warn] /spark/core/src/main/scala/org/apache/spark/HeartbeatReceiver.scala:70: procedure syntax is deprecated for constructors: add `=`, as in method definition
```
and

```
[WARNING] [Warn] /spark/core/src/main/scala/org/apache/spark/storage/BlockManagerDecommissioner.scala:211: procedure syntax is deprecated: instead, add `: Unit =` to explicitly declare `run`'s return type
```

this pr is the first part to resolve SPARK-33352:

- For constructors method definition add `=` to convert to function syntax

- For without `return type` methods definition add `: Unit =` to convert to function syntax

### Why are the changes needed?
Eliminate compilation warnings in Scala 2.13 and this change should be compatible with Scala 2.12

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass the Jenkins or GitHub Action

Closes #30255 from LuciferYang/SPARK-29392-FOLLOWUP.1.

Authored-by: yangjie01 <yangjie01@baidu.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
2020-11-08 12:51:48 -06:00
Kent Yao 82d500a05c [SPARK-33193][SQL][TEST] Hive ThriftServer JDBC Database MetaData API Behavior Auditing
### What changes were proposed in this pull request?

Add a test case to audit all JDBC metadata behaviors to check and prevent potential APIs silent changing from both the upstream hive-jdbc module or the Spark thrift server side.

Forked from my kyuubi project here https://github.com/yaooqinn/kyuubi/blob/master/externals/kyuubi-spark-sql-engine/src/test/scala/org/apache/kyuubi/engine/spark/operation/SparkOperationSuite.scala

### Why are the changes needed?

Make the SparkThriftServer safer to evolve.

### Does this PR introduce _any_ user-facing change?

dev only

### How was this patch tested?

new tests

Closes #30101 from yaooqinn/SPARK-33193.

Authored-by: Kent Yao <yaooqinn@hotmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2020-10-23 13:34:33 -07:00
Xuedong Luan d9ee33cfb9 [SPARK-26533][SQL] Support query auto timeout cancel on thriftserver
### What changes were proposed in this pull request?

Support query auto cancelling when running too long on thriftserver.

This is the rework of #28991 and the credit should be the original author, leoluan2009.

Closes #28991

### Why are the changes needed?

For some cases, we use thriftserver as long-running applications.
Some times we want all the query need not to run more than given time.
In these cases, we can enable auto cancel for time-consumed query.Which can let us release resources for other queries to run.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Added tests.

Closes #29933 from maropu/pr28991.

Lead-authored-by: Xuedong Luan <luanxuedong2009@gmail.com>
Co-authored-by: Takeshi Yamamuro <yamamuro@apache.org>
Co-authored-by: Luan <luanxuedong2009@gmail.com>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
2020-10-22 17:23:10 +09:00
Kent Yao 2507301705 [SPARK-33159][SQL] Use hive-service-rpc as dependency instead of inlining the generated code
### What changes were proposed in this pull request?

Hive's `hive-service-rpc` module started since hive-2.1.0 and it contains only the thrift IDL file and the code generated by it.

Removing the inlined code will help maintain and upgrade builtin hive versions

### Why are the changes needed?

to simply the code.

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

passing CI

Closes #30055 from yaooqinn/SPARK-33159.

Authored-by: Kent Yao <yaooqinn@hotmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-10-16 09:37:54 -07:00
xuewei.linxuewei 306872eefa [SPARK-33139][SQL] protect setActionSession and clearActiveSession
### What changes were proposed in this pull request?

This PR is a sub-task of [SPARK-33138](https://issues.apache.org/jira/browse/SPARK-33138). In order to make SQLConf.get reliable and stable, we need to make sure user can't pollute the SQLConf and SparkSession Context via calling setActiveSession and clearActiveSession.

Change of the PR:

* add legacy config spark.sql.legacy.allowModifyActiveSession to fallback to old behavior if user do need to call these two API.
* by default, if user call these two API, it will throw exception
* add extra two internal and private API setActiveSessionInternal and clearActiveSessionInternal for current internal usage
* change all internal reference to new internal API except for SQLContext.setActive and SQLContext.clearActive

### Why are the changes needed?

Make SQLConf.get reliable and stable.

### Does this PR introduce any user-facing change?
No.

### How was this patch tested?

* Add UT in SparkSessionBuilderSuite to test the legacy config
* Existing test

Closes #30042 from leanken/leanken-SPARK-33139.

Authored-by: xuewei.linxuewei <xuewei.linxuewei@alibaba-inc.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2020-10-16 06:05:17 +00:00
Denis Pyshev ba69d68d91 [SPARK-33080][BUILD] Replace fatal warnings snippet
### What changes were proposed in this pull request?

Current solution in build file to enable build failure on compilation warnings with exclusion of deprecation ones is not portable after SBT version 1.3.13 (build import fails with compilation error with SBT 1.4) and could be replaced with more robust and maintainable, especially since Scala 2.13.2 with similar built-in functionality.

Additionally, warnings were fixed to pass the build, with as few changes as possible:
warnings in 2.12 compilation fixed in code,
warnings in 2.13 compilation covered by configuration to be addressed separately

### Why are the changes needed?

Unblocks upgrade to SBT after 1.3.13.
Enhances build file maintainability.
Allows fine tune of warnings configuration in scope of Scala 2.13 compilation.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

`build/sbt`'s `compile` and `Test/compile` for both Scala 2.12 and 2.13 profiles.

Closes #29995 from gemelen/feature/warnings-reporter.

Authored-by: Denis Pyshev <git@gemelen.net>
Signed-off-by: Sean Owen <srowen@gmail.com>
2020-10-15 14:49:43 -05:00
Prashant Sharma 304ca1ec93 [SPARK-33129][BUILD][DOCS] Updating the build/sbt references to test-only with testOnly for SBT 1.3.x
### What changes were proposed in this pull request?

test-only - > testOnly in docs across the project.

### Why are the changes needed?

Since the sbt version is updated, the older way or running i.e. `test-only` is no longer valid.

### Does this PR introduce _any_ user-facing change?

docs update.

### How was this patch tested?

Manually.

Closes #30028 from ScrapCodes/fix-build/sbt-sample.

Authored-by: Prashant Sharma <prashsh1@in.ibm.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2020-10-13 09:21:06 -07:00
Yuming Wang 5e170140b0 [SPARK-33107][SQL] Remove hive-2.3 workaround code
### What changes were proposed in this pull request?

This pr remove `hive-2.3` workaround code.

### Why are the changes needed?

Make code more clear and readable.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing unit tests.

Closes #29996 from wangyum/SPARK-33107.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2020-10-10 16:41:42 -07:00
Dongjoon Hyun a127387a53 [SPARK-33082][SQL] Remove hive-1.2 workaround code
### What changes were proposed in this pull request?

This PR removes old Hive-1.2 profile related workaround code.

### Why are the changes needed?

To simply the code.
### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CI.

Closes #29961 from dongjoon-hyun/SPARK-HIVE12.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2020-10-07 12:27:23 -07:00
Kent Yao 17d309dfac [SPARK-32963][SQL] empty string should be consistent for schema name in SparkGetSchemasOperation
### What changes were proposed in this pull request?
This PR makes the empty string for schema name pattern match the global temp view as same as it works for other databases.

This PR also add new tests to covering different kinds of wildcards to verify the SparkGetSchemasOperation

### Why are the changes needed?

When the schema name is empty string, it is considered as ".*" and can match all databases in the catalog.
But when it can not match the global temp view as it is not converted to ".*"

### Does this PR introduce _any_ user-facing change?

yes , JDBC operation like `statement.getConnection.getMetaData..getSchemas(null, "")` now also provides the global temp view in the result set.

### How was this patch tested?

new tests

Closes #29834 from yaooqinn/SPARK-32963.

Authored-by: Kent Yao <yaooqinn@hotmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2020-10-06 16:01:10 +00:00