Commit graph

4075 commits

Author SHA1 Message Date
Kengo Seki 59c014e120 [SPARK-30350][SQL] Fix ScalaReflection to use an empty array for getting its class object
### What changes were proposed in this pull request?

This PR fixes `ScalaReflection.arrayClassFor()` to use an empty array instead of a one-element array for getting its class object by reflection.

### Why are the changes needed?

Because it may reduce unnecessary memory allocation.

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

Ran the existing unit tests for sql/catalyst and confirmed that all of them succeeded.

Closes #27005 from sekikn/SPARK-30350.

Authored-by: Kengo Seki <sekikn@apache.org>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
2019-12-26 22:54:29 +09:00
gengjiaan d59e7195f6 [SPARK-27986][SQL] Support ANSI SQL filter clause for aggregate expression
### What changes were proposed in this pull request?
The filter predicate for aggregate expression is an `ANSI SQL`.
```
<aggregate function> ::=
COUNT <left paren> <asterisk> <right paren> [ <filter clause> ]
| <general set function> [ <filter clause> ]
| <binary set function> [ <filter clause> ]
| <ordered set function> [ <filter clause> ]
| <array aggregate function> [ <filter clause> ]
| <row pattern count function> [ <filter clause> ]
```
There are some mainstream database support this syntax.
**PostgreSQL:**
https://www.postgresql.org/docs/current/sql-expressions.html#SYNTAX-AGGREGATES
For example:
```
SELECT
  year,
  count(*) FILTER (WHERE gdp_per_capita >= 40000)
FROM
  countries
GROUP BY
  year
```
```
SELECT
  year,
  code,
  gdp_per_capita,
  count(*)
    FILTER (WHERE gdp_per_capita >= 40000)
    OVER   (PARTITION BY year)
FROM
  countries
```
**jOOQ:**
https://blog.jooq.org/2014/12/30/the-awesome-postgresql-9-4-sql2003-filter-clause-for-aggregate-functions/

**Notice:**
1.This PR only supports FILTER predicate without codegen. maropu will create another PR is related to SPARK-30027 to support codegen.
2.This PR only supports FILTER predicate without DISTINCT. I will create another PR is related to SPARK-30276 to support this.
3.This PR only supports FILTER predicate that can't reference the outer query. I created ticket SPARK-30219 to support it.
4.This PR only supports FILTER predicate that can't use IN/EXISTS predicate sub-queries. I created ticket SPARK-30220 to support it.
5.Spark SQL cannot supports a SQL with nested aggregate. I created ticket SPARK-30182 to support it.

There are some show of the PR on my production environment.
```
spark-sql> desc gja_test_partition;
key     string  NULL
value   string  NULL
other   string  NULL
col2    int     NULL
# Partition Information
# col_name      data_type       comment
col2    int     NULL
Time taken: 0.79 s
```
```
spark-sql> select * from gja_test_partition;
a       A       ao      1
b       B       bo      1
c       C       co      1
d       D       do      1
e       E       eo      2
g       G       go      2
h       H       ho      2
j       J       jo      2
f       F       fo      3
k       K       ko      3
l       L       lo      4
i       I       io      4
Time taken: 1.75 s
```
```
spark-sql> select count(key), sum(col2) from gja_test_partition;
12      26
Time taken: 1.848 s
```
```
spark-sql> select count(key) filter (where col2 > 1) from gja_test_partition;
8
Time taken: 2.926 s
```
```
spark-sql> select sum(col2) filter (where col2 > 2) from gja_test_partition;
14
Time taken: 2.087 s
```
```
spark-sql> select count(key) filter (where col2 > 1), sum(col2) filter (where col2 > 2) from gja_test_partition;
8       14
Time taken: 2.847 s
```
```
spark-sql> select count(key), count(key) filter (where col2 > 1), sum(col2), sum(col2) filter (where col2 > 2) from gja_test_partition;
12      8       26      14
Time taken: 1.787 s
```
```
spark-sql> desc student;
id      int     NULL
name    string  NULL
sex     string  NULL
class_id        int     NULL
Time taken: 0.206 s
```
```
spark-sql> select * from student;
1       张三    man     1
2       李四    man     1
3       王五    man     2
4       赵六    man     2
5       钱小花  woman   1
6       赵九红  woman   2
7       郭丽丽  woman   2
Time taken: 0.786 s
```
```
spark-sql> select class_id, count(id), sum(id) from student group by class_id;
1       3       8
2       4       20
Time taken: 18.783 s
```
```
spark-sql> select class_id, count(id) filter (where sex = 'man'), sum(id) filter (where sex = 'woman') from student group by class_id;
1       2       5
2       2       13
Time taken: 3.887 s
```

### Why are the changes needed?
Add new SQL feature.

### Does this PR introduce any user-facing change?
'No'.

### How was this patch tested?
Exists UT and new UT.

Closes #26656 from beliefer/support-aggregate-clause.

Lead-authored-by: gengjiaan <gengjiaan@360.cn>
Co-authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2019-12-26 17:41:50 +08:00
wenfang 4d58cd77f9 [SPARK-30330][SQL] Support single quotes json parsing for get_json_object and json_tuple
### What changes were proposed in this pull request?

I execute some query as` select get_json_object(ytag, '$.y1') AS y1 from t4`; SparkSQL return null but  Hive return correct results.
In my production environment, ytag is a json wrapped by single quotes,as follows
```
{'y1': 'shuma', 'y2': 'shuma:shouji'}
{'y1': 'jiaoyu', 'y2': 'jiaoyu:gaokao'}
{'y1': 'yule', 'y2': 'yule:mingxing'}
```
Then l realized some functions including get_json_object and json_tuple does not support  single quotes json parsing.
So l provide this PR to resolve the question.

### Why are the changes needed?

Enabled for Hive compatibility

### Does this PR introduce any user-facing change?

NO

### How was this patch tested?

NEW TESTS

Closes #26965 from wenfang6/enableSingleQuotesJsonForSparkSQL.

Authored-by: wenfang <wenfang@360.cn>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2019-12-26 11:45:31 +09:00
yi.wu 6d64fc2407 [SPARK-26389][SS][FOLLOW-UP] Format config name to follow the other boolean conf naming convention
### What changes were proposed in this pull request?

Rename `spark.sql.streaming.forceDeleteTempCheckpointLocation` to `spark.sql.streaming.forceDeleteTempCheckpointLocation.enabled`.

### Why are the changes needed?

To follow the other boolean conf naming convention.

### Does this PR introduce any user-facing change?

No, as this config is newly added in 3.0.

### How was this patch tested?

Pass Jenkins.

Closes #26981 from Ngone51/SPARK-26389-FOLLOWUP.

Authored-by: yi.wu <yi.wu@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2019-12-25 21:45:01 +08:00
Kent Yao da65a955ed [SPARK-30266][SQL] Avoid match error and int overflow in ApproximatePercentile and Percentile
### What changes were proposed in this pull request?
accuracyExpression can accept Long which may cause overflow error.
accuracyExpression can accept fractions which are implicitly floored.
accuracyExpression can accept null which is implicitly changed to 0.
percentageExpression can accept null but cause MatchError.
percentageExpression can accept ArrayType(_, nullable=true) in which the nulls are implicitly changed to zeros.

##### cases
```sql
select percentile_approx(10.0, 0.5, 2147483648); -- overflow and fail
select percentile_approx(10.0, 0.5, 4294967297); -- overflow but success
select percentile_approx(10.0, 0.5, null); -- null cast to 0
select percentile_approx(10.0, 0.5, 1.2); -- 1.2 cast to 1
select percentile_approx(10.0, null, 1); -- scala.MatchError
select percentile_approx(10.0, array(0.2, 0.4, null), 1); -- null cast to zero.
```

##### behavior before

```sql
+select percentile_approx(10.0, 0.5, 2147483648)
+org.apache.spark.sql.AnalysisException
+cannot resolve 'percentile_approx(10.0BD, CAST(0.5BD AS DOUBLE), CAST(2147483648L AS INT))' due to data type mismatch: The accuracy provided must be a positive integer literal (current value = -2147483648); line 1 pos 7
+
+select percentile_approx(10.0, 0.5, 4294967297)
+10.0
+

+select percentile_approx(10.0, 0.5, null)
+org.apache.spark.sql.AnalysisException
+cannot resolve 'percentile_approx(10.0BD, CAST(0.5BD AS DOUBLE), CAST(NULL AS INT))' due to data type mismatch: The accuracy provided must be a positive integer literal (current value = 0); line 1 pos 7
+
+select percentile_approx(10.0, 0.5, 1.2)
+10.0
+
+select percentile_approx(10.0, null, 1)
+scala.MatchError
+null
+
+
+select percentile_approx(10.0, array(0.2, 0.4, null), 1)
+[10.0,10.0,10.0]
```

##### behavior after

```sql

+select percentile_approx(10.0, 0.5, 2147483648)
+10.0
+
+select percentile_approx(10.0, 0.5, 4294967297)
+10.0
+
+select percentile_approx(10.0, 0.5, null)
+org.apache.spark.sql.AnalysisException
+cannot resolve 'percentile_approx(10.0BD, 0.5BD, NULL)' due to data type mismatch: argument 3 requires integral type, however, 'NULL' is of null type.; line 1 pos 7
+
+select percentile_approx(10.0, 0.5, 1.2)
+org.apache.spark.sql.AnalysisException
+cannot resolve 'percentile_approx(10.0BD, 0.5BD, 1.2BD)' due to data type mismatch: argument 3 requires integral type, however, '1.2BD' is of decimal(2,1) type.; line 1 pos 7
+

+select percentile_approx(10.0, null, 1)
+java.lang.IllegalArgumentException
+The value of percentage must be be between 0.0 and 1.0, but got null
+
+select percentile_approx(10.0, array(0.2, 0.4, null), 1)
+java.lang.IllegalArgumentException
+Each value of the percentage array must be be between 0.0 and 1.0, but got [0.2,0.4,null]
```

### Why are the changes needed?

bug fix

### Does this PR introduce any user-facing change?

yes, fix some improper usages of percentile_approx as cases list above

### How was this patch tested?

add ut

Closes #26905 from yaooqinn/SPARK-30266.

Authored-by: Kent Yao <yaooqinn@hotmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2019-12-25 20:03:26 +08:00
Kazuaki Ishizaki f31d9a629b [MINOR][DOC][SQL][CORE] Fix typo in document and comments
### What changes were proposed in this pull request?

Fixed typo in `docs` directory and in other directories

1. Find typo in `docs` and apply fixes to files in all directories
2. Fix `the the` -> `the`

### Why are the changes needed?

Better readability of documents

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

No test needed

Closes #26976 from kiszk/typo_20191221.

Authored-by: Kazuaki Ishizaki <ishizaki@jp.ibm.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-12-21 14:08:58 -08:00
Sean Owen 7dff3b125d [SPARK-30272][SQL][CORE] Remove usage of Guava that breaks in 27; replace with workalikes
### What changes were proposed in this pull request?

Remove usages of Guava that no longer work in Guava 27, and replace with workalikes. I'll comment on key types of changes below.

### Why are the changes needed?

Hadoop 3.2.1 uses Guava 27, so this helps us avoid problems running on Hadoop 3.2.1+ and generally lowers our exposure to Guava.

### Does this PR introduce any user-facing change?

Should not be, but see notes below on hash codes and toString.

### How was this patch tested?

Existing tests will verify whether these changes break anything for Guava 14.
I manually built with an updated version and it compiles with Guava 27; tests running manually locally now.

Closes #26911 from srowen/SPARK-30272.

Authored-by: Sean Owen <srowen@gmail.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
2019-12-20 08:55:04 -06:00
Kent Yao 12249fcdc7 [SPARK-30301][SQL] Fix wrong results when datetimes as fields of complex types
### What changes were proposed in this pull request?

When date and timestamp values are fields of arrays, maps, etc, we convert them to hive string using `toString`. This makes the result wrong before the default transition ’1582-10-15‘.

https://bugs.openjdk.java.net/browse/JDK-8061577?focusedCommentId=13566712&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-13566712

cases to reproduce:

```sql
+-- !query 47
+select array(cast('1582-10-13' as date), date '1582-10-14', date '1582-10-15', null)
+-- !query 47 schema
+struct<array(CAST(1582-10-13 AS DATE), DATE '1582-10-14', DATE '1582-10-15', CAST(NULL AS DATE)):array<date>>
+-- !query 47 output
+[1582-10-03,1582-10-04,1582-10-15,null]
+
+
+-- !query 48
+select cast('1582-10-13' as date), date '1582-10-14', date '1582-10-15'
+-- !query 48 schema
+struct<CAST(1582-10-13 AS DATE):date,DATE '1582-10-14':date,DATE '1582-10-15':date>
+-- !query 48 output
+1582-10-13     1582-10-14      1582-10-15
```

other refencences
https://github.com/h2database/h2database/issues/831
### Why are the changes needed?

bug fix
### Does this PR introduce any user-facing change?

yes, complex types containing datetimes in `spark-sql `script and thrift server can result same as self-contained spark app or `spark-shell` script
### How was this patch tested?

add uts

Closes #26942 from yaooqinn/SPARK-30301.

Authored-by: Kent Yao <yaooqinn@hotmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2019-12-20 19:21:43 +08:00
Aman Omer 726f6d3e3c [SPARK-30184][SQL] Implement a helper method for aliasing functions
### What changes were proposed in this pull request?
This PR is to use `expressionWithAlias` for remaining functions for which alias name can be used. Remaining functions are:
`Average, First, Last, ApproximatePercentile, StddevSamp, VarianceSamp`

PR https://github.com/apache/spark/pull/26712 introduced `expressionWithAlias`
### Why are the changes needed?
Error message is wrong when alias name is used for above mentioned functions.
### Does this PR introduce any user-facing change?
No
### How was this patch tested?
Manually

Closes #26808 from amanomer/fncAlias.

Lead-authored-by: Aman Omer <amanomer1996@gmail.com>
Co-authored-by: Aman Omer <40591404+amanomer@users.noreply.github.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2019-12-20 12:49:16 +08:00
Maxim Gekk dea18231d4 [SPARK-30309][SQL] Mark Filter as a sealed class
### What changes were proposed in this pull request?
Added the `sealed` keyword to the `Filter` class

### Why are the changes needed?
To do not miss handling of new filters in a datasource in the future. For example, `AlwaysTrue` and `AlwaysFalse` were added recently by https://github.com/apache/spark/pull/23606

### Does this PR introduce any user-facing change?
Should not.

### How was this patch tested?
By existing tests.

Closes #26950 from MaxGekk/sealed-filter.

Authored-by: Maxim Gekk <max.gekk@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-12-19 12:30:34 -08:00
Gengliang Wang ab8eb86a77 Revert "[SPARK-29629][SQL] Support typed integer literal expression"
This reverts commit 8e667db5d8.

Closes #26940 from gengliangwang/revert_Spark_29629.

Authored-by: Gengliang Wang <gengliang.wang@databricks.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2019-12-19 16:34:27 +09:00
ulysses 1e48b43a0e [SPARK-30254][SQL] Fix LikeSimplification optimizer to use a given escapeChar
Since [25001](https://github.com/apache/spark/pull/25001), spark support like escape syntax.

We should also sync the escape used by `LikeSimplification`.

Avoid optimize failed.

No.

Add UT.

Closes #26880 from ulysses-you/SPARK-30254.

Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-12-18 15:56:13 -08:00
Jalpan Randeri f15eee18cc [SPARK-29493][SQL] Arrow MapType support
### What changes were proposed in this pull request?
This pull request add support for Arrow MapType into Spark SQL.

### Why are the changes needed?
Without this change User's of spark are not able to query data in spark if one of columns is stored as map and Apache Arrow execution mode is preferred by user.
More info: https://issues.apache.org/jira/projects/SPARK/issues/SPARK-29493

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
Introduced few unit tests around map type in existing arrow test suit

Closes #26512 from jalpan-randeri/feature-arrow-java-map-type.

Authored-by: Jalpan Randeri <randerij@amazon.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2019-12-18 23:59:27 +09:00
Aman Omer 297f406425 [SPARK-29600][SQL] ArrayContains function may return incorrect result for DecimalType
### What changes were proposed in this pull request?
Use `TypeCoercion.findWiderTypeForTwo()` instead of `TypeCoercion.findTightestCommonType()` while preprocessing `inputTypes` in `ArrayContains`.

### Why are the changes needed?
`TypeCoercion.findWiderTypeForTwo()` also handles cases for DecimalType.

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
Test cases to be added.

Closes #26811 from amanomer/29600.

Authored-by: Aman Omer <amanomer1996@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2019-12-18 01:30:28 +08:00
Terry Kim e75d9afb2f [SPARK-30094][SQL] Apply current namespace for the single-part table name
### What changes were proposed in this pull request?

This PR applies the current namespace for the single-part table name if the current catalog is a non-session catalog.

Note that the reason the current namespace is not applied for the session catalog is that the single-part name could be referencing a temp view which doesn't belong to any namespaces. The empty namespace for a table inside the session catalog is resolved by the session catalog implementation.

### Why are the changes needed?

It's fixing the following bug where the current namespace is not respected:
```
sql("CREATE TABLE testcat.ns.t USING foo AS SELECT 1 AS id")
sql("USE testcat.ns")
sql("SHOW CURRENT NAMESPACE").show
+-------+---------+
|catalog|namespace|
+-------+---------+
|testcat|       ns|
+-------+---------+

// `t` is not resolved since the current namespace `ns` is not used.
sql("DESCRIBE t").show
Failed to analyze query: org.apache.spark.sql.AnalysisException: Table not found: t;;
```

### Does this PR introduce any user-facing change?

Yes, the above `DESCRIBE` command will succeed.

### How was this patch tested?

Added tests.

Closes #26894 from imback82/current_namespace.

Authored-by: Terry Kim <yuminkim@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2019-12-17 11:13:27 +08:00
Wenchen Fan fdcd0e71b9 [SPARK-30192][SQL] support column position in DS v2
### What changes were proposed in this pull request?

update DS v2 API to support add/alter column with column position

### Why are the changes needed?

We have a parser rule for column position, but we fail the query if it's specified, because the builtin catalog can't support add/alter column with column position.

Since we have the catalog plugin API now, we should let the catalog implementation to decide if it supports column position or not.

### Does this PR introduce any user-facing change?

not yet

### How was this patch tested?

new tests

Closes #26817 from cloud-fan/parser.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2019-12-16 18:55:17 +08:00
Terry Kim 72f5597ce2 [SPARK-30104][SQL][FOLLOWUP] Remove LookupCatalog.AsTemporaryViewIdentifier
### What changes were proposed in this pull request?

As discussed in https://github.com/apache/spark/pull/26741#discussion_r357504518, `LookupCatalog.AsTemporaryViewIdentifier` is no longer used and can be removed.

### Why are the changes needed?

Code clean up

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

Removed tests that were testing solely `AsTemporaryViewIdentifier` extractor.

Closes #26897 from imback82/30104-followup.

Authored-by: Terry Kim <yuminkim@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2019-12-16 17:43:01 +08:00
Maxim Gekk 67b644c3d7 [SPARK-30166][SQL] Eliminate compilation warnings in JSONOptions
### What changes were proposed in this pull request?
In the PR, I propose to replace `setJacksonOptions()` in `JSONOptions` by `buildJsonFactory()` which builds `JsonFactory` using `JsonFactoryBuilder`. This allows to avoid using **deprecated** feature configurations from `JsonParser.Feature`.

### Why are the changes needed?
- The changes eliminate the following compilation warnings in `sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JSONOptions.scala`:
```
    Warning:Warning:line (137)Java enum ALLOW_NUMERIC_LEADING_ZEROS in Java enum Feature is deprecated: see corresponding Javadoc for more information.
    factory.configure(JsonParser.Feature.ALLOW_NUMERIC_LEADING_ZEROS, allowNumericLeadingZeros)
    Warning:Warning:line (138)Java enum ALLOW_NON_NUMERIC_NUMBERS in Java enum Feature is deprecated: see corresponding Javadoc for more information.
    factory.configure(JsonParser.Feature.ALLOW_NON_NUMERIC_NUMBERS, allowNonNumericNumbers)
    Warning:Warning:line (139)Java enum ALLOW_BACKSLASH_ESCAPING_ANY_CHARACTER in Java enum Feature is deprecated: see corresponding Javadoc for more information.
    factory.configure(JsonParser.Feature.ALLOW_BACKSLASH_ESCAPING_ANY_CHARACTER,
    Warning:Warning:line (141)Java enum ALLOW_UNQUOTED_CONTROL_CHARS in Java enum Feature is deprecated: see corresponding Javadoc for more information.
    factory.configure(JsonParser.Feature.ALLOW_UNQUOTED_CONTROL_CHARS, allowUnquotedControlChars)
```
- This put together building JsonFactory and set options from JSONOptions. So, we will not forget to call `setJacksonOptions` in the future.

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
By `JsonSuite`, `JsonFunctionsSuite`, `JsonExpressionsSuite`.

Closes #26797 from MaxGekk/eliminate-warning.

Authored-by: Maxim Gekk <max.gekk@gmail.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
2019-12-15 08:45:57 -06:00
Kent Yao d3ec8b1735 [SPARK-30066][SQL] Support columnar execution on interval types
### What changes were proposed in this pull request?

Columnar execution support for interval types

### Why are the changes needed?

support cache tables with interval columns
improve performance too

### Does this PR introduce any user-facing change?

Yes cache table with accept interval columns

### How was this patch tested?

add ut

Closes #26699 from yaooqinn/SPARK-30066.

Authored-by: Kent Yao <yaooqinn@hotmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-12-14 13:10:46 -08:00
John Ayad f197204f03 [SPARK-30236][SQL][DOCS] Clarify date and time patterns supported in docs
### What changes were proposed in this pull request?

Link to appropriate Java Class with list of date/time patterns supported

### Why are the changes needed?

Avoid confusion on the end-user's side of things, as seen in questions like [this](https://stackoverflow.com/questions/54496878/date-format-conversion-is-adding-1-year-to-the-border-dates) on StackOverflow

### Does this PR introduce any user-facing change?

Yes, Docs are updated.

### How was this patch tested?

`date_format`:
![image](https://user-images.githubusercontent.com/2394761/70796647-b5c55900-1d9a-11ea-89f9-7a8661641c09.png)

`to_unix_timestamp`:
![image](https://user-images.githubusercontent.com/2394761/70796664-c07fee00-1d9a-11ea-9029-e82d899e3f59.png)

`unix_timestamp`:
![image](https://user-images.githubusercontent.com/2394761/70796688-caa1ec80-1d9a-11ea-8868-a18c437a5d49.png)

`from_unixtime`:
![image](https://user-images.githubusercontent.com/2394761/70796703-d4c3eb00-1d9a-11ea-85fe-3c672e0cda28.png)

`to_date`:
![image](https://user-images.githubusercontent.com/2394761/70796718-dd1c2600-1d9a-11ea-81f4-a0966eeb0f1d.png)

`to_timestamp`:
![image](https://user-images.githubusercontent.com/2394761/70796735-e6a58e00-1d9a-11ea-8ef7-d3e1d9b5370f.png)

Closes #26864 from johnhany97/SPARK-30236.

Authored-by: John Ayad <johnhany97@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-12-14 13:08:15 -08:00
Burak Yavuz 4c37a8a3f4 [SPARK-30143][SS] Add a timeout on stopping a streaming query
### What changes were proposed in this pull request?

Add a timeout configuration for StreamingQuery.stop()

### Why are the changes needed?

The stop() method on a Streaming Query awaits the termination of the stream execution thread. However, the stream execution thread may block forever depending on the streaming source implementation (like in Kafka, which runs UninterruptibleThreads).

This causes control flow applications to hang indefinitely as well. We'd like to introduce a timeout to stop the execution thread, so that the control flow thread can decide to do an action if a timeout is hit.

### Does this PR introduce any user-facing change?

By default, no. If the timeout configuration is set, then a TimeoutException will be thrown if a stream cannot be stopped within the given timeout.

### How was this patch tested?

Unit tests

Closes #26771 from brkyvz/stopTimeout.

Lead-authored-by: Burak Yavuz <brkyvz@gmail.com>
Co-authored-by: Burak Yavuz <burak@databricks.com>
Signed-off-by: Burak Yavuz <brkyvz@gmail.com>
2019-12-13 15:16:00 -08:00
Gengliang Wang 4da9780bc0 Revert "[SPARK-30230][SQL] Like ESCAPE syntax can not use '_' and '%'"
This reverts commit cada5beef7.

Closes #26883 from gengliangwang/revert.

Authored-by: Gengliang Wang <gengliang.wang@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-12-13 11:23:55 -08:00
Jungtaek Lim (HeartSaVioR) 94eb66593a [SPARK-30227][SQL] Add close() on DataWriter interface
### What changes were proposed in this pull request?

This patch adds close() method to the DataWriter interface, which will become the place to cleanup the resource.

### Why are the changes needed?

The lifecycle of DataWriter instance ends at either commit() or abort(). That makes datasource implementors to feel they can place resource cleanup in both sides, but abort() can be called when commit() fails; so they have to ensure they don't do double-cleanup if cleanup is not idempotent.

### Does this PR introduce any user-facing change?

Depends on the definition of user; if they're developers of custom DSv2 source, they have to add close() in their DataWriter implementations. It's OK to just add close() with empty content as they should have already dealt with resource cleanup in commit/abort, but they would love to migrate the resource cleanup logic to close() as it avoids double cleanup. If they're just end users using the provided DSv2 source (regardless of built-in/3rd party), no change.

### How was this patch tested?

Existing tests.

Closes #26855 from HeartSaVioR/SPARK-30227.

Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2019-12-13 16:12:41 +08:00
Pablo Langa cb6d2b3f83 [SPARK-30040][SQL] DROP FUNCTION should do multi-catalog resolution
### What changes were proposed in this pull request?

Add DropFunctionStatement and make DROP FUNCTION go through the same catalog/table resolution framework of v2 commands.

### Why are the changes needed?

It's important to make all the commands have the same table resolution behavior, to avoid confusing
DROP FUNCTION namespace.function

### Does this PR introduce any user-facing change?

Yes. When running DROP FUNCTION namespace.function Spark fails the command if the current catalog is set to a v2 catalog.

### How was this patch tested?

Unit tests.

Closes #26854 from planga82/feature/SPARK-30040_DropFunctionV2Catalog.

Authored-by: Pablo Langa <soypab@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-12-12 15:15:54 -08:00
ulysses cada5beef7 [SPARK-30230][SQL] Like ESCAPE syntax can not use '_' and '%'
### What changes were proposed in this pull request?

Since [25001](https://github.com/apache/spark/pull/25001), spark support like escape syntax.
But '%' and '_' is the reserve char in `Like` expression. We can not use them as escape char.

### Why are the changes needed?

Avoid some unexpect problem when using like escape syntax.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Add UT.

Closes #26860 from ulysses-you/SPARK-30230.

Authored-by: ulysses <youxiduo@weidian.com>
Signed-off-by: Gengliang Wang <gengliang.wang@databricks.com>
2019-12-12 09:52:27 -08:00
Maxim Gekk 25de90e762 [SPARK-30170][SQL][MLLIB][TESTS] Eliminate compilation warnings: part 1
### What changes were proposed in this pull request?
- Replace `Seq[String]` by `Seq[_]` in `StopWordsRemoverSuite` because `String` type is unchecked due erasure.
- Throw an exception for default case in `MLTest.checkNominalOnDF` because we don't expect other attribute types currently.
- Explicitly cast float to double in `BigDecimal(y)`. This is what the `apply()` method does for `float`s.
- Replace deprecated `verifyZeroInteractions` by `verifyNoInteractions`.
- Equivalent replacement of `\0` by `\u0000` in `CSVExprUtilsSuite`
- Import `scala.language.implicitConversions` in `CollectionExpressionsSuite`, `HashExpressionsSuite` and in `ExpressionParserSuite`.

### Why are the changes needed?
The changes fix compiler warnings showed in the JIRA ticket https://issues.apache.org/jira/browse/SPARK-30170 . Eliminating the warning highlights other warnings which could take more attention to real problems.

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
By existing test suites `StopWordsRemoverSuite`, `AnalysisExternalCatalogSuite`, `CSVExprUtilsSuite`, `CollectionExpressionsSuite`, `HashExpressionsSuite`, `ExpressionParserSuite` and sub-tests of `MLTest`.

Closes #26799 from MaxGekk/eliminate-warning-2.

Authored-by: Maxim Gekk <max.gekk@gmail.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
2019-12-12 08:38:15 -06:00
root1 2936507f94 [SPARK-30150][SQL] ADD FILE, ADD JAR, LIST FILE & LIST JAR Command do not accept quoted path
### What changes were proposed in this pull request?
`add file "abc.txt"` and `add file 'abc.txt'` are not supported.
For these two spark sql gives `FileNotFoundException`.
Only `add file abc.txt` is supported currently.

After these changes path can be given as quoted text for ADD FILE, ADD JAR, LIST FILE, LIST JAR commands in spark-sql

### Why are the changes needed?

In many of the spark-sql commands (like create table ,etc )we write path in quoted format only.  To maintain this consistency we should support quoted format with this command as well.

### Does this PR introduce any user-facing change?
Yes. Now users can write path with quotes.

### How was this patch tested?
Manually tested.

Closes #26779 from iRakson/SPARK-30150.

Authored-by: root1 <raksonrakesh@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2019-12-12 17:11:21 +08:00
Terry Kim 3741a36ebf [SPARK-30104][SQL][FOLLOWUP] V2 catalog named 'global_temp' should always be masked
### What changes were proposed in this pull request?

This is a follow up to #26741 to address the following:
1. V2 catalog named `global_temp` should always be masked.
2. #26741 introduces `CatalogAndIdentifer` that supersedes `CatalogObjectIdentfier`. This PR removes `CatalogObjectIdentfier` and its usages and replace them with `CatalogAndIdentifer`.
3. `CatalogObjectIdentifier(catalog, ident) if !isSessionCatalog(catalog)` and `CatalogObjectIdentifier(catalog, ident) if isSessionCatalog(catalog)` are replaced with `NonSessionCatalogAndIdentifier` and `SessionCatalogAndIdentifier` respectively.

### Why are the changes needed?

To fix an existing with handling v2 catalog named `global_temp` and to simplify the code base.

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

Added new tests.

Closes #26853 from imback82/lookup_table.

Authored-by: Terry Kim <yuminkim@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2019-12-12 14:47:20 +08:00
Pablo Langa 9cf9304e17 [SPARK-30038][SQL] DESCRIBE FUNCTION should do multi-catalog resolution
### What changes were proposed in this pull request?

Add DescribeFunctionsStatement and make DESCRIBE FUNCTIONS go through the same catalog/table resolution framework of v2 commands.

### Why are the changes needed?

It's important to make all the commands have the same table resolution behavior, to avoid confusing
DESCRIBE FUNCTIONS namespace.function

### Does this PR introduce any user-facing change?

Yes. When running DESCRIBE FUNCTIONS namespace.function Spark fails the command if the current catalog is set to a v2 catalog.

### How was this patch tested?

Unit tests.

Closes #26840 from planga82/feature/SPARK-30038_DescribeFunction_V2Catalog.

Authored-by: Pablo Langa <soypab@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-12-11 14:02:58 -08:00
Sean Owen 33f53cb2d5 [SPARK-30195][SQL][CORE][ML] Change some function, import definitions to work with stricter compiler in Scala 2.13
### What changes were proposed in this pull request?

See https://issues.apache.org/jira/browse/SPARK-30195 for the background; I won't repeat it here. This is sort of a grab-bag of related issues.

### Why are the changes needed?

To cross-compile with Scala 2.13 later.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Existing tests for 2.12. I've been manually checking that this actually resolves the compile problems in 2.13 separately.

Closes #26826 from srowen/SPARK-30195.

Authored-by: Sean Owen <srowen@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-12-11 12:33:58 -08:00
Maxim Gekk e933539cdd [SPARK-29864][SPARK-29920][SQL] Strict parsing of day-time strings to intervals
### What changes were proposed in this pull request?
In the PR, I propose new implementation of `fromDayTimeString` which strictly parses strings in day-time formats to intervals. New implementation accepts only strings that match to a pattern defined by the `from` and `to`. Here is the mapping of user's bounds and patterns:
- `[+|-]D+ H[H]:m[m]:s[s][.SSSSSSSSS]` for **DAY TO SECOND**
- `[+|-]D+ H[H]:m[m]` for **DAY TO MINUTE**
- `[+|-]D+ H[H]` for **DAY TO HOUR**
- `[+|-]H[H]:m[m]s[s][.SSSSSSSSS]` for **HOUR TO SECOND**
- `[+|-]H[H]:m[m]` for **HOUR TO MINUTE**
- `[+|-]m[m]:s[s][.SSSSSSSSS]` for **MINUTE TO SECOND**

Closes #26327
Closes #26358

### Why are the changes needed?
- Improve user experience with Spark SQL, and respect to the bound specified by users.
- Behave the same as other broadly used DBMS - Oracle and MySQL.

### Does this PR introduce any user-facing change?
Yes, before:
```sql
spark-sql> SELECT INTERVAL '10 11:12:13.123' HOUR TO MINUTE;
interval 1 weeks 3 days 11 hours 12 minutes
```
After:
```sql
spark-sql> SELECT INTERVAL '10 11:12:13.123' HOUR TO MINUTE;
Error in query:
requirement failed: Interval string must match day-time format of '^(?<sign>[+|-])?(?<hour>\d{1,2}):(?<minute>\d{1,2})$': 10 11:12:13.123(line 1, pos 16)

== SQL ==
SELECT INTERVAL '10 11:12:13.123' HOUR TO MINUTE
----------------^^^
```

### How was this patch tested?
- Added tests to `IntervalUtilsSuite`
- By `ExpressionParserSuite`
- Updated `literals.sql`

Closes #26473 from MaxGekk/strict-from-daytime-string.

Authored-by: Maxim Gekk <max.gekk@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2019-12-12 01:08:53 +08:00
Terry Kim beae14d5ed [SPARK-30104][SQL] Fix catalog resolution for 'global_temp'
### What changes were proposed in this pull request?

`global_temp` is used as a database name to access global temp views. The current catalog lookup logic considers only the first element of multi-part name when it resolves a catalog. This results in using the session catalog even `global_temp` is used as a table name under v2 catalog. This PR addresses this by making sure multi-part name has two elements before using the session catalog.

### Why are the changes needed?

Currently, 'global_temp' can be used as a table name in certain commands (CREATE) but not in others (DESCRIBE):
```
// Assume "spark.sql.globalTempDatabase" is set to "global_temp".
sql(s"CREATE TABLE testcat.t (id bigint, data string) USING foo")
sql(s"CREATE TABLE testcat.global_temp (id bigint, data string) USING foo")
sql("USE testcat")

sql(s"DESCRIBE TABLE t").show
+---------------+---------+-------+
|       col_name|data_type|comment|
+---------------+---------+-------+
|             id|   bigint|       |
|           data|   string|       |
|               |         |       |
| # Partitioning|         |       |
|Not partitioned|         |       |
+---------------+---------+-------+

sql(s"DESCRIBE TABLE global_temp").show
org.apache.spark.sql.AnalysisException: Table not found: global_temp;;
  'DescribeTable 'UnresolvedV2Relation [global_temp], org.apache.spark.sql.connector.InMemoryTableSessionCatalog2f1af64f, `global_temp`, false
  at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.failAnalysis(CheckAnalysis.scala:47)
  at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.failAnalysis$(CheckAnalysis.scala:46)
  at org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:122)
```

### Does this PR introduce any user-facing change?

Yes, `sql(s"DESCRIBE TABLE global_temp").show` in the above example now displays:
```
+---------------+---------+-------+
|       col_name|data_type|comment|
+---------------+---------+-------+
|             id|   bigint|       |
|           data|   string|       |
|               |         |       |
| # Partitioning|         |       |
|Not partitioned|         |       |
+---------------+---------+-------+
```
instead of throwing an exception.

### How was this patch tested?

Added new tests.

Closes #26741 from imback82/global_temp.

Authored-by: Terry Kim <yuminkim@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2019-12-11 16:56:42 +08:00
Sean Owen 3cc55f6a0a [SPARK-29392][CORE][SQL][FOLLOWUP] More removal of 'foo Symbol syntax for Scala 2.13
### What changes were proposed in this pull request?

Another continuation of https://github.com/apache/spark/pull/26748

### Why are the changes needed?

To cleanly cross compile with Scala 2.13.

### Does this PR introduce any user-facing change?

None.

### How was this patch tested?

Existing tests

Closes #26842 from srowen/SPARK-29392.4.

Authored-by: Sean Owen <srowen@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-12-10 19:41:24 -08:00
Kent Yao 8f0eb7dc86 [SPARK-29587][SQL] Support SQL Standard type real as float(4) numeric as decimal
### What changes were proposed in this pull request?
The types decimal and numeric are equivalent. Both types are part of the SQL standard.

the real type is  4 bytes, variable-precision, inexact, 6 decimal digits precision, same as our float, part of the SQL standard.

### Why are the changes needed?

improve sql standard support
other dbs
https://www.postgresql.org/docs/9.3/datatype-numeric.html
https://prestodb.io/docs/current/language/types.html#floating-point
http://www.sqlservertutorial.net/sql-server-basics/sql-server-data-types/
MySQL treats REAL as a synonym for DOUBLE PRECISION (a nonstandard variation), unless the REAL_AS_FLOAT SQL mode is enabled.
In MySQL, NUMERIC is implemented as DECIMAL, so the following remarks about DECIMAL apply equally to NUMERIC.

### Does this PR introduce any user-facing change?

no
### How was this patch tested?

add ut

Closes #26537 from yaooqinn/SPARK-29587.

Authored-by: Kent Yao <yaooqinn@hotmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2019-12-11 02:22:08 +08:00
Kent Yao 24c4ce1e64 [SPARK-28351][SQL][FOLLOWUP] Remove 'DELETE FROM' from unsupportedHiveNativeCommands
### What changes were proposed in this pull request?

Minor change, rm `DELETE FROM` from unsupported hive native operation, because it is supported in parser.

### Why are the changes needed?
clear ambiguous ambiguous

### Does this PR introduce any user-facing change?

no

### How was this patch tested?

no

Closes #26836 from yaooqinn/SPARK-28351.

Authored-by: Kent Yao <yaooqinn@hotmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-12-10 09:54:50 -08:00
Yuanjian Li d9b3069412 [SPARK-30125][SQL] Remove PostgreSQL dialect
### What changes were proposed in this pull request?
Reprocess all PostgreSQL dialect related PRs, listing in order:

- #25158: PostgreSQL integral division support [revert]
- #25170: UT changes for the integral division support [revert]
- #25458: Accept "true", "yes", "1", "false", "no", "0", and unique prefixes as input and trim input for the boolean data type. [revert]
- #25697: Combine below 2 feature tags into "spark.sql.dialect" [revert]
- #26112: Date substraction support [keep the ANSI-compliant part]
- #26444: Rename config "spark.sql.ansi.enabled" to "spark.sql.dialect.spark.ansi.enabled" [revert]
- #26463: Cast to boolean support for PostgreSQL dialect [revert]
- #26584: Make the behavior of Postgre dialect independent of ansi mode config [keep the ANSI-compliant part]

### Why are the changes needed?
As the discussion in http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-PostgreSQL-dialect-td28417.html, we need to remove PostgreSQL dialect form code base for several reasons:
1. The current approach makes the codebase complicated and hard to maintain.
2. Fully migrating PostgreSQL workloads to Spark SQL is not our focus for now.

### Does this PR introduce any user-facing change?
Yes, the config `spark.sql.dialect` will be removed.

### How was this patch tested?
Existing UT.

Closes #26763 from xuanyuanking/SPARK-30125.

Lead-authored-by: Yuanjian Li <xyliyuanjian@gmail.com>
Co-authored-by: Maxim Gekk <max.gekk@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2019-12-11 01:22:34 +08:00
fuwhu c2f29d5ea5 [SPARK-30138][SQL] Separate configuration key of max iterations for analyzer and optimizer
### What changes were proposed in this pull request?
separate the configuration keys "spark.sql.optimizer.maxIterations" and "spark.sql.analyzer.maxIterations".

### Why are the changes needed?
Currently, both Analyzer and Optimizer use conf "spark.sql.optimizer.maxIterations" to set the max iterations to run, which is a little confusing.
It is clearer to add a new conf "spark.sql.analyzer.maxIterations" for analyzer max iterations.

### Does this PR introduce any user-facing change?
no

### How was this patch tested?
Existing unit tests.

Closes #26766 from fuwhu/SPARK-30138.

Authored-by: fuwhu <bestwwg@163.com>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
2019-12-09 19:43:32 +09:00
Aman Omer dcea7a4c9a [SPARK-29883][SQL] Implement a helper method for aliasing bool_and() and bool_or()
### What changes were proposed in this pull request?
This PR introduces a method `expressionWithAlias` in class `FunctionRegistry` which is used to register function's constructor. Currently, `expressionWithAlias` is used to register `BoolAnd` & `BoolOr`.

### Why are the changes needed?
Error message is wrong when alias name is used for `BoolAnd` & `BoolOr`.

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
Tested manually.

For query,
`select every('true');`

Output before this PR,

> Error in query: cannot resolve 'bool_and('true')' due to data type mismatch: Input to function 'bool_and' should have been boolean, but it's [string].; line 1 pos 7;

After this PR,

> Error in query: cannot resolve 'every('true')' due to data type mismatch: Input to function 'every' should have been boolean, but it's [string].; line 1 pos 7;

Closes #26712 from amanomer/29883.

Authored-by: Aman Omer <amanomer1996@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2019-12-09 13:23:16 +08:00
Pablo Langa bca9de6684 [SPARK-29922][SQL] SHOW FUNCTIONS should do multi-catalog resolution
### What changes were proposed in this pull request?

Add ShowFunctionsStatement and make SHOW FUNCTIONS go through the same catalog/table resolution framework of v2 commands.

We don’t have this methods in the catalog to implement an V2 command
* catalog.listFunctions

### Why are the changes needed?

It's important to make all the commands have the same table resolution behavior, to avoid confusing
`SHOW FUNCTIONS LIKE namespace.function`

### Does this PR introduce any user-facing change?

Yes. When running SHOW FUNCTIONS LIKE namespace.function Spark fails the command if the current catalog is set to a v2 catalog.

### How was this patch tested?

Unit tests.

Closes #26667 from planga82/feature/SPARK-29922_ShowFunctions_V2Catalog.

Authored-by: Pablo Langa <soypab@gmail.com>
Signed-off-by: Liang-Chi Hsieh <liangchi@uber.com>
2019-12-08 20:15:09 -08:00
Kent Yao e88d74052b [SPARK-30147][SQL] Trim the string when cast string type to booleans
### What changes were proposed in this pull request?

Now, we trim the string when casting string value to those `canCast` types values, e.g. int, double, decimal, interval, date, timestamps, except for boolean.
This behavior makes type cast and coercion inconsistency in Spark.
Not fitting ANSI SQL standard either.
```
If TD is boolean, then
Case:
a) If SD is character string, then SV is replaced by
    TRIM ( BOTH ' ' FROM VE )
    Case:
    i) If the rules for literal in Subclause 5.3, “literal”, can be applied to SV to determine a valid
value of the data type TD, then let TV be that value.
   ii) Otherwise, an exception condition is raised: data exception — invalid character value for cast.
b) If SD is boolean, then TV is SV
```
In this pull request, we trim all the whitespaces from both ends of the string before converting it to a bool value. This behavior is as same as others, but a bit different from sql standard, which trim only spaces.

### Why are the changes needed?

Type cast/coercion consistency

### Does this PR introduce any user-facing change?

yes, string with whitespaces in both ends will be trimmed before converted to booleans.

e.g. `select cast('\t true' as boolean)` results `true` now, before this pr it's `null`
### How was this patch tested?

add unit tests

Closes #26776 from yaooqinn/SPARK-30147.

Authored-by: Kent Yao <yaooqinn@hotmail.com>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
2019-12-07 15:03:51 +09:00
Sean Owen a30ec19a73 [SPARK-30155][SQL] Rename parse() to parseString() to avoid conflict in Scala 2.13
### What changes were proposed in this pull request?

Rename internal method LegacyTypeStringParser.parse() to parseString().

### Why are the changes needed?

In Scala 2.13, the parse() definition clashes with supertype declarations.

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

Existing tests.

Closes #26784 from srowen/SPARK-30155.

Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-12-06 16:16:28 -08:00
wuyi 58be82ad4b [SPARK-30098][SQL] Use default datasource as provider for CREATE TABLE syntax
### What changes were proposed in this pull request?

In this PR, we propose to use the value of `spark.sql.source.default` as the provider for `CREATE TABLE` syntax instead of `hive` in Spark 3.0.

And to help the migration, we introduce a legacy conf `spark.sql.legacy.respectHiveDefaultProvider.enabled` and set its default to `false`.

### Why are the changes needed?

1. Currently, `CREATE TABLE` syntax use hive provider to create table while `DataFrameWriter.saveAsTable` API using the value of `spark.sql.source.default` as a provider to create table. It would be better to make them consistent.

2. User may gets confused in some cases. For example:

```
CREATE TABLE t1 (c1 INT) USING PARQUET;
CREATE TABLE t2 (c1 INT);
```

In these two DDLs, use may think that `t2` should also use parquet as default provider since Spark always advertise parquet as the default format. However, it's hive in this case.

On the other hand, if we omit the USING clause in a CTAS statement, we do pick parquet by default if `spark.sql.hive.convertCATS=true`:

```
CREATE TABLE t3 USING PARQUET AS SELECT 1 AS VALUE;
CREATE TABLE t4 AS SELECT 1 AS VALUE;
```
And these two cases together can be really confusing.

3. Now, Spark SQL is very independent and popular. We do not need to be fully consistent with Hive's behavior.

### Does this PR introduce any user-facing change?

Yes, before this PR, using `CREATE TABLE` syntax will use hive provider. But now, it use the value of `spark.sql.source.default` as its provider.

### How was this patch tested?

Added tests in `DDLParserSuite` and `HiveDDlSuite`.

Closes #26736 from Ngone51/dev-create-table-using-parquet-by-default.

Lead-authored-by: wuyi <yi.wu@databricks.com>
Co-authored-by: yi.wu <yi.wu@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2019-12-07 02:15:25 +08:00
Sean Owen c8ed71b3cd [SPARK-30011][SQL] Inline 2.12 "AsIfIntegral" classes, not present in 2.13
### What changes were proposed in this pull request?

Classes like DoubleAsIfIntegral are not found in Scala 2.13, but used in the current build. This change 'inlines' the 2.12 implementation and makes it work with both 2.12 and 2.13.

### Why are the changes needed?

To cross-compile with 2.13.

### Does this PR introduce any user-facing change?

It should not as it copies in 2.12's existing behavior.

### How was this patch tested?

Existing tests.

Closes #26769 from srowen/SPARK-30011.

Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-12-06 08:15:38 -08:00
gengjiaan 187f3c1773 [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
## What changes were proposed in this pull request?

The syntax 'LIKE predicate: ESCAPE clause' is a ANSI SQL.
For example:

```
select 'abcSpark_13sd' LIKE '%Spark\\_%';             //true
select 'abcSpark_13sd' LIKE '%Spark/_%';              //false
select 'abcSpark_13sd' LIKE '%Spark"_%';              //false
select 'abcSpark_13sd' LIKE '%Spark/_%' ESCAPE '/';   //true
select 'abcSpark_13sd' LIKE '%Spark"_%' ESCAPE '"';   //true
select 'abcSpark%13sd' LIKE '%Spark\\%%';             //true
select 'abcSpark%13sd' LIKE '%Spark/%%';              //false
select 'abcSpark%13sd' LIKE '%Spark"%%';              //false
select 'abcSpark%13sd' LIKE '%Spark/%%' ESCAPE '/';   //true
select 'abcSpark%13sd' LIKE '%Spark"%%' ESCAPE '"';   //true
select 'abcSpark\\13sd' LIKE '%Spark\\\\_%';          //true
select 'abcSpark/13sd' LIKE '%Spark//_%';             //false
select 'abcSpark"13sd' LIKE '%Spark""_%';             //false
select 'abcSpark/13sd' LIKE '%Spark//_%' ESCAPE '/';  //true
select 'abcSpark"13sd' LIKE '%Spark""_%' ESCAPE '"';  //true
```
But Spark SQL only supports 'LIKE predicate'.

Note: If the input string or pattern string is null, then the result is null too.

There are some mainstream database support the syntax.

**PostgreSQL:**
https://www.postgresql.org/docs/11/functions-matching.html

**Vertica:**
https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/SQLReferenceManual/LanguageElements/Predicates/LIKE-predicate.htm?zoom_highlight=like%20escape

**MySQL:**
https://dev.mysql.com/doc/refman/5.6/en/string-comparison-functions.html

**Oracle:**
https://docs.oracle.com/en/database/oracle/oracle-database/19/jjdbc/JDBC-reference-information.html#GUID-5D371A5B-D7F6-42EB-8C0D-D317F3C53708
https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/Pattern-matching-Conditions.html#GUID-0779657B-06A8-441F-90C5-044B47862A0A

## How was this patch tested?

Exists UT and new UT.

This PR merged to my production environment and runs above sql:
```
spark-sql> select 'abcSpark_13sd' LIKE '%Spark\\_%';
true
Time taken: 0.119 seconds, Fetched 1 row(s)
spark-sql> select 'abcSpark_13sd' LIKE '%Spark/_%';
false
Time taken: 0.103 seconds, Fetched 1 row(s)
spark-sql> select 'abcSpark_13sd' LIKE '%Spark"_%';
false
Time taken: 0.096 seconds, Fetched 1 row(s)
spark-sql> select 'abcSpark_13sd' LIKE '%Spark/_%' ESCAPE '/';
true
Time taken: 0.096 seconds, Fetched 1 row(s)
spark-sql> select 'abcSpark_13sd' LIKE '%Spark"_%' ESCAPE '"';
true
Time taken: 0.092 seconds, Fetched 1 row(s)
spark-sql> select 'abcSpark%13sd' LIKE '%Spark\\%%';
true
Time taken: 0.109 seconds, Fetched 1 row(s)
spark-sql> select 'abcSpark%13sd' LIKE '%Spark/%%';
false
Time taken: 0.1 seconds, Fetched 1 row(s)
spark-sql> select 'abcSpark%13sd' LIKE '%Spark"%%';
false
Time taken: 0.081 seconds, Fetched 1 row(s)
spark-sql> select 'abcSpark%13sd' LIKE '%Spark/%%' ESCAPE '/';
true
Time taken: 0.095 seconds, Fetched 1 row(s)
spark-sql> select 'abcSpark%13sd' LIKE '%Spark"%%' ESCAPE '"';
true
Time taken: 0.113 seconds, Fetched 1 row(s)
spark-sql> select 'abcSpark\\13sd' LIKE '%Spark\\\\_%';
true
Time taken: 0.078 seconds, Fetched 1 row(s)
spark-sql> select 'abcSpark/13sd' LIKE '%Spark//_%';
false
Time taken: 0.067 seconds, Fetched 1 row(s)
spark-sql> select 'abcSpark"13sd' LIKE '%Spark""_%';
false
Time taken: 0.084 seconds, Fetched 1 row(s)
spark-sql> select 'abcSpark/13sd' LIKE '%Spark//_%' ESCAPE '/';
true
Time taken: 0.091 seconds, Fetched 1 row(s)
spark-sql> select 'abcSpark"13sd' LIKE '%Spark""_%' ESCAPE '"';
true
Time taken: 0.091 seconds, Fetched 1 row(s)
```
I create a table and its schema is:
```
spark-sql> desc formatted gja_test;
key     string  NULL
value   string  NULL
other   string  NULL

# Detailed Table Information
Database        test
Table   gja_test
Owner   test
Created Time    Wed Apr 10 11:06:15 CST 2019
Last Access     Thu Jan 01 08:00:00 CST 1970
Created By      Spark 2.4.1-SNAPSHOT
Type    MANAGED
Provider        hive
Table Properties        [transient_lastDdlTime=1563443838]
Statistics      26 bytes
Location        hdfs://namenode.xxx:9000/home/test/hive/warehouse/test.db/gja_test
Serde Library   org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat     org.apache.hadoop.mapred.TextInputFormat
OutputFormat    org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Storage Properties      [field.delim=   , serialization.format= ]
Partition Provider      Catalog
Time taken: 0.642 seconds, Fetched 21 row(s)
```
Table `gja_test` exists three rows of data.
```
spark-sql> select * from gja_test;
a       A       ao
b       B       bo
"__     """__   "
Time taken: 0.665 seconds, Fetched 3 row(s)
```
At finally, I test this function:
```
spark-sql> select * from gja_test where key like value escape '"';
"__     """__   "
Time taken: 0.687 seconds, Fetched 1 row(s)
```

Closes #25001 from beliefer/ansi-sql-like.

Lead-authored-by: gengjiaan <gengjiaan@360.cn>
Co-authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Gengliang Wang <gengliang.wang@databricks.com>
2019-12-06 00:07:38 -08:00
Terry Kim b86d4bb931 [SPARK-30001][SQL] ResolveRelations should handle both V1 and V2 tables
### What changes were proposed in this pull request?

This PR makes `Analyzer.ResolveRelations` responsible for looking up both v1 and v2 tables from the session catalog and create an appropriate relation.

### Why are the changes needed?

Currently there are two issues:
1. As described in [SPARK-29966](https://issues.apache.org/jira/browse/SPARK-29966), the logic for resolving relation can load a table twice, which is a perf regression (e.g., Hive metastore can be accessed twice).
2. As described in [SPARK-30001](https://issues.apache.org/jira/browse/SPARK-30001), if a catalog name is specified for v1 tables, the query fails:
```
scala> sql("create table t using csv as select 1 as i")
res2: org.apache.spark.sql.DataFrame = []

scala> sql("select * from t").show
+---+
|  i|
+---+
|  1|
+---+

scala> sql("select * from spark_catalog.t").show
org.apache.spark.sql.AnalysisException: Table or view not found: spark_catalog.t; line 1 pos 14;
'Project [*]
+- 'UnresolvedRelation [spark_catalog, t]
```

### Does this PR introduce any user-facing change?

Yes. Now the catalog name is resolved correctly:
```
scala> sql("create table t using csv as select 1 as i")
res0: org.apache.spark.sql.DataFrame = []

scala> sql("select * from t").show
+---+
|  i|
+---+
|  1|
+---+

scala> sql("select * from spark_catalog.t").show
+---+
|  i|
+---+
|  1|
+---+
```

### How was this patch tested?

Added new tests.

Closes #26684 from imback82/resolve_relation.

Authored-by: Terry Kim <yuminkim@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2019-12-06 15:45:13 +08:00
Sean Owen 7782b61a31 [SPARK-29392][CORE][SQL][FOLLOWUP] Avoid deprecated (in 2.13) Symbol syntax 'foo in favor of simpler expression, where it generated deprecation warnings
TL;DR - this is more of the same change in https://github.com/apache/spark/pull/26748

I told you it'd be iterative!

Closes #26765 from srowen/SPARK-29392.3.

Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-12-05 13:48:29 -08:00
Kent Yao b9cae37750 [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres
# What changes were proposed in this pull request?
Add an analyzer rule to convert unresolved `Add`, `Subtract`, etc. to `TimeAdd`, `DateAdd`, etc. according to the following policy:
```scala
 /**
   * For [[Add]]:
   * 1. if both side are interval, stays the same;
   * 2. else if one side is interval, turns it to [[TimeAdd]];
   * 3. else if one side is date, turns it to [[DateAdd]] ;
   * 4. else stays the same.
   *
   * For [[Subtract]]:
   * 1. if both side are interval, stays the same;
   * 2. else if the right side is an interval, turns it to [[TimeSub]];
   * 3. else if one side is timestamp, turns it to [[SubtractTimestamps]];
   * 4. else if the right side is date, turns it to [[DateDiff]]/[[SubtractDates]];
   * 5. else if the left side is date, turns it to [[DateSub]];
   * 6. else turns it to stays the same.
   *
   * For [[Multiply]]:
   * 1. If one side is interval, turns it to [[MultiplyInterval]];
   * 2. otherwise, stays the same.
   *
   * For [[Divide]]:
   * 1. If the left side is interval, turns it to [[DivideInterval]];
   * 2. otherwise, stays the same.
   */
```
Besides, we change datetime functions from implicit cast types to strict ones, all available type coercions happen in `DateTimeOperations` coercion rule.
### Why are the changes needed?

Feature Parity between PostgreSQL and Spark, and make the null semantic consistent with Spark.

### Does this PR introduce any user-facing change?

1. date_add/date_sub functions only accept int/tinynit/smallint as the second arg, double/string etc, are forbidden like hive, which produce weird results.

### How was this patch tested?

add ut

Closes #26412 from yaooqinn/SPARK-29774.

Authored-by: Kent Yao <yaooqinn@hotmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2019-12-05 22:03:44 +08:00
Kent Yao 332e252a14 [SPARK-29425][SQL] The ownership of a database should be respected
### What changes were proposed in this pull request?

Keep the owner of a database when executing alter database commands

### Why are the changes needed?

Spark will inadvertently delete the owner of a database for executing databases ddls

### Does this PR introduce any user-facing change?

NO

### How was this patch tested?

add and modify uts

Closes #26080 from yaooqinn/SPARK-29425.

Authored-by: Kent Yao <yaooqinn@hotmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2019-12-05 16:14:27 +08:00
turbofei 0ab922c1eb [SPARK-29860][SQL] Fix dataType mismatch issue for InSubquery
### What changes were proposed in this pull request?
There is an issue for InSubquery expression.
For example, there are two tables `ta` and `tb` created by the below statements.
```
 sql("create table ta(id Decimal(18,0)) using parquet")
 sql("create table tb(id Decimal(19,0)) using parquet")
```
This statement below would thrown dataType mismatch exception.

```
 sql("select * from ta where id in (select id from tb)").show()
```
However, this similar statement could execute successfully.

```
 sql("select * from ta where id in ((select id from tb))").show()
```
The root cause is that, for `InSubquery` expression, it does not find a common type for two decimalType like `In` expression.
Besides that, for `InSubquery` expression, it also does not find a common type for DecimalType and double/float/bigInt.
In this PR, I fix this issue by finding widerType for `InSubquery` expression when DecimalType is involved.

### Why are the changes needed?
Some InSubquery would throw dataType mismatch exception.

### Does this PR introduce any user-facing change?
No.

### How was this patch tested?
Unit test.

Closes #26485 from turboFei/SPARK-29860-in-subquery.

Authored-by: turbofei <fwang12@ebay.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2019-12-05 16:00:16 +08:00
Sean Owen ebd83a544e [SPARK-30009][CORE][SQL][FOLLOWUP] Remove OrderingUtil and Utils.nanSafeCompare{Doubles,Floats} and use java.lang.{Double,Float}.compare directly
### What changes were proposed in this pull request?

Follow up on https://github.com/apache/spark/pull/26654#discussion_r353826162
Instead of OrderingUtil or Utils.nanSafeCompare{Doubles,Floats}, just use java.lang.{Double,Float}.compare directly. All work identically w.r.t. NaN when used to `compare`.

### Why are the changes needed?

Simplification of the previous change, which existed to support Scala 2.13 migration.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Existing tests

Closes #26761 from srowen/SPARK-30009.2.

Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2019-12-05 11:27:25 +08:00
Sean Owen 2ceed6f32c [SPARK-29392][CORE][SQL][FOLLOWUP] Avoid deprecated (in 2.13) Symbol syntax 'foo in favor of simpler expression, where it generated deprecation warnings
### What changes were proposed in this pull request?

Where it generates a deprecation warning in Scala 2.13, replace Symbol shorthand syntax `'foo` with an equivalent.

### Why are the changes needed?

Symbol syntax `'foo` is deprecated in Scala 2.13. The lines changed below otherwise generate about 440 warnings when building for 2.13.

The previous PR directly replaced many usages with `Symbol("foo")`. But it's also used to specify Columns via implicit conversion (`.select('foo)`) or even where simple Strings are used (`.as('foo)`), as it's kind of an abstraction for interned Strings.

While I find this syntax confusing and would like to deprecate it, here I just replaced it where it generates a build warning (not sure why all occurrences don't): `$"foo"` or just `"foo"`.

### Does this PR introduce any user-facing change?

Should not change behavior.

### How was this patch tested?

Existing tests.

Closes #26748 from srowen/SPARK-29392.2.

Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-12-04 15:03:26 -08:00
Sean Owen 4193d2f4cc [SPARK-30012][CORE][SQL] Change classes extending scala collection classes to work with 2.13
### What changes were proposed in this pull request?

Move some classes extending Scala collections into parallel source trees, to support 2.13; other minor collection-related modifications.

Modify some classes extending Scala collections to work with 2.13 as well as 2.12. In many cases, this means introducing parallel source trees, as the type hierarchy changed in ways that one class can't support both.

### Why are the changes needed?

To support building for Scala 2.13 in the future.

### Does this PR introduce any user-facing change?

There should be no behavior change.

### How was this patch tested?

Existing tests. Note that the 2.13 changes are not tested by the PR builder, of course. They compile in 2.13 but can't even be tested locally. Later, once the project can be compiled for 2.13, thus tested, it's possible the 2.13 implementations will need updates.

Closes #26728 from srowen/SPARK-30012.

Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-12-03 08:59:43 -08:00
Kent Yao 65552a81d1 [SPARK-30083][SQL] visitArithmeticUnary should wrap PLUS case with UnaryPositive for type checking
### What changes were proposed in this pull request?

`UnaryPositive` only accepts numeric and interval as we defined, but what we do for this in  `AstBuider.visitArithmeticUnary` is just bypassing it.

This should not be omitted for the type checking requirement.

### Why are the changes needed?

bug fix, you can find a pre-discussion here https://github.com/apache/spark/pull/26578#discussion_r347350398

### Does this PR introduce any user-facing change?
yes,  +non-numeric-or-interval is now invalid.
```
-- !query 14
select +date '1900-01-01'
-- !query 14 schema
struct<DATE '1900-01-01':date>
-- !query 14 output
1900-01-01

-- !query 15
select +timestamp '1900-01-01'
-- !query 15 schema
struct<TIMESTAMP '1900-01-01 00:00:00':timestamp>
-- !query 15 output
1900-01-01 00:00:00

-- !query 16
select +map(1, 2)
-- !query 16 schema
struct<map(1, 2):map<int,int>>
-- !query 16 output
{1:2}

-- !query 17
select +array(1,2)
-- !query 17 schema
struct<array(1, 2):array<int>>
-- !query 17 output
[1,2]

-- !query 18
select -'1'
-- !query 18 schema
struct<(- CAST(1 AS DOUBLE)):double>
-- !query 18 output
-1.0

-- !query 19
select -X'1'
-- !query 19 schema
struct<>
-- !query 19 output
org.apache.spark.sql.AnalysisException
cannot resolve '(- X'01')' due to data type mismatch: argument 1 requires (numeric or interval) type, however, 'X'01'' is of binary type.; line 1 pos 7

-- !query 20
select +X'1'
-- !query 20 schema
struct<X'01':binary>
-- !query 20 output
```

### How was this patch tested?

add ut check

Closes #26716 from yaooqinn/SPARK-30083.

Authored-by: Kent Yao <yaooqinn@hotmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2019-12-03 23:42:21 +08:00
Kent Yao 39291cff95 [SPARK-30048][SQL] Enable aggregates with interval type values for RelationalGroupedDataset
### What changes were proposed in this pull request?

Now the min/max/sum/avg are support for intervals, we should also enable it in RelationalGroupedDataset

### Why are the changes needed?

API consistency improvement

### Does this PR introduce any user-facing change?

yes, Dataset support min/max/sum/avg(mean) on intervals
### How was this patch tested?

add ut

Closes #26681 from yaooqinn/SPARK-30048.

Authored-by: Kent Yao <yaooqinn@hotmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2019-12-03 18:40:14 +08:00
herman d7b268ab32 [SPARK-29348][SQL] Add observable Metrics for Streaming queries
### What changes were proposed in this pull request?
Observable metrics are named arbitrary aggregate functions that can be defined on a query (Dataframe). As soon as the execution of a Dataframe reaches a completion point (e.g. finishes batch query or reaches streaming epoch) a named event is emitted that contains the metrics for the data processed since the last completion point.

A user can observe these metrics by attaching a listener to spark session, it depends on the execution mode which listener to attach:
- Batch: `QueryExecutionListener`. This will be called when the query completes. A user can access the metrics by using the `QueryExecution.observedMetrics` map.
- (Micro-batch) Streaming: `StreamingQueryListener`. This will be called when the streaming query completes an epoch. A user can access the metrics by using the `StreamingQueryProgress.observedMetrics` map. Please note that we currently do not support continuous execution streaming.

### Why are the changes needed?
This enabled observable metrics.

### Does this PR introduce any user-facing change?
Yes. It adds the `observe` method to `Dataset`.

### How was this patch tested?
- Added unit tests for the `CollectMetrics` logical node to the `AnalysisSuite`.
- Added unit tests for `StreamingProgress` JSON serialization to the `StreamingQueryStatusAndProgressSuite`.
- Added integration tests for streaming to the `StreamingQueryListenerSuite`.
- Added integration tests for batch to the `DataFrameCallbackSuite`.

Closes #26127 from hvanhovell/SPARK-29348.

Authored-by: herman <herman@databricks.com>
Signed-off-by: herman <herman@databricks.com>
2019-12-03 11:25:49 +01:00
Ali Afroozeh 68034a8056 [SPARK-30072][SQL] Create dedicated planner for subqueries
### What changes were proposed in this pull request?

This PR changes subquery planning by calling the planner and plan preparation rules on the subquery plan directly. Before we were creating a `QueryExecution` instance for subqueries to get the executedPlan. This would re-run analysis and optimization on the subqueries plan. Running the analysis again on an optimized query plan can have unwanted consequences, as some rules, for example `DecimalPrecision`, are not idempotent.

As an example, consider the expression `1.7 * avg(a)` which after applying the `DecimalPrecision` rule becomes:

```
promote_precision(1.7) * promote_precision(avg(a))
```

After the optimization, more specifically the constant folding rule, this expression becomes:

```
1.7 * promote_precision(avg(a))
```

Now if we run the analyzer on this optimized query again, we will get:

```
promote_precision(1.7) * promote_precision(promote_precision(avg(a)))
```

Which will later optimized as:

```
1.7 * promote_precision(promote_precision(avg(a)))
```

As can be seen, re-running the analysis and optimization on this expression results in an expression with extra nested promote_preceision nodes. Adding unneeded nodes to the plan is problematic because it can eliminate situations where we can reuse the plan.

We opted to introduce dedicated planners for subuqueries, instead of making the DecimalPrecision rule idempotent, because this eliminates this entire category of problems. Another benefit is that planning time for subqueries is reduced.

### How was this patch tested?
Unit tests

Closes #26705 from dbaliafroozeh/CreateDedicatedPlannerForSubqueries.

Authored-by: Ali Afroozeh <ali.afroozeh@databricks.com>
Signed-off-by: herman <herman@databricks.com>
2019-12-02 20:56:40 +01:00
Wenchen Fan e271664a01 [MINOR][SQL] Rename config name to spark.sql.analyzer.failAmbiguousSelfJoin.enabled
### What changes were proposed in this pull request?

add `.enabled` postfix to `spark.sql.analyzer.failAmbiguousSelfJoin`.

### Why are the changes needed?

to follow the existing naming style

### Does this PR introduce any user-facing change?

no

### How was this patch tested?

not needed

Closes #26694 from cloud-fan/conf.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2019-12-02 21:05:06 +08:00
Kent Yao 4e073f3c50 [SPARK-30047][SQL] Support interval types in UnsafeRow
### What changes were proposed in this pull request?

Optimize aggregates on interval values from sort-based to hash-based, and we can use the `org.apache.spark.sql.catalyst.expressions.RowBasedKeyValueBatch` for better performance.

### Why are the changes needed?

improve aggerates

### Does this PR introduce any user-facing change?
no

### How was this patch tested?

add ut and existing ones

Closes #26680 from yaooqinn/SPARK-30047.

Authored-by: Kent Yao <yaooqinn@hotmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2019-12-02 20:47:23 +08:00
LantaoJin 04a5b8f5f8 [SPARK-29839][SQL] Supporting STORED AS in CREATE TABLE LIKE
### What changes were proposed in this pull request?
In SPARK-29421 (#26097) , we can specify a different table provider for `CREATE TABLE LIKE` via `USING provider`.
Hive support `STORED AS` new file format syntax:
```sql
CREATE TABLE tbl(a int) STORED AS TEXTFILE;
CREATE TABLE tbl2 LIKE tbl STORED AS PARQUET;
```
For Hive compatibility, we should also support `STORED AS` in `CREATE TABLE LIKE`.

### Why are the changes needed?
See https://github.com/apache/spark/pull/26097#issue-327424759

### Does this PR introduce any user-facing change?
Add a new syntax based on current CTL:
CREATE TABLE tbl2 LIKE tbl [STORED AS hiveFormat];

### How was this patch tested?
Add UTs.

Closes #26466 from LantaoJin/SPARK-29839.

Authored-by: LantaoJin <jinlantao@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2019-12-02 16:11:58 +08:00
Yuanjian Li 169415ffac [SPARK-30025][CORE] Continuous shuffle block fetching should be disabled by default when the old fetch protocol is used
### What changes were proposed in this pull request?
Disable continuous shuffle block fetching when the old fetch protocol in use.

### Why are the changes needed?
The new feature of continuous shuffle block fetching depends on the latest version of the shuffle fetch protocol. We should keep this constraint in `BlockStoreShuffleReader.fetchContinuousBlocksInBatch`.

### Does this PR introduce any user-facing change?
Users will not get the exception related to continuous shuffle block fetching when old version of the external shuffle service is used.

### How was this patch tested?
Existing UT.

Closes #26663 from xuanyuanking/SPARK-30025.

Authored-by: Yuanjian Li <xyliyuanjian@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2019-12-02 15:59:12 +08:00
HyukjinKwon 51e69feb49 [SPARK-29851][SQL][FOLLOW-UP] Use foreach instead of misusing map
### What changes were proposed in this pull request?

This PR proposes to use foreach instead of misusing map as a small followup of #26476. This could cause some weird errors potentially and it's not a good practice anyway. See also SPARK-16694

### Why are the changes needed?
To avoid potential issues like SPARK-16694

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
Existing tests should cover.

Closes #26729 from HyukjinKwon/SPARK-29851.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2019-12-02 13:40:00 +09:00
Yuanjian Li d1465a1b0d [SPARK-30074][SQL] The maxNumPostShufflePartitions config should obey reducePostShufflePartitions enabled
### What changes were proposed in this pull request?
1. Make maxNumPostShufflePartitions config obey reducePostShfflePartitions config.
2. Update the description for all the SQLConf affected by `spark.sql.adaptive.enabled`.

### Why are the changes needed?
Make the relation between these confs clearer.

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
Existing UT.

Closes #26664 from xuanyuanking/SPARK-9853-follow.

Authored-by: Yuanjian Li <xyliyuanjian@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2019-12-02 12:37:06 +08:00
wuyi 87ebfaf003 [SPARK-29956][SQL] A literal number with an exponent should be parsed to Double
### What changes were proposed in this pull request?

For a literal number with an exponent(e.g. 1e-45, 1E2), we'd parse it to Double by default rather than Decimal. And user could still use  `spark.sql.legacy.exponentLiteralToDecimal.enabled=true` to fall back to previous behavior.

### Why are the changes needed?

According to ANSI standard of SQL, we see that the (part of) definition of `literal` :

```
<approximate numeric literal> ::=
    <mantissa> E <exponent>
```
which indicates that a literal number with an exponent should be approximate numeric(e.g. Double) rather than exact numeric(e.g. Decimal).

And when we test Presto, we found that Presto also conforms to this standard:

```
presto:default> select typeof(1E2);
 _col0
--------
 double
(1 row)
```

```
presto:default> select typeof(1.2);
    _col0
--------------
 decimal(2,1)
(1 row)
```

We also find that, actually, literals like `1E2` are parsed as Double before Spark2.1, but changed to Decimal after #14828 due to *The difference between the two confuses most users* as it said. But we also see support(from DB2 test) of original behavior at #14828 (comment).

Although, we also see that PostgreSQL has its own implementation:

```
postgres=# select pg_typeof(1E2);
 pg_typeof
-----------
 numeric
(1 row)

postgres=# select pg_typeof(1.2);
 pg_typeof
-----------
 numeric
(1 row)
```

We still think that Spark should also conform to this standard while considering SQL standard and Spark own history and majority DBMS and also user experience.

### Does this PR introduce any user-facing change?

Yes.

For `1E2`, before this PR:

```
scala> spark.sql("select 1E2")
res0: org.apache.spark.sql.DataFrame = [1E+2: decimal(1,-2)]
```

After this PR:

```
scala> spark.sql("select 1E2")
res0: org.apache.spark.sql.DataFrame = [100.0: double]
```

And for `1E-45`, before this PR:

```
org.apache.spark.sql.catalyst.parser.ParseException:
decimal can only support precision up to 38
== SQL ==
select 1E-45
  at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:131)
  at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:48)
  at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:76)
  at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:605)
  at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
  at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:605)
  ... 47 elided
```

after this PR:

```
scala> spark.sql("select 1E-45");
res1: org.apache.spark.sql.DataFrame = [1.0E-45: double]
```

And before this PR, user may feel super weird to see that `select 1e40` works but `select 1e-40 fails`. And now, both of them work well.

### How was this patch tested?

updated `literals.sql.out` and `ansi/literals.sql.out`

Closes #26595 from Ngone51/SPARK-29956.

Authored-by: wuyi <ngone_5451@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2019-12-02 11:34:56 +08:00
Yuming Wang 708ab57f37 [SPARK-28461][SQL] Pad Decimal numbers with trailing zeros to the scale of the column
## What changes were proposed in this pull request?

[HIVE-12063](https://issues.apache.org/jira/browse/HIVE-12063) improved pad decimal numbers with trailing zeros to the scale of the column. The following description is copied from the description of HIVE-12063.

> HIVE-7373 was to address the problems of trimming tailing zeros by Hive, which caused many problems including treating 0.0, 0.00 and so on as 0, which has different precision/scale. Please refer to HIVE-7373 description. However, HIVE-7373 was reverted by HIVE-8745 while the underlying problems remained. HIVE-11835 was resolved recently to address one of the problems, where 0.0, 0.00, and so on cannot be read into decimal(1,1).
 However, HIVE-11835 didn't address the problem of showing as 0 in query result for any decimal values such as 0.0, 0.00, etc. This causes confusion as 0 and 0.0 have different precision/scale than 0.
The proposal here is to pad zeros for query result to the type's scale. This not only removes the confusion described above, but also aligns with many other DBs. Internal decimal number representation doesn't change, however.

**Spark SQL**:
```sql
// bin/spark-sql
spark-sql> select cast(1 as decimal(38, 18));
1
spark-sql>

// bin/beeline
0: jdbc:hive2://localhost:10000/default> select cast(1 as decimal(38, 18));
+----------------------------+--+
| CAST(1 AS DECIMAL(38,18))  |
+----------------------------+--+
| 1.000000000000000000       |
+----------------------------+--+

// bin/spark-shell
scala> spark.sql("select cast(1 as decimal(38, 18))").show(false)
+-------------------------+
|CAST(1 AS DECIMAL(38,18))|
+-------------------------+
|1.000000000000000000     |
+-------------------------+

// bin/pyspark
>>> spark.sql("select cast(1 as decimal(38, 18))").show()
+-------------------------+
|CAST(1 AS DECIMAL(38,18))|
+-------------------------+
|     1.000000000000000000|
+-------------------------+

// bin/sparkR
> showDF(sql("SELECT cast(1 as decimal(38, 18))"))
+-------------------------+
|CAST(1 AS DECIMAL(38,18))|
+-------------------------+
|     1.000000000000000000|
+-------------------------+
```

**PostgreSQL**:
```sql
postgres=# select cast(1 as decimal(38, 18));
       numeric
----------------------
 1.000000000000000000
(1 row)
```
**Presto**:
```sql
presto> select cast(1 as decimal(38, 18));
        _col0
----------------------
 1.000000000000000000
(1 row)
```

## How was this patch tested?

unit tests and manual test:
```sql
spark-sql> select cast(1 as decimal(38, 18));
1.000000000000000000
```
Spark SQL Upgrading Guide:
![image](https://user-images.githubusercontent.com/5399861/69649620-4405c380-10a8-11ea-84b1-6ee675663b98.png)

Closes #26697 from wangyum/SPARK-28461.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2019-12-02 09:02:39 +09:00
Dongjoon Hyun 9cd174a7c9 Revert "[SPARK-28461][SQL] Pad Decimal numbers with trailing zeros to the scale of the column"
This reverts commit 19af1fe3a2.
2019-11-27 11:07:08 -08:00
fuwhu 16da714ea5 [SPARK-29979][SQL][FOLLOW-UP] improve the output of DesribeTableExec
### What changes were proposed in this pull request?
refine the output of "DESC TABLE" command.

After this PR, the output of "DESC TABLE" command is like below :

```
id                        bigint
data                      string

# Partitioning
Part 0                    id

# Detailed Table Information
Name                      testca.table_name
Comment                   this is a test table
Location                  /tmp/testcat/table_name
Provider                  foo
Table Properties          [bar=baz]
```

### Why are the changes needed?
Currently, "DESC TABLE" will show reserved properties (eg. location, comment) in the "Table Property" section.
Since reserved properties are different from common properties, displaying reserved properties together with other table detailed information and displaying other properties in single field should be reasonable, and it is consistent with hive and DescribeTableCommand action.

### Does this PR introduce any user-facing change?
yes, the output of "DESC TABLE" command is refined as above.

### How was this patch tested?
Update existing unit tests.

Closes #26677 from fuwhu/SPARK-29979-FOLLOWUP-1.

Authored-by: fuwhu <bestwwg@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2019-11-27 23:16:53 +08:00
Yuming Wang 19af1fe3a2 [SPARK-28461][SQL] Pad Decimal numbers with trailing zeros to the scale of the column
## What changes were proposed in this pull request?

[HIVE-12063](https://issues.apache.org/jira/browse/HIVE-12063) improved pad decimal numbers with trailing zeros to the scale of the column. The following description is copied from the description of HIVE-12063.

> HIVE-7373 was to address the problems of trimming tailing zeros by Hive, which caused many problems including treating 0.0, 0.00 and so on as 0, which has different precision/scale. Please refer to HIVE-7373 description. However, HIVE-7373 was reverted by HIVE-8745 while the underlying problems remained. HIVE-11835 was resolved recently to address one of the problems, where 0.0, 0.00, and so on cannot be read into decimal(1,1).
 However, HIVE-11835 didn't address the problem of showing as 0 in query result for any decimal values such as 0.0, 0.00, etc. This causes confusion as 0 and 0.0 have different precision/scale than 0.
The proposal here is to pad zeros for query result to the type's scale. This not only removes the confusion described above, but also aligns with many other DBs. Internal decimal number representation doesn't change, however.

**Spark SQL**:
```sql
// bin/spark-sql
spark-sql> select cast(1 as decimal(38, 18));
1
spark-sql>

// bin/beeline
0: jdbc:hive2://localhost:10000/default> select cast(1 as decimal(38, 18));
+----------------------------+--+
| CAST(1 AS DECIMAL(38,18))  |
+----------------------------+--+
| 1.000000000000000000       |
+----------------------------+--+

// bin/spark-shell
scala> spark.sql("select cast(1 as decimal(38, 18))").show(false)
+-------------------------+
|CAST(1 AS DECIMAL(38,18))|
+-------------------------+
|1.000000000000000000     |
+-------------------------+

// bin/pyspark
>>> spark.sql("select cast(1 as decimal(38, 18))").show()
+-------------------------+
|CAST(1 AS DECIMAL(38,18))|
+-------------------------+
|     1.000000000000000000|
+-------------------------+

// bin/sparkR
> showDF(sql("SELECT cast(1 as decimal(38, 18))"))
+-------------------------+
|CAST(1 AS DECIMAL(38,18))|
+-------------------------+
|     1.000000000000000000|
+-------------------------+
```

**PostgreSQL**:
```sql
postgres=# select cast(1 as decimal(38, 18));
       numeric
----------------------
 1.000000000000000000
(1 row)
```
**Presto**:
```sql
presto> select cast(1 as decimal(38, 18));
        _col0
----------------------
 1.000000000000000000
(1 row)
```

## How was this patch tested?

unit tests and manual test:
```sql
spark-sql> select cast(1 as decimal(38, 18));
1.000000000000000000
```
Spark SQL Upgrading Guide:
![image](https://user-images.githubusercontent.com/5399861/69649620-4405c380-10a8-11ea-84b1-6ee675663b98.png)

Closes #25214 from wangyum/SPARK-28461.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2019-11-27 18:13:33 +09:00
wuyi a58d91b159 [SPARK-29768][SQL] Column pruning through nondeterministic expressions
### What changes were proposed in this pull request?

Support columnar pruning through non-deterministic expressions.

### Why are the changes needed?

In some cases, columns can still be pruned even though nondeterministic expressions appears.
e.g. for the plan  `Filter('a = 1, Project(Seq('a, rand() as 'r), LogicalRelation('a, 'b)))`, we shall still prune column b while non-deterministic expression appears.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Added a new test file: `ScanOperationSuite`.
Added test in `FileSourceStrategySuite` to verify the right prune behavior for both DS v1 and v2.

Closes #26629 from Ngone51/SPARK-29768.

Authored-by: wuyi <ngone_5451@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2019-11-27 15:37:01 +08:00
Kent Yao 4fd585d2c5 [SPARK-30008][SQL] The dataType of collect_list/collect_set aggs should be ArrayType(_, false)
### What changes were proposed in this pull request?

```scala
// Do not allow null values. We follow the semantics of Hive's collect_list/collect_set here.
// See: org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMkCollectionEvaluator
```
These two functions do not allow null values as they are defined, so their elements should not contain null.

### Why are the changes needed?

Casting collect_list(a) to ArrayType(_, false) fails before this fix.

### Does this PR introduce any user-facing change?

no

### How was this patch tested?

add ut

Closes #26651 from yaooqinn/SPARK-30008.

Authored-by: Kent Yao <yaooqinn@hotmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-11-26 20:40:21 -08:00
Kent Yao ed0c33fdd4 [SPARK-30026][SQL] Whitespaces can be identified as delimiters in interval string
### What changes were proposed in this pull request?

We are now able to handle whitespaces for integral and fractional types, and the leading or trailing whitespaces for interval, date, and timestamps. But the current interval parser is not able to identify whitespaces as separates as PostgreSQL can do.

This PR makes the whitespaces handling be consistent for nterval values.
Typed interval literal, multi-unit representation, and casting from strings are all supported.

```sql
postgres=# select interval E'1 \t day';
 interval
----------
 1 day
(1 row)

postgres=# select interval E'1\t' day;
 interval
----------
 1 day
(1 row)
```

### Why are the changes needed?

Whitespace handling should be consistent for interval value, and across different types in Spark.
PostgreSQL feature parity.

### Does this PR introduce any user-facing change?
Yes, the interval string of multi-units values which separated by whitespaces can be valid now.

### How was this patch tested?
add ut.

Closes #26662 from yaooqinn/SPARK-30026.

Authored-by: Kent Yao <yaooqinn@hotmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2019-11-27 01:20:38 +08:00
Sean Owen 29018025ba [SPARK-30009][CORE][SQL] Support different floating-point Ordering for Scala 2.12 / 2.13
### What changes were proposed in this pull request?

Make separate source trees for Scala 2.12/2.13 in order to accommodate mutually-incompatible support for Ordering of double, float.

Note: This isn't the last change that will need a split source tree for 2.13. But this particular change could go several ways:

- (Split source tree)
- Inline the Scala 2.12 implementation
- Reflection

For this change alone any are possible, and splitting the source tree is a bit overkill. But if it will be necessary for other JIRAs (see umbrella SPARK-25075), then it might be the easiest way to implement this.

### Why are the changes needed?

Scala 2.13 split Ordering.Double into Ordering.Double.TotalOrdering and Ordering.Double.IeeeOrdering. Neither can be used in a single build that supports 2.12 and 2.13.

TotalOrdering works like java.lang.Double.compare. IeeeOrdering works like Scala 2.12 Ordering.Double. They differ in how NaN is handled - compares always above other values? or always compares as 'false'? In theory they have different uses: TotalOrdering is important if floating-point values are sorted. IeeeOrdering behaves like 2.12 and JVM comparison operators.

I chose TotalOrdering as I think we care more about stable sorting, and because elsewhere we rely on java.lang comparisons. It is also possible to support with two methods.

### Does this PR introduce any user-facing change?

Pending tests, will see if it obviously affects any sort order. We need to see if it changes NaN sort order.

### How was this patch tested?

Existing tests so far.

Closes #26654 from srowen/SPARK-30009.

Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-11-26 08:25:53 -08:00
Huaxin Gao 373c2c3f44 [SPARK-29862][SQL] CREATE (OR REPLACE) ... VIEW should look up catalog/table like v2 commands
### What changes were proposed in this pull request?
Add CreateViewStatement and make CREARE VIEW  go through the same catalog/table resolution framework of v2 commands.

### Why are the changes needed?
It's important to make all the commands have the same table resolution behavior, to avoid confusing end-users. e.g.
```
USE my_catalog
DESC v // success and describe the view v from my_catalog
CREATE VIEW v AS SELECT 1 // report view not found as there is no view v in the session catalog
```
### Does this PR introduce any user-facing change?
Yes. When running CREATE VIEW ...  Spark fails the command if the current catalog is set to a v2 catalog, or the view name specified a v2 catalog.

### How was this patch tested?
unit tests

Closes #26649 from huaxingao/spark-29862.

Authored-by: Huaxin Gao <huaxing@us.ibm.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2019-11-26 14:10:46 +08:00
Kent Yao 8b0121bea8 [MINOR][DOC] Fix the CalendarIntervalType description
### What changes were proposed in this pull request?

fix the overdue and incorrect description about CalendarIntervalType

### Why are the changes needed?

api doc correctness

### Does this PR introduce any user-facing change?

no
### How was this patch tested?

no

Closes #26659 from yaooqinn/intervaldoc.

Authored-by: Kent Yao <yaooqinn@hotmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2019-11-26 12:49:56 +08:00
fuwhu 29ebd9336c [SPARK-29979][SQL] Add basic/reserved property key constants in TableCatalog and SupportsNamespaces
### What changes were proposed in this pull request?
Add "comment" and "location" property key constants in TableCatalog and SupportNamespaces.
And update code of implementation classes to use these constants instead of hard code.

### Why are the changes needed?
Currently, some basic/reserved keys (eg. "location", "comment") of table and namespace properties are hard coded or defined in specific logical plan implementation class.
These keys can be centralized in TableCatalog and SupportsNamespaces interface and shared across different implementation classes.

### Does this PR introduce any user-facing change?
no

### How was this patch tested?
Existing unit test

Closes #26617 from fuwhu/SPARK-29979.

Authored-by: fuwhu <bestwwg@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2019-11-26 01:24:43 +08:00
Kent Yao de21f28f8a [SPARK-29986][SQL] casting string to date/timestamp/interval should trim all whitespaces
### What changes were proposed in this pull request?

A java like string trim method trims all whitespaces that less or equal than 0x20. currently, our UTF8String handle the space =0x20 ONLY. This is not suitable for many cases in Spark, like trim for interval strings, date, timestamps, PostgreSQL like cast string to boolean.

### Why are the changes needed?

improve the white spaces handling in UTF8String, also with some bugs fixed

### Does this PR introduce any user-facing change?

yes,
string with `control character` at either end can be convert to date/timestamp and interval now

### How was this patch tested?

add ut

Closes #26626 from yaooqinn/SPARK-29986.

Authored-by: Kent Yao <yaooqinn@hotmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2019-11-25 14:37:04 +08:00
Kent Yao 5cf475d288 [SPARK-30000][SQL] Trim the string when cast string type to decimals
### What changes were proposed in this pull request?

https://bugs.openjdk.java.net/browse/JDK-8170259
https://bugs.openjdk.java.net/browse/JDK-8170563

When we cast string type to decimal type, we rely on java.math. BigDecimal. It can't accept leading and training spaces, as you can see in the above links. This behavior is not consistent with other numeric types now. we need to fix it and keep consistency.

### Why are the changes needed?

make string to numeric types be consistent

### Does this PR introduce any user-facing change?

yes, string removed trailing or leading white spaces will be able to convert to decimal if the trimmed is valid

### How was this patch tested?

1. modify ut

#### Benchmark
```scala
/*
 * Licensed to the Apache Software Foundation (ASF) under one or more
 * contributor license agreements.  See the NOTICE file distributed with
 * this work for additional information regarding copyright ownership.
 * The ASF licenses this file to You under the Apache License, Version 2.0
 * (the "License"); you may not use this file except in compliance with
 * the License.  You may obtain a copy of the License at
 *
 *    http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

package org.apache.spark.sql.execution.benchmark

import org.apache.spark.benchmark.Benchmark

/**
 * Benchmark trim the string when casting string type to Boolean/Numeric types.
 * To run this benchmark:
 * {{{
 *   1. without sbt:
 *      bin/spark-submit --class <this class> --jars <spark core test jar> <spark sql test jar>
 *   2. build/sbt "sql/test:runMain <this class>"
 *   3. generate result: SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain <this class>"
 *      Results will be written to "benchmarks/CastBenchmark-results.txt".
 * }}}
 */
object CastBenchmark extends SqlBasedBenchmark {

  override def runBenchmarkSuite(mainArgs: Array[String]): Unit = {
    val title = "Cast String to Integral"
    runBenchmark(title) {
      withTempPath { dir =>
        val N = 500L << 14
        val df = spark.range(N)
        val types = Seq("decimal")
        (1 to 5).by(2).foreach { i =>
          df.selectExpr(s"concat(id, '${" " * i}') as str")
            .write.mode("overwrite").parquet(dir + i.toString)
        }

        val benchmark = new Benchmark(title, N, minNumIters = 5, output = output)
        Seq(true, false).foreach { trim =>
          types.foreach { t =>
            val str = if (trim) "trim(str)" else "str"
            val expr = s"cast($str as $t) as c_$t"
            (1 to 5).by(2).foreach { i =>
              benchmark.addCase(expr + s" - with $i spaces") { _ =>
                spark.read.parquet(dir + i.toString).selectExpr(expr).collect()
              }
            }
          }
        }
        benchmark.run()
      }
    }
  }
}

```

#### string trim vs not trim
```java
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_231-b11 on Mac OS X 10.15.1
[info] Intel(R) Core(TM) i9-9980HK CPU  2.40GHz
[info] Cast String to Integral:                  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] cast(trim(str) as decimal) as c_decimal - with 1 spaces           3362           5486         NaN          2.4         410.4       1.0X
[info] cast(trim(str) as decimal) as c_decimal - with 3 spaces           3251           5655         NaN          2.5         396.8       1.0X
[info] cast(trim(str) as decimal) as c_decimal - with 5 spaces           3208           5725         NaN          2.6         391.7       1.0X
[info] cast(str as decimal) as c_decimal - with 1 spaces          13962          16233        1354          0.6        1704.3       0.2X
[info] cast(str as decimal) as c_decimal - with 3 spaces          14273          14444         179          0.6        1742.4       0.2X
[info] cast(str as decimal) as c_decimal - with 5 spaces          14318          14535         125          0.6        1747.8       0.2X
```
#### string trim vs this fix
```java
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_231-b11 on Mac OS X 10.15.1
[info] Intel(R) Core(TM) i9-9980HK CPU  2.40GHz
[info] Cast String to Integral:                  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] cast(trim(str) as decimal) as c_decimal - with 1 spaces           3265           6299         NaN          2.5         398.6       1.0X
[info] cast(trim(str) as decimal) as c_decimal - with 3 spaces           3183           6241         693          2.6         388.5       1.0X
[info] cast(trim(str) as decimal) as c_decimal - with 5 spaces           3167           5923        1151          2.6         386.7       1.0X
[info] cast(str as decimal) as c_decimal - with 1 spaces           3161           5838        1126          2.6         385.9       1.0X
[info] cast(str as decimal) as c_decimal - with 3 spaces           3046           3457         837          2.7         371.8       1.1X
[info] cast(str as decimal) as c_decimal - with 5 spaces           3053           4445         NaN          2.7         372.7       1.1X
[info]
```

Closes #26640 from yaooqinn/SPARK-30000.

Authored-by: Kent Yao <yaooqinn@hotmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2019-11-25 12:47:07 +08:00
Sean Owen 13896e4eae [SPARK-30013][SQL] For scala 2.13, omit parens in various BigDecimal value() methods
### What changes were proposed in this pull request?

Omit parens on calls like BigDecimal.longValue()

### Why are the changes needed?

For some reason, this won't compile in Scala 2.13. The calls are otherwise equivalent in 2.12.

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

Existing tests

Closes #26653 from srowen/SPARK-30013.

Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-11-24 18:23:34 -08:00
Takeshi Yamamuro 3f3a18fff1 [SPARK-24690][SQL] Add a config to control plan stats computation in LogicalRelation
### What changes were proposed in this pull request?

This pr proposes a new independent config so that `LogicalRelation` could use `rowCount` to compute data statistics in logical plans even if CBO disabled. In the master, we currently cannot enable `StarSchemaDetection.reorderStarJoins` because we need to turn off CBO to enable it but `StarSchemaDetection` internally references the `rowCount` that is used in LogicalRelation if CBO disabled.

### Why are the changes needed?

Plan stats are pretty useful other than CBO, e.g., star-schema detector and dynamic partition pruning.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Added tests in `DataFrameJoinSuite`.

Closes #21668 from maropu/PlanStatsConf.

Authored-by: Takeshi Yamamuro <yamamuro@apache.org>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-11-24 08:30:24 -08:00
Dongjoon Hyun 13338eaa95 [SPARK-29554][SQL][FOLLOWUP] Rename Version to SparkVersion
### What changes were proposed in this pull request?

This is a follow-up of https://github.com/apache/spark/pull/26209 .
This renames class `Version` to class `SparkVersion`.

### Why are the changes needed?

According to the review comment, this uses more specific class name.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Pass the Jenkins with the existing tests.

Closes #26647 from dongjoon-hyun/SPARK-29554.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-11-23 19:53:52 -08:00
HyukjinKwon fc7a37b147 [SPARK-30003][SQL] Do not throw stack overflow exception in non-root unknown hint resolution
### What changes were proposed in this pull request?
This is rather a followup of https://github.com/apache/spark/pull/25464 (see https://github.com/apache/spark/pull/25464/files#r349543286)

It will cause an infinite recursion via mapping children - we should return the hint rather than its parent plan in unknown hint resolution.

### Why are the changes needed?

Prevent Stack over flow during hint resolution.

### Does this PR introduce any user-facing change?

Yes, it avoids stack overflow exception It was caused by https://github.com/apache/spark/pull/25464 and this is only in the master.

No behaviour changes to end users as it happened only in the master.

### How was this patch tested?

Unittest was added.

Closes #26642 from HyukjinKwon/SPARK-30003.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2019-11-23 17:24:56 +09:00
Wenchen Fan 6e581cf164 [SPARK-29893][SQL][FOLLOWUP] code cleanup for local shuffle reader
### What changes were proposed in this pull request?

A few cleanups for https://github.com/apache/spark/pull/26516:
1. move the calculating of partition start indices from the RDD to the rule. We can reuse code from "shrink number of reducers" in the future if we split partitions by size.
2. only check extra shuffles when adding local readers to the probe side.
3. add comments.
4. simplify the config name: `optimizedLocalShuffleReader` -> `localShuffleReader`

### Why are the changes needed?

make code more maintainable.

### Does this PR introduce any user-facing change?

no

### How was this patch tested?

existing tests

Closes #26625 from cloud-fan/aqe.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Xiao Li <gatorsmile@gmail.com>
2019-11-22 10:26:54 -08:00
Wenchen Fan 6b4b6a87cd [SPARK-29558][SQL] ResolveTables and ResolveRelations should be order-insensitive
### What changes were proposed in this pull request?

Make `ResolveRelations` call `ResolveTables` at the beginning, and make `ResolveTables` call `ResolveTempViews`(newly added) at the beginning, to ensure the relation resolution priority.

### Why are the changes needed?

To resolve an `UnresolvedRelation`, the general process is:
1. try to resolve to (global) temp view first. If it's not a temp view, move on
2. if the table name specifies a catalog, lookup the table from the specified catalog. Otherwise, lookup table from the current catalog.
3. when looking up table from session catalog, return a v1 relation if the table provider is v1.

Currently, this process is done by 2 rules: `ResolveTables` and `ResolveRelations`. To avoid rule conflicts, we add a lot of checks:
1. `ResolveTables` only resolves `UnresolvedRelation` if it's not a temp view and the resolved table is not v1.
2. `ResolveRelations` only resolves `UnresolvedRelation` if the table name has less than 2 parts.

This requires to run `ResolveTables` before `ResolveRelations`, otherwise we may resolve a v2 table to a v1 relation.

To clearly guarantee the resolution priority, and avoid massive changes, this PR proposes to call one rule in another rule to ensure the rule execution order. Now the process is simple:
1. first run `ResolveTempViews`, see if we can resolve relation to temp view
2. then run `ResolveTables`, see if we can resolve relation to v2 tables.
3. finally run `ResolveRelations`, see if we can resolve relation to v1 tables.

### Does this PR introduce any user-facing change?

no

### How was this patch tested?

existing tests

Closes #26214 from cloud-fan/resolve.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Ryan Blue <blue@apache.org>
2019-11-21 09:47:42 -08:00
Ximo Guanter 54c5087a3a [SPARK-29248][SQL] provider number of partitions when creating v2 data writer factory
### What changes were proposed in this pull request?
When implementing a ScanBuilder, we require the implementor to provide the schema of the data and the number of partitions.

However, when someone is implementing WriteBuilder we only pass them the schema, but not the number of partitions. This is an asymetrical developer experience.

This PR adds a PhysicalWriteInfo interface that is passed to createBatchWriterFactory and createStreamingWriterFactory that adds the number of partitions of the data that is going to be written.

### Why are the changes needed?
Passing in the number of partitions on the WriteBuilder would enable data sources to provision their write targets before starting to write. For example:

it could be used to provision a Kafka topic with a specific number of partitions
it could be used to scale a microservice prior to sending the data to it
it could be used to create a DsV2 that sends the data to another spark cluster (currently not possible since the reader wouldn't be able to know the number of partitions)
### Does this PR introduce any user-facing change?
No

### How was this patch tested?
Tests passed

Closes #26591 from edrevo/temp.

Authored-by: Ximo Guanter <joaquin.guantergonzalbez@telefonica.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2019-11-22 00:19:25 +08:00
Takeshi Yamamuro cdcd43cbf2 [SPARK-29977][SQL] Remove newMutableProjection/newOrdering/newNaturalAscendingOrdering from SparkPlan
### What changes were proposed in this pull request?

This is to refactor `SparkPlan` code; it mainly removed `newMutableProjection`/`newOrdering`/`newNaturalAscendingOrdering` from `SparkPlan`.
The other modifications are listed below;
 - Move `BaseOrdering` from `o.a.s.sqlcatalyst.expressions.codegen.GenerateOrdering.scala` to `o.a.s.sqlcatalyst.expressions.ordering.scala`
 - `RowOrdering` extends `CodeGeneratorWithInterpretedFallback ` for `BaseOrdering`
 - Remove the unused variables (`subexpressionEliminationEnabled` and `codeGenFallBack`) from `SparkPlan`

### Why are the changes needed?

For better code/test coverage.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Existing.

Closes #26615 from maropu/RefactorOrdering.

Authored-by: Takeshi Yamamuro <yamamuro@apache.org>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2019-11-21 23:51:12 +08:00
Kent Yao 7a70670345 [SPARK-29961][SQL] Implement builtin function - typeof
### What changes were proposed in this pull request?
Add typeof function for Spark to get the underlying type of value.
```sql
-- !query 0
select typeof(1)
-- !query 0 schema
struct<typeof(1):string>
-- !query 0 output
int

-- !query 1
select typeof(1.2)
-- !query 1 schema
struct<typeof(1.2):string>
-- !query 1 output
decimal(2,1)

-- !query 2
select typeof(array(1, 2))
-- !query 2 schema
struct<typeof(array(1, 2)):string>
-- !query 2 output
array<int>

-- !query 3
select typeof(a) from (values (1), (2), (3.1)) t(a)
-- !query 3 schema
struct<typeof(a):string>
-- !query 3 output
decimal(11,1)
decimal(11,1)
decimal(11,1)

```

##### presto

```sql
presto> select typeof(array[1]);
     _col0
----------------
 array(integer)
(1 row)
```
##### PostgreSQL

```sql
postgres=# select pg_typeof(a) from (values (1), (2), (3.0)) t(a);
 pg_typeof
-----------
 numeric
 numeric
 numeric
(3 rows)
```
##### impala
https://issues.apache.org/jira/browse/IMPALA-1597

### Why are the changes needed?
a function which is better we have to help us debug, test, develop ...

### Does this PR introduce any user-facing change?

add a new function
### How was this patch tested?

add ut and example

Closes #26599 from yaooqinn/SPARK-29961.

Authored-by: Kent Yao <yaooqinn@hotmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2019-11-21 10:28:32 +09:00
LantaoJin 06e203b856 [SPARK-29911][SQL] Uncache cached tables when session closed
### What changes were proposed in this pull request?
The local temporary view is session-scoped. Its lifetime is the lifetime of the session that created it.  But now cache data is cross-session. Its lifetime is the lifetime of the Spark application. That's will cause the memory leak if cache a local temporary view in memory when the session closed.
In this PR, we uncache the cached data of local temporary view when session closed. This PR doesn't impact the cached data of global temp view and persisted view.

How to reproduce:
1. create a local temporary view v1
2. cache it in memory
3. close session without drop table v1.

The application will hold the memory forever. In a long running thrift server scenario. It's worse.
```shell
0: jdbc:hive2://localhost:10000> CACHE TABLE testCacheTable AS SELECT 1;
CACHE TABLE testCacheTable AS SELECT 1;
+---------+--+
| Result  |
+---------+--+
+---------+--+
No rows selected (1.498 seconds)
0: jdbc:hive2://localhost:10000> !close
!close
Closing: 0: jdbc:hive2://localhost:10000
0: jdbc:hive2://localhost:10000 (closed)> !connect 'jdbc:hive2://localhost:10000'
!connect 'jdbc:hive2://localhost:10000'
Connecting to jdbc:hive2://localhost:10000
Enter username for jdbc:hive2://localhost:10000:
lajin
Enter password for jdbc:hive2://localhost:10000:
***
Connected to: Spark SQL (version 3.0.0-SNAPSHOT)
Driver: Hive JDBC (version 1.2.1.spark2)
Transaction isolation: TRANSACTION_REPEATABLE_READ
1: jdbc:hive2://localhost:10000> select * from testCacheTable;
select * from testCacheTable;
Error: Error running query: org.apache.spark.sql.AnalysisException: Table or view not found: testCacheTable; line 1 pos 14;
'Project [*]
+- 'UnresolvedRelation [testCacheTable] (state=,code=0)
```
<img width="1047" alt="Screen Shot 2019-11-15 at 2 03 49 PM" src="https://user-images.githubusercontent.com/1853780/68923527-7ca8c180-07b9-11ea-9cc7-74f276c46840.png">

### Why are the changes needed?
Resolve memory leak for thrift server

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
Manual test in UI storage tab
And add an UT

Closes #26543 from LantaoJin/SPARK-29911.

Authored-by: LantaoJin <jinlantao@gmail.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-11-20 18:19:30 -06:00
Sean Owen 1febd373ea [MINOR][TESTS] Replace JVM assert with JUnit Assert in tests
### What changes were proposed in this pull request?

Use JUnit assertions in tests uniformly, not JVM assert() statements.

### Why are the changes needed?

assert() statements do not produce as useful errors when they fail, and, if they were somehow disabled, would fail to test anything.

### Does this PR introduce any user-facing change?

No. The assertion logic should be identical.

### How was this patch tested?

Existing tests.

Closes #26581 from srowen/assertToJUnit.

Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-11-20 14:04:15 -06:00
Yuanjian Li 23b3c4fafd [SPARK-29951][SQL] Make the behavior of Postgre dialect independent of ansi mode config
### What changes were proposed in this pull request?
Fix the inconsistent behavior of build-in function SQL LEFT/RIGHT.

### Why are the changes needed?
As the comment in https://github.com/apache/spark/pull/26497#discussion_r345708065, Postgre dialect should not be affected by the ANSI mode config.
During reran the existing tests, only the LEFT/RIGHT build-in SQL function broke the assumption. We fix this by following https://www.postgresql.org/docs/12/sql-keywords-appendix.html: `LEFT/RIGHT reserved (can be function or type)`

### Does this PR introduce any user-facing change?
Yes, the Postgre dialect will not be affected by the ANSI mode config.

### How was this patch tested?
Existing UT.

Closes #26584 from xuanyuanking/SPARK-29951.

Authored-by: Yuanjian Li <xyliyuanjian@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2019-11-21 00:56:48 +08:00
Takeshi Yamamuro 0032d85153 [SPARK-29968][SQL] Remove the Predicate code from SparkPlan
### What changes were proposed in this pull request?

This is to refactor Predicate code; it mainly removed `newPredicate` from `SparkPlan`.
Modifications are listed below;
 - Move `Predicate` from `o.a.s.sqlcatalyst.expressions.codegen.GeneratePredicate.scala` to `o.a.s.sqlcatalyst.expressions.predicates.scala`
 - To resolve the name conflict,  rename `o.a.s.sqlcatalyst.expressions.codegen.Predicate` to `o.a.s.sqlcatalyst.expressions.BasePredicate`
 - Extend `CodeGeneratorWithInterpretedFallback ` for `BasePredicate`

This comes from the cloud-fan suggestion: https://github.com/apache/spark/pull/26420#discussion_r348005497

### Why are the changes needed?

For better code/test coverage.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Existing tests.

Closes #26604 from maropu/RefactorPredicate.

Authored-by: Takeshi Yamamuro <yamamuro@apache.org>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2019-11-20 21:13:51 +08:00
Nikita Konda 5a70af7a6c [SPARK-29029][SQL] Use AttributeMap in PhysicalOperation.collectProjectsAndFilters
### What changes were proposed in this pull request?

This PR fixes the issue of substituting aliases while collecting filters in  `PhysicalOperation.collectProjectsAndFilters`. When the `AttributeReference` in alias map differs from the `AttributeReference` in filter condition only in qualifier, it does not substitute alias and throws exception saying `key videoid#47L not found` in the following scenario.

```
[1] Project [userid#0]
+- [2] Filter (isnotnull(videoid#47L) && NOT (videoid#47L = 30))
   +- [3] Project [factorial(videoid#1) AS videoid#47L, userid#0]
      +- [4] Filter (isnotnull(avebitrate#2) && (avebitrate#2 < 10))
         +- [5] Relation[userid#0,videoid#1,avebitrate#2]
```

### Why are the changes needed?

We need to use `AttributeMap` where the key is `AttributeReference`'s `ExprId` instead of `Map[Attribute, Expression]` while collecting and substituting aliases in `PhysicalOperation.collectProjectsAndFilters`.

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
New unit tests were added in `TestPhysicalOperation` which reproduces the bug

Closes #25761 from nikitagkonda/SPARK-29029-use-attributemap-for-aliasmap-in-physicaloperation.

Authored-by: Nikita Konda <nikita.konda@workday.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-11-19 20:01:42 -08:00
Wenchen Fan 9e58b10c8e [SPARK-29945][SQL] do not handle negative sign specially in the parser
### What changes were proposed in this pull request?

Remove the special handling of the negative sign in the parser (interval literal and type constructor)

### Why are the changes needed?

The negative sign is an operator (UnaryMinus). We don't need to handle it specially, which is kind of doing constant folding at parser side.

### Does this PR introduce any user-facing change?

The error message becomes a little different. Now it reports type mismatch for the `-` operator.

### How was this patch tested?

existing tests

Closes #26578 from cloud-fan/interval.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
2019-11-20 11:08:04 +09:00
Maxim Gekk 40b8a08b8b [SPARK-29963][SQL][TESTS] Check formatting timestamps up to microsecond precision by JSON/CSV datasource
### What changes were proposed in this pull request?
In the PR, I propose to add tests from the commit 47cb1f359a for Spark 2.4 that check formatting of timestamp strings for various seconds fractions.

### Why are the changes needed?
To make sure that current behavior is the same as in Spark 2.4

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
By running `CSVSuite`, `JsonFunctionsSuite` and `TimestampFormatterSuite`.

Closes #26601 from MaxGekk/format-timestamp-micros-tests.

Authored-by: Maxim Gekk <max.gekk@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2019-11-20 10:34:25 +09:00
Kent Yao 79ed4ae2db [SPARK-29926][SQL] Fix weird interval string whose value is only a dangling decimal point
### What changes were proposed in this pull request?

Currently, we support to parse '1. second' to 1s or even '. second' to 0s.

```sql
-- !query 118
select interval '1. seconds'
-- !query 118 schema
struct<1 seconds:interval>
-- !query 118 output
1 seconds

-- !query 119
select interval '. seconds'
-- !query 119 schema
struct<0 seconds:interval>
-- !query 119 output
0 seconds
```

```sql
postgres=# select interval '1. second';
ERROR:  invalid input syntax for type interval: "1. second"
LINE 1: select interval '1. second';

postgres=# select interval '. second';
ERROR:  invalid input syntax for type interval: ". second"
LINE 1: select interval '. second';
```
We fix this by fixing the new interval parser's VALUE_FRACTIONAL_PART state

With further digging, we found that 1. is valid in python, r, scala, and presto and so on... so this PR
ONLY forbid the invalid interval value in the form of  '. seconds'.

### Why are the changes needed?

bug fix

### Does this PR introduce any user-facing change?

yes, now we treat '. second' .... as invalid intervals
### How was this patch tested?

add ut

Closes #26573 from yaooqinn/SPARK-29926.

Authored-by: Kent Yao <yaooqinn@hotmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2019-11-19 21:01:26 +08:00
Wenchen Fan 16134d6d0f [SPARK-29948][SQL] make the default alias consistent between date, timestamp and interval
### What changes were proposed in this pull request?

Update `Literal.sql` to make date, timestamp and interval consistent. They should all use the `TYPE 'value'` format.

### Why are the changes needed?

Make the default alias consistent. For example, without this patch we will see
```
scala> sql("select interval '1 day', date '2000-10-10'").show
+------+-----------------+
|1 days|DATE '2000-10-10'|
+------+-----------------+
|1 days|       2000-10-10|
+------+-----------------+
```

### Does this PR introduce any user-facing change?

no

### How was this patch tested?

existing tests

Closes #26579 from cloud-fan/sql.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2019-11-19 15:37:35 +08:00
Terry Kim 3d45779b68 [SPARK-29728][SQL] Datasource V2: Support ALTER TABLE RENAME TO
### What changes were proposed in this pull request?

This PR adds `ALTER TABLE a.b.c RENAME TO x.y.x` support for V2 catalogs.

### Why are the changes needed?

The current implementation doesn't support this command V2 catalogs.

### Does this PR introduce any user-facing change?

Yes, now the renaming table works for v2 catalogs:
```
scala> spark.sql("SHOW TABLES IN testcat.ns1.ns2").show
+---------+---------+
|namespace|tableName|
+---------+---------+
|  ns1.ns2|      old|
+---------+---------+

scala> spark.sql("ALTER TABLE testcat.ns1.ns2.old RENAME TO testcat.ns1.ns2.new").show

scala> spark.sql("SHOW TABLES IN testcat.ns1.ns2").show
+---------+---------+
|namespace|tableName|
+---------+---------+
|  ns1.ns2|      new|
+---------+---------+
```
### How was this patch tested?

Added unit tests.

Closes #26539 from imback82/rename_table.

Authored-by: Terry Kim <yuminkim@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2019-11-19 12:03:29 +08:00
HyukjinKwon 882f54b0a3 [SPARK-29870][SQL][FOLLOW-UP] Keep CalendarInterval's toString
### What changes were proposed in this pull request?

This is a followup of https://github.com/apache/spark/pull/26418. This PR removed `CalendarInterval`'s `toString` with an unfinished changes.

### Why are the changes needed?

1. Ideally we should make each PR isolated and separate targeting one issue without touching unrelated codes.

2. There are some other places where the string formats were exposed to users. For example:

    ```scala
    scala> sql("select interval 1 days as a").selectExpr("to_csv(struct(a))").show()
    ```
    ```
    +--------------------------+
    |to_csv(named_struct(a, a))|
    +--------------------------+
    |      "CalendarInterval...|
    +--------------------------+
    ```

3.  Such fixes:

    ```diff
     private def writeMapData(
        map: MapData, mapType: MapType, fieldWriter: ValueWriter): Unit = {
      val keyArray = map.keyArray()
    + val keyString = mapType.keyType match {
    +   case CalendarIntervalType =>
    +    (i: Int) => IntervalUtils.toMultiUnitsString(keyArray.getInterval(i))
    +   case _ => (i: Int) => keyArray.get(i, mapType.keyType).toString
    + }
    ```

    can cause performance regression due to type dispatch for each map.

### Does this PR introduce any user-facing change?

Yes, see 2. case above.

### How was this patch tested?

Manually tested.

Closes #26572 from HyukjinKwon/SPARK-29783.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2019-11-19 09:11:41 +09:00
HyukjinKwon 8469614c05 [SPARK-25694][SQL][FOLLOW-UP] Move 'spark.sql.defaultUrlStreamHandlerFactory.enabled' into StaticSQLConf.scala
### What changes were proposed in this pull request?

This PR is a followup of https://github.com/apache/spark/pull/26530 and proposes to move the configuration `spark.sql.defaultUrlStreamHandlerFactory.enabled` to `StaticSQLConf.scala` for consistency.

### Why are the changes needed?

To put the similar configurations together and for readability.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Manually tested as described in https://github.com/apache/spark/pull/26530.

Closes #26570 from HyukjinKwon/SPARK-25694.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2019-11-19 09:08:20 +09:00
Kent Yao ae6b711b26 [SPARK-29941][SQL] Add ansi type aliases for char and decimal
### What changes were proposed in this pull request?

Checked with SQL Standard and PostgreSQL

> CHAR is equivalent to CHARACTER. DEC is equivalent to DECIMAL. INT is equivalent to INTEGER. VARCHAR is equivalent to CHARACTER VARYING. ...

```sql
postgres=# select dec '1.0';
numeric
---------
1.0
(1 row)

postgres=# select CHARACTER '. second';
  bpchar
----------
 . second
(1 row)

postgres=# select CHAR '. second';
  bpchar
----------
 . second
(1 row)
```

### Why are the changes needed?

For better ansi support
### Does this PR introduce any user-facing change?

yes, we add character as char and dec as decimal

### How was this patch tested?

add ut

Closes #26574 from yaooqinn/SPARK-29941.

Authored-by: Kent Yao <yaooqinn@hotmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2019-11-18 23:30:31 +08:00
fuwhu c32e228689 [SPARK-29859][SQL] ALTER DATABASE (SET LOCATION) should look up catalog like v2 commands
### What changes were proposed in this pull request?
Add AlterNamespaceSetLocationStatement, AlterNamespaceSetLocation, AlterNamespaceSetLocationExec to make ALTER DATABASE (SET LOCATION) look up catalog like v2 commands.
And also refine the code of AlterNamespaceSetProperties, AlterNamespaceSetPropertiesExec, DescribeNamespace, DescribeNamespaceExec to use SupportsNamespaces instead of CatalogPlugin for catalog parameter.

### Why are the changes needed?
It's important to make all the commands have the same catalog/namespace resolution behavior, to avoid confusing end-users.

### Does this PR introduce any user-facing change?
Yes, add "ALTER NAMESPACE ... SET LOCATION" whose function is same as "ALTER DATABASE ... SET LOCATION" and "ALTER SCHEMA ... SET LOCATION".

### How was this patch tested?
New unit tests

Closes #26562 from fuwhu/SPARK-29859.

Authored-by: fuwhu <bestwwg@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2019-11-18 20:40:23 +08:00
Kent Yao 50f6d930da [SPARK-29870][SQL] Unify the logic of multi-units interval string to CalendarInterval
### What changes were proposed in this pull request?

We now have two different implementation for multi-units interval strings to CalendarInterval type values.

One is used to covert interval string literals to CalendarInterval. This approach will re-delegate the interval string to spark parser which handles the string as a `singleInterval` -> `multiUnitsInterval` -> eventually call `IntervalUtils.fromUnitStrings`

The other is used in `Cast`, which eventually calls `IntervalUtils.stringToInterval`. This approach is ~10 times faster than the other.

We should unify these two for better performance and simple logic. this pr uses the 2nd approach.

### Why are the changes needed?

We should unify these two for better performance and simple logic.

### Does this PR introduce any user-facing change?

no

### How was this patch tested?

we shall not fail on existing uts

Closes #26491 from yaooqinn/SPARK-29870.

Authored-by: Kent Yao <yaooqinn@hotmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2019-11-18 15:50:06 +08:00