ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Kent Yao	f926809a1f	[SPARK-29390][SQL] Add the justify_days(), justify_hours() and justif_interval() functions ### What changes were proposed in this pull request? Add 3 interval functions justify_days, justify_hours, justif_interval to support justify interval values ### Why are the changes needed? For feature parity with postgres add three interval functions to justify interval values. justify_days(interval) \| interval \| Adjust interval so 30-day time periods are represented as months \| justify_days(interval '35 days') \| 1 mon 5 days -- \| -- \| -- \| -- \| -- justify_hours(interval) \| interval \| Adjust interval so 24-hour time periods are represented as days \| justify_hours(interval '27 hours') \| 1 day 03:00:00 justify_interval(interval) \| interval \| Adjust interval using justify_days and justify_hours, with additional sign adjustments \| justify_interval(interval '1 mon -1 hour') \| 29 days 23:00:00 ### Does this PR introduce any user-facing change? yes. new interval functions are added ### How was this patch tested? add ut Closes #26465 from yaooqinn/SPARK-29390. Authored-by: Kent Yao <yaooqinn@hotmail.com> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>	2019-11-13 15:04:39 +09:00
HyukjinKwon	80fbc382a6	Revert "[SPARK-29462] The data type of "array()" should be array<null>" This reverts commit `0dcd739534`.	2019-11-13 13:12:20 +09:00
angerszhu	eb79af8dae	[SPARK-29145][SQL][FOLLOW-UP] Move tests from `SubquerySuite` to `subquery/in-subquery/in-joins.sql` ### What changes were proposed in this pull request? Follow comment of https://github.com/apache/spark/pull/25854#discussion_r342383272 ### Why are the changes needed? NO ### Does this PR introduce any user-facing change? NO ### How was this patch tested? ADD TEST CASE Closes #26406 from AngersZhuuuu/SPARK-29145-FOLLOWUP. Authored-by: angerszhu <angers.zhu@gmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-11-12 17:34:03 -08:00
Ankitraj	45e212e161	[SPARK-29570][WEBUI] Improve tooltip for Executor Tab for Shuffle Write,Blacklisted,Logs,Threaddump columns ### What changes were proposed in this pull request? All tooltips message will display in centre. ### Why are the changes needed? Some time tooltips will hide the data of column and tooltips display position will be inconsistent in UI. ### Does this PR introduce any user-facing change? yes. ![Screenshot 2019-10-26 at 3 08 51 AM](https://user-images.githubusercontent.com/8948111/67606124-04dd0d80-f79e-11e9-865a-b7e9bffc9890.png) ### How was this patch tested? Manual test. Closes #26263 from 07ARB/SPARK-29570. Lead-authored-by: Ankitraj <8948111+07ARB@users.noreply.github.com> Co-authored-by: 07ARB <ankitrajboudh@gmail.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2019-11-12 18:49:54 -06:00
Wenchen Fan	030e5d987e	[SPARK-29789][SQL] should not parse the bucket column name when creating v2 tables ### What changes were proposed in this pull request? When creating v2 expressions, we have public java APIs, as well as interval scala APIs. All of these APIs take a string column name and parse it to `NamedReference`. This is convenient for end-users, but not for interval development. For example, the query plan already contains the parsed partition/bucket column names, and it's tricky if we need to quote the names before creating v2 expressions. This PR proposes to change the interval scala APIs to take `NamedReference` directly, with a new method to create `NamedReference` with the exact name parts. The public java APIs are not changed. ### Why are the changes needed? fix a bug, and make it easier to create v2 expressions correctly in the future. ### Does this PR introduce any user-facing change? yes, now v2 CREATE TABLE works as expected. ### How was this patch tested? a new test Closes #26425 from cloud-fan/extract. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Ryan Blue <blue@apache.org>	2019-11-12 12:25:45 -08:00
Wenchen Fan	414cade011	[SPARK-29850][SQL] sort-merge-join an empty table should not memory leak ### What changes were proposed in this pull request? When whole stage codegen `HashAggregateExec`, create the hash map when we begin to process inputs. ### Why are the changes needed? Sort-merge join completes directly if the left side table is empty. If there is an aggregate in the right side, the aggregate will not be triggered at all, but its hash map is created during codegen and can't be released. ### Does this PR introduce any user-facing change? No ### How was this patch tested? a new test Closes #26471 from cloud-fan/memory. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2019-11-13 01:00:30 +08:00
Kent Yao	d99398e9f5	[SPARK-29855][SQL] typed literals with negative sign with proper result or exception ### What changes were proposed in this pull request? ```sql -- !query 83 select -integer '7' -- !query 83 schema struct<7:int> -- !query 83 output 7 -- !query 86 select -date '1999-01-01' -- !query 86 schema struct<DATE '1999-01-01':date> -- !query 86 output 1999-01-01 -- !query 87 select -timestamp '1999-01-01' -- !query 87 schema struct<TIMESTAMP('1999-01-01 00:00:00'):timestamp> -- !query 87 output 1999-01-01 00:00:00 ``` the integer should be -7 and the date and timestamp results are confusing which should throw exceptions ### Why are the changes needed? bug fix ### Does this PR introduce any user-facing change? NO ### How was this patch tested? ADD UTs Closes #26479 from yaooqinn/SPARK-29855. Authored-by: Kent Yao <yaooqinn@hotmail.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2019-11-12 23:53:07 +09:00
Pablo Langa	37e387a22d	[SPARK-29519][SQL] SHOW TBLPROPERTIES should do multi-catalog resolution ### What changes were proposed in this pull request? Add ShowTablePropertiesStatement and make SHOW TBLPROPERTIES go through the same catalog/table resolution framework of v2 commands. ### Why are the changes needed? It's important to make all the commands have the same table resolution behavior, to avoid confusing end-users. e.g. USE my_catalog DESC t // success and describe the table t from my_catalog SHOW TBLPROPERTIES t // report table not found as there is no table t in the session catalog ### Does this PR introduce any user-facing change? yes. When running SHOW TBLPROPERTIES Spark fails the command if the current catalog is set to a v2 catalog, or the table name specified a v2 catalog. ### How was this patch tested? Unit tests. Closes #26176 from planga82/feature/SPARK-29519_SHOW_TBLPROPERTIES_datasourceV2. Authored-by: Pablo Langa <soypab@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2019-11-12 13:31:28 +08:00
Jungtaek Lim (HeartSaVioR)	c941362cb9	[SPARK-26154][SS] Streaming left/right outer join should not return outer nulls for already matched rows ### What changes were proposed in this pull request? This patch fixes the edge case of streaming left/right outer join described below: Suppose query is provided as `select * from A join B on A.id = B.id AND (A.ts <= B.ts AND B.ts <= A.ts + interval 5 seconds)` and there're two rows for L1 (from A) and R1 (from B) which ensures L1.id = R1.id and L1.ts = R1.ts. (we can simply imagine it from self-join) Then Spark processes L1 and R1 as below: - row L1 and row R1 are joined at batch 1 - row R1 is evicted at batch 2 due to join and watermark condition, whereas row L1 is not evicted - row L1 is evicted at batch 3 due to join and watermark condition When determining outer rows to match with null, Spark applies some assumption commented in codebase, as below: ``` Checking whether the current row matches a key in the right side state, and that key has any value which satisfies the filter function when joined. If it doesn't, we know we can join with null, since there was never (including this batch) a match within the watermark period. If it does, there must have been a match at some point, so we know we can't join with null. ``` But as explained the edge-case earlier, the assumption is not correct. As we don't have any good assumption to optimize which doesn't have edge-case, we have to track whether such row is matched with others before, and match with null row only when the row is not matched. To track the matching of row, the patch adds a new state to streaming join state manager, and mark whether the row is matched to others or not. We leverage the information when dealing with eviction of rows which would be candidates to match with null rows. This approach introduces new state format which is not compatible with old state format - queries with old state format will be still running but they will still have the issue and be required to discard checkpoint and rerun to take this patch in effect. ### Why are the changes needed? This patch fixes a correctness issue. ### Does this PR introduce any user-facing change? No for compatibility viewpoint, but we'll encourage end users to discard the old checkpoint and rerun the query if they run stream-stream outer join query with old checkpoint, which might be "yes" for the question. ### How was this patch tested? Added UT which fails on current Spark and passes with this patch. Also passed existing streaming join UTs. Closes #26108 from HeartSaVioR/SPARK-26154-shorten-alternative. Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com> Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>	2019-11-11 15:47:17 -08:00
Marcelo Vanzin	9753a8e330	[SPARK-29766][SQL] Do metrics aggregation asynchronously in SQL listener This unblocks the event handling thread, which should help avoid dropped events when large queries are running. Existing unit tests should already cover this code. Closes #26405 from vanzin/SPARK-29766. Authored-by: Marcelo Vanzin <vanzin@cloudera.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-11-11 14:20:34 -08:00
DB Tsai	a6a2748585	[SPARK-29805][SQL] Enable nested schema pruning and nested pruning on expressions by default ### What changes were proposed in this pull request? Enable nested schema pruning and nested pruning on expressions by default. We have been using those features in production in Apple for couple months with great success. For some jobs, we reduce the data reading by more than 8x and 21x faster in wall clock time. ### Why are the changes needed? Better performance. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Existing tests. Closes #26443 from dbtsai/enableNestedSchemaPrunning. Authored-by: DB Tsai <d_tsai@apple.com> Signed-off-by: DB Tsai <d_tsai@apple.com>	2019-11-11 19:11:05 +00:00
Takeshi Yamamuro	cceb2d6f11	[SPARK-29825][SQL][TESTS] Add join-related configs in `inner-join.sql` and `postgreSQL/join.sql` ### What changes were proposed in this pull request? For better test coverage, this pr is to add join-related configs in `inner-join.sql` and `postgreSQL/join.sql`. These join related configs were just copied from ones in the other join-related tests in `SQLQueryTestSuite` (e.g., https://github.com/apache/spark/blob/master/sql/core/src/test/resources/sql-tests/inputs/natural-join.sql#L2-L4). ### Why are the changes needed? Better test coverage. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Existing tests. Closes #26459 from maropu/AddJoinConds. Authored-by: Takeshi Yamamuro <yamamuro@apache.org> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-11-11 10:21:33 -08:00
Kent Yao	d06a9cc4bd	[SPARK-29822][SQL] Fix cast error when there are white spaces between signs and values ### What changes were proposed in this pull request? With the latest string to literal optimization https://github.com/apache/spark/pull/26256, some interval strings can not be cast when there are some spaces between signs and unit values. After state `PARSE_SIGN`, it directly goes to `PARSE_UNIT_VALUE` when takes a space character as the end. So when there are some white spaces come before the real unit value, it fails to parse, we should add a new state like `TRIM_VALUE` to trim all these spaces. How to re-produce, which aim the revisions since https://github.com/apache/spark/pull/26256 is merged ```sql select cast(v as interval) from values ('+ 1 second') t(v); select cast(v as interval) from values ('- 1 second') t(v); ``` ### Why are the changes needed? bug fix ### Does this PR introduce any user-facing change? no ### How was this patch tested? 1. ut 2. new benchmark test Closes #26449 from yaooqinn/SPARK-29605. Authored-by: Kent Yao <yaooqinn@hotmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2019-11-11 21:53:33 +08:00
lajin	4de7131cff	[SPARK-29421][SQL] Supporting Create Table Like Using Provider ### What changes were proposed in this pull request? Hive support STORED AS new file format syntax: ```sql CREATE TABLE tbl(a int) STORED AS TEXTFILE; CREATE TABLE tbl2 LIKE tbl STORED AS PARQUET; ``` We add a similar syntax for Spark. Here we separate to two features: 1. specify a different table provider in CREATE TABLE LIKE 2. Hive compatibility In this PR, we address the first one: - [ ] Using `USING provider` to specify a different table provider in CREATE TABLE LIKE. - [ ] Using `STORED AS file_format` in CREATE TABLE LIKE to address Hive compatibility. ### Why are the changes needed? Use CREATE TABLE tb1 LIKE tb2 command to create an empty table tb1 based on the definition of table tb2. The most user case is to create tb1 with the same schema of tb2. But an inconvenient case here is this command also copies the FileFormat from tb2, it cannot change the input/output format and serde. Add the ability of changing file format is useful for some scenarios like upgrading a table from a low performance file format to a high performance one (parquet, orc). ### Does this PR introduce any user-facing change? Add a new syntax based on current CTL: ```sql CREATE TABLE tbl2 LIKE tbl [USING parquet]; ``` ### How was this patch tested? Modify some exist UTs. Closes #26097 from LantaoJin/SPARK-29421. Authored-by: lajin <lajin@ebay.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2019-11-11 15:25:56 +08:00
Maxim Gekk	18440151b0	[SPARK-29393][SQL] Add `make_interval` function ### What changes were proposed in this pull request? In the PR, I propose new expression `MakeInterval` and register it as the function `make_interval`. The function accepts the following parameters: - `years` - the number of years in the interval, positive or negative. The parameter is multiplied by 12, and added to interval's `months`. - `months` - the number of months in the interval, positive or negative. - `weeks` - the number of months in the interval, positive or negative. The parameter is multiplied by 7, and added to interval's `days`. - `hours`, `mins` - the number of hours and minutes. The parameters can be negative or positive. They are converted to microseconds and added to interval's `microseconds`. - `seconds` - the number of seconds with the fractional part in microseconds precision. It is converted to microseconds, and added to total interval's `microseconds` as `hours` and `minutes`. For example: ```sql spark-sql> select make_interval(2019, 11, 1, 1, 12, 30, 01.001001); 2019 years 11 months 8 days 12 hours 30 minutes 1.001001 seconds ``` ### Why are the changes needed? - To improve user experience with Spark SQL, and allow users making `INTERVAL` columns from other columns containing `years`, `months` ... `seconds`. Currently, users can make an `INTERVAL` column from other columns only by constructing a `STRING` column and cast it to `INTERVAL`. Have a look at the `IntervalBenchmark` as an example. - To maintain feature parity with PostgreSQL which provides such function: ```sql # SELECT make_interval(2019, 11); make_interval -------------------- 2019 years 11 mons ``` ### Does this PR introduce any user-facing change? No ### How was this patch tested? - By new tests for the `MakeInterval` expression to `IntervalExpressionsSuite` - By tests in `interval.sql` Closes #26446 from MaxGekk/make_interval. Authored-by: Maxim Gekk <max.gekk@gmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-11-10 14:34:52 -08:00
Pavithra Ramachandran	e2ca7f396f	[SPARK-29601][WEBUI] JDBC ODBC Tab Statement column provide ellipsis for big SQL statement ### What changes were proposed in this pull request? Provide Ellipses in Statement column , just like description in Jobs page . ### Why are the changes needed? When a query is executed the whole query statement is displayed no matter how big it is. When bigger queries are executed, it covers a large portion of the page display, when we have multiple queries it is difficult to scroll down to view all. ### Does this PR introduce any user-facing change? No Before: ![Screenshot from 2019-11-01 23-15-23](https://user-images.githubusercontent.com/51401130/68064468-ebaa0300-fd41-11e9-8787-c5144c1468d4.png) After: ![Screenshot from 2019-11-02 07-07-21](https://user-images.githubusercontent.com/51401130/68064471-f19fe400-fd41-11e9-85c6-65f0faa64cc3.png) ### How was this patch tested? Manual Closes #26364 from PavithraRamachandran/ellipse_JDBC. Authored-by: Pavithra Ramachandran <pavi.rams@gmail.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2019-11-10 13:08:26 -06:00
Maxim Gekk	d4de01f567	[SPARK-29408][SQL] Support `-` before `interval` in interval literals ### What changes were proposed in this pull request? - `SqlBase.g4` is modified to support a negative sign `-` in the interval type constructor from a string and in interval literals - Negate interval in `AstBuilder` if a sign presents. - Interval related SQL statements are moved from `inputs/datetime.sql` to new file `inputs/interval.sql` For example: ```sql spark-sql> select -interval '-1 month 1 day -1 second'; 1 months -1 days 1 seconds spark-sql> select -interval -1 month 1 day -1 second; 1 months -1 days 1 seconds ``` ### Why are the changes needed? For feature parity with PostgreSQL which supports that: ```sql # select -interval '-1 month 1 day -1 second'; ?column? ------------------------- 1 mon -1 days +00:00:01 (1 row) ``` ### Does this PR introduce any user-facing change? No ### How was this patch tested? - Added tests to `ExpressionParserSuite` - by `interval.sql` Closes #26438 from MaxGekk/negative-interval. Authored-by: Maxim Gekk <max.gekk@gmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-11-10 10:10:04 -08:00
Maxim Gekk	7ddcb5b46d	[SPARK-29819][SQL] Introduce an enum for interval units ### What changes were proposed in this pull request? In the PR, I propose an enumeration for interval units with the value `YEAR`, `MONTH`, `WEEK`, `DAY`, `HOUR`, `MINUTE`, `SECOND`, `MILLISECOND`, `MICROSECOND` and `NANOSECOND`. ### Why are the changes needed? - This should prevent typos in interval unit names - Stronger type checking of unit parameters. ### Does this PR introduce any user-facing change? No ### How was this patch tested? By existing test suites `ExpressionParserSuite` and `IntervalUtilsSuite` Closes #26455 from MaxGekk/interval-unit-enum. Authored-by: Maxim Gekk <max.gekk@gmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-11-10 08:41:55 -08:00
Huaxin Gao	57b954e825	[SPARK-29730][SQL] ALTER VIEW QUERY should look up catalog/table like v2 commands Add AlterViewAsStatement and make ALTER VIEW ... QUERY go through the same catalog/table resolution framework of v2 commands. It's important to make all the commands have the same table resolution behavior, to avoid confusing end-users. e.g. ``` USE my_catalog DESC v // success and describe the view v from my_catalog ALTER VIEW v SELECT 1 // report view not found as there is no view v in the session catalog ``` Yes. When running ALTER VIEW ... QUERY, Spark fails the command if the current catalog is set to a v2 catalog, or the view name specified a v2 catalog. unit tests Closes #26453 from huaxingao/spark-29730. Authored-by: Huaxin Gao <huaxing@us.ibm.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-11-09 17:06:09 -08:00
Xiao Li	1e2d76e80a	[HOT-FIX] Fix the SQLBase.g4 ### What changes were proposed in this pull request? Remove the duplicate code See the build failure: https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-compile-maven-hadoop-3.2/986/ ### Why are the changes needed? Fix the compilation ### Does this PR introduce any user-facing change? No ### How was this patch tested? The existing tests Closes #26445 from gatorsmile/hotfixPraser. Authored-by: Xiao Li <gatorsmile@gmail.com> Signed-off-by: Xiao Li <gatorsmile@gmail.com>	2019-11-08 22:39:07 -08:00
xy_xin	7cfd589868	[SPARK-28893][SQL] Support MERGE INTO in the parser and add the corresponding logical plan ### What changes were proposed in this pull request? This PR supports MERGE INTO in the parser and add the corresponding logical plan. The SQL syntax likes, ``` MERGE INTO [ds_catalog.][multi_part_namespaces.]target_table [AS target_alias] USING [ds_catalog.][multi_part_namespaces.]source_table \| subquery [AS source_alias] ON <merge_condition> [ WHEN MATCHED [ AND <condition> ] THEN <matched_action> ] [ WHEN MATCHED [ AND <condition> ] THEN <matched_action> ] [ WHEN NOT MATCHED [ AND <condition> ] THEN <not_matched_action> ] ``` where ``` <matched_action> = DELETE \| UPDATE SET * \| UPDATE SET column1 = value1 [, column2 = value2 ...] <not_matched_action> = INSERT * \| INSERT (column1 [, column2 ...]) VALUES (value1 [, value2 ...]) ``` ### Why are the changes needed? This is a start work for introduce `MERGE INTO` support for the builtin datasource, and the design work for the `MERGE INTO` support in DSV2. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? New test cases. Closes #26167 from xianyinxin/SPARK-28893. Authored-by: xy_xin <xianyin.xxy@alibaba-inc.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2019-11-09 11:45:24 +08:00
Liang-Chi Hsieh	70987d8144	[SPARK-29680][SQL][FOLLOWUP] Replace qualifiedName with multipartIdentifier ### What changes were proposed in this pull request? Replace qualifiedName with multipartIdentifier in parser rules of DDL commands. ### Why are the changes needed? There are identifiers in some DDL rules we use `qualifiedName`. We should use `multipartIdentifier` because it can capture wrong identifiers such as `test-table`, `test-col`. ### Does this PR introduce any user-facing change? Yes. Wrong identifiers such as test-table, will be captured now after this change. ### How was this patch tested? Unit tests. Closes #26419 from viirya/SPARK-29680-followup2. Lead-authored-by: Liang-Chi Hsieh <viirya@gmail.com> Co-authored-by: Liang-Chi Hsieh <liangchi@uber.com> Signed-off-by: Liang-Chi Hsieh <liangchi@uber.com>	2019-11-08 14:18:06 -08:00
Kent Yao	e026412d9c	[SPARK-29679][SQL] Make interval type comparable and orderable ### What changes were proposed in this pull request? interval type support >, >=, <, <=, =, <=>, order by, min,max.. ### Why are the changes needed? Part of SPARK-27764 Feature Parity between PostgreSQL and Spark ### Does this PR introduce any user-facing change? yes, we now support compare intervals ### How was this patch tested? add ut Closes #26337 from yaooqinn/SPARK-29679. Authored-by: Kent Yao <yaooqinn@hotmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2019-11-08 22:45:11 +08:00
Kent Yao	e7f7990bc3	[SPARK-29688][SQL] Support average for interval type values ### What changes were proposed in this pull request? avg aggregate support interval type values ### Why are the changes needed? Part of SPARK-27764 Feature Parity between PostgreSQL and Spark ### Does this PR introduce any user-facing change? yes, we can do avg on intervals ### How was this patch tested? add ut Closes #26347 from yaooqinn/SPARK-29688. Authored-by: Kent Yao <yaooqinn@hotmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2019-11-08 21:55:07 +08:00
davidvrba	afc943ff8a	[SPARK-28477][SQL] Rewrite CaseWhen with single branch to If ### What changes were proposed in this pull request? Spark org.apache.spark.sql.functions do not have `if` function so conditions are expressed using `when-otherwise` function. However `If` (which is available in SQL) has more efficient code gen. This pr rewrites `when-otherwise` conditions to `If` if it is possible (`when-otherwise` with single branch) ### Why are the changes needed? It is an optimization enhancement. Here is a simple performance comparison (tested in local mode (with 4 cores)): ``` val df = spark.range(10000000000L).withColumn("x", rand) val resultA = df.withColumn("r", when($"x" < 0.5, lit(1)).otherwise(lit(0))).agg(sum($"r")) val resultB = df.withColumn("r", expr("if(x < 0.5, 1, 0)")).agg(sum($"r")) resultA.collect() // takes 56s to finish resultB.collect() // takes 30s to finish ``` ### Does this PR introduce any user-facing change? No ### How was this patch tested? New test is added. Closes #26294 from davidvrba/spark-28477_rewriteCaseWhenToIf. Authored-by: davidvrba <vrba.dave@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2019-11-08 21:25:48 +08:00
ulysses	7759f7179c	[SPARK-29772][TESTS][SQL] Add withNamespace in SQLTestUtils ### What changes were proposed in this pull request? V2 catalog support namespace, we should add `withNamespace` like `withDatabase`. ### Why are the changes needed? Make test easy. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Add UT. Closes #26411 from ulysses-you/Add-test-with-namespace. Authored-by: ulysses <youxiduo@weidian.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2019-11-08 11:53:44 +08:00
Kent Yao	0a03839366	[SPARK-29787][SQL] Move methods add/subtract/negate from CalendarInterval to IntervalUtils ### What changes were proposed in this pull request? Move method add/subtract/negate from CalendarInterval to IntervalUtils ### Why are the changes needed? https://github.com/apache/spark/pull/26410#discussion_r343125468 suggested here ### Does this PR introduce any user-facing change? no ### How was this patch tested? add uts and move some Closes #26423 from yaooqinn/SPARK-29787. Authored-by: Kent Yao <yaooqinn@hotmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2019-11-08 10:28:58 +08:00
Dongjoon Hyun	da848b1897	[SPARK-29796][SQL][TESTS] `HiveExternalCatalogVersionsSuite` should ignore preview release ### What changes were proposed in this pull request? This aims to exclude the `preview` release to recover `HiveExternalCatalogVersionsSuite`. Currently, new preview release breaks `branch-2.4` PRBuilder since yesterday. New release (especially `preview`) should not affect `branch-2.4`. - https://github.com/apache/spark/pull/26417 (Failed 4 times) ### Why are the changes needed? BEFORE ```scala scala> scala.io.Source.fromURL("https://dist.apache.org/repos/dist/release/spark/").mkString.split("\n").filter(_.contains("""<li><a href="spark-""")).map("""<a href="spark-(\d.\d.\d)/">""".r.findFirstMatchIn(_).get.group(1)) java.util.NoSuchElementException: None.get ``` AFTER ```scala scala> scala.io.Source.fromURL("https://dist.apache.org/repos/dist/release/spark/").mkString.split("\n").filter(_.contains("""<li><a href="spark-""")).filterNot(_.contains("preview")).map("""<a href="spark-(\d.\d.\d)/">""".r.findFirstMatchIn(_).get.group(1)) res5: Array[String] = Array(2.3.4, 2.4.4) ``` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This should pass the PRBuilder. Closes #26428 from dongjoon-hyun/SPARK-HiveExternalCatalogVersionsSuite. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-11-07 10:28:32 -08:00
Kent Yao	9562b26914	[SPARK-29757][SQL] Move calendar interval constants together ### What changes were proposed in this pull request? ```java public static final int YEARS_PER_DECADE = 10; public static final int YEARS_PER_CENTURY = 100; public static final int YEARS_PER_MILLENNIUM = 1000; public static final byte MONTHS_PER_QUARTER = 3; public static final int MONTHS_PER_YEAR = 12; public static final byte DAYS_PER_WEEK = 7; public static final long DAYS_PER_MONTH = 30L; public static final long HOURS_PER_DAY = 24L; public static final long MINUTES_PER_HOUR = 60L; public static final long SECONDS_PER_MINUTE = 60L; public static final long SECONDS_PER_HOUR = MINUTES_PER_HOUR * SECONDS_PER_MINUTE; public static final long SECONDS_PER_DAY = HOURS_PER_DAY * SECONDS_PER_HOUR; public static final long MILLIS_PER_SECOND = 1000L; public static final long MILLIS_PER_MINUTE = SECONDS_PER_MINUTE * MILLIS_PER_SECOND; public static final long MILLIS_PER_HOUR = MINUTES_PER_HOUR * MILLIS_PER_MINUTE; public static final long MILLIS_PER_DAY = HOURS_PER_DAY * MILLIS_PER_HOUR; public static final long MICROS_PER_MILLIS = 1000L; public static final long MICROS_PER_SECOND = MILLIS_PER_SECOND * MICROS_PER_MILLIS; public static final long MICROS_PER_MINUTE = SECONDS_PER_MINUTE * MICROS_PER_SECOND; public static final long MICROS_PER_HOUR = MINUTES_PER_HOUR * MICROS_PER_MINUTE; public static final long MICROS_PER_DAY = HOURS_PER_DAY * MICROS_PER_HOUR; public static final long MICROS_PER_MONTH = DAYS_PER_MONTH * MICROS_PER_DAY; /* 365.25 days per year assumes leap year every four years / public static final long MICROS_PER_YEAR = (36525L MICROS_PER_DAY) / 100; public static final long NANOS_PER_MICROS = 1000L; public static final long NANOS_PER_MILLIS = MICROS_PER_MILLIS * NANOS_PER_MICROS; public static final long NANOS_PER_SECOND = MILLIS_PER_SECOND * NANOS_PER_MILLIS; ``` The above parameters are defined in IntervalUtils, DateTimeUtils, and CalendarInterval, some of them are redundant, some of them are cross-referenced. ### Why are the changes needed? To simplify code, enhance consistency and reduce risks ### Does this PR introduce any user-facing change? no ### How was this patch tested? modified uts Closes #26399 from yaooqinn/SPARK-29757. Authored-by: Kent Yao <yaooqinn@hotmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2019-11-07 19:48:19 +08:00
Wenchen Fan	9b61f90987	[SPARK-29761][SQL] do not output leading 'interval' in CalendarInterval.toString ### What changes were proposed in this pull request? remove the leading "interval" in `CalendarInterval.toString`. ### Why are the changes needed? Although it's allowed to have "interval" prefix when casting string to int, it's not recommended. This is also consistent with pgsql: ``` cloud0fan=# select interval '1' day; interval ---------- 1 day (1 row) ``` ### Does this PR introduce any user-facing change? yes, when display a dataframe with interval type column, the result is different. ### How was this patch tested? updated tests. Closes #26401 from cloud-fan/interval. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2019-11-07 15:44:50 +08:00
Maxim Gekk	29dc59ac29	[SPARK-29605][SQL] Optimize string to interval casting ### What changes were proposed in this pull request? In the PR, I propose new function `stringToInterval()` in `IntervalUtils` for converting `UTF8String` to `CalendarInterval`. The function is used in casting a `STRING` column to an `INTERVAL` column. ### Why are the changes needed? The proposed implementation is ~10 times faster. For example, parsing 9 interval units on JDK 8: Before: ``` 9 units w/ interval 14004 14125 116 0.1 14003.6 0.0X 9 units w/o interval 13785 14056 290 0.1 13784.9 0.0X ``` After: ``` 9 units w/ interval 1343 1344 1 0.7 1343.0 0.3X 9 units w/o interval 1345 1349 8 0.7 1344.6 0.3X ``` ### Does this PR introduce any user-facing change? No ### How was this patch tested? - By new tests for `stringToInterval` in `IntervalUtilsSuite` - By existing tests Closes #26256 from MaxGekk/string-to-interval. Authored-by: Maxim Gekk <max.gekk@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2019-11-07 12:39:52 +08:00
Kent Yao	3437862975	[SPARK-29387][SQL][FOLLOWUP] Fix issues of the multiply and divide for intervals ### What changes were proposed in this pull request? Handle the inconsistence dividing zeros between literals and columns. fix the null issue too. ### Why are the changes needed? BUG FIX ### 1 Handle the inconsistence dividing zeros between literals and columns ```sql -- !query 24 select k, v, cast(k as interval) / v, cast(k as interval) * v from VALUES ('1 seconds', 1), ('2 seconds', 0), ('3 seconds', null), (null, null), (null, 0) t(k, v) -- !query 24 schema struct<k:string,v:int,divide_interval(CAST(k AS INTERVAL), CAST(v AS DOUBLE)):interval,multiply_interval(CAST(k AS INTERVAL), CAST(v AS DOUBLE)):interval> -- !query 24 output 1 seconds 1 interval 1 seconds interval 1 seconds 2 seconds 0 interval 0 microseconds interval 0 microseconds 3 seconds NULL NULL NULL NULL 0 NULL NULL NULL NULL NULL NULL ``` ```sql -- !query 21 select interval '1 year 2 month' / 0 -- !query 21 schema struct<divide_interval(interval 1 years 2 months, CAST(0 AS DOUBLE)):interval> -- !query 21 output NULL ``` in the first case, interval ’2 seconds ‘ / 0, it produces `interval 0 microseconds ` in the second case, it is `null` ### 2 null literal issues ```sql -- !query 20 select interval '1 year 2 month' / null -- !query 20 schema struct<> -- !query 20 output org.apache.spark.sql.AnalysisException cannot resolve '(interval 1 years 2 months / NULL)' due to data type mismatch: differing types in '(interval 1 years 2 months / NULL)' (interval and null).; line 1 pos 7 -- !query 22 select interval '4 months 2 weeks 6 days' * null -- !query 22 schema struct<> -- !query 22 output org.apache.spark.sql.AnalysisException cannot resolve '(interval 4 months 20 days * NULL)' due to data type mismatch: differing types in '(interval 4 months 20 days * NULL)' (interval and null).; line 1 pos 7 -- !query 23 select null * interval '4 months 2 weeks 6 days' -- !query 23 schema struct<> -- !query 23 output org.apache.spark.sql.AnalysisException cannot resolve '(NULL * interval 4 months 20 days)' due to data type mismatch: differing types in '(NULL * interval 4 months 20 days)' (null and interval).; line 1 pos 7 ``` dividing or multiplying null literals, error occurs; where in column is fine as the first case ### Does this PR introduce any user-facing change? NO, maybe yes, but it is just a follow-up ### How was this patch tested? add uts cc cloud-fan MaxGekk maropu Closes #26410 from yaooqinn/SPARK-29387. Authored-by: Kent Yao <yaooqinn@hotmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2019-11-07 12:19:03 +08:00
Wenchen Fan	1f3863c856	[SPARK-29759][SQL] LocalShuffleReaderExec.outputPartitioning should use the corrected attributes ### What changes were proposed in this pull request? Update `LocalShuffleReaderExec.outputPartitioning` to use attributes from `ReusedQueryStage`. This also removes the override `doCanonicalize` in local/coalesced shuffle reader, as these 2 operators change the output partitioning. It's not safe to strip them in the canonicalized query plan. ### Why are the changes needed? We will have an invalid output partitioning if we don fix it. ### Does this PR introduce any user-facing change? no ### How was this patch tested? existing tests Closes #26400 from cloud-fan/aqe. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Xiao Li <gatorsmile@gmail.com>	2019-11-06 14:33:52 -08:00
Jungtaek Lim (HeartSaVioR)	782992c7ed	[SPARK-29642][SS] Change the element type of underlying array to UnsafeRow for ContinuousRecordEndpoint ### What changes were proposed in this pull request? This patch fixes the bug that `ContinuousMemoryStream[String]` throws error regarding ClassCastException - cast String to UTFString. This is because ContinuousMemoryStream and ContinuousRecordEndpoint uses origin input as it is for underlying data structure of Row, and encoding is missing here. To force encoding, this patch changes the element type of underlying array to UnsafeRow instead of Any for ContinuousRecordEndpoint - ContinuousMemoryStream and TextSocketContinuousStream are modified to reflect the change. ### Why are the changes needed? Above section describes the bug. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Add new UT to check for availability on couple of types. Closes #26300 from HeartSaVioR/SPARK-29642. Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com> Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>	2019-11-06 10:37:00 -08:00
Wenchen Fan	411015300e	[SPARK-29752][SQL][TEST] make AdaptiveQueryExecSuite more robust ### What changes were proposed in this pull request? instead of checking the exact number of local shuffle readers, we should check whether the number of shuffles is equal to the number of local readers. ### Why are the changes needed? AQE is known to have randomness. We may pick different build side for broadcast join depending on which query stage finishes first. The decision to build side may add/remove shuffles downstream, so it's flaky to check the exact number of local shuffle readers. ### Does this PR introduce any user-facing change? no ### How was this patch tested? test only PR. Closes #26394 from cloud-fan/test. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Xiao Li <gatorsmile@gmail.com>	2019-11-06 10:27:39 -08:00
shahid	90df858a26	[SPARK-29725][SQL][TESTS] Add ThriftServerPageSuite ### What changes were proposed in this pull request? Added UT for the classes `ThriftServerPage.scala` and `ThriftServerSessionPage.scala` ### Why are the changes needed? Currently, there are no UTs for testing Thriftserver UI page ### Does this PR introduce any user-facing change? No ### How was this patch tested? UT Closes #26403 from shahidki31/ut. Authored-by: shahid <shahidki31@gmail.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2019-11-06 20:59:45 +09:00
Aman Omer	0dcd739534	[SPARK-29462] The data type of "array()" should be array<null> ### What changes were proposed in this pull request? During creation of array, if CreateArray does not gets any children to set data type for array, it will create an array of null type . ### Why are the changes needed? When empty array is created, it should be declared as array<null>. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Tested manually Closes #26324 from amanomer/29462. Authored-by: Aman Omer <amanomer1996@gmail.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2019-11-06 18:39:46 +09:00
Liang-Chi Hsieh	6233958ab6	[SPARK-29680][SQL] Remove ALTER TABLE CHANGE COLUMN syntax ### What changes were proposed in this pull request? This patch removes v1 ALTER TABLE CHANGE COLUMN syntax. ### Why are the changes needed? Since in v2 we have ALTER TABLE CHANGE COLUMN and ALTER TABLE RENAME COLUMN, this old syntax is not necessary now and can be confusing. The v2 ALTER TABLE CHANGE COLUMN should fallback to v1 AlterTableChangeColumnCommand (#26354). ### Does this PR introduce any user-facing change? Yes, the old v1 ALTER TABLE CHANGE COLUMN syntax is removed. ### How was this patch tested? Unit tests. Closes #26338 from viirya/SPARK-29680. Authored-by: Liang-Chi Hsieh <viirya@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2019-11-06 10:42:44 +08:00
Takeshi Yamamuro	20b9d8259b	[SPARK-29714][SQL][TESTS] Port insert.sql ### What changes were proposed in this pull request? This PR ports insert.sql from PostgreSQL regression tests https://github.com/postgres/postgres/blob/REL_12_STABLE/src/test/regress/sql/insert.sql The expected results can be found in the link: https://github.com/postgres/postgres/blob/REL_12_STABLE/src/test/regress/expected/insert.out ### Why are the changes needed? To check behaviour differences between Spark and PostgreSQL ### Does this PR introduce any user-facing change? No ### How was this patch tested? Pass the Jenkins. And, Comparison with PgSQL results Closes #26360 from maropu/InsertTest. Authored-by: Takeshi Yamamuro <yamamuro@apache.org> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-11-05 16:44:54 -08:00
Maxim Gekk	4c53ac1822	[SPARK-29387][SQL] Support `` and `/` operators for intervals ### What changes were proposed in this pull request? Added new expressions `MultiplyInterval` and `DivideInterval` to multiply/divide an interval by a numeric. Updated `TypeCoercion.DateTimeOperations` to turn the `Multiply`/`Divide` expressions of `CalendarIntervalType` and `NumericType` to `MultiplyInterval`/`DivideInterval`. To support new operations, added new methods `multiply()` and `divide()` to `CalendarInterval`. ### Why are the changes needed? - To maintain feature parity with PostgreSQL which supports multiplication and division of intervals by doubles: ```sql # select interval '1 hour' / double precision '1.5'; ?column? ---------- 00:40:00 ``` - To conform the SQL standard which defines those operations: `numeric interval`, `interval * numeric` and `interval / numeric`. See [4.5.3 Operations involving datetimes and intervals](http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt). - Improve Spark SQL UX and allow users to adjust interval columns. For example: ```sql spark-sql> select (timestamp'now' - timestamp'yesterday') * 1.3; interval 2 days 10 hours 39 minutes 38 seconds 568 milliseconds 900 microseconds ``` ### Does this PR introduce any user-facing change? Yes, previously the following query fails with the error: ```sql spark-sql> select interval 1 hour 30 minutes * 1.5; Error in query: cannot resolve '(interval 1 hours 30 minutes * 1.5BD)' due to data type mismatch: differing types in '(interval 1 hours 30 minutes * 1.5BD)' (interval and decimal(2,1)).; line 1 pos 7; ``` After: ```sql spark-sql> select interval 1 hour 30 minutes * 1.5; interval 2 hours 15 minutes ``` ### How was this patch tested? - Added tests for the `multiply()` and `divide()` methods to `CalendarIntervalSuite.java` - New test suite `IntervalExpressionsSuite` - by tests for `Multiply` -> `MultiplyInterval` and `Divide` -> `DivideInterval` in `TypeCoercionSuite` - updated `datetime.sql` Closes #26132 from MaxGekk/interval-mul-div. Authored-by: Maxim Gekk <max.gekk@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2019-11-06 00:37:43 +08:00
Takeshi Yamamuro	41be5125a1	[SPARK-29648][SQL][TESTS] Port limit.sql ### What changes were proposed in this pull request? This PR ports limit.sql from PostgreSQL regression tests https://github.com/postgres/postgres/blob/REL_12_STABLE/src/test/regress/sql/limit.sql The expected results can be found in the link: https://github.com/postgres/postgres/blob/REL_12_STABLE/src/test/regress/expected/limit.out ### Why are the changes needed? To check behaviour differences between Spark and PostgreSQL ### Does this PR introduce any user-facing change? No ### How was this patch tested? Pass the Jenkins. And, Comparison with PgSQL results Closes #26311 from maropu/SPARK-29648. Authored-by: Takeshi Yamamuro <yamamuro@apache.org> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-11-04 22:12:27 -08:00
Huaxin Gao	02eecfec99	[SPARK-29695][SQL] ALTER TABLE (SerDe properties) should look up catalog/table like v2 commands ### What changes were proposed in this pull request? Add AlterTableSerDePropertiesStatement and make ALTER TABLE ... SET SERDE/SERDEPROPERTIES go through the same catalog/table resolution framework of v2 commands. ### Why are the changes needed? It's important to make all the commands have the same table resolution behavior, to avoid confusing end-users. e.g. ``` USE my_catalog DESC t // success and describe the table t from my_catalog ALTER TABLE t SET SERDE 'org.apache.class' // report table not found as there is no table t in the session catalog ``` ### Does this PR introduce any user-facing change? Yes. When running ALTER TABLE ... SET SERDE/SERDEPROPERTIES, Spark fails the command if the current catalog is set to a v2 catalog, or the table name specified a v2 catalog. ### How was this patch tested? Unit tests. Closes #26374 from huaxingao/spark_29695. Authored-by: Huaxin Gao <huaxing@us.ibm.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-11-04 21:42:39 -08:00
Terry Kim	66619b84d8	[SPARK-29630][SQL] Disallow creating a permanent view that references a temporary view in an expression ### What changes were proposed in this pull request? Disallow creating a permanent view that references a temporary view in expressions. ### Why are the changes needed? Creating a permanent view that references a temporary view is currently disallowed. For example, ```SQL # The following throws org.apache.spark.sql.AnalysisException # Not allowed to create a permanent view `per_view` by referencing a temporary view `tmp`; CREATE VIEW per_view AS SELECT t1.a, t2.b FROM base_table t1, (SELECT * FROM tmp) t2" ``` However, the following is allowed. ```SQL CREATE VIEW per_view AS SELECT * FROM base_table WHERE EXISTS (SELECT * FROM tmp); ``` This PR fixes the bug where temporary views used inside expressions are not checked. ### Does this PR introduce any user-facing change? Yes. Now the following SQL query throws an exception as expected: ```SQL # The following throws org.apache.spark.sql.AnalysisException # Not allowed to create a permanent view `per_view` by referencing a temporary view `tmp`; CREATE VIEW per_view AS SELECT * FROM base_table WHERE EXISTS (SELECT * FROM tmp); ``` ### How was this patch tested? Added new unit tests. Closes #26361 from imback82/spark-29630. Authored-by: Terry Kim <yuminkim@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2019-11-05 13:19:46 +08:00
Takeshi Yamamuro	942a057934	[SPARK-29696][SQL][TESTS] Port groupingsets.sql ### What changes were proposed in this pull request? This PR ports groupingsets.sql from PostgreSQL regression tests https://github.com/postgres/postgres/blob/REL_12_STABLE/src/test/regress/sql/groupingsets.sql The expected results can be found in the link: https://github.com/postgres/postgres/blob/REL_12_STABLE/src/test/regress/expected/groupingsets.out ### Why are the changes needed? To check behaviour differences between Spark and PostgreSQL ### Does this PR introduce any user-facing change? No ### How was this patch tested? Pass the Jenkins. And, Comparison with PgSQL results Closes #26352 from maropu/GgroupingSets. Authored-by: Takeshi Yamamuro <yamamuro@apache.org> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-11-04 19:06:28 -08:00
Terry Kim	bc65c54f6b	[SPARK-29734][SQL] Datasource V2: Support SHOW CURRENT NAMESPACE ### What changes were proposed in this pull request? This PR introduces a new SQL command: `SHOW CURRENT NAMESPACE`. ### Why are the changes needed? Datasource V2 supports multiple catalogs/namespaces and having `SHOW CURRENT NAMESPACE` to retrieve the current catalog/namespace info would be useful. ### Does this PR introduce any user-facing change? Yes, the user can perform the following: ``` scala> spark.sql("SHOW CURRENT NAMESPACE").show +-------------+---------+ \| catalog\|namespace\| +-------------+---------+ \|spark_catalog\| default\| +-------------+---------+ scala> spark.sql("USE testcat.ns1.ns2").show scala> spark.sql("SHOW CURRENT NAMESPACE").show +-------+---------+ \|catalog\|namespace\| +-------+---------+ \|testcat\| ns1.ns2\| +-------+---------+ ``` ### How was this patch tested? Added unit tests. Closes #26379 from imback82/show_current_catalog. Authored-by: Terry Kim <yuminkim@gmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-11-04 18:05:10 -08:00
Jungtaek Lim (HeartSaVioR)	ba2bc4b0e0	[SPARK-20568][SS] Provide option to clean up completed files in streaming query ## What changes were proposed in this pull request? This patch adds the option to clean up files which are completed in previous batch. `cleanSource` -> "archive" / "delete" / "off" The default value is "off", which Spark will do nothing. If "delete" is specified, Spark will simply delete input files. If "archive" is specified, Spark will require additional config `sourceArchiveDir` which will be used to move input files to there. When archiving (via move) the path of input files are retained to the archived paths as sub-path. Note that it is only applied to "micro-batch", since for batch all input files must be kept to get same result across multiple query executions. ## How was this patch tested? Added UT. Manual test against local disk as well as HDFS. Closes #22952 from HeartSaVioR/SPARK-20568. Lead-authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com> Co-authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan@gmail.com> Co-authored-by: Jungtaek Lim <kabhwan@gmail.com> Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>	2019-11-04 15:16:10 -08:00
yong.tian1	04536b21db	[SPARK-28552][SQL] Case-insensitive database URLs in JdbcDialect ## What changes were proposed in this pull request? This pr proposes to be case insensitive when matching dialects via jdbc url prefix. When I use jdbc url such as: ```jdbc: MySQL://localhost/db``` to query data through sparksql, the result is wrong, but MySQL supports such url writing. because sparksql matches MySQLDialect by prefix ```jdbc:mysql```, so ```jdbc: MySQL``` is not matched with the correct dialect. Therefore, it should be case insensitive when identifying the corresponding dialect through jdbc url https://issues.apache.org/jira/browse/SPARK-28552 ## How was this patch tested? UT. Closes #25287 from teeyog/sql_dialect. Lead-authored-by: yong.tian1 <yong.tian1@dmall.com> Co-authored-by: Xingbo Jiang <xingbo.jiang@databricks.com> Co-authored-by: Chris Martin <chris@cmartinit.co.uk> Co-authored-by: Takeshi Yamamuro <yamamuro@apache.org> Co-authored-by: Dongjoon Hyun <dhyun@apple.com> Co-authored-by: Kent Yao <yaooqinn@hotmail.com> Co-authored-by: teeyog <teeyog@gmail.com> Co-authored-by: Maxim Gekk <max.gekk@gmail.com> Co-authored-by: Ryan Blue <blue@apache.org> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>	2019-11-05 08:15:29 +09:00
Wenchen Fan	326b789340	[SPARK-29743][SQL] sample should set needCopyResult to true if its child is ### What changes were proposed in this pull request? `SampleExec` has a bug that it sets `needCopyResult` to false as long as the `withReplacement` parameter is false. This causes problems if its child needs to copy the result, e.g. a join. ### Why are the changes needed? to fix a correctness issue ### Does this PR introduce any user-facing change? Yes, the result will be corrected. ### How was this patch tested? a new test Closes #26387 from cloud-fan/sample-bug. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-11-04 10:56:37 -08:00
angerszhu	e524a3a223	[SPARK-29742][BUILD] Update checkstyle plugin's check dir scope ### What changes were proposed in this pull request? Current checkstyle checking folder can't cover all folder. Since for support multi version hive, we have some divided hive folder. We should check it too. ### Why are the changes needed? Fix build bug ### Does this PR introduce any user-facing change? NO ### How was this patch tested? NO Closes #26385 from AngersZhuuuu/SPARK-29742. Authored-by: angerszhu <angers.zhu@gmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-11-04 09:08:47 -08:00
Kent Yao	44b8fbcc58	[SPARK-29663][SQL] Support sum with interval type values ### What changes were proposed in this pull request? sum support interval values ### Why are the changes needed? Part of SPARK-27764 Feature Parity between PostgreSQL and Spark ### Does this PR introduce any user-facing change? yes, sum can evaluate intervals ### How was this patch tested? add ut Closes #26325 from yaooqinn/SPARK-29663. Authored-by: Kent Yao <yaooqinn@hotmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2019-11-05 01:05:07 +08:00

1 2 3 4 5 ...

8624 commits