ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
ulysses	7759f7179c	[SPARK-29772][TESTS][SQL] Add withNamespace in SQLTestUtils ### What changes were proposed in this pull request? V2 catalog support namespace, we should add `withNamespace` like `withDatabase`. ### Why are the changes needed? Make test easy. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Add UT. Closes #26411 from ulysses-you/Add-test-with-namespace. Authored-by: ulysses <youxiduo@weidian.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2019-11-08 11:53:44 +08:00
Kent Yao	0a03839366	[SPARK-29787][SQL] Move methods add/subtract/negate from CalendarInterval to IntervalUtils ### What changes were proposed in this pull request? Move method add/subtract/negate from CalendarInterval to IntervalUtils ### Why are the changes needed? https://github.com/apache/spark/pull/26410#discussion_r343125468 suggested here ### Does this PR introduce any user-facing change? no ### How was this patch tested? add uts and move some Closes #26423 from yaooqinn/SPARK-29787. Authored-by: Kent Yao <yaooqinn@hotmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2019-11-08 10:28:58 +08:00
Dongjoon Hyun	da848b1897	[SPARK-29796][SQL][TESTS] `HiveExternalCatalogVersionsSuite` should ignore preview release ### What changes were proposed in this pull request? This aims to exclude the `preview` release to recover `HiveExternalCatalogVersionsSuite`. Currently, new preview release breaks `branch-2.4` PRBuilder since yesterday. New release (especially `preview`) should not affect `branch-2.4`. - https://github.com/apache/spark/pull/26417 (Failed 4 times) ### Why are the changes needed? BEFORE ```scala scala> scala.io.Source.fromURL("https://dist.apache.org/repos/dist/release/spark/").mkString.split("\n").filter(_.contains("""<li><a href="spark-""")).map("""<a href="spark-(\d.\d.\d)/">""".r.findFirstMatchIn(_).get.group(1)) java.util.NoSuchElementException: None.get ``` AFTER ```scala scala> scala.io.Source.fromURL("https://dist.apache.org/repos/dist/release/spark/").mkString.split("\n").filter(_.contains("""<li><a href="spark-""")).filterNot(_.contains("preview")).map("""<a href="spark-(\d.\d.\d)/">""".r.findFirstMatchIn(_).get.group(1)) res5: Array[String] = Array(2.3.4, 2.4.4) ``` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This should pass the PRBuilder. Closes #26428 from dongjoon-hyun/SPARK-HiveExternalCatalogVersionsSuite. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-11-07 10:28:32 -08:00
Kent Yao	9562b26914	[SPARK-29757][SQL] Move calendar interval constants together ### What changes were proposed in this pull request? ```java public static final int YEARS_PER_DECADE = 10; public static final int YEARS_PER_CENTURY = 100; public static final int YEARS_PER_MILLENNIUM = 1000; public static final byte MONTHS_PER_QUARTER = 3; public static final int MONTHS_PER_YEAR = 12; public static final byte DAYS_PER_WEEK = 7; public static final long DAYS_PER_MONTH = 30L; public static final long HOURS_PER_DAY = 24L; public static final long MINUTES_PER_HOUR = 60L; public static final long SECONDS_PER_MINUTE = 60L; public static final long SECONDS_PER_HOUR = MINUTES_PER_HOUR * SECONDS_PER_MINUTE; public static final long SECONDS_PER_DAY = HOURS_PER_DAY * SECONDS_PER_HOUR; public static final long MILLIS_PER_SECOND = 1000L; public static final long MILLIS_PER_MINUTE = SECONDS_PER_MINUTE * MILLIS_PER_SECOND; public static final long MILLIS_PER_HOUR = MINUTES_PER_HOUR * MILLIS_PER_MINUTE; public static final long MILLIS_PER_DAY = HOURS_PER_DAY * MILLIS_PER_HOUR; public static final long MICROS_PER_MILLIS = 1000L; public static final long MICROS_PER_SECOND = MILLIS_PER_SECOND * MICROS_PER_MILLIS; public static final long MICROS_PER_MINUTE = SECONDS_PER_MINUTE * MICROS_PER_SECOND; public static final long MICROS_PER_HOUR = MINUTES_PER_HOUR * MICROS_PER_MINUTE; public static final long MICROS_PER_DAY = HOURS_PER_DAY * MICROS_PER_HOUR; public static final long MICROS_PER_MONTH = DAYS_PER_MONTH * MICROS_PER_DAY; /* 365.25 days per year assumes leap year every four years / public static final long MICROS_PER_YEAR = (36525L MICROS_PER_DAY) / 100; public static final long NANOS_PER_MICROS = 1000L; public static final long NANOS_PER_MILLIS = MICROS_PER_MILLIS * NANOS_PER_MICROS; public static final long NANOS_PER_SECOND = MILLIS_PER_SECOND * NANOS_PER_MILLIS; ``` The above parameters are defined in IntervalUtils, DateTimeUtils, and CalendarInterval, some of them are redundant, some of them are cross-referenced. ### Why are the changes needed? To simplify code, enhance consistency and reduce risks ### Does this PR introduce any user-facing change? no ### How was this patch tested? modified uts Closes #26399 from yaooqinn/SPARK-29757. Authored-by: Kent Yao <yaooqinn@hotmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2019-11-07 19:48:19 +08:00
Wenchen Fan	9b61f90987	[SPARK-29761][SQL] do not output leading 'interval' in CalendarInterval.toString ### What changes were proposed in this pull request? remove the leading "interval" in `CalendarInterval.toString`. ### Why are the changes needed? Although it's allowed to have "interval" prefix when casting string to int, it's not recommended. This is also consistent with pgsql: ``` cloud0fan=# select interval '1' day; interval ---------- 1 day (1 row) ``` ### Does this PR introduce any user-facing change? yes, when display a dataframe with interval type column, the result is different. ### How was this patch tested? updated tests. Closes #26401 from cloud-fan/interval. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2019-11-07 15:44:50 +08:00
Maxim Gekk	29dc59ac29	[SPARK-29605][SQL] Optimize string to interval casting ### What changes were proposed in this pull request? In the PR, I propose new function `stringToInterval()` in `IntervalUtils` for converting `UTF8String` to `CalendarInterval`. The function is used in casting a `STRING` column to an `INTERVAL` column. ### Why are the changes needed? The proposed implementation is ~10 times faster. For example, parsing 9 interval units on JDK 8: Before: ``` 9 units w/ interval 14004 14125 116 0.1 14003.6 0.0X 9 units w/o interval 13785 14056 290 0.1 13784.9 0.0X ``` After: ``` 9 units w/ interval 1343 1344 1 0.7 1343.0 0.3X 9 units w/o interval 1345 1349 8 0.7 1344.6 0.3X ``` ### Does this PR introduce any user-facing change? No ### How was this patch tested? - By new tests for `stringToInterval` in `IntervalUtilsSuite` - By existing tests Closes #26256 from MaxGekk/string-to-interval. Authored-by: Maxim Gekk <max.gekk@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2019-11-07 12:39:52 +08:00
Kent Yao	3437862975	[SPARK-29387][SQL][FOLLOWUP] Fix issues of the multiply and divide for intervals ### What changes were proposed in this pull request? Handle the inconsistence dividing zeros between literals and columns. fix the null issue too. ### Why are the changes needed? BUG FIX ### 1 Handle the inconsistence dividing zeros between literals and columns ```sql -- !query 24 select k, v, cast(k as interval) / v, cast(k as interval) * v from VALUES ('1 seconds', 1), ('2 seconds', 0), ('3 seconds', null), (null, null), (null, 0) t(k, v) -- !query 24 schema struct<k:string,v:int,divide_interval(CAST(k AS INTERVAL), CAST(v AS DOUBLE)):interval,multiply_interval(CAST(k AS INTERVAL), CAST(v AS DOUBLE)):interval> -- !query 24 output 1 seconds 1 interval 1 seconds interval 1 seconds 2 seconds 0 interval 0 microseconds interval 0 microseconds 3 seconds NULL NULL NULL NULL 0 NULL NULL NULL NULL NULL NULL ``` ```sql -- !query 21 select interval '1 year 2 month' / 0 -- !query 21 schema struct<divide_interval(interval 1 years 2 months, CAST(0 AS DOUBLE)):interval> -- !query 21 output NULL ``` in the first case, interval ’2 seconds ‘ / 0, it produces `interval 0 microseconds ` in the second case, it is `null` ### 2 null literal issues ```sql -- !query 20 select interval '1 year 2 month' / null -- !query 20 schema struct<> -- !query 20 output org.apache.spark.sql.AnalysisException cannot resolve '(interval 1 years 2 months / NULL)' due to data type mismatch: differing types in '(interval 1 years 2 months / NULL)' (interval and null).; line 1 pos 7 -- !query 22 select interval '4 months 2 weeks 6 days' * null -- !query 22 schema struct<> -- !query 22 output org.apache.spark.sql.AnalysisException cannot resolve '(interval 4 months 20 days * NULL)' due to data type mismatch: differing types in '(interval 4 months 20 days * NULL)' (interval and null).; line 1 pos 7 -- !query 23 select null * interval '4 months 2 weeks 6 days' -- !query 23 schema struct<> -- !query 23 output org.apache.spark.sql.AnalysisException cannot resolve '(NULL * interval 4 months 20 days)' due to data type mismatch: differing types in '(NULL * interval 4 months 20 days)' (null and interval).; line 1 pos 7 ``` dividing or multiplying null literals, error occurs; where in column is fine as the first case ### Does this PR introduce any user-facing change? NO, maybe yes, but it is just a follow-up ### How was this patch tested? add uts cc cloud-fan MaxGekk maropu Closes #26410 from yaooqinn/SPARK-29387. Authored-by: Kent Yao <yaooqinn@hotmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2019-11-07 12:19:03 +08:00
Wenchen Fan	1f3863c856	[SPARK-29759][SQL] LocalShuffleReaderExec.outputPartitioning should use the corrected attributes ### What changes were proposed in this pull request? Update `LocalShuffleReaderExec.outputPartitioning` to use attributes from `ReusedQueryStage`. This also removes the override `doCanonicalize` in local/coalesced shuffle reader, as these 2 operators change the output partitioning. It's not safe to strip them in the canonicalized query plan. ### Why are the changes needed? We will have an invalid output partitioning if we don fix it. ### Does this PR introduce any user-facing change? no ### How was this patch tested? existing tests Closes #26400 from cloud-fan/aqe. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Xiao Li <gatorsmile@gmail.com>	2019-11-06 14:33:52 -08:00
Jungtaek Lim (HeartSaVioR)	782992c7ed	[SPARK-29642][SS] Change the element type of underlying array to UnsafeRow for ContinuousRecordEndpoint ### What changes were proposed in this pull request? This patch fixes the bug that `ContinuousMemoryStream[String]` throws error regarding ClassCastException - cast String to UTFString. This is because ContinuousMemoryStream and ContinuousRecordEndpoint uses origin input as it is for underlying data structure of Row, and encoding is missing here. To force encoding, this patch changes the element type of underlying array to UnsafeRow instead of Any for ContinuousRecordEndpoint - ContinuousMemoryStream and TextSocketContinuousStream are modified to reflect the change. ### Why are the changes needed? Above section describes the bug. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Add new UT to check for availability on couple of types. Closes #26300 from HeartSaVioR/SPARK-29642. Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com> Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>	2019-11-06 10:37:00 -08:00
Wenchen Fan	411015300e	[SPARK-29752][SQL][TEST] make AdaptiveQueryExecSuite more robust ### What changes were proposed in this pull request? instead of checking the exact number of local shuffle readers, we should check whether the number of shuffles is equal to the number of local readers. ### Why are the changes needed? AQE is known to have randomness. We may pick different build side for broadcast join depending on which query stage finishes first. The decision to build side may add/remove shuffles downstream, so it's flaky to check the exact number of local shuffle readers. ### Does this PR introduce any user-facing change? no ### How was this patch tested? test only PR. Closes #26394 from cloud-fan/test. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Xiao Li <gatorsmile@gmail.com>	2019-11-06 10:27:39 -08:00
shahid	90df858a26	[SPARK-29725][SQL][TESTS] Add ThriftServerPageSuite ### What changes were proposed in this pull request? Added UT for the classes `ThriftServerPage.scala` and `ThriftServerSessionPage.scala` ### Why are the changes needed? Currently, there are no UTs for testing Thriftserver UI page ### Does this PR introduce any user-facing change? No ### How was this patch tested? UT Closes #26403 from shahidki31/ut. Authored-by: shahid <shahidki31@gmail.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2019-11-06 20:59:45 +09:00
Aman Omer	0dcd739534	[SPARK-29462] The data type of "array()" should be array<null> ### What changes were proposed in this pull request? During creation of array, if CreateArray does not gets any children to set data type for array, it will create an array of null type . ### Why are the changes needed? When empty array is created, it should be declared as array<null>. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Tested manually Closes #26324 from amanomer/29462. Authored-by: Aman Omer <amanomer1996@gmail.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2019-11-06 18:39:46 +09:00
Liang-Chi Hsieh	6233958ab6	[SPARK-29680][SQL] Remove ALTER TABLE CHANGE COLUMN syntax ### What changes were proposed in this pull request? This patch removes v1 ALTER TABLE CHANGE COLUMN syntax. ### Why are the changes needed? Since in v2 we have ALTER TABLE CHANGE COLUMN and ALTER TABLE RENAME COLUMN, this old syntax is not necessary now and can be confusing. The v2 ALTER TABLE CHANGE COLUMN should fallback to v1 AlterTableChangeColumnCommand (#26354). ### Does this PR introduce any user-facing change? Yes, the old v1 ALTER TABLE CHANGE COLUMN syntax is removed. ### How was this patch tested? Unit tests. Closes #26338 from viirya/SPARK-29680. Authored-by: Liang-Chi Hsieh <viirya@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2019-11-06 10:42:44 +08:00
Takeshi Yamamuro	20b9d8259b	[SPARK-29714][SQL][TESTS] Port insert.sql ### What changes were proposed in this pull request? This PR ports insert.sql from PostgreSQL regression tests https://github.com/postgres/postgres/blob/REL_12_STABLE/src/test/regress/sql/insert.sql The expected results can be found in the link: https://github.com/postgres/postgres/blob/REL_12_STABLE/src/test/regress/expected/insert.out ### Why are the changes needed? To check behaviour differences between Spark and PostgreSQL ### Does this PR introduce any user-facing change? No ### How was this patch tested? Pass the Jenkins. And, Comparison with PgSQL results Closes #26360 from maropu/InsertTest. Authored-by: Takeshi Yamamuro <yamamuro@apache.org> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-11-05 16:44:54 -08:00
Maxim Gekk	4c53ac1822	[SPARK-29387][SQL] Support `` and `/` operators for intervals ### What changes were proposed in this pull request? Added new expressions `MultiplyInterval` and `DivideInterval` to multiply/divide an interval by a numeric. Updated `TypeCoercion.DateTimeOperations` to turn the `Multiply`/`Divide` expressions of `CalendarIntervalType` and `NumericType` to `MultiplyInterval`/`DivideInterval`. To support new operations, added new methods `multiply()` and `divide()` to `CalendarInterval`. ### Why are the changes needed? - To maintain feature parity with PostgreSQL which supports multiplication and division of intervals by doubles: ```sql # select interval '1 hour' / double precision '1.5'; ?column? ---------- 00:40:00 ``` - To conform the SQL standard which defines those operations: `numeric interval`, `interval * numeric` and `interval / numeric`. See [4.5.3 Operations involving datetimes and intervals](http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt). - Improve Spark SQL UX and allow users to adjust interval columns. For example: ```sql spark-sql> select (timestamp'now' - timestamp'yesterday') * 1.3; interval 2 days 10 hours 39 minutes 38 seconds 568 milliseconds 900 microseconds ``` ### Does this PR introduce any user-facing change? Yes, previously the following query fails with the error: ```sql spark-sql> select interval 1 hour 30 minutes * 1.5; Error in query: cannot resolve '(interval 1 hours 30 minutes * 1.5BD)' due to data type mismatch: differing types in '(interval 1 hours 30 minutes * 1.5BD)' (interval and decimal(2,1)).; line 1 pos 7; ``` After: ```sql spark-sql> select interval 1 hour 30 minutes * 1.5; interval 2 hours 15 minutes ``` ### How was this patch tested? - Added tests for the `multiply()` and `divide()` methods to `CalendarIntervalSuite.java` - New test suite `IntervalExpressionsSuite` - by tests for `Multiply` -> `MultiplyInterval` and `Divide` -> `DivideInterval` in `TypeCoercionSuite` - updated `datetime.sql` Closes #26132 from MaxGekk/interval-mul-div. Authored-by: Maxim Gekk <max.gekk@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2019-11-06 00:37:43 +08:00
Takeshi Yamamuro	41be5125a1	[SPARK-29648][SQL][TESTS] Port limit.sql ### What changes were proposed in this pull request? This PR ports limit.sql from PostgreSQL regression tests https://github.com/postgres/postgres/blob/REL_12_STABLE/src/test/regress/sql/limit.sql The expected results can be found in the link: https://github.com/postgres/postgres/blob/REL_12_STABLE/src/test/regress/expected/limit.out ### Why are the changes needed? To check behaviour differences between Spark and PostgreSQL ### Does this PR introduce any user-facing change? No ### How was this patch tested? Pass the Jenkins. And, Comparison with PgSQL results Closes #26311 from maropu/SPARK-29648. Authored-by: Takeshi Yamamuro <yamamuro@apache.org> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-11-04 22:12:27 -08:00
Huaxin Gao	02eecfec99	[SPARK-29695][SQL] ALTER TABLE (SerDe properties) should look up catalog/table like v2 commands ### What changes were proposed in this pull request? Add AlterTableSerDePropertiesStatement and make ALTER TABLE ... SET SERDE/SERDEPROPERTIES go through the same catalog/table resolution framework of v2 commands. ### Why are the changes needed? It's important to make all the commands have the same table resolution behavior, to avoid confusing end-users. e.g. ``` USE my_catalog DESC t // success and describe the table t from my_catalog ALTER TABLE t SET SERDE 'org.apache.class' // report table not found as there is no table t in the session catalog ``` ### Does this PR introduce any user-facing change? Yes. When running ALTER TABLE ... SET SERDE/SERDEPROPERTIES, Spark fails the command if the current catalog is set to a v2 catalog, or the table name specified a v2 catalog. ### How was this patch tested? Unit tests. Closes #26374 from huaxingao/spark_29695. Authored-by: Huaxin Gao <huaxing@us.ibm.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-11-04 21:42:39 -08:00
Terry Kim	66619b84d8	[SPARK-29630][SQL] Disallow creating a permanent view that references a temporary view in an expression ### What changes were proposed in this pull request? Disallow creating a permanent view that references a temporary view in expressions. ### Why are the changes needed? Creating a permanent view that references a temporary view is currently disallowed. For example, ```SQL # The following throws org.apache.spark.sql.AnalysisException # Not allowed to create a permanent view `per_view` by referencing a temporary view `tmp`; CREATE VIEW per_view AS SELECT t1.a, t2.b FROM base_table t1, (SELECT * FROM tmp) t2" ``` However, the following is allowed. ```SQL CREATE VIEW per_view AS SELECT * FROM base_table WHERE EXISTS (SELECT * FROM tmp); ``` This PR fixes the bug where temporary views used inside expressions are not checked. ### Does this PR introduce any user-facing change? Yes. Now the following SQL query throws an exception as expected: ```SQL # The following throws org.apache.spark.sql.AnalysisException # Not allowed to create a permanent view `per_view` by referencing a temporary view `tmp`; CREATE VIEW per_view AS SELECT * FROM base_table WHERE EXISTS (SELECT * FROM tmp); ``` ### How was this patch tested? Added new unit tests. Closes #26361 from imback82/spark-29630. Authored-by: Terry Kim <yuminkim@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2019-11-05 13:19:46 +08:00
Takeshi Yamamuro	942a057934	[SPARK-29696][SQL][TESTS] Port groupingsets.sql ### What changes were proposed in this pull request? This PR ports groupingsets.sql from PostgreSQL regression tests https://github.com/postgres/postgres/blob/REL_12_STABLE/src/test/regress/sql/groupingsets.sql The expected results can be found in the link: https://github.com/postgres/postgres/blob/REL_12_STABLE/src/test/regress/expected/groupingsets.out ### Why are the changes needed? To check behaviour differences between Spark and PostgreSQL ### Does this PR introduce any user-facing change? No ### How was this patch tested? Pass the Jenkins. And, Comparison with PgSQL results Closes #26352 from maropu/GgroupingSets. Authored-by: Takeshi Yamamuro <yamamuro@apache.org> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-11-04 19:06:28 -08:00
Terry Kim	bc65c54f6b	[SPARK-29734][SQL] Datasource V2: Support SHOW CURRENT NAMESPACE ### What changes were proposed in this pull request? This PR introduces a new SQL command: `SHOW CURRENT NAMESPACE`. ### Why are the changes needed? Datasource V2 supports multiple catalogs/namespaces and having `SHOW CURRENT NAMESPACE` to retrieve the current catalog/namespace info would be useful. ### Does this PR introduce any user-facing change? Yes, the user can perform the following: ``` scala> spark.sql("SHOW CURRENT NAMESPACE").show +-------------+---------+ \| catalog\|namespace\| +-------------+---------+ \|spark_catalog\| default\| +-------------+---------+ scala> spark.sql("USE testcat.ns1.ns2").show scala> spark.sql("SHOW CURRENT NAMESPACE").show +-------+---------+ \|catalog\|namespace\| +-------+---------+ \|testcat\| ns1.ns2\| +-------+---------+ ``` ### How was this patch tested? Added unit tests. Closes #26379 from imback82/show_current_catalog. Authored-by: Terry Kim <yuminkim@gmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-11-04 18:05:10 -08:00
Jungtaek Lim (HeartSaVioR)	ba2bc4b0e0	[SPARK-20568][SS] Provide option to clean up completed files in streaming query ## What changes were proposed in this pull request? This patch adds the option to clean up files which are completed in previous batch. `cleanSource` -> "archive" / "delete" / "off" The default value is "off", which Spark will do nothing. If "delete" is specified, Spark will simply delete input files. If "archive" is specified, Spark will require additional config `sourceArchiveDir` which will be used to move input files to there. When archiving (via move) the path of input files are retained to the archived paths as sub-path. Note that it is only applied to "micro-batch", since for batch all input files must be kept to get same result across multiple query executions. ## How was this patch tested? Added UT. Manual test against local disk as well as HDFS. Closes #22952 from HeartSaVioR/SPARK-20568. Lead-authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com> Co-authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan@gmail.com> Co-authored-by: Jungtaek Lim <kabhwan@gmail.com> Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>	2019-11-04 15:16:10 -08:00
yong.tian1	04536b21db	[SPARK-28552][SQL] Case-insensitive database URLs in JdbcDialect ## What changes were proposed in this pull request? This pr proposes to be case insensitive when matching dialects via jdbc url prefix. When I use jdbc url such as: ```jdbc: MySQL://localhost/db``` to query data through sparksql, the result is wrong, but MySQL supports such url writing. because sparksql matches MySQLDialect by prefix ```jdbc:mysql```, so ```jdbc: MySQL``` is not matched with the correct dialect. Therefore, it should be case insensitive when identifying the corresponding dialect through jdbc url https://issues.apache.org/jira/browse/SPARK-28552 ## How was this patch tested? UT. Closes #25287 from teeyog/sql_dialect. Lead-authored-by: yong.tian1 <yong.tian1@dmall.com> Co-authored-by: Xingbo Jiang <xingbo.jiang@databricks.com> Co-authored-by: Chris Martin <chris@cmartinit.co.uk> Co-authored-by: Takeshi Yamamuro <yamamuro@apache.org> Co-authored-by: Dongjoon Hyun <dhyun@apple.com> Co-authored-by: Kent Yao <yaooqinn@hotmail.com> Co-authored-by: teeyog <teeyog@gmail.com> Co-authored-by: Maxim Gekk <max.gekk@gmail.com> Co-authored-by: Ryan Blue <blue@apache.org> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>	2019-11-05 08:15:29 +09:00
Wenchen Fan	326b789340	[SPARK-29743][SQL] sample should set needCopyResult to true if its child is ### What changes were proposed in this pull request? `SampleExec` has a bug that it sets `needCopyResult` to false as long as the `withReplacement` parameter is false. This causes problems if its child needs to copy the result, e.g. a join. ### Why are the changes needed? to fix a correctness issue ### Does this PR introduce any user-facing change? Yes, the result will be corrected. ### How was this patch tested? a new test Closes #26387 from cloud-fan/sample-bug. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-11-04 10:56:37 -08:00
angerszhu	e524a3a223	[SPARK-29742][BUILD] Update checkstyle plugin's check dir scope ### What changes were proposed in this pull request? Current checkstyle checking folder can't cover all folder. Since for support multi version hive, we have some divided hive folder. We should check it too. ### Why are the changes needed? Fix build bug ### Does this PR introduce any user-facing change? NO ### How was this patch tested? NO Closes #26385 from AngersZhuuuu/SPARK-29742. Authored-by: angerszhu <angers.zhu@gmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-11-04 09:08:47 -08:00
Kent Yao	44b8fbcc58	[SPARK-29663][SQL] Support sum with interval type values ### What changes were proposed in this pull request? sum support interval values ### Why are the changes needed? Part of SPARK-27764 Feature Parity between PostgreSQL and Spark ### Does this PR introduce any user-facing change? yes, sum can evaluate intervals ### How was this patch tested? add ut Closes #26325 from yaooqinn/SPARK-29663. Authored-by: Kent Yao <yaooqinn@hotmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2019-11-05 01:05:07 +08:00
Terry Kim	d4ea211187	[SPARK-29678][SQL] ALTER TABLE (ADD PARTITION) should look up catalog/table like v2 commands ### What changes were proposed in this pull request? Add AlterTableAddPartitionStatement and make ALTER TABLE ... ADD PARTITION go through the same catalog/table resolution framework of v2 commands. ### Why are the changes needed? It's important to make all the commands have the same table resolution behavior, to avoid confusing end-users. e.g. ``` USE my_catalog DESC t // success and describe the table t from my_catalog ALTER TABLE t ADD PARTITION (id=1) // report table not found as there is no table t in the session catalog ``` ### Does this PR introduce any user-facing change? Yes. When running ALTER TABLE ... ADD PARTITION, Spark fails the command if the current catalog is set to a v2 catalog, or the table name specified a v2 catalog. ### How was this patch tested? Unit tests Closes #26369 from imback82/spark-29678. Authored-by: Terry Kim <yuminkim@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2019-11-04 23:56:47 +08:00
shahid	9023c69db8	[SPARK-29590][WEBUI] JDBC/ODBC tab in the spark UI support hide tables, to make it consistent with other tabs ### What changes were proposed in this pull request? Currently, JDBC/ODBC tab in the WEBUI doesn't support hiding table. Other tabs in the web ui like, Jobs, stages, SQL etc supports hiding table (refer https://github.com/apache/spark/pull/22592). In this PR, added the support for hide table in the jdbc/odbc tab also. ### Why are the changes needed? Spark ui about the contents of the form need to have hidden and show features, when the table records very much. Because sometimes you do not care about the record of the table, you just want to see the contents of the next table, but you have to scroll the scroll bar for a long time to see the contents of the next table. ### Does this PR introduce any user-facing change? No, except support of hide table ### How was this patch tested? Manually tested ![Screenshot 2019-11-01 at 12 10 05 PM](https://user-images.githubusercontent.com/23054875/68007364-61aa5d80-fca1-11e9-841e-c5a7382871fa.png) ![Screenshot 2019-11-01 at 12 10 43 PM](https://user-images.githubusercontent.com/23054875/68007355-5a834f80-fca1-11e9-844a-f4ba1a333db7.png) Closes #26353 from shahidki31/hideTable. Authored-by: shahid <shahidki31@gmail.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2019-11-04 09:44:10 -06:00
Maxim Gekk	50538600ec	[SPARK-29736][TESTS] Improve stability of tests for special datetime values ### What changes were proposed in this pull request? - Retry the tests for special date-time values on failure. The tests can potentially fail when reference values were taken before midnight and test code resolves special values after midnight. The retry can guarantees that the tests run during the same day. - Simplify getting of the current timestamp via `Instant.now()`. This should avoid any issues of converting current local datetime to an instance. For example, the same local time can be mapped to 2 instants when clocks are turned backward 1 hour on daylight saving date. - Extract common code to SQLHelper - Set the tested zoneId to the session time zone in `DateTimeUtilsSuite`. ### Why are the changes needed? To make the tests more stable. ### Does this PR introduce any user-facing change? No ### How was this patch tested? By existing test suites `Date`/`TimestampFormatterSuite` and `DateTimeUtilsSuite`. Closes #26380 from MaxGekk/retry-on-fail. Authored-by: Maxim Gekk <max.gekk@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2019-11-04 16:59:32 +08:00
Liang-Chi Hsieh	afb055ba19	[SPARK-29353][SQL] Fallback AlterTableAlterColumnStatement to v1 AlterTableChangeColumnCommand ### What changes were proposed in this pull request? If the resolved table is v1 table, AlterTableAlterColumnStatement fallbacks to v1 AlterTableChangeColumnCommand. ### Why are the changes needed? To make the catalog/table lookup logic consistent. ### Does this PR introduce any user-facing change? Yes, a ALTER TABLE ALTER COLUMN command previously fails on v1 tables. After this, it falls back to v1 AlterTableChangeColumnCommand. ### How was this patch tested? Unit test. Closes #26354 from viirya/SPARK-29353. Authored-by: Liang-Chi Hsieh <viirya@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2019-11-04 15:02:27 +08:00
Maxim Gekk	fb60c2a170	[SPARK-29671][SQL] Simplify string representation of intervals ### What changes were proposed in this pull request? In the PR, I propose to changed `CalendarInterval.toString`: - to skip the `week` unit - to convert `milliseconds` and `microseconds` as the fractional part of the `seconds` unit. ### Why are the changes needed? To improve readability. ### Does this PR introduce any user-facing change? Yes ### How was this patch tested? - By `CalendarIntervalSuite` and `IntervalUtilsSuite` - `literals.sql`, `datetime.sql` and `interval.sql` Closes #26367 from MaxGekk/interval-to-string-format. Authored-by: Maxim Gekk <max.gekk@gmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-11-03 22:56:59 -08:00
wangguangxin.cn	83c39d15e1	[SPARK-29343][SQL] Eliminate sorts without limit in the subquery of Join/Aggregation ### What changes were proposed in this pull request? This is somewhat a complement of https://github.com/apache/spark/pull/21853. The `Sort` without `Limit` operator in `Join` subquery is useless, it's the same case in `GroupBy` when the aggregation function is order irrelevant, such as `count`, `sum`. This PR try to remove this kind of `Sort` operator in `SQL Optimizer`. ### Why are the changes needed? For example, `select count(1) from (select a from test1 order by a)` is equal to `select count(1) from (select a from test1)`. 'select * from (select a from test1 order by a) t1 join (select b from test2) t2 on t1.a = t2.b' is equal to `select * from (select a from test1) t1 join (select b from test2) t2 on t1.a = t2.b`. Remove useless `Sort` operator can improve performance. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Adding new UT `RemoveSortInSubquerySuite.scala` Closes #26011 from WangGuangxin/remove_sorts. Authored-by: wangguangxin.cn <wangguangxin.cn@bytedance.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2019-11-04 14:52:19 +08:00
Kent Yao	5ba17d09ac	[SPARK-29722][SQL] Non reversed keywords should be able to be used in high order functions ### What changes were proposed in this pull request? Support non-reversed keywords to be used in high order functions. ### Why are the changes needed? the keywords are non-reversed. ### Does this PR introduce any user-facing change? yes, all non-reversed keywords can be used in high order function correctly ### How was this patch tested? add uts Closes #26366 from yaooqinn/SPARK-29722. Authored-by: Kent Yao <yaooqinn@hotmail.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2019-11-04 14:52:14 +09:00
Maxim Gekk	80a89873b2	[SPARK-29733][TESTS] Fix wrong order of parameters passed to `assertEquals` ### What changes were proposed in this pull request? The `assertEquals` method of JUnit Assert requires the first parameter to be the expected value. In this PR, I propose to change the order of parameters when the expected value is passed as the second parameter. ### Why are the changes needed? Wrong order of assert parameters confuses when the assert fails and the parameters have special string representation. For example: ```java assertEquals(input1.add(input2), new CalendarInterval(5, 5, 367200000000L)); ``` ``` java.lang.AssertionError: Expected :interval 5 months 5 days 101 hours Actual :interval 5 months 5 days 102 hours ``` ### Does this PR introduce any user-facing change? No ### How was this patch tested? By existing tests. Closes #26377 from MaxGekk/fix-order-in-assert-equals. Authored-by: Maxim Gekk <max.gekk@gmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-11-03 11:21:28 -08:00
Wenchen Fan	31ae446e9c	[SPARK-29623][SQL] do not allow multiple unit TO unit statements in interval literal syntax ### What changes were proposed in this pull request? re-arrange the parser rules to make it clear that multiple unit TO unit statement like `SELECT INTERVAL '1-1' YEAR TO MONTH '2-2' YEAR TO MONTH` is not allowed. ### Why are the changes needed? This is definitely an accident that we support such a weird syntax in the past. It's not supported by any other DBs and I can't think of any use case of it. Also no test covers this syntax in the current codebase. ### Does this PR introduce any user-facing change? Yes, and a migration guide item is added. ### How was this patch tested? new tests. Closes #26285 from cloud-fan/syntax. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2019-11-02 21:35:56 +08:00
DylanGuedes	f53be0a05e	[SPARK-29109][SQL][TESTS] Port window.sql (Part 3) ### What changes were proposed in this pull request? This PR ports window.sql from PostgreSQL regression tests https://github.com/postgres/postgres/blob/REL_12_STABLE/src/test/regress/sql/window.sql#L564-L911 The expected results can be found in the link: https://github.com/postgres/postgres/blob/REL_12_STABLE/src/test/regress/expected/window.out ### Why are the changes needed? To ensure compatibility with PostgreSQL. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Pass the Jenkins. And, Comparison with PgSQL results. Closes #26274 from DylanGuedes/spark-29109. Authored-by: DylanGuedes <djmgguedes@gmail.com> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>	2019-11-01 22:05:40 +09:00
Huaxin Gao	14337f68e3	[SPARK-29643][SQL] ALTER TABLE/VIEW (DROP PARTITION) should look up catalog/table like v2 commands ###What changes were proposed in this pull request? Add AlterTableDropPartitionStatement and make ALTER TABLE/VIEW ... DROP PARTITION go through the same catalog/table resolution framework of v2 commands. ### Why are the changes needed? It's important to make all the commands have the same table resolution behavior, to avoid confusing end-users. e.g. ``` USE my_catalog DESC t // success and describe the table t from my_catalog ALTER TABLE t DROP PARTITION (id=1) // report table not found as there is no table t in the session catalog ``` ### Does this PR introduce any user-facing change? Yes. When running ALTER TABLE/VIEW ... DROP PARTITION, Spark fails the command if the current catalog is set to a v2 catalog, or the table name specified a v2 catalog. ### How was this patch tested? Unit tests. Closes #26303 from huaxingao/spark-29643. Authored-by: Huaxin Gao <huaxing@us.ibm.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2019-11-01 18:29:04 +08:00
Liu,Linhong	a4382f7fe1	[SPARK-29486][SQL] CalendarInterval should have 3 fields: months, days and microseconds ### What changes were proposed in this pull request? Current CalendarInterval has 2 fields: months and microseconds. This PR try to change it to 3 fields: months, days and microseconds. This is because one logical day interval may have different number of microseconds (daylight saving). ### Why are the changes needed? One logical day interval may have different number of microseconds (daylight saving). For example, in PST timezone, there will be 25 hours from 2019-11-2 12:00:00 to 2019-11-3 12:00:00 ### Does this PR introduce any user-facing change? no ### How was this patch tested? unit test and new added test cases Closes #26134 from LinhongLiu/calendarinterval. Authored-by: Liu,Linhong <liulinhong@baidu.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2019-11-01 18:12:33 +08:00
Huaxin Gao	ae7450d1c9	[SPARK-29676][SQL] ALTER TABLE (RENAME PARTITION) should look up catalog/table like v2 commands ### What changes were proposed in this pull request? Add AlterTableRenamePartitionStatement and make ALTER TABLE ... RENAME TO PARTITION go through the same catalog/table resolution framework of v2 commands. ### Why are the changes needed? It's important to make all the commands have the same table resolution behavior, to avoid confusing end-users. e.g. ``` USE my_catalog DESC t // success and describe the table t from my_catalog ALTER TABLE t PARTITION (id=1) RENAME TO PARTITION (id=2) // report table not found as there is no table t in the session catalog ``` ### Does this PR introduce any user-facing change? Yes. When running ALTER TABLE ... RENAME TO PARTITION, Spark fails the command if the current catalog is set to a v2 catalog, or the table name specified a v2 catalog. ### How was this patch tested? Unit tests. Closes #26350 from huaxingao/spark_29676. Authored-by: Huaxin Gao <huaxing@us.ibm.com> Signed-off-by: Liang-Chi Hsieh <liangchi@uber.com>	2019-10-31 20:28:31 -07:00
ulysses	8a8ac00271	[SPARK-29687][SQL] Fix JDBC metrics counter data type ### What changes were proposed in this pull request? Fix JDBC metrics counter data type. Related pull request [26109](https://github.com/apache/spark/pull/26109). ### Why are the changes needed? Avoid overflow. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Exists UT. Closes #26346 from ulysses-you/SPARK-29687. Authored-by: ulysses <youxiduo@weidian.com> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>	2019-11-01 08:35:00 +09:00
ulysses	888cc4601a	[SPARK-29675][SQL] Add exception when isolationLevel is Illegal ### What changes were proposed in this pull request? Now we use JDBC api and set an Illegal isolationLevel option, spark will throw a `scala.MatchError`, it's not friendly to user. So we should add an IllegalArgumentException. ### Why are the changes needed? Make exception friendly to user. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Add UT. Closes #26334 from ulysses-you/SPARK-29675. Authored-by: ulysses <youxiduo@weidian.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-10-31 09:02:13 -07:00
Wenchen Fan	faf220aad9	[SPARK-29277][SQL][test-hadoop3.2] Add early DSv2 filter and projection pushdown Bring back https://github.com/apache/spark/pull/25955 ### What changes were proposed in this pull request? This adds a new rule, `V2ScanRelationPushDown`, to push filters and projections in to a new `DataSourceV2ScanRelation` in the optimizer. That scan is then used when converting to a physical scan node. The new relation correctly reports stats based on the scan. To run scan pushdown before rules where stats are used, this adds a new optimizer override, `earlyScanPushDownRules` and a batch for early pushdown in the optimizer, before cost-based join reordering. The other early pushdown rule, `PruneFileSourcePartitions`, is moved into the early pushdown rule set. This also moves pushdown helper methods from `DataSourceV2Strategy` into a util class. ### Why are the changes needed? This is needed for DSv2 sources to supply stats for cost-based rules in the optimizer. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This updates the implementation of stats from `DataSourceV2Relation` so tests will fail if stats are accessed before early pushdown for v2 relations. Closes #26341 from cloud-fan/back. Lead-authored-by: Wenchen Fan <wenchen@databricks.com> Co-authored-by: Ryan Blue <blue@apache.org> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-10-31 08:25:32 -07:00
jiake	cd39cd4bce	[SPARK-28560][SQL][FOLLOWUP] support the build side to local shuffle reader as far as possible in BroadcastHashJoin ### What changes were proposed in this pull request? [PR#25295](https://github.com/apache/spark/pull/25295) already implement the rule of converting the shuffle reader to local reader for the `BroadcastHashJoin` in probe side. This PR support converting the shuffle reader to local reader in build side. ### Why are the changes needed? Improve performance ### Does this PR introduce any user-facing change? No ### How was this patch tested? existing unit tests Closes #26289 from JkSelf/supportTwoSideLocalReader. Authored-by: jiake <ke.a.jia@intel.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2019-10-31 21:28:15 +08:00
maryannxue	4d302cb7ed	[SPARK-11150][SQL][FOLLOW-UP] Dynamic partition pruning ### What changes were proposed in this pull request? This is code cleanup PR for https://github.com/apache/spark/pull/25600, aiming to remove an unnecessary condition and to correct a code comment. ### Why are the changes needed? For code cleanup only. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Passed existing tests. Closes #26328 from maryannxue/dpp-followup. Authored-by: maryannxue <maryannxue@apache.org> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2019-10-31 15:43:02 +08:00
Maxim Gekk	5e9a155eba	[SPARK-29520][SS] Fix checks of negative intervals ### What changes were proposed in this pull request? - Added `getDuration()` to calculate interval duration in specified time units assuming provided days per months - Added `isNegative()` which return `true` is the interval duration is less than 0 - Fix checking negative intervals by using `isNegative()` in structured streaming classes - Fix checking of `year-months` intervals ### Why are the changes needed? This fixes incorrect checking of negative intervals. An interval is negative when its duration is negative but not if interval's months or microseconds is negative. Also this fixes checking of `year-month` interval support because the `month` field could be negative. ### Does this PR introduce any user-facing change? Should not ### How was this patch tested? - Added tests for the `getDuration()` and `isNegative()` methods to `IntervalUtilsSuite` - By existing SS tests Closes #26177 from MaxGekk/interval-is-positive. Authored-by: Maxim Gekk <max.gekk@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2019-10-31 15:35:04 +08:00
Dongjoon Hyun	095f7b05fd	Revert "[SPARK-29277][SQL] Add early DSv2 filter and projection pushdown" This reverts commit `cfc80d0eb1`.	2019-10-30 23:11:22 -07:00
Terry Kim	3a06c129f4	[SPARK-29592][SQL] ALTER TABLE (set partition location) should look up catalog/table like v2 commands ### What changes were proposed in this pull request? Update `AlterTableSetLocationStatement` to store `partitionSpec` and make `ALTER TABLE a.b.c PARTITION(...) SET LOCATION 'loc'` fail if `partitionSpec` is set with unsupported message. ### Why are the changes needed? It's important to make all the commands have the same table resolution behavior, to avoid confusing end-users. e.g. ``` USE my_catalog DESC t // success and describe the table t from my_catalog ALTER TABLE t PARTITION(...) SET LOCATION 'loc' // report set location with partition spec is not supported. ``` ### Does this PR introduce any user-facing change? yes. When running ALTER TABLE (set partition location), Spark fails the command if the current catalog is set to a v2 catalog, or the table name specified a v2 catalog. ### How was this patch tested? New unit tests Closes #26304 from imback82/alter_table_partition_loc. Authored-by: Terry Kim <yuminkim@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2019-10-31 10:47:43 +08:00
Unknown	401a5f7715	[SPARK-29523][SQL] SHOW COLUMNS should do multi-catalog resolution ### What changes were proposed in this pull request? Add ShowColumnsStatement and make SHOW COLUMNS go through the same catalog/table resolution framework of v2 commands. ### Why are the changes needed? It's important to make all the commands have the same table resolution behavior, to avoid confusing end-users. e.g. USE my_catalog DESC t // success and describe the table t from my_catalog SHOW COLUMNS FROM t // report table not found as there is no table t in the session catalog ### Does this PR introduce any user-facing change? yes. When running SHOW COLUMNS Spark fails the command if the current catalog is set to a v2 catalog, or the table name specified a v2 catalog. ### How was this patch tested? Unit tests. Closes #26182 from planga82/feature/SPARK-29523_SHOW_COLUMNS_datasourceV2. Authored-by: Unknown <soypab@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2019-10-31 10:13:12 +08:00
Maxim Gekk	3206a99870	[SPARK-29651][SQL] Fix parsing of interval seconds fraction ### What changes were proposed in this pull request? In the PR, I propose to extract parsing of the seconds interval units to the private method `parseNanos` in `IntervalUtils` and modify the code to correctly parse the fractional part of the seconds unit of intervals in the cases: - When the fractional part has less than 9 digits - The seconds unit is negative ### Why are the changes needed? The changes are needed to fix the issues: ```sql spark-sql> select interval '10.123456 seconds'; interval 10 seconds 123 microseconds ``` The correct result must be `interval 10 seconds 123 milliseconds 456 microseconds` ```sql spark-sql> select interval '-10.123456789 seconds'; interval -9 seconds -876 milliseconds -544 microseconds ``` but the whole interval should be negated, and the result must be `interval -10 seconds -123 milliseconds -456 microseconds`, taking into account the truncation to microseconds. ### Does this PR introduce any user-facing change? Yes. After changes: ```sql spark-sql> select interval '10.123456 seconds'; interval 10 seconds 123 milliseconds 456 microseconds spark-sql> select interval '-10.123456789 seconds'; interval -10 seconds -123 milliseconds -456 microseconds ``` ### How was this patch tested? By existing and new tests in `ExpressionParserSuite`. Closes #26313 from MaxGekk/fix-interval-nanos-parsing. Authored-by: Maxim Gekk <max.gekk@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2019-10-31 09:20:46 +08:00
Ryan Blue	cfc80d0eb1	[SPARK-29277][SQL] Add early DSv2 filter and projection pushdown ### What changes were proposed in this pull request? This adds a new rule, `V2ScanRelationPushDown`, to push filters and projections in to a new `DataSourceV2ScanRelation` in the optimizer. That scan is then used when converting to a physical scan node. The new relation correctly reports stats based on the scan. To run scan pushdown before rules where stats are used, this adds a new optimizer override, `earlyScanPushDownRules` and a batch for early pushdown in the optimizer, before cost-based join reordering. The other early pushdown rule, `PruneFileSourcePartitions`, is moved into the early pushdown rule set. This also moves pushdown helper methods from `DataSourceV2Strategy` into a util class. ### Why are the changes needed? This is needed for DSv2 sources to supply stats for cost-based rules in the optimizer. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This updates the implementation of stats from `DataSourceV2Relation` so tests will fail if stats are accessed before early pushdown for v2 relations. Closes #25955 from rdblue/move-v2-pushdown. Authored-by: Ryan Blue <blue@apache.org> Signed-off-by: Ryan Blue <blue@apache.org>	2019-10-30 18:07:34 -07:00
Xingbo Jiang	8207c835b4	Revert "Prepare Spark release v3.0.0-preview-rc2" This reverts commit `007c873ae3`.	2019-10-30 17:45:44 -07:00

1 2 3 4 5 ...

8599 commits