ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Yuming Wang	c87b0085c9	[SPARK-33696][BUILD][SQL] Upgrade built-in Hive to 2.3.8 ### What changes were proposed in this pull request? Hive 2.3.8 changes: HIVE-19662: Upgrade Avro to 1.8.2 HIVE-24324: Remove deprecated API usage from Avro HIVE-23980: Shade Guava from hive-exec in Hive 2.3 HIVE-24436: Fix Avro NULL_DEFAULT_VALUE compatibility issue HIVE-24512: Exclude calcite in packaging. HIVE-22708: Fix for HttpTransport to replace String.equals HIVE-24551: Hive should include transitive dependencies from calcite after shading it HIVE-24553: Exclude calcite from test-jar dependency of hive-exec ### Why are the changes needed? Upgrade Avro and Parquet to latest version. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing test add test try to upgrade Parquet to 1.11.1 and Avro to 1.10.1: https://github.com/apache/spark/pull/30517 Closes #30657 from wangyum/SPARK-33696. Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2021-01-17 21:54:35 -08:00
Kousuke Saruta	bec80d7eec	[SPARK-34101][SQL] Make spark-sql CLI configurable for the behavior of printing header by SET command ### What changes were proposed in this pull request? This PR introduces a new property `spark.sql.cli.print.header` to let users change the behavior of printing header for spark-sql CLI by SET command. ### Why are the changes needed? Like Hive CLI, spark-sql CLI accepts `hive.cli.print.header` property and we can change the behavior of printing header. But spark-sql CLI doesn't allow users to change Hive specific configurations dynamically by SET command. So, it's better to support the way to change the behavior by SET command. ### Does this PR introduce _any_ user-facing change? Yes. Users can dynamically change the behavior by SET command. ### How was this patch tested? I confirmed with the following commands/queries. ``` spark-sql> select (1) as a, (2) as b, (3) as c, (4) as d; 1 2 3 4 Time taken: 3.218 seconds, Fetched 1 row(s) spark-sql> set spark.sql.cli.print.header=true; key value spark.sql.cli.print.header true Time taken: 1.506 seconds, Fetched 1 row(s) spark-sql> select (1) as a, (2) as b, (3) as c, (4) as d; a b c d 1 2 3 4 Time taken: 0.79 seconds, Fetched 1 row(s) ``` Closes #31173 from sarutak/spark-sql-print-header. Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2021-01-15 13:15:48 +09:00
yangjie01	8b1ba233f1	[SPARK-34068][CORE][SQL][MLLIB][GRAPHX] Remove redundant collection conversion ### What changes were proposed in this pull request? There are some redundant collection conversion can be removed, for version compatibility, clean up these with Scala-2.13 profile. ### Why are the changes needed? Remove redundant collection conversion ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - Pass the Jenkins or GitHub Action - Manual test `core`, `graphx`, `mllib`, `mllib-local`, `sql`, `yarn`,`kafka-0-10` in Scala 2.13 passed Closes #31125 from LuciferYang/SPARK-34068. Authored-by: yangjie01 <yangjie01@baidu.com> Signed-off-by: Sean Owen <srowen@gmail.com>	2021-01-13 18:07:02 -06:00
fwang12	7b06acc28b	[SPARK-33100][SQL][FOLLOWUP] Find correct bound of bracketed comment in spark-sql ### What changes were proposed in this pull request? This PR help find correct bound of bracketed comment in spark-sql. Here is the log for UT of SPARK-33100 in CliSuite before: ``` 2021-01-05 13:22:34.768 - stdout> spark-sql> /* SELECT 'test';/ SELECT 'test'; 2021-01-05 13:22:41.523 - stderr> Time taken: 6.716 seconds, Fetched 1 row(s) 2021-01-05 13:22:41.599 - stdout> test 2021-01-05 13:22:41.6 - stdout> spark-sql> ;;/ SELECT 'test';/ SELECT 'test'; 2021-01-05 13:22:41.709 - stdout> test 2021-01-05 13:22:41.709 - stdout> spark-sql> / SELECT 'test';/;; SELECT 'test'; 2021-01-05 13:22:41.902 - stdout> spark-sql> SELECT 'test'; -- SELECT 'test'; 2021-01-05 13:22:41.902 - stderr> Time taken: 0.129 seconds, Fetched 1 row(s) 2021-01-05 13:22:41.902 - stderr> Error in query: 2021-01-05 13:22:41.902 - stderr> mismatched input '<EOF>' expecting {'(', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 19) 2021-01-05 13:22:42.006 - stderr> 2021-01-05 13:22:42.006 - stderr> == SQL == 2021-01-05 13:22:42.006 - stderr> / SELECT 'test';/ 2021-01-05 13:22:42.006 - stderr> -------------------^^^ 2021-01-05 13:22:42.006 - stderr> 2021-01-05 13:22:42.006 - stderr> Time taken: 0.226 seconds, Fetched 1 row(s) 2021-01-05 13:22:42.006 - stdout> test ``` The root cause is that the insideBracketedComment is not accurate. For `/ comment */`, the last character `/` is not insideBracketedComment and it would be treat as beginning of statements. In this PR, this issue is fixed. ### Why are the changes needed? To fix the issue described above. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing UT Closes #31054 from turboFei/SPARK-33100-followup. Authored-by: fwang12 <fwang12@ebay.com> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>	2021-01-07 20:49:37 +09:00
fwang12	a071826f72	[SPARK-33100][SQL] Ignore a semicolon inside a bracketed comment in spark-sql ### What changes were proposed in this pull request? Now the spark-sql does not support parse the sql statements with bracketed comments. For the sql statements: ``` /* SELECT 'test'; / SELECT 'test'; ``` Would be split to two statements: The first one: `/ SELECT 'test'` The second one: `*/ SELECT 'test'` Then it would throw an exception because the first one is illegal. In this PR, we ignore the content in bracketed comments while splitting the sql statements. Besides, we ignore the comment without any content. ### Why are the changes needed? Spark-sql might split the statements inside bracketed comments and it is not correct. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Added UT. Closes #29982 from turboFei/SPARK-33110. Lead-authored-by: fwang12 <fwang12@ebay.com> Co-authored-by: turbofei <fwang12@ebay.com> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>	2021-01-05 15:55:30 +09:00
Kent Yao	d7dc42d5f6	[SPARK-33895][SQL] Char and Varchar fail in MetaOperation of ThriftServer ### What changes were proposed in this pull request? ``` Caused by: java.lang.IllegalArgumentException: Unrecognized type name: CHAR(10) at org.apache.spark.sql.hive.thriftserver.SparkGetColumnsOperation.toJavaSQLType(SparkGetColumnsOperation.scala:187) at org.apache.spark.sql.hive.thriftserver.SparkGetColumnsOperation.$anonfun$addToRowSet$1(SparkGetColumnsOperation.scala:203) at scala.collection.immutable.List.foreach(List.scala:392) at org.apache.spark.sql.hive.thriftserver.SparkGetColumnsOperation.addToRowSet(SparkGetColumnsOperation.scala:195) at org.apache.spark.sql.hive.thriftserver.SparkGetColumnsOperation.$anonfun$runInternal$4(SparkGetColumnsOperation.scala:99) at org.apache.spark.sql.hive.thriftserver.SparkGetColumnsOperation.$anonfun$runInternal$4$adapted(SparkGetColumnsOperation.scala:98) at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) ``` meta operation is targeting raw table schema, we need to handle these types there. ### Why are the changes needed? bugfix, see the above case ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? new tests locally ![image](https://user-images.githubusercontent.com/8326978/103069196-cdfcc480-45f9-11eb-9c6a-d4c42123c6e3.png) Closes #30914 from yaooqinn/SPARK-33895. Authored-by: Kent Yao <yaooqinn@hotmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2020-12-24 07:40:38 +00:00
ulysses-you	e853f068f6	[SPARK-33526][SQL][FOLLOWUP] Fix flaky test due to timeout and fix docs ### What changes were proposed in this pull request? Make test stable and fix docs. ### Why are the changes needed? Query timeout sometime since we set an another config after set query timeout. ``` sbt.ForkMain$ForkError: java.sql.SQLTimeoutException: Query timed out after 0 seconds at org.apache.hive.jdbc.HiveStatement.waitForOperationToComplete(HiveStatement.java:381) at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:254) at org.apache.spark.sql.hive.thriftserver.ThriftServerWithSparkContextSuite.$anonfun$$init$$13(ThriftServerWithSparkContextSuite.scala:107) at org.apache.spark.sql.hive.thriftserver.ThriftServerWithSparkContextSuite.$anonfun$$init$$13$adapted(ThriftServerWithSparkContextSuite.scala:106) at scala.collection.immutable.List.foreach(List.scala:392) at org.apache.spark.sql.hive.thriftserver.ThriftServerWithSparkContextSuite.$anonfun$$init$$12(ThriftServerWithSparkContextSuite.scala:106) at org.apache.spark.sql.hive.thriftserver.ThriftServerWithSparkContextSuite.$anonfun$$init$$12$adapted(ThriftServerWithSparkContextSuite.scala:89) at org.apache.spark.sql.hive.thriftserver.SharedThriftServer.$anonfun$withJdbcStatement$4(SharedThriftServer.scala:95) at org.apache.spark.sql.hive.thriftserver.SharedThriftServer.$anonfun$withJdbcStatement$4$adapted(SharedThriftServer.scala:95) ``` The reason is: 1. we execute `set spark.sql.thriftServer.queryTimeout = 1`, then all the option will be limited in 1s. 2. we execute `set spark.sql.thriftServer.interruptOnCancel = false/true`. This sql will get timeout exception if there is something hung within 1s. It's not our expected. Reset the timeout before we do the step2 can avoid this problem. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Fix test. Closes #30897 from ulysses-you/SPARK-33526-followup. Authored-by: ulysses-you <ulyssesyou18@gmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2020-12-22 22:43:03 -08:00
Kent Yao	4d47ac4b4b	[SPARK-33705][SQL][TEST] Fix HiveThriftHttpServerSuite flakiness ### What changes were proposed in this pull request? TO FIX flaky tests: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132345/testReport/ ``` org.apache.spark.sql.hive.thriftserver.HiveThriftHttpServerSuite.JDBC query execution org.apache.spark.sql.hive.thriftserver.HiveThriftHttpServerSuite.Checks Hive version org.apache.spark.sql.hive.thriftserver.HiveThriftHttpServerSuite.SPARK-24829 Checks cast as float ``` The root cause here is a jar conflict issue. `NewCookie.isHttpOnly` is not defined in the `jsr311-api.jar` which conflicts The transitive artifact `jsr311-api.jar` of `hadoop-client` is excluded at the maven side. See https://issues.apache.org/jira/browse/SPARK-27179. The Jenkins PR builder and Github Action use `SBT` as the compiler tool. First, the exclusion rule from maven is not followed by sbt, so I was able to see `jsr311-api.jar` from maven cache to be added to the classpath directly. This seems to be a bug of `sbt-pom-reader` plugin but I'm not that sure. Then I added an `ExcludeRule` for the `hive-thriftserver` module at the SBT side and did see the `jsr311-api.jar` gone, but the CI jobs still failed with the same error. I added a trace log in ThriftHttpServlet ```s ERROR ThriftHttpServlet: !!!!!!!!! Suspect???????? ---> file:/home/jenkins/workspace/SparkPullRequestBuilder/assembly/target/scala-2.12/jars/jsr311-api-1.1.1.jar ``` And the log pointed out that the assembly phase copied it to `assembly/target/scala-2.12/jars/` which will be added to the classpath too. With the help of SBT `dependencyTree` tool, I saw the `jsr311-api` again as a transitive of `jersery-core` from `yarn` module with a `test` scope. So This seems to be another bug from the SBT side of the `sbt-assembly` plugin. It copied a test scope transitive artifact to the assembly output. In this PR, I defined some rules in SparkBuild.scala to bypass the potential bugs from the SBT side. First, exclude the `jsr311` from all over the project and then add it back separately to the YARN module for SBT. Additionally, the HiveThriftServerSuites was reflected for reducing flakiness too, but not related to the bugs I have found so far. ### Why are the changes needed? fix test here ### Does this PR introduce _any_ user-facing change? NO ### How was this patch tested? passing jenkins and ga Closes #30643 from yaooqinn/HiveThriftHttpServerSuite. Authored-by: Kent Yao <yaooqinn@hotmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2020-12-14 05:14:38 +00:00
Yuming Wang	94bc2d61a2	[SPARK-33589][SQL][FOLLOWUP] Replace Throwable with NonFatal ### What changes were proposed in this pull request? This pr replace `Throwable` with `NonFatal`. ### Why are the changes needed? Improve code. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? N/A Closes #30744 from wangyum/SPARK-33589-2. Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-12-13 14:52:26 -08:00
ulysses-you	5bab27e00b	[SPARK-33526][SQL] Add config to control if cancel invoke interrupt task on thriftserver ### What changes were proposed in this pull request? This PR add a new config `spark.sql.thriftServer.forceCancel` to give user a way to interrupt task when cancel statement. ### Why are the changes needed? After [#29933](https://github.com/apache/spark/pull/29933), we support cancel query if timeout, but the default behavior of `SparkContext.cancelJobGroups` won't interrupt task and just let task finish by itself. In some case it's dangerous, e.g., data skew or exists a heavily shuffle. A task will hold in a long time after do cancel and the resource will not release. ### Does this PR introduce _any_ user-facing change? Yes, a new config. ### How was this patch tested? Add test. Closes #30481 from ulysses-you/SPARK-33526. Lead-authored-by: ulysses-you <ulyssesyou18@gmail.com> Co-authored-by: ulysses-you <youxiduo@weidian.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-12-12 00:52:33 +09:00
Dooyoung Hwang	a713a7eee3	[SPARK-33655][SQL] Improve performance of processing FETCH_PRIOR ### What changes were proposed in this pull request? Currently, when a client requests FETCH_PRIOR to Thriftserver, Thriftserver reiterates from the start position. Because Thriftserver caches a query result with an array when THRIFTSERVER_INCREMENTAL_COLLECT feature is off, FETCH_PRIOR can be implemented without reiterating the result. A trait FeatureIterator is added in order to separate the implementation for iterator and an array. Also, FeatureIterator supports moves cursor with absolute position, which will be useful for the implementation of FETCH_RELATIVE, FETCH_ABSOLUTE. ### Why are the changes needed? For better performance of Thriftserver. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? FetchIteratorSuite Closes #30600 from Dooyoung-Hwang/refactor_with_fetch_iterator. Authored-by: Dooyoung Hwang <dooyoung.hwang@sk.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-12-09 18:35:24 +09:00
Josh Soref	c62b84a043	[MINOR] Spelling sql not core ### What changes were proposed in this pull request? This PR intends to fix typos in the sub-modules: * `sql/catalyst` * `sql/hive-thriftserver` * `sql/hive` Split per srowen https://github.com/apache/spark/pull/30323#issuecomment-728981618 NOTE: The misspellings have been reported at `706a726f87 (commitcomment-44064356)` ### Why are the changes needed? Misspelled words make it harder to read / understand content. ### Does this PR introduce _any_ user-facing change? There are various fixes to documentation, etc... ### How was this patch tested? No testing was performed Closes #30532 from jsoref/spelling-sql-not-core. Authored-by: Josh Soref <jsoref@users.noreply.github.com> Signed-off-by: Sean Owen <srowen@gmail.com>	2020-12-07 08:40:29 -06:00
Dongjoon Hyun	de9818f043	[SPARK-33662][BUILD] Setting version to 3.2.0-SNAPSHOT ### What changes were proposed in this pull request? This PR aims to update `master` branch version to 3.2.0-SNAPSHOT. ### Why are the changes needed? Start to prepare Apache Spark 3.2.0. ### Does this PR introduce _any_ user-facing change? N/A. ### How was this patch tested? Pass the CIs. Closes #30606 from dongjoon-hyun/SPARK-3.2. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-12-04 14:10:42 -08:00
Linhong Liu	e02324f2dd	[SPARK-33142][SPARK-33647][SQL] Store SQL text for SQL temp view ### What changes were proposed in this pull request? Currently, in spark, the temp view is saved as its analyzed logical plan, while the permanent view is kept in HMS with its origin SQL text. As a result, permanent and temporary views have different behaviors in some cases. In this PR we store the SQL text for temporary view in order to unify the behavior between permanent and temporary views. ### Why are the changes needed? to unify the behavior between permanent and temporary views ### Does this PR introduce _any_ user-facing change? Yes, with this PR, the temporary view will be re-analyzed when it's referred. So if the underlying datasource changed, the view will also be updated. ### How was this patch tested? existing and newly added test cases Closes #30567 from linhongliu-db/SPARK-33142. Authored-by: Linhong Liu <linhong.liu@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2020-12-04 06:48:49 +00:00
Terry Kim	0fd9f57dd4	[SPARK-33448][SQL] Support CACHE/UNCACHE TABLE commands for v2 tables ### What changes were proposed in this pull request? This PR proposes to support `CHACHE/UNCACHE TABLE` commands for v2 tables. In addtion, this PR proposes to migrate `CACHE/UNCACHE TABLE` to use `UnresolvedTableOrView` to resolve the table identifier. This allows consistent resolution rules (temp view first, etc.) to be applied for both v1/v2 commands. More info about the consistent resolution rule proposal can be found in [JIRA](https://issues.apache.org/jira/browse/SPARK-29900) or [proposal doc](https://docs.google.com/document/d/1hvLjGA8y_W_hhilpngXVub1Ebv8RsMap986nENCFnrg/edit?usp=sharing). ### Why are the changes needed? To support `CACHE/UNCACHE TABLE` commands for v2 tables. Note that `CACHE/UNCACHE TABLE` for v1 tables/views go through `SparkSession.table` to resolve identifier, which resolves temp views first, so there is no change in the behavior by moving to the new framework. ### Does this PR introduce _any_ user-facing change? Yes. Now the user can run `CACHE/UNCACHE TABLE` commands on v2 tables. ### How was this patch tested? Added/updated existing tests. Closes #30403 from imback82/cache_table. Authored-by: Terry Kim <yuminkim@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2020-11-30 05:37:10 +00:00
Yuming Wang	f93d4395b2	[SPARK-33589][SQL] Close opened session if the initialization fails ### What changes were proposed in this pull request? This pr add try catch when opening session. ### Why are the changes needed? Close opened session if the initialization fails. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manual test. Before this pr: ``` [rootspark-3267648 spark]# bin/beeline -u jdbc:hive2://localhost:10000/db_not_exist NOTE: SPARK_PREPEND_CLASSES is set, placing locally compiled Spark classes ahead of assembly. Connecting to jdbc:hive2://localhost:10000/db_not_exist log4j:WARN No appenders could be found for logger (org.apache.hive.jdbc.Utils). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. Error: Could not open client transport with JDBC Uri: jdbc:hive2://localhost:10000/db_not_exist: Database 'db_not_exist' not found; (state=08S01,code=0) Beeline version 2.3.7 by Apache Hive beeline> ``` ![image](https://user-images.githubusercontent.com/5399861/100560975-73ba5d80-32f2-11eb-8f92-b2509e7a121f.png) After this pr: ``` [rootspark-3267648 spark]# bin/beeline -u jdbc:hive2://localhost:10000/db_not_exist NOTE: SPARK_PREPEND_CLASSES is set, placing locally compiled Spark classes ahead of assembly. log4j:WARN No appenders could be found for logger (org.apache.hadoop.util.Shell). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. Connecting to jdbc:hive2://localhost:10000/db_not_exist Error: Could not open client transport with JDBC Uri: jdbc:hive2://localhost:10000/db_not_exist: Failed to open new session: org.apache.spark.sql.catalyst.analysis.NoSuchDatabaseException: Database 'db_not_exist' not found; (state=08S01,code=0) Beeline version 2.3.7 by Apache Hive beeline> ``` ![image](https://user-images.githubusercontent.com/5399861/100560917-479edc80-32f2-11eb-986f-7a997f1163fc.png) Closes #30536 from wangyum/SPARK-33589. Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-11-30 11:21:02 +09:00
Yuming Wang	1de3fc4282	[SPARK-33525][SQL] Update hive-service-rpc to 3.1.2 ### What changes were proposed in this pull request? We supported Hive metastore are 0.12.0 through 3.1.2, but we supported hive-jdbc are 0.12.0 through 2.3.7. It will throw `TProtocolException` if we use hive-jdbc 3.x: ``` [rootspark-3267648 apache-hive-3.1.2-bin]# bin/beeline -u jdbc:hive2://localhost:10000/default Connecting to jdbc:hive2://localhost:10000/default Connected to: Spark SQL (version 3.1.0-SNAPSHOT) Driver: Hive JDBC (version 3.1.2) Transaction isolation: TRANSACTION_REPEATABLE_READ Beeline version 3.1.2 by Apache Hive 0: jdbc:hive2://localhost:10000/default> create table t1(id int) using parquet; Unexpected end of file when reading from HS2 server. The root cause might be too many concurrent connections. Please ask the administrator to check the number of active connections, and adjust hive.server2.thrift.max.worker.threads if applicable. Error: org.apache.thrift.transport.TTransportException (state=08S01,code=0) ``` ``` org.apache.thrift.protocol.TProtocolException: Missing version in readMessageBegin, old client? at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:234) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:27) at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:53) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:310) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) at java.base/java.lang.Thread.run(Thread.java:832) ``` This pr upgrade hive-service-rpc to 3.1.2 to fix this issue. ### Why are the changes needed? To support hive-jdbc 3.x. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manual test: ``` [rootspark-3267648 apache-hive-3.1.2-bin]# bin/beeline -u jdbc:hive2://localhost:10000/default Connecting to jdbc:hive2://localhost:10000/default Connected to: Spark SQL (version 3.1.0-SNAPSHOT) Driver: Hive JDBC (version 3.1.2) Transaction isolation: TRANSACTION_REPEATABLE_READ Beeline version 3.1.2 by Apache Hive 0: jdbc:hive2://localhost:10000/default> create table t1(id int) using parquet; +---------+ \| Result \| +---------+ +---------+ No rows selected (1.051 seconds) 0: jdbc:hive2://localhost:10000/default> insert into t1 values(1); +---------+ \| Result \| +---------+ +---------+ No rows selected (2.08 seconds) 0: jdbc:hive2://localhost:10000/default> select * from t1; +-----+ \| id \| +-----+ \| 1 \| +-----+ 1 row selected (0.605 seconds) ``` Closes #30478 from wangyum/SPARK-33525. Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-11-25 12:37:59 -08:00
Gustavo Martin Morcuende	517b810dfa	[SPARK-33463][SQL] Keep Job Id during incremental collect in Spark Thrift Server ### What changes were proposed in this pull request? When enabling spark.sql.thriftServer.incrementalCollect Job Ids get lost and tracing queries in Spark Thrift Server ends up being too complicated. ### Why are the changes needed? Because it will make easier tracing Spark Thrift Server queries. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? The current tests are enough. No need of more tests. Closes #30390 from gumartinm/master. Authored-by: Gustavo Martin Morcuende <gu.martinm@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-11-21 08:39:16 -08:00
yangjie01	e3058ba17c	[SPARK-33441][BUILD] Add unused-imports compilation check and remove all unused-imports ### What changes were proposed in this pull request? This pr add a new Scala compile arg to `pom.xml` to defense against new unused imports: - `-Ywarn-unused-import` for Scala 2.12 - `-Wconf:cat=unused-imports:e` for Scala 2.13 The other fIles change are remove all unused imports in Spark code ### Why are the changes needed? Cleanup code and add guarantee to defense against new unused imports ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass the Jenkins or GitHub Action Closes #30351 from LuciferYang/remove-imports-core-module. Authored-by: yangjie01 <yangjie01@baidu.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-11-19 14:20:39 +09:00
artiship	1ae6d64b5f	[SPARK-33358][SQL] Return code when command process failed Exit Spark SQL CLI processing loop if one of the commands (sub sql statement) process failed This is a regression at Apache Spark 3.0.0. ``` $ cat 1.sql select * from nonexistent_table; select 2; ``` Apache Spark 2.4.7 ``` spark-2.4.7-bin-hadoop2.7:$ bin/spark-sql -f 1.sql 20/11/15 16:14:38 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Error in query: Table or view not found: nonexistent_table; line 1 pos 14 ``` Apache Spark 3.0.1 ``` $ bin/spark-sql -f 1.sql Error in query: Table or view not found: nonexistent_table; line 1 pos 14; 'Project [] +- 'UnresolvedRelation [nonexistent_table] 2 Time taken: 2.786 seconds, Fetched 1 row(s) ``` Apache Hive 1.2.2* ``` apache-hive-1.2.2-bin:$ bin/hive -f 1.sql Logging initialized using configuration in jar:file:/Users/dongjoon/APACHE/hive-release/apache-hive-1.2.2-bin/lib/hive-common-1.2.2.jar!/hive-log4j.properties FAILED: SemanticException [Error 10001]: Line 1:14 Table not found 'nonexistent_table' ``` Yes. This is a fix of regression. Pass the UT. Closes #30263 from artiship/SPARK-33358. Authored-by: artiship <meilziner@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-11-15 16:57:12 -08:00
luluorta	156704ba0d	[SPARK-33432][SQL] SQL parser should use active SQLConf ### What changes were proposed in this pull request? This PR makes SQL parser using active SQLConf instead of the one in ctor-parameters. ### Why are the changes needed? In ANSI mode, schema string parsing should fail if the schema uses ANSI reserved keyword as attribute name: ```scala spark.conf.set("spark.sql.ansi.enabled", "true") spark.sql("""select from_json('{"time":"26/10/2015"}', 'time Timestamp', map('timestampFormat', 'dd/MM/yyyy'));""").show ``` output: > Cannot parse the data type: > no viable alternative at input 'time'(line 1, pos 0) > > == SQL == > time Timestamp > ^^^ But this query may accidentally succeed in certain cases cause the DataType parser sticks to the configs of the first created session in the current thread: ```scala DataType.fromDDL("time Timestamp") val newSpark = spark.newSession() newSpark.conf.set("spark.sql.ansi.enabled", "true") newSpark.sql("""select from_json('{"time":"26/10/2015"}', 'time Timestamp', map('timestampFormat', 'dd/MM/yyyy'));""").show ``` output: > +--------------------------------+ > \|from_json({"time":"26/10/2015"})\| > +--------------------------------+ > \| {2015-10-26 00:00...\| > +--------------------------------+ ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Newly and updated UTs Closes #30357 from luluorta/SPARK-33432. Authored-by: luluorta <luluorta@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-11-14 13:37:12 -08:00
artiship	34a9a77ab5	[SPARK-33396][SQL] Spark SQL CLI prints appliction id when process file ### What changes were proposed in this pull request? Modify SparkSQLCLIDriver.scala to move ahead calling the cli.printMasterAndAppId method before process file. ### Why are the changes needed? Even though in SPARK-25043 it has already brought in the printing application id feature. But the process file situation seems have not been included. This small change is to make spark-sql will also print out application id when process file. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? env ``` spark version: 3.0.1 os: centos 7 ``` /tmp/tmp.sql ```sql select 1; ``` submit command: ```sh export HADOOP_USER_NAME=my-hadoop-user bin/spark-sql \ --master yarn \ --deploy-mode client \ --queue my.queue.name \ --conf spark.driver.host=$(hostname -i) \ --conf spark.app.name=spark-test \ --name "spark-test" \ -f /tmp/tmp.sql ``` execution log: ```sh 20/11/09 23:18:39 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 20/11/09 23:18:40 WARN HiveConf: HiveConf of name hive.spark.client.rpc.server.address.use.ip does not exist 20/11/09 23:18:40 WARN HiveConf: HiveConf of name hive.spark.client.submit.timeout.interval does not exist 20/11/09 23:18:40 WARN HiveConf: HiveConf of name hive.enforce.bucketing does not exist 20/11/09 23:18:40 WARN HiveConf: HiveConf of name hive.server2.enable.impersonation does not exist 20/11/09 23:18:40 WARN HiveConf: HiveConf of name hive.run.timeout.seconds does not exist 20/11/09 23:18:40 WARN HiveConf: HiveConf of name hive.support.sql11.reserved.keywords does not exist 20/11/09 23:18:40 WARN DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded. 20/11/09 23:18:41 WARN SparkConf: Note that spark.local.dir will be overridden by the value set by the cluster manager (via SPARK_LOCAL_DIRS in mesos/standalone/kubernetes and LOCAL_DIRS in YARN). 20/11/09 23:18:42 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME. 20/11/09 23:18:52 WARN YarnSchedulerBackend$YarnSchedulerEndpoint: Attempted to request executors before the AM has registered! Spark master: yarn, Application Id: application_1567136266901_27355775 1 1 Time taken: 4.974 seconds, Fetched 1 row(s) ``` Closes #30301 from artiship/SPARK-33396. Authored-by: artiship <meilziner@gmail.com> Signed-off-by: Yuming Wang <yumwang@ebay.com>	2020-11-14 20:54:17 +08:00
xuewei.linxuewei	234711a328	Revert "[SPARK-33139][SQL] protect setActionSession and clearActiveSession" ### What changes were proposed in this pull request? In [SPARK-33139] we defined `setActionSession` and `clearActiveSession` as deprecated API, it turns out it is widely used, and after discussion, even if without this PR, it should work with unify view feature, it might only be a risk if user really abuse using these two API. So revert the PR is needed. [SPARK-33139] has two commit, include a follow up. Revert them both. ### Why are the changes needed? Revert. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Existing UT. Closes #30367 from leanken/leanken-revert-SPARK-33139. Authored-by: xuewei.linxuewei <xuewei.linxuewei@alibaba-inc.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2020-11-13 13:35:45 +00:00
yangjie01	02fd52cfbc	[SPARK-33352][CORE][SQL][SS][MLLIB][AVRO][K8S] Fix procedure-like declaration compilation warnings in Scala 2.13 ### What changes were proposed in this pull request? There are two similar compilation warnings about procedure-like declaration in Scala 2.13: ``` [WARNING] [Warn] /spark/core/src/main/scala/org/apache/spark/HeartbeatReceiver.scala:70: procedure syntax is deprecated for constructors: add `=`, as in method definition ``` and ``` [WARNING] [Warn] /spark/core/src/main/scala/org/apache/spark/storage/BlockManagerDecommissioner.scala:211: procedure syntax is deprecated: instead, add `: Unit =` to explicitly declare `run`'s return type ``` this pr is the first part to resolve SPARK-33352： - For constructors method definition add `=` to convert to function syntax - For without `return type` methods definition add `: Unit =` to convert to function syntax ### Why are the changes needed? Eliminate compilation warnings in Scala 2.13 and this change should be compatible with Scala 2.12 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass the Jenkins or GitHub Action Closes #30255 from LuciferYang/SPARK-29392-FOLLOWUP.1. Authored-by: yangjie01 <yangjie01@baidu.com> Signed-off-by: Sean Owen <srowen@gmail.com>	2020-11-08 12:51:48 -06:00
Kent Yao	82d500a05c	[SPARK-33193][SQL][TEST] Hive ThriftServer JDBC Database MetaData API Behavior Auditing ### What changes were proposed in this pull request? Add a test case to audit all JDBC metadata behaviors to check and prevent potential APIs silent changing from both the upstream hive-jdbc module or the Spark thrift server side. Forked from my kyuubi project here https://github.com/yaooqinn/kyuubi/blob/master/externals/kyuubi-spark-sql-engine/src/test/scala/org/apache/kyuubi/engine/spark/operation/SparkOperationSuite.scala ### Why are the changes needed? Make the SparkThriftServer safer to evolve. ### Does this PR introduce _any_ user-facing change? dev only ### How was this patch tested? new tests Closes #30101 from yaooqinn/SPARK-33193. Authored-by: Kent Yao <yaooqinn@hotmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2020-10-23 13:34:33 -07:00
Xuedong Luan	d9ee33cfb9	[SPARK-26533][SQL] Support query auto timeout cancel on thriftserver ### What changes were proposed in this pull request? Support query auto cancelling when running too long on thriftserver. This is the rework of #28991 and the credit should be the original author, leoluan2009. Closes #28991 ### Why are the changes needed? For some cases, we use thriftserver as long-running applications. Some times we want all the query need not to run more than given time. In these cases, we can enable auto cancel for time-consumed query.Which can let us release resources for other queries to run. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Added tests. Closes #29933 from maropu/pr28991. Lead-authored-by: Xuedong Luan <luanxuedong2009@gmail.com> Co-authored-by: Takeshi Yamamuro <yamamuro@apache.org> Co-authored-by: Luan <luanxuedong2009@gmail.com> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>	2020-10-22 17:23:10 +09:00
Kent Yao	2507301705	[SPARK-33159][SQL] Use hive-service-rpc as dependency instead of inlining the generated code ### What changes were proposed in this pull request? Hive's `hive-service-rpc` module started since hive-2.1.0 and it contains only the thrift IDL file and the code generated by it. Removing the inlined code will help maintain and upgrade builtin hive versions ### Why are the changes needed? to simply the code. ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? passing CI Closes #30055 from yaooqinn/SPARK-33159. Authored-by: Kent Yao <yaooqinn@hotmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-10-16 09:37:54 -07:00
xuewei.linxuewei	306872eefa	[SPARK-33139][SQL] protect setActionSession and clearActiveSession ### What changes were proposed in this pull request? This PR is a sub-task of [SPARK-33138](https://issues.apache.org/jira/browse/SPARK-33138). In order to make SQLConf.get reliable and stable, we need to make sure user can't pollute the SQLConf and SparkSession Context via calling setActiveSession and clearActiveSession. Change of the PR: * add legacy config spark.sql.legacy.allowModifyActiveSession to fallback to old behavior if user do need to call these two API. * by default, if user call these two API, it will throw exception * add extra two internal and private API setActiveSessionInternal and clearActiveSessionInternal for current internal usage * change all internal reference to new internal API except for SQLContext.setActive and SQLContext.clearActive ### Why are the changes needed? Make SQLConf.get reliable and stable. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? * Add UT in SparkSessionBuilderSuite to test the legacy config * Existing test Closes #30042 from leanken/leanken-SPARK-33139. Authored-by: xuewei.linxuewei <xuewei.linxuewei@alibaba-inc.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2020-10-16 06:05:17 +00:00
Denis Pyshev	ba69d68d91	[SPARK-33080][BUILD] Replace fatal warnings snippet ### What changes were proposed in this pull request? Current solution in build file to enable build failure on compilation warnings with exclusion of deprecation ones is not portable after SBT version 1.3.13 (build import fails with compilation error with SBT 1.4) and could be replaced with more robust and maintainable, especially since Scala 2.13.2 with similar built-in functionality. Additionally, warnings were fixed to pass the build, with as few changes as possible: warnings in 2.12 compilation fixed in code, warnings in 2.13 compilation covered by configuration to be addressed separately ### Why are the changes needed? Unblocks upgrade to SBT after 1.3.13. Enhances build file maintainability. Allows fine tune of warnings configuration in scope of Scala 2.13 compilation. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? `build/sbt`'s `compile` and `Test/compile` for both Scala 2.12 and 2.13 profiles. Closes #29995 from gemelen/feature/warnings-reporter. Authored-by: Denis Pyshev <git@gemelen.net> Signed-off-by: Sean Owen <srowen@gmail.com>	2020-10-15 14:49:43 -05:00
Prashant Sharma	304ca1ec93	[SPARK-33129][BUILD][DOCS] Updating the build/sbt references to test-only with testOnly for SBT 1.3.x ### What changes were proposed in this pull request? test-only - > testOnly in docs across the project. ### Why are the changes needed? Since the sbt version is updated, the older way or running i.e. `test-only` is no longer valid. ### Does this PR introduce _any_ user-facing change? docs update. ### How was this patch tested? Manually. Closes #30028 from ScrapCodes/fix-build/sbt-sample. Authored-by: Prashant Sharma <prashsh1@in.ibm.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2020-10-13 09:21:06 -07:00
Yuming Wang	5e170140b0	[SPARK-33107][SQL] Remove hive-2.3 workaround code ### What changes were proposed in this pull request? This pr remove `hive-2.3` workaround code. ### Why are the changes needed? Make code more clear and readable. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing unit tests. Closes #29996 from wangyum/SPARK-33107. Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2020-10-10 16:41:42 -07:00
Dongjoon Hyun	a127387a53	[SPARK-33082][SQL] Remove hive-1.2 workaround code ### What changes were proposed in this pull request? This PR removes old Hive-1.2 profile related workaround code. ### Why are the changes needed? To simply the code. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CI. Closes #29961 from dongjoon-hyun/SPARK-HIVE12. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2020-10-07 12:27:23 -07:00
Kent Yao	17d309dfac	[SPARK-32963][SQL] empty string should be consistent for schema name in SparkGetSchemasOperation ### What changes were proposed in this pull request? This PR makes the empty string for schema name pattern match the global temp view as same as it works for other databases. This PR also add new tests to covering different kinds of wildcards to verify the SparkGetSchemasOperation ### Why are the changes needed? When the schema name is empty string, it is considered as "." and can match all databases in the catalog. But when it can not match the global temp view as it is not converted to "." ### Does this PR introduce _any_ user-facing change? yes , JDBC operation like `statement.getConnection.getMetaData..getSchemas(null, "")` now also provides the global temp view in the result set. ### How was this patch tested? new tests Closes #29834 from yaooqinn/SPARK-32963. Authored-by: Kent Yao <yaooqinn@hotmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2020-10-06 16:01:10 +00:00
Dongjoon Hyun	008a2ad1f8	[SPARK-20202][BUILD][SQL] Remove references to org.spark-project.hive (Hive 1.2.1) ### What changes were proposed in this pull request? As of today, - SPARK-30034 Apache Spark 3.0.0 switched its default Hive execution engine from Hive 1.2 to Hive 2.3. This removes the direct dependency to the forked Hive 1.2.1 in maven repository. - SPARK-32981 Apache Spark 3.1.0(`master` branch) removed Hive 1.2 related artifacts from Apache Spark binary distributions. This PR(SPARK-20202) aims to remove the following usage of unofficial Apache Hive fork completely from Apache Spark master for Apache Spark 3.1.0. ``` <hive.group>org.spark-project.hive</hive.group> <hive.version>1.2.1.spark2</hive.version> ``` For the forked Hive 1.2.1.spark2 users, Apache Spark 2.4(LTS) and 3.0 (~ 2021.12) will provide it. ### Why are the changes needed? - First, Apache Spark community should not use the unofficial forked release of another Apache project. - Second, Apache Hive 1.2.1 was released at 2015-06-26 and the forked Hive `1.2.1.spark2` exposed many unfixable bugs in Apache because the forked `1.2.1.spark2` is not maintained at all. Apache Hive 2.3.0 was released at 2017-07-19 and it has been used with less number of bugs compared with `1.2.1.spark2`. Many bugs still exist in `hive-1.2` profile and new Apache Spark unit tests are added with `HiveUtils.isHive23` condition so far. ### Does this PR introduce _any_ user-facing change? No. This is a dev-only change. PRBuilder will not accept `[test-hive1.2]` on master and `branch-3.1`. ### How was this patch tested? 1. SBT/Hadoop 3.2/Hive 2.3 (https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129366) 2. SBT/Hadoop 2.7/Hive 2.3 (https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129382) 3. SBT/Hadoop 3.2/Hive 1.2 (This has not been supported already due to Hive 1.2 doesn't work with Hadoop 3.2.) 4. SBT/Hadoop 2.7/Hive 1.2 (https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129383, This is rejected) Closes #29936 from dongjoon-hyun/SPARK-REMOVE-HIVE1. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2020-10-05 15:29:56 -07:00
Kent Yao	e2a740147c	[SPARK-32874][SQL][FOLLOWUP][TEST-HIVE1.2][TEST-HADOOP2.7] Fix spark-master-test-sbt-hadoop-2.7-hive-1.2 ### What changes were proposed in this pull request? Found via discussion https://github.com/apache/spark/pull/29746#issuecomment-694726504 and the root cause it that hive-1.2 does not recognize NULL ```scala sbt.ForkMain$ForkError: java.sql.SQLException: Unrecognized column type: NULL at org.apache.hive.jdbc.JdbcColumn.typeStringToHiveType(JdbcColumn.java:160) at org.apache.hive.jdbc.HiveResultSetMetaData.getHiveType(HiveResultSetMetaData.java:48) at org.apache.hive.jdbc.HiveResultSetMetaData.getPrecision(HiveResultSetMetaData.java:86) at org.apache.spark.sql.hive.thriftserver.SparkThriftServerProtocolVersionsSuite.$anonfun$new$35(SparkThriftServerProtocolVersionsSuite.scala:358) at org.apache.spark.sql.hive.thriftserver.SparkThriftServerProtocolVersionsSuite.$anonfun$new$35$adapted(SparkThriftServerProtocolVersionsSuite.scala:351) at org.apache.spark.sql.hive.thriftserver.SparkThriftServerProtocolVersionsSuite.testExecuteStatementWithProtocolVersion(SparkThriftServerProtocolVersionsSuite.scala:66) at org.apache.spark.sql.hive.thriftserver.SparkThriftServerProtocolVersionsSuite.$anonfun$new$34(SparkThriftServerProtocolVersionsSuite.scala:351) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) at org.scalatest.Transformer.apply(Transformer.scala:22) at org.scalatest.Transformer.apply(Transformer.scala:20) at org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:189) at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:176) at org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:187) at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:199) at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) at org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:199) at org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:181) at org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:61) at org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:234) at org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:227) at org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:61) at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:232) at org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413) at scala.collection.immutable.List.foreach(List.scala:392) at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401) at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:396) at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:475) at org.scalatest.funsuite.AnyFunSuiteLike.runTests(AnyFunSuiteLike.scala:232) at org.scalatest.funsuite.AnyFunSuiteLike.runTests$(AnyFunSuiteLike.scala:231) at org.scalatest.funsuite.AnyFunSuite.runTests(AnyFunSuite.scala:1562) at org.scalatest.Suite.run(Suite.scala:1112) at org.scalatest.Suite.run$(Suite.scala:1094) at org.scalatest.funsuite.AnyFunSuite.org$scalatest$funsuite$AnyFunSuiteLike$$super$run(AnyFunSuite.scala:1562) at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$run$1(AnyFunSuiteLike.scala:236) at org.scalatest.SuperEngine.runImpl(Engine.scala:535) at org.scalatest.funsuite.AnyFunSuiteLike.run(AnyFunSuiteLike.scala:236) at org.scalatest.funsuite.AnyFunSuiteLike.run$(AnyFunSuiteLike.scala:235) at org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:61) at org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:213) at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210) at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208) at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:61) at org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:318) at org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:513) at sbt.ForkMain$Run$2.call(ForkMain.java:296) at sbt.ForkMain$Run$2.call(ForkMain.java:286) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) ``` In this PR, we simply ignore these checks for hive 1.2 ### Why are the changes needed? fix jenkins ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? test itself. Closes #29803 from yaooqinn/SPARK-32874-F. Authored-by: Kent Yao <yaooqinn@hotmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2020-09-18 11:55:27 +00:00
yangjie01	5817c584b8	[SPARK-32909][SQL] Pass all `sql/hive-thriftserver` module UTs in Scala 2.13 ### What changes were proposed in this pull request? This pr fix failed and aborted cases in sql hive-thriftserver module in Scala 2.13, the main change of this pr as follow: - Use `s.c.Seq` instead of `Seq` in `HiveResult` because the input type maybe `mutable.ArraySeq`, but `Seq` represent `immutable.Seq` in Scala 2.13. - Reset classLoader after `HiveMetastoreLazyInitializationSuite` completed because context class loader is `NonClosableMutableURLClassLoader` in `HiveMetastoreLazyInitializationSuite` running process, and it propagate to `HiveThriftServer2ListenerSuite` trigger following problems in Scala 2.13: ``` HiveThriftServer2ListenerSuite: * RUN ABORTED * java.lang.LinkageError: loader constraint violation: loader (instance of net/bytebuddy/dynamic/loading/MultipleParentClassLoader) previously initiated loading for a different type with name "org/apache/hive/service/ServiceStateChangeListener" at org.mockito.codegen.HiveThriftServer2$MockitoMock$1850222569.<clinit>(Unknown Source) at sun.reflect.GeneratedSerializationConstructorAccessor530.newInstance(Unknown Source) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.objenesis.instantiator.sun.SunReflectionFactoryInstantiator.newInstance(SunReflectionFactoryInstantiator.java:48) at org.objenesis.ObjenesisBase.newInstance(ObjenesisBase.java:73) at org.mockito.internal.creation.instance.ObjenesisInstantiator.newInstance(ObjenesisInstantiator.java:19) at org.mockito.internal.creation.bytebuddy.SubclassByteBuddyMockMaker.createMock(SubclassByteBuddyMockMaker.java:47) at org.mockito.internal.creation.bytebuddy.ByteBuddyMockMaker.createMock(ByteBuddyMockMaker.java:25) at org.mockito.internal.util.MockUtil.createMock(MockUtil.java:35) at org.mockito.internal.MockitoCore.mock(MockitoCore.java:63) ... ``` After this pr `HiveThriftServer2Suites` and `HiveThriftServer2ListenerSuite` was fixed and all 461 test passed ### Why are the changes needed? We need to support a Scala 2.13 build. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - Scala 2.12: Pass the Jenkins or GitHub Action - Scala 2.13: All tests passed. Do the following: ``` dev/change-scala-version.sh 2.13 mvn clean install -DskipTests -pl sql/hive-thriftserver -am -Phive-thriftserver -Pscala-2.13 mvn test -pl sql/hive-thriftserver -Phive -Phive-thriftserver -Pscala-2.13 ``` Before ``` HiveThriftServer2ListenerSuite: * RUN ABORTED * ``` After ``` Tests: succeeded 461, failed 0, canceled 0, ignored 17, pending 0 All tests passed. ``` Closes #29783 from LuciferYang/sql-thriftserver-tests. Authored-by: yangjie01 <yangjie01@baidu.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2020-09-17 14:35:01 -07:00
Kent Yao	316242b768	[SPARK-32874][SQL][TEST] Enhance result set meta data check for execute statement operation with thrift server ### What changes were proposed in this pull request? This PR adds test cases for the result set metadata checking for Spark's `ExecuteStatementOperation` to make the JDBC API more future-proofing because any server-side change may affect the client compatibility. ### Why are the changes needed? add test to prevent potential silent behavior change for JDBC users. ### Does this PR introduce _any_ user-facing change? NO, test only ### How was this patch tested? add new test Closes #29746 from yaooqinn/SPARK-32874. Authored-by: Kent Yao <yaooqinn@hotmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2020-09-15 13:07:03 +00:00
HyukjinKwon	4a096131ee	Revert "[SPARK-32772][SQL][FOLLOWUP] Remove legacy silent support mode for spark-sql CLI" This reverts commit `f1f7ae420e`.	2020-09-10 14:23:10 +09:00
Kent Yao	9ab8a2c36d	[SPARK-32826][SQL] Set the right column size for the null type in SparkGetColumnsOperation ### What changes were proposed in this pull request? In Spark 3.0.0, the SparkGetColumnsOperation can not recognize NULL columns but now we can because the side effect of https://issues.apache.org/jira/browse/SPARK-32696 / `f14f3742e0`, but the test coverage for this change was not added. In Spark, the column size for null fields should be 1, in this PR, we set the right column size for the null type. ### Why are the changes needed? test coverage and fix the client-side information about the null type through jdbc ### Does this PR introduce _any_ user-facing change? NO ### How was this patch tested? added ut both for this pr and SPARK-32696 Closes #29687 from yaooqinn/SPARK-32826. Authored-by: Kent Yao <yaooqinn@hotmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2020-09-10 04:53:38 +00:00
Yuming Wang	f1f7ae420e	[SPARK-32772][SQL][FOLLOWUP] Remove legacy silent support mode for spark-sql CLI ### What changes were proposed in this pull request? Remove legacy silent support mode for spark-sql CLI. ### Why are the changes needed? https://github.com/apache/spark/pull/29619 add new silent mode. We can remove legacy silent support mode. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manual test: ``` spark-sql> LM-SHC-16508156:spark yumwang$ bin/spark-sql -S NOTE: SPARK_PREPEND_CLASSES is set, placing locally compiled Spark classes ahead of assembly. 20/09/03 09:06:12 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 20/09/03 09:06:16 WARN HiveConf: HiveConf of name hive.stats.jdbc.timeout does not exist 20/09/03 09:06:16 WARN HiveConf: HiveConf of name hive.stats.retries.wait does not exist 20/09/03 09:06:19 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 2.3.0 20/09/03 09:06:19 WARN ObjectStore: setMetaStoreSchemaVersion called but recording version is disabled: version = 2.3.0, comment = Set by MetaStore yumwang10.226.196.190 spark-sql> select * from test1; 1 spark-sql> select * from test1; 1 ``` Closes #29631 from wangyum/SPARK-32772. Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: Yuming Wang <yumwang@ebay.com>	2020-09-04 08:38:35 +08:00
Kousuke Saruta	ad6b887541	[SPARK-32772][SQL] Reduce log messages for spark-sql CLI ### What changes were proposed in this pull request? This PR reduces log messages for spark-sql CLI like spark-shell and pyspark CLI. ### Why are the changes needed? When we launch spark-sql CLI, too many log messages are shown and it's sometimes difficult to find the result of query. ``` spark-sql> SELECT now(); 20/09/02 00:11:45 INFO CodeGenerator: Code generated in 10.121625 ms 20/09/02 00:11:45 INFO SparkContext: Starting job: main at NativeMethodAccessorImpl.java:0 20/09/02 00:11:45 INFO DAGScheduler: Got job 0 (main at NativeMethodAccessorImpl.java:0) with 1 output partitions 20/09/02 00:11:45 INFO DAGScheduler: Final stage: ResultStage 0 (main at NativeMethodAccessorImpl.java:0) 20/09/02 00:11:45 INFO DAGScheduler: Parents of final stage: List() 20/09/02 00:11:45 INFO DAGScheduler: Missing parents: List() 20/09/02 00:11:45 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[2] at main at NativeMethodAccessorImpl.java:0), which has no missing parents 20/09/02 00:11:45 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 6.3 KiB, free 366.3 MiB) 20/09/02 00:11:45 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 3.2 KiB, free 366.3 MiB) 20/09/02 00:11:45 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.1.204:42615 (size: 3.2 KiB, free: 366.3 MiB) 20/09/02 00:11:45 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1348 20/09/02 00:11:45 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[2] at main at NativeMethodAccessorImpl.java:0) (first 15 tasks are for partitions Vector(0)) 20/09/02 00:11:45 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks resource profile 0 20/09/02 00:11:45 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0) (192.168.1.204, executor driver, partition 0, PROCESS_LOCAL, 7561 bytes) taskResourceAssignments Map() 20/09/02 00:11:45 INFO Executor: Running task 0.0 in stage 0.0 (TID 0) 20/09/02 00:11:45 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 1446 bytes result sent to driver 20/09/02 00:11:45 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 238 ms on 192.168.1.204 (executor driver) (1/1) 20/09/02 00:11:45 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 20/09/02 00:11:45 INFO DAGScheduler: ResultStage 0 (main at NativeMethodAccessorImpl.java:0) finished in 0.343 s 20/09/02 00:11:45 INFO DAGScheduler: Job 0 is finished. Cancelling potential speculative or zombie tasks for this job 20/09/02 00:11:45 INFO TaskSchedulerImpl: Killing all running tasks in stage 0: Stage finished 20/09/02 00:11:45 INFO DAGScheduler: Job 0 finished: main at NativeMethodAccessorImpl.java:0, took 0.377489 s 2020-09-02 00:11:45.07 Time taken: 0.704 seconds, Fetched 1 row(s) 20/09/02 00:11:45 INFO SparkSQLCLIDriver: Time taken: 0.704 seconds, Fetched 1 row(s) ``` ### Does this PR introduce _any_ user-facing change? Yes. Log messages are reduced for spark-sql CLI like as follows. ``` 20/09/02 00:34:51 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 20/09/02 00:34:53 WARN HiveConf: HiveConf of name hive.stats.jdbc.timeout does not exist 20/09/02 00:34:53 WARN HiveConf: HiveConf of name hive.stats.retries.wait does not exist 20/09/02 00:34:55 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 2.3.0 20/09/02 00:34:55 WARN ObjectStore: setMetaStoreSchemaVersion called but recording version is disabled: version = 2.3.0, comment = Set by MetaStore kou192.168.1.204 Spark master: local[*], Application Id: local-1598974492822 spark-sql> SELECT now(); 2020-09-02 00:35:05.258 Time taken: 2.299 seconds, Fetched 1 row(s) ``` ### How was this patch tested? Launched spark-sql CLI and confirmed that log messages are reduced as I paste above. Closes #29619 from sarutak/suppress-log-for-spark-sql. Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-09-02 13:31:06 -07:00
Ali Smesseim	3cde392b69	[SPARK-31831][SQL][FOLLOWUP] Make the GetCatalogsOperationMock for HiveSessionImplSuite compile with the proper Hive version ### What changes were proposed in this pull request? #29129 duplicated GetCatalogsOperationMock in the hive-version-specific subdirectories, otherwise profile hive-1.2 would not compile. We can prevent duplication of this class by shimming the required hive-version-specific types. ### Why are the changes needed? This is a cleanup to avoid duplication of a mock class. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? This patch only changes tests. Closes #29549 from alismess-db/get-catalogs-operation-mock-use-shim. Authored-by: Ali Smesseim <ali.smesseim@databricks.com> Signed-off-by: Yuming Wang <yumwang@ebay.com>	2020-09-02 20:23:57 +08:00
angerszhu	55ce49ed28	[SPARK-32400][SQL][TEST][FOLLOWUP][TEST-MAVEN] Fix resource loading error in HiveScripTransformationSuite ### What changes were proposed in this pull request? #29401 move `test_script.py` from sql/hive module to sql/core module, cause HiveScripTransformationSuite load resource issue. ### Why are the changes needed? This issue cause jenkins test failed in mvn spark-master-test-maven-hadoop-2.7-hive-2.3-jdk-11: https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-2.7-hive-2.3-jdk-11/ spark-master-test-maven-hadoop-3.2-hive-2.3-jdk-11: https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-3.2-hive-2.3-jdk-11/ spark-master-test-maven-hadoop-3.2-hive-2.3: https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-3.2-hive-2.3/ ![image](https://user-images.githubusercontent.com/46485123/91681585-71285a80-eb81-11ea-8519-99fc9783d6b9.png) ![image](https://user-images.githubusercontent.com/46485123/91681010-aaf86180-eb7f-11ea-8dbb-61365a3b0ab4.png) Error as below: ``` Exception thrown while executing Spark plan: HiveScriptTransformation [a#349299, b#349300, c#349301, d#349302, e#349303], python /home/jenkins/workspace/spark-master-test-maven-hadoop-2.7-hive-2.3-jdk-11/sql/hive/file:/home/jenkins/workspace/spark-master-test-maven-hadoop-2.7-hive-2.3-jdk-11/sql/core/target/spark-sql_2.12-3.1.0-SNAPSHOT-tests.jar!/test_script.py, [a#349309, b#349310, c#349311, d#349312, e#349313], ScriptTransformationIOSchema(List(),List(),Some(org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe),Some(org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe),List((field.delim, )),List((field.delim, )),Some(org.apache.hadoop.hive.ql.exec.TextRecordReader),Some(org.apache.hadoop.hive.ql.exec.TextRecordWriter),false) +- Project [_1#349288 AS a#349299, _2#349289 AS b#349300, _3#349290 AS c#349301, _4#349291 AS d#349302, _5#349292 AS e#349303] +- LocalTableScan [_1#349288, _2#349289, _3#349290, _4#349291, _5#349292] == Exception == org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 18021.0 failed 1 times, most recent failure: Lost task 0.0 in stage 18021.0 (TID 37324) (192.168.10.31 executor driver): org.apache.spark.SparkException: Subprocess exited with status 2. Error: python: can't open file '/home/jenkins/workspace/spark-master-test-maven-hadoop-2.7-hive-2.3-jdk-11/sql/hive/file:/home/jenkins/workspace/spark-master-test-maven-hadoop-2.7-hive-2.3-jdk-11/sql/core/target/spark-sql_2.12-3.1.0-SNAPSHOT-tests.jar!/test_script.py': [Errno 2] No such file or directory at org.apache.spark.sql.execution.BaseScriptTransformationExec.checkFailureAndPropagate(BaseScriptTransformationExec.scala:180) at org.apache.spark.sql.execution.BaseScriptTransformationExec.checkFailureAndPropagate$(BaseScriptTransformationExec.scala:157) at org.apache.spark.sql.hive.execution.HiveScriptTransformationExec.checkFailureAndPropagate(HiveScriptTransformationExec.scala:49) at org.apache.spark.sql.hive.execution.HiveScriptTransformationExec$$anon$1.hasNext(HiveScriptTransformationExec.scala:110) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458) at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:340) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:898) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:898) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) at org.apache.spark.rdd.RDD.iterator(RDD.scala:337) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.run(Task.scala:127) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:480) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1426) at o ``` ### Does this PR introduce _any_ user-facing change? NO ### How was this patch tested? Existed UT Closes #29588 from AngersZhuuuu/SPARK-32400-FOLLOWUP. Authored-by: angerszhu <angers.zhu@gmail.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-09-02 18:27:29 +09:00
Kent Yao	6dacba7fa0	[SPARK-32733][SQL] Add extended information - arguments/examples/since/notes of expressions to the remarks field of GetFunctionsOperation ### What changes were proposed in this pull request? This PR adds extended information of a function including arguments, examples, notes and the since field to the SparkGetFunctionOperation ### Why are the changes needed? better user experience, it will help JDBC users to have a better understanding of our builtin functions ### Does this PR introduce _any_ user-facing change? Yes, BI tools and JDBC users will get full information on a spark function instead of only fragmentary usage info. e.g. date_part #### before ``` date_part(field, source) - Extracts a part of the date/timestamp or interval source. ``` #### after ``` Usage: date_part(field, source) - Extracts a part of the date/timestamp or interval source. Arguments: * field - selects which part of the source should be extracted, and supported string values are as same as the fields of the equivalent function `EXTRACT`. * source - a date/timestamp or interval column from where `field` should be extracted Examples: > SELECT date_part('YEAR', TIMESTAMP '2019-08-12 01:00:00.123456'); 2019 > SELECT date_part('week', timestamp'2019-08-12 01:00:00.123456'); 33 > SELECT date_part('doy', DATE'2019-08-12'); 224 > SELECT date_part('SECONDS', timestamp'2019-10-01 00:00:01.000001'); 1.000001 > SELECT date_part('days', interval 1 year 10 months 5 days); 5 > SELECT date_part('seconds', interval 5 hours 30 seconds 1 milliseconds 1 microseconds); 30.001001 Note: The date_part function is equivalent to the SQL-standard function `EXTRACT(field FROM source)` Since: 3.0.0 ``` ### How was this patch tested? New tests Closes #29577 from yaooqinn/SPARK-32733. Authored-by: Kent Yao <yaooqinn@hotmail.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-08-31 11:03:01 +09:00
Kent Yao	f14f3742e0	[SPARK-32696][SQL][TEST-HIVE1.2][TEST-HADOOP2.7] Get columns operation should handle interval column properly ### What changes were proposed in this pull request? This PR let JDBC clients identify spark interval columns properly. ### Why are the changes needed? JDBC users can query interval values through thrift server, create views with interval columns, e.g. ```sql CREATE global temp view view1 as select interval 1 day as i; ``` but when they want to get the details of the columns of view1, the will fail with `Unrecognized type name: INTERVAL` ``` Caused by: java.lang.IllegalArgumentException: Unrecognized type name: INTERVAL at org.apache.hadoop.hive.serde2.thrift.Type.getType(Type.java:170) at org.apache.spark.sql.hive.thriftserver.ThriftserverShimUtils$.toJavaSQLType(ThriftserverShimUtils.scala:53) at org.apache.spark.sql.hive.thriftserver.SparkGetColumnsOperation.$anonfun$addToRowSet$1(SparkGetColumnsOperation.scala:157) at scala.collection.Iterator.foreach(Iterator.scala:941) at scala.collection.Iterator.foreach$(Iterator.scala:941) at scala.collection.AbstractIterator.foreach(Iterator.scala:1429) at scala.collection.IterableLike.foreach(IterableLike.scala:74) at scala.collection.IterableLike.foreach$(IterableLike.scala:73) at org.apache.spark.sql.types.StructType.foreach(StructType.scala:102) at org.apache.spark.sql.hive.thriftserver.SparkGetColumnsOperation.addToRowSet(SparkGetColumnsOperation.scala:149) at org.apache.spark.sql.hive.thriftserver.SparkGetColumnsOperation.$anonfun$runInternal$6(SparkGetColumnsOperation.scala:113) at org.apache.spark.sql.hive.thriftserver.SparkGetColumnsOperation.$anonfun$runInternal$6$adapted(SparkGetColumnsOperation.scala:112) at scala.Option.foreach(Option.scala:407) at org.apache.spark.sql.hive.thriftserver.SparkGetColumnsOperation.$anonfun$runInternal$5(SparkGetColumnsOperation.scala:112) at org.apache.spark.sql.hive.thriftserver.SparkGetColumnsOperation.$anonfun$runInternal$5$adapted(SparkGetColumnsOperation.scala:111) at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at org.apache.spark.sql.hive.thriftserver.SparkGetColumnsOperation.runInternal(SparkGetColumnsOperation.scala:111) ... 34 more ``` ### Does this PR introduce _any_ user-facing change? YES, #### before ![image](https://user-images.githubusercontent.com/8326978/91162239-6cd1ec80-e6fe-11ea-8c2c-914ddb325c4e.png) #### after ![image](https://user-images.githubusercontent.com/8326978/91162025-1a90cb80-e6fe-11ea-94c4-03a6f2ec296b.png) ### How was this patch tested? new tests Closes #29539 from yaooqinn/SPARK-32696. Authored-by: Kent Yao <yaooqinn@hotmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2020-08-27 06:52:34 +00:00
Kent Yao	c26a97637f	Revert "[SPARK-32412][SQL] Unify error handling for spark thrift serv… …er operations" ### What changes were proposed in this pull request? This reverts commit `510a1656e6`. ### Why are the changes needed? see https://github.com/apache/spark/pull/29204#discussion_r475716547 ### Does this PR introduce _any_ user-facing change? NO ### How was this patch tested? pass ci tools Closes #29531 from yaooqinn/revert. Authored-by: Kent Yao <yaooqinn@hotmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2020-08-25 05:57:14 +00:00
Kent Yao	1b6f482adb	[SPARK-32492][SQL][FOLLOWUP][TEST-MAVEN] Fix jenkins maven jobs ### What changes were proposed in this pull request? The newly added test fails Jenkins maven jobs, see https://github.com/apache/spark/pull/29303#discussion_r464729021 We move the test from `ThriftServerWithSparkContextSuite` to `SparkMetadataOperationSuite`, the former uses an embedded thrift server where the server and the client are in the same JVM process and the latter forks a new process to start the server where the server and client are isolated. The sbt runner seems to be fine with the test in the `ThriftServerWithSparkContextSuite`, but the maven runner with `scalates`t plugin will face the classloader issue as we will switch classloader to the one in the `sharedState` which is not the one that hive uses to load some classes. This is more like an issue that belongs to the maven runner or the `scalatest`. So in this PR, we simply move it to bypass the issue. BTW, we should test against the way of using embedded thrift server to verify whether it is just a maven issue or not, there could be some use cases with this API. ### Why are the changes needed? Jenkins recovery ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? modified uts Closes #29347 from yaooqinn/SPARK-32492-F. Authored-by: Kent Yao <yaooqinn@hotmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2020-08-05 13:56:11 +00:00
Kent Yao	7f5326c082	[SPARK-32492][SQL] Fulfill missing column meta information COLUMN_SIZE /DECIMAL_DIGITS/NUM_PREC_RADIX/ORDINAL_POSITION for thriftserver client tools ### What changes were proposed in this pull request? This PR fulfills some missing fields for SparkGetColumnsOperation including COLUMN_SIZE /DECIMAL_DIGITS/NUM_PREC_RADIX/ORDINAL_POSITION and improve the test coverage. ### Why are the changes needed? make jdbc tools happier ### Does this PR introduce _any_ user-facing change? yes, #### before ![image](https://user-images.githubusercontent.com/8326978/88911764-e78b2180-d290-11ea-8abb-96f137f9c3c4.png) #### after ![image](https://user-images.githubusercontent.com/8326978/88911709-d04c3400-d290-11ea-90ab-02bda3e628e9.png) ![image](https://user-images.githubusercontent.com/8326978/88912007-39cc4280-d291-11ea-96d6-1ef3abbbddec.png) ### How was this patch tested? add unit tests Closes #29303 from yaooqinn/SPARK-32492. Authored-by: Kent Yao <yaooqinn@hotmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2020-08-03 12:49:34 +00:00
Kent Yao	510a1656e6	[SPARK-32412][SQL] Unify error handling for spark thrift server operations ### What changes were proposed in this pull request? Log error/warn message only once at the server-side for both sync and async modes ### Why are the changes needed? In `b151194299` we make the error logging for SparkExecuteStatementOperation with `runInBackground=true` not duplicated, but the operations with runInBackground=false and other metadata operation still will be log twice which happened in the operation's `runInternal` method and ThriftCLIService. In this PR, I propose to reflect the logic to get a unified error handling approach. ### Does this PR introduce _any_ user-facing change? Yes, when spark.sql.hive.thriftServer.async=false and people call sync APIs the error message will be logged only once at server-side. ### How was this patch tested? locally verified the result in target/unit-test.log add unit tests. Closes #29204 from yaooqinn/SPARK-32412. Authored-by: Kent Yao <yaooqinn@hotmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2020-07-30 06:44:13 +00:00
sychen	be9f03dc71	[SPARK-32426][SQL] ui shows sql after variable substitution ### What changes were proposed in this pull request? When submitting sql with variables, the sql displayed by ui is not replaced by variables. ### Why are the changes needed? See the final executed sql in ui ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? manual test Closes #29221 from cxzl25/SPARK-32426. Authored-by: sychen <sychen@ctrip.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-07-25 03:30:01 -07:00

1 2 3 4 5 ...

516 commits