spark-instrumented-optimizer/sql
Kent Yao 202529ef23 [SPARK-21449][SPARK-23745][SQL] add ShutdownHook to cloes HiveClient's SessionState to delete residual dirs
### What changes were proposed in this pull request?

We initialized a Hive `SessionState` to interact with the external hive metastore server but left it behind after we finished.

We should close the metastore client explicitly in case of connection leaks with HMS
and we should trigger the `SessionState` to close itself to clean the residual dirs to fix issues reported by SPARK-21449 and SPARK-23745.

`hive.downloaded.resources.dir` contains transient files, such as UDF jars, it will not be used anymore after spark applications exit.

### Why are the changes needed?

1. prevent potential metastore client leak

2. clean `hive.downloaded.resources.dir`

```
    DOWNLOADED_RESOURCES_DIR("hive.downloaded.resources.dir", "${system:java.io.tmpdir}" + File.separator + "${hive.session.id}_resources", "Temporary local directory for added resources in the remote file system."),

```
### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

passing jenkins and verify locally

Closes #31833 from yaooqinn/SPARK-21449-2.

Authored-by: Kent Yao <yao@apache.org>
Signed-off-by: Yuming Wang <yumwang@ebay.com>
2021-03-16 10:37:40 +08:00
..
catalyst [MINOR][SQL] Remove unused variable in NewInstance.constructor 2021-03-15 18:49:54 -07:00
core [SPARK-34739][SQL] Support add/subtract of a year-month interval to/from a timestamp 2021-03-15 14:36:12 +03:00
hive [SPARK-21449][SPARK-23745][SQL] add ShutdownHook to cloes HiveClient's SessionState to delete residual dirs 2021-03-16 10:37:40 +08:00
hive-thriftserver [SPARK-34712][SQL][TESTS] Refactor UT about hive build in version, avoid to change every time when upgrade hive version 2021-03-11 12:52:29 -08:00
create-docs.sh [SPARK-34010][SQL][DODCS] Use python3 instead of python in SQL documentation build 2021-01-05 19:48:10 +09:00
gen-sql-api-docs.py [SPARK-34022][DOCS][FOLLOW-UP] Fix typo in SQL built-in function docs 2021-01-06 09:28:22 -08:00
gen-sql-config-docs.py [SPARK-31550][SQL][DOCS] Set nondeterministic configurations with general meanings in sql configuration doc 2020-04-27 17:08:52 +09:00
gen-sql-functions-docs.py [SPARK-31562][SQL] Update ExpressionDescription for substring, current_date, and current_timestamp 2020-04-26 11:46:52 -07:00
mkdocs.yml [SPARK-30731] Update deprecated Mkdocs option 2020-02-19 17:28:58 +09:00
README.md [SPARK-30510][SQL][DOCS] Publicly document Spark SQL configuration options 2020-02-09 19:20:47 +09:00

Spark SQL

This module provides support for executing relational queries expressed in either SQL or the DataFrame/Dataset API.

Spark SQL is broken up into four subprojects:

  • Catalyst (sql/catalyst) - An implementation-agnostic framework for manipulating trees of relational operators and expressions.
  • Execution (sql/core) - A query planner / execution engine for translating Catalyst's logical query plans into Spark RDDs. This component also includes a new public interface, SQLContext, that allows users to execute SQL or LINQ statements against existing RDDs and Parquet files.
  • Hive Support (sql/hive) - Includes extensions that allow users to write queries using a subset of HiveQL and access data from a Hive Metastore using Hive SerDes. There are also wrappers that allow users to run queries that include Hive UDFs, UDAFs, and UDTFs.
  • HiveServer and CLI support (sql/hive-thriftserver) - Includes support for the SQL CLI (bin/spark-sql) and a HiveServer2 (for JDBC/ODBC) compatible server.

Running ./sql/create-docs.sh generates SQL documentation for built-in functions under sql/site, and SQL configuration documentation that gets included as part of configuration.md in the main docs directory.