3a48ea1fe0
### What changes were proposed in this pull request? Hive 2.3+ supports `getTablesByType` API, which will provide an efficient way to get HiveTable with specific type. Now, we have following mappings when using `HiveExternalCatalog`. ``` CatalogTableType.EXTERNAL => HiveTableType.EXTERNAL_TABLE CatalogTableType.MANAGED => HiveTableType.MANAGED_TABLE CatalogTableType.VIEW => HiveTableType.VIRTUAL_VIEW ``` Without this API, we need to achieve the goal by `getTables` + `getTablesByName` + `filter with type`. This PR add `getTablesByType` in `HiveShim`. For those hive versions don't support this API, `UnsupportedOperationException` will be thrown. And the upper logic should catch the exception and fallback to the filter solution mentioned above. Since the JDK11 related fix in `Hive` is not released yet, manual tests against hive 2.3.7-SNAPSHOT is done by following the instructions of SPARK-29245. ### Why are the changes needed? This API will provide better usability and performance if we want to get a list of hiveTables with specific type. For example `HiveTableType.VIRTUAL_VIEW` corresponding to `CatalogTableType.VIEW`. ### Does this PR introduce any user-facing change? No, this is a support function. ### How was this patch tested? Add tests in VersionsSuite and manually run JDK11 test with following settings: - Hive 2.3.6 Metastore on JDK8 - Hive 2.3.7-SNAPSHOT library build from source of Hive 2.3 branch - Spark build with Hive 2.3.7-SNAPSHOT on jdk-11.0.6 Closes #27952 from Eric5553/GetTableByType. Authored-by: Eric Wu <492960551@qq.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> |
||
---|---|---|
.. | ||
catalyst | ||
core | ||
hive | ||
hive-thriftserver | ||
create-docs.sh | ||
gen-sql-api-docs.py | ||
gen-sql-config-docs.py | ||
mkdocs.yml | ||
README.md |
Spark SQL
This module provides support for executing relational queries expressed in either SQL or the DataFrame/Dataset API.
Spark SQL is broken up into four subprojects:
- Catalyst (sql/catalyst) - An implementation-agnostic framework for manipulating trees of relational operators and expressions.
- Execution (sql/core) - A query planner / execution engine for translating Catalyst's logical query plans into Spark RDDs. This component also includes a new public interface, SQLContext, that allows users to execute SQL or LINQ statements against existing RDDs and Parquet files.
- Hive Support (sql/hive) - Includes extensions that allow users to write queries using a subset of HiveQL and access data from a Hive Metastore using Hive SerDes. There are also wrappers that allow users to run queries that include Hive UDFs, UDAFs, and UDTFs.
- HiveServer and CLI support (sql/hive-thriftserver) - Includes support for the SQL CLI (bin/spark-sql) and a HiveServer2 (for JDBC/ODBC) compatible server.
Running ./sql/create-docs.sh
generates SQL documentation for built-in functions under sql/site
, and SQL configuration documentation that gets included as part of configuration.md
in the main docs
directory.