6093a78dbd
### What changes were proposed in this pull request? Currently, the warehouse path gets fully qualified in the caller side for creating a database, table, partition, etc. An unqualified path is populated into Spark and Hadoop confs, which leads to inconsistent API behaviors. We should make it qualified ahead. When the value is a relative path `spark.sql.warehouse.dir=lakehouse`, some behaviors become inconsistent, for example. If the default database is absent at runtime, the app fails with ```java Caused by: java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: file:./lakehouse at org.apache.hadoop.fs.Path.initialize(Path.java:263) at org.apache.hadoop.fs.Path.<init>(Path.java:254) at org.apache.hadoop.hive.metastore.Warehouse.getDnsPath(Warehouse.java:133) at org.apache.hadoop.hive.metastore.Warehouse.getDnsPath(Warehouse.java:137) at org.apache.hadoop.hive.metastore.Warehouse.getWhRoot(Warehouse.java:150) at org.apache.hadoop.hive.metastore.Warehouse.getDefaultDatabasePath(Warehouse.java:163) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB_core(HiveMetaStore.java:636) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:655) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:431) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:148) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:79) ... 73 more ``` If the default database is present at runtime, the app can work with it, and if we create a database, it gets fully qualified, for example ```sql spark-sql> create database test; Time taken: 0.052 seconds spark-sql> desc database test; Database Name test Comment Location file:/Users/kentyao/Downloads/spark/spark-3.2.0-SNAPSHOT-bin-20210226/lakehouse/test.db Owner kentyao Time taken: 0.023 seconds, Fetched 4 row(s) ``` Another thing is that the log becomes nubilous, for example. ```logtalk 21/02/27 13:54:17 INFO SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir ('datalake'). 21/02/27 13:54:17 INFO SharedState: Warehouse path is 'lakehouse'. ``` ### Why are the changes needed? fix bug and ambiguity ### Does this PR introduce _any_ user-facing change? yes, the path now resolved with proper order - `warehouse->database->table->partition` ### How was this patch tested? w/ ut added Closes #31671 from yaooqinn/SPARK-34558. Authored-by: Kent Yao <yao@apache.org> Signed-off-by: Wenchen Fan <wenchen@databricks.com> |
||
---|---|---|
.. | ||
benchmarks | ||
compatibility/src/test/scala/org/apache/spark/sql/hive/execution | ||
src | ||
pom.xml |