spark-instrumented-optimizer

History

hyukjinkwon 01c3dfab15 [MINOR][SQL] Add a debug log when a SQL text is used for a view ## What changes were proposed in this pull request? This took me a while to debug and find out. Looks we better at least leave a debug log that SQL text for a view will be used. Here's how I got there: Hive: ``` CREATE TABLE emp AS SELECT 'user' AS name, 'address' as address; CREATE DATABASE d100; CREATE FUNCTION d100.udf100 AS 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFUpper'; CREATE VIEW testview AS SELECT d100.udf100(name) FROM default.emp; ``` Spark: ``` sql("SELECT * FROM testview").show() ``` ``` scala> sql("SELECT * FROM testview").show() org.apache.spark.sql.AnalysisException: Undefined function: 'd100.udf100'. This function is neither a registered temporary function nor a permanent function registered in the database 'default'.; line 1 pos 7 ``` Under the hood, it actually makes sense since the view is defined as `SELECT d100.udf100(name) FROM default.emp;` and Hive API: ``` org.apache.hadoop.hive.ql.metadata.Table.getViewExpandedText() ``` This returns a wrongly qualified SQL string for the view as below: ``` SELECT `d100.udf100`(`emp`.`name`) FROM `default`.`emp` ``` which works fine in Hive but not in Spark. ## How was this patch tested? Manually: ``` 18/09/06 19:32:48 DEBUG HiveSessionCatalog: 'SELECT `d100.udf100`(`emp`.`name`) FROM `default`.`emp`' will be used for the view(testview). ``` Closes #22351 from HyukjinKwon/minor-debug. Authored-by: hyukjinkwon <gurwls223@apache.org> Signed-off-by: hyukjinkwon <gurwls223@apache.org>	2018-09-08 12:55:44 +08:00
..
src	[MINOR][SQL] Add a debug log when a SQL text is used for a view	2018-09-08 12:55:44 +08:00
pom.xml	[SPARK-19550][BUILD][FOLLOW-UP] Remove MaxPermSize for sql module	2018-01-15 07:49:34 -06:00

hyukjinkwon 01c3dfab15 [MINOR][SQL] Add a debug log when a SQL text is used for a view

## What changes were proposed in this pull request?

This took me a while to debug and find out. Looks we better at least leave a debug log that SQL text for a view will be used.

Here's how I got there:

**Hive:**

```
CREATE TABLE emp AS SELECT 'user' AS name, 'address' as address;
CREATE DATABASE d100;
CREATE FUNCTION d100.udf100 AS 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFUpper';
CREATE VIEW testview AS SELECT d100.udf100(name) FROM default.emp;
```

**Spark:**

```
sql("SELECT * FROM testview").show()
```

```
scala> sql("SELECT * FROM testview").show()
org.apache.spark.sql.AnalysisException: Undefined function: 'd100.udf100'. This function is neither a registered temporary function nor a permanent function registered in the database 'default'.; line 1 pos 7
```

Under the hood, it actually makes sense since the view is defined as `SELECT d100.udf100(name) FROM default.emp;` and Hive API:

```
org.apache.hadoop.hive.ql.metadata.Table.getViewExpandedText()
```

This returns a wrongly qualified SQL string for the view as below:

```
SELECT `d100.udf100`(`emp`.`name`) FROM `default`.`emp`
```

which works fine in Hive but not in Spark.

## How was this patch tested?

Manually:

```
18/09/06 19:32:48 DEBUG HiveSessionCatalog: 'SELECT `d100.udf100`(`emp`.`name`) FROM `default`.`emp`' will be used for the view(testview).
```

Closes #22351 from HyukjinKwon/minor-debug.

Authored-by: hyukjinkwon <gurwls223@apache.org>
Signed-off-by: hyukjinkwon <gurwls223@apache.org>

2018-09-08 12:55:44 +08:00

src

[MINOR][SQL] Add a debug log when a SQL text is used for a view

2018-09-08 12:55:44 +08:00

pom.xml

[SPARK-19550][BUILD][FOLLOW-UP] Remove MaxPermSize for sql module

2018-01-15 07:49:34 -06:00