[SPARK-29630][SQL] Disallow creating a permanent view that references a temporary view in an expression

### What changes were proposed in this pull request?

Disallow creating a permanent view that references a temporary view in **expressions**.

### Why are the changes needed?

Creating a permanent view that references a temporary view is currently disallowed. For example,
```SQL
# The following throws org.apache.spark.sql.AnalysisException
# Not allowed to create a permanent view `per_view` by referencing a temporary view `tmp`;
CREATE VIEW per_view AS SELECT t1.a, t2.b FROM base_table t1, (SELECT * FROM tmp) t2"
```
However, the following is allowed.
```SQL

CREATE VIEW per_view AS SELECT * FROM base_table WHERE EXISTS (SELECT * FROM tmp);
```
This PR fixes the bug where temporary views used inside expressions are not checked.

### Does this PR introduce any user-facing change?

Yes. Now the following SQL query throws an exception as expected:
```SQL
# The following throws org.apache.spark.sql.AnalysisException
# Not allowed to create a permanent view `per_view` by referencing a temporary view `tmp`;
CREATE VIEW per_view AS SELECT * FROM base_table WHERE EXISTS (SELECT * FROM tmp);
```

### How was this patch tested?

Added new unit tests.

Closes #26361 from imback82/spark-29630.

Authored-by: Terry Kim <yuminkim@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
This commit is contained in:
Terry Kim 2019-11-05 13:19:46 +08:00 committed by Wenchen Fan
parent 942a057934
commit 66619b84d8
3 changed files with 415 additions and 391 deletions

View file

@ -190,21 +190,26 @@ case class CreateViewCommand(
// added/generated from a temporary view.
// 2) The temp functions are represented by multiple classes. Most are inaccessible from this
// package (e.g., HiveGenericUDF).
child.collect {
// Disallow creating permanent views based on temporary views.
case UnresolvedRelation(AsTableIdentifier(ident))
def verify(child: LogicalPlan) {
child.collect {
// Disallow creating permanent views based on temporary views.
case UnresolvedRelation(AsTableIdentifier(ident))
if sparkSession.sessionState.catalog.isTemporaryTable(ident) =>
// temporary views are only stored in the session catalog
throw new AnalysisException(s"Not allowed to create a permanent view $name by " +
s"referencing a temporary view $ident")
case other if !other.resolved => other.expressions.flatMap(_.collect {
// Disallow creating permanent views based on temporary UDFs.
case e: UnresolvedFunction
if sparkSession.sessionState.catalog.isTemporaryFunction(e.name) =>
// temporary views are only stored in the session catalog
throw new AnalysisException(s"Not allowed to create a permanent view $name by " +
s"referencing a temporary function `${e.name}`")
})
s"referencing a temporary view $ident")
case other if !other.resolved => other.expressions.flatMap(_.collect {
// Traverse subquery plan for any unresolved relations.
case e: SubqueryExpression => verify(e.plan)
// Disallow creating permanent views based on temporary UDFs.
case e: UnresolvedFunction
if sparkSession.sessionState.catalog.isTemporaryFunction(e.name) =>
throw new AnalysisException(s"Not allowed to create a permanent view $name by " +
s"referencing a temporary function `${e.name}`")
})
}
}
verify(child)
}
}

View file

@ -177,7 +177,6 @@ DESC TABLE EXTENDED v8;
-- [SPARK-29628] Forcibly create a temporary view in CREATE VIEW if referencing a temporary view
CREATE VIEW v6_temp AS SELECT * FROM base_table WHERE id IN (SELECT id FROM temp_table);
CREATE VIEW v7_temp AS SELECT t1.id, t2.a FROM base_table t1, (SELECT * FROM temp_table) t2;
-- [SPARK-29630] Not allowed to create a permanent view by referencing a temporary view in EXISTS
CREATE VIEW v8_temp AS SELECT * FROM base_table WHERE EXISTS (SELECT 1 FROM temp_table);
CREATE VIEW v9_temp AS SELECT * FROM base_table WHERE NOT EXISTS (SELECT 1 FROM temp_table);
@ -232,6 +231,7 @@ CREATE VIEW nontemp4 AS SELECT * FROM t1 LEFT JOIN t2 ON t1.num = t2.num2 AND t2
DESC TABLE EXTENDED nontemp4;
-- [SPARK-29628] Forcibly create a temporary view in CREATE VIEW if referencing a temporary view
CREATE VIEW temporal4 AS SELECT * FROM t1 LEFT JOIN tt ON t1.num = tt.num2 AND tt.value = 'xxx';
CREATE VIEW temporal5 AS SELECT * FROM t1 WHERE num IN (SELECT num FROM t1 WHERE EXISTS (SELECT 1 FROM tt));
-- Skip the tests below because of PostgreSQL specific cases
-- SELECT relname FROM pg_class
@ -247,10 +247,10 @@ CREATE TABLE tbl1 ( a int, b int) using parquet;
CREATE TABLE tbl2 (c int, d int) using parquet;
CREATE TABLE tbl3 (e int, f int) using parquet;
CREATE TABLE tbl4 (g int, h int) using parquet;
-- Since Spark doesn't support CREATE TEMPORARY TABLE, we used CREATE TEMPORARY VIEW instead
-- Since Spark doesn't support CREATE TEMPORARY TABLE, we used CREATE TABLE instead
-- CREATE TEMP TABLE tmptbl (i int, j int);
CREATE TEMP VIEW tmptbl AS SELECT * FROM VALUES
(1, 1) AS temptbl(i, j);
CREATE TABLE tmptbl (i int, j int) using parquet;
INSERT INTO tmptbl VALUES (1, 1);
--Should be in testviewschm2
CREATE VIEW pubview AS SELECT * FROM tbl1 WHERE tbl1.a