From d28ca9cc9808828118be64a545c3478160fdc170 Mon Sep 17 00:00:00 2001 From: Max Gekk Date: Wed, 30 Jun 2021 09:44:52 +0300 Subject: [PATCH] [SPARK-35935][SQL] Prevent failure of `MSCK REPAIR TABLE` on table refreshing ### What changes were proposed in this pull request? In the PR, I propose to catch all non-fatal exceptions coming `refreshTable()` at the final stage of table repairing, and output an error message instead of failing with an exception. ### Why are the changes needed? 1. The uncaught exceptions from table refreshing might be considered as regression comparing to previous Spark versions. Table refreshing was introduced by https://github.com/apache/spark/pull/31066. 2. This should improve user experience with Spark SQL. For instance, when the `MSCK REPAIR TABLE` is performed in a chain of command in SQL where catching exception is difficult or even impossible. ### Does this PR introduce _any_ user-facing change? Yes. Before the changes the `MSCK REPAIR TABLE` command can fail with the exception portrayed in SPARK-35935. After the changes, the same command outputs error message, and completes successfully. ### How was this patch tested? By existing test suites. Closes #33137 from MaxGekk/msck-repair-catch-except. Authored-by: Max Gekk Signed-off-by: Max Gekk --- .../org/apache/spark/sql/execution/command/ddl.scala | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala index 0876b5f058..06c684783a 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala @@ -675,7 +675,15 @@ case class RepairTableCommand( // This is always the case for Hive format tables, but is not true for Datasource tables created // before Spark 2.1 unless they are converted via `msck repair table`. spark.sessionState.catalog.alterTable(table.copy(tracksPartitionsInCatalog = true)) - spark.catalog.refreshTable(tableIdentWithDB) + try { + spark.catalog.refreshTable(tableIdentWithDB) + } catch { + case NonFatal(e) => + logError(s"Cannot refresh the table '$tableIdentWithDB'. A query of the table " + + "might return wrong result if the table was cached. To avoid such issue, you should " + + "uncache the table manually via the UNCACHE TABLE command after table recovering will " + + "complete fully.", e) + } logInfo(s"Recovered all partitions: added ($addedAmount), dropped ($droppedAmount).") Seq.empty[Row] }