From b914ff7d54bd7c07e7313bb06a1fa22c36b628d2 Mon Sep 17 00:00:00 2001 From: Kousuke Saruta Date: Wed, 18 Aug 2021 13:31:22 +0900 Subject: [PATCH] [SPARK-36400][SPARK-36398][SQL][WEBUI] Make ThriftServer recognize spark.sql.redaction.string.regex ### What changes were proposed in this pull request? This PR fixes an issue that ThriftServer doesn't recognize `spark.sql.redaction.string.regex`. The problem is that sensitive information included in queries can be exposed. ![thrift-password1](https://user-images.githubusercontent.com/4736016/129440772-46379cc5-987b-41ac-adce-aaf2139f6955.png) ![thrift-password2](https://user-images.githubusercontent.com/4736016/129440775-fd328c0f-d128-4a20-82b0-46c331b9fd64.png) ### Why are the changes needed? Bug fix. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Ran ThriftServer, connect to it and execute `CREATE TABLE mytbl2(a int) OPTIONS(url="jdbc:mysql//example.com:3306", driver="com.mysql.jdbc.Driver", dbtable="test_tbl", user="test_usr", password="abcde");` with `spark.sql.redaction.string.regex=((?i)(?<=password=))(".*")|('.*')` Then, confirmed UI. ![thrift-hide-password1](https://user-images.githubusercontent.com/4736016/129440863-cabea247-d51f-41a4-80ac-6c64141e1fb7.png) ![thrift-hide-password2](https://user-images.githubusercontent.com/4736016/129440874-96cd0f0c-720b-4010-968a-cffbc85d2be5.png) Closes #33743 from sarutak/thrift-redact. Authored-by: Kousuke Saruta Signed-off-by: Kousuke Saruta --- .../sql/hive/thriftserver/SparkExecuteStatementOperation.scala | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala index f43f8e7eba..0df58857e1 100644 --- a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala +++ b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala @@ -186,10 +186,11 @@ private[hive] class SparkExecuteStatementOperation( override def runInternal(): Unit = { setState(OperationState.PENDING) logInfo(s"Submitting query '$statement' with $statementId") + val redactedStatement = SparkUtils.redact(sqlContext.conf.stringRedactionPattern, statement) HiveThriftServer2.eventManager.onStatementStart( statementId, parentSession.getSessionHandle.getSessionId.toString, - statement, + redactedStatement, statementId, parentSession.getUsername) setHasResultSet(true) // avoid no resultset for async run