From fb38887e001d33adef519d0288bd0844dcfe2bd5 Mon Sep 17 00:00:00 2001 From: Kousuke Saruta Date: Wed, 25 Aug 2021 21:30:43 +0900 Subject: [PATCH] [SPARK-36398][SQL] Redact sensitive information in Spark Thrift Server log ### What changes were proposed in this pull request? This PR fixes an issue that there is no way to redact sensitive information in Spark Thrift Server log. For example, JDBC password can be exposed in the log. ``` 21/08/25 18:52:37 INFO SparkExecuteStatementOperation: Submitting query 'CREATE TABLE mytbl2(a int) OPTIONS(url="jdbc:mysql//example.com:3306", driver="com.mysql.jdbc.Driver", dbtable="test_tbl", user="test_usr", password="abcde")' with ca14ae38-1aaf-4bf4-a099-06b8e5337613 ``` ### Why are the changes needed? Bug fix. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Ran ThriftServer, connect to it and execute `CREATE TABLE mytbl2(a int) OPTIONS(url="jdbc:mysql//example.com:3306", driver="com.mysql.jdbc.Driver", dbtable="test_tbl", user="test_usr", password="abcde");` with `spark.sql.redaction.string.regex=((?i)(?<=password=))(".*")|('.*')` Then, confirmed the log. ``` 21/08/25 18:54:11 INFO SparkExecuteStatementOperation: Submitting query 'CREATE TABLE mytbl2(a int) OPTIONS(url="jdbc:mysql//example.com:3306", driver="com.mysql.jdbc.Driver", dbtable="test_tbl", user="test_usr", password=*********(redacted))' with ffc627e2-b1a8-4d83-ab6d-d819b3ccd909 ``` Closes #33832 from sarutak/fix-SPARK-36398. Authored-by: Kousuke Saruta Signed-off-by: Kousuke Saruta (cherry picked from commit b2ff01608f5ecdba19630e12478bd370f9766f7b) Signed-off-by: Kousuke Saruta --- .../sql/hive/thriftserver/SparkExecuteStatementOperation.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala index 0df58857e1..4f4088990a 100644 --- a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala +++ b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala @@ -185,8 +185,8 @@ private[hive] class SparkExecuteStatementOperation( override def runInternal(): Unit = { setState(OperationState.PENDING) - logInfo(s"Submitting query '$statement' with $statementId") val redactedStatement = SparkUtils.redact(sqlContext.conf.stringRedactionPattern, statement) + logInfo(s"Submitting query '$redactedStatement' with $statementId") HiveThriftServer2.eventManager.onStatementStart( statementId, parentSession.getSessionHandle.getSessionId.toString,