[SPARK-33611][UI] Avoid encoding twice on the query parameter of rewritten proxy URL
### What changes were proposed in this pull request? When running Spark behind a reverse proxy(e.g. Nginx, Apache HTTP server), the request URL can be encoded twice if we pass the query string directly to the constructor of `java.net.URI`: ``` > val uri = "http://localhost:8081/test" > val query = "order%5B0%5D%5Bcolumn%5D=0" // query string of URL from the reverse proxy > val rewrittenURI = URI.create(uri.toString()) > new URI(rewrittenURI.getScheme(), rewrittenURI.getAuthority(), rewrittenURI.getPath(), query, rewrittenURI.getFragment()).toString result: http://localhost:8081/test?order%255B0%255D%255Bcolumn%255D=0 ``` In Spark's stage page, the URL of "/taskTable" contains query parameter order[0][dir]. After encoding twice, the query parameter becomes `order%255B0%255D%255Bdir%255D` and it will be decoded as `order%5B0%5D%5Bdir%5D` instead of `order[0][dir]`. As a result, there will be NullPointerException from https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/status/api/v1/StagesResource.scala#L176 Other than that, the other parameter may not work as expected after encoded twice. This PR is to fix the bug by calling the method `URI.create(String URL)` directly. This convenience method can avoid encoding twice on the query parameter. ``` > val uri = "http://localhost:8081/test" > val query = "order%5B0%5D%5Bcolumn%5D=0" > URI.create(s"$uri?$query").toString result: http://localhost:8081/test?order%5B0%5D%5Bcolumn%5D=0 > URI.create(s"$uri?$query").getQuery result: order[0][column]=0 ``` ### Why are the changes needed? Fix a potential bug when Spark's reverse proxy is enabled. The bug itself is similar to https://github.com/apache/spark/pull/29271. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Add a new unit test. Also, Manual UI testing for master, worker and app UI with an nginx proxy Spark config: ``` spark.ui.port 8080 spark.ui.reverseProxy=true spark.ui.reverseProxyUrl=/path/to/spark/ ``` nginx config: ``` server { listen 9000; set $SPARK_MASTER http://127.0.0.1:8080; # split spark UI path into prefix and local path within master UI location ~ ^(/path/to/spark/) { # strip prefix when forwarding request rewrite /path/to/spark(/.*) $1 break; #rewrite /path/to/spark/ "/" ; # forward to spark master UI proxy_pass $SPARK_MASTER; proxy_intercept_errors on; error_page 301 302 307 = handle_redirects; } location handle_redirects { set $saved_redirect_location '$upstream_http_location'; proxy_pass $saved_redirect_location; } } ``` Closes #30552 from gengliangwang/decodeProxyRedirect. Authored-by: Gengliang Wang <gengliang.wang@databricks.com> Signed-off-by: Gengliang Wang <gengliang.wang@databricks.com>
This commit is contained in:
parent
c24f2b2d6a
commit
5d0045eedf
|
@ -401,17 +401,13 @@ private[spark] object JettyUtils extends Logging {
|
|||
uri.append(rest)
|
||||
}
|
||||
|
||||
val rewrittenURI = URI.create(uri.toString())
|
||||
if (query != null) {
|
||||
return new URI(
|
||||
rewrittenURI.getScheme(),
|
||||
rewrittenURI.getAuthority(),
|
||||
rewrittenURI.getPath(),
|
||||
query,
|
||||
rewrittenURI.getFragment()
|
||||
).normalize()
|
||||
val queryString = if (query == null) {
|
||||
""
|
||||
} else {
|
||||
s"?$query"
|
||||
}
|
||||
rewrittenURI.normalize()
|
||||
// SPARK-33611: use method `URI.create` to avoid percent-encoding twice on the query string.
|
||||
URI.create(uri.toString() + queryString).normalize()
|
||||
}
|
||||
|
||||
def createProxyLocationHeader(
|
||||
|
|
|
@ -216,6 +216,15 @@ class UISuite extends SparkFunSuite {
|
|||
assert(rewrittenURI === null)
|
||||
}
|
||||
|
||||
test("SPARK-33611: Avoid encoding twice on the query parameter of proxy rewrittenURI") {
|
||||
val prefix = "/worker-id"
|
||||
val target = "http://localhost:8081"
|
||||
val path = "/worker-id/json"
|
||||
val rewrittenURI =
|
||||
JettyUtils.createProxyURI(prefix, target, path, "order%5B0%5D%5Bcolumn%5D=0")
|
||||
assert(rewrittenURI.toString === "http://localhost:8081/json?order%5B0%5D%5Bcolumn%5D=0")
|
||||
}
|
||||
|
||||
test("verify rewriting location header for reverse proxy") {
|
||||
val clientRequest = mock(classOf[HttpServletRequest])
|
||||
var headerValue = "http://localhost:4040/jobs"
|
||||
|
|
Loading…
Reference in a new issue