-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-32467][UI]Avoid encoding URL twice on https redirect #29271
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #126720 has finished for PR 29271 at commit
|
srowen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks reasonable to me.
zsxwing
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, the original HTTP URL could be encoded already, so that the HTTPS result will be encoded twice.
It would be great if we can explain the issue more explicitly. Maybe something like this:
When we create the redirect url, we will call
getRequestURIandgetQueryString. These two methods return an encoded string. However, we pass them directly to the following URI constructor
URI(String scheme,
String authority,
String path, String query, String fragment)
As this URI constructor assumes both
pathandqueryparameters are decoded strings, it will encode them again. This makes the redirect url to be encoded twice.
| } | ||
| } | ||
|
|
||
| test("Avoid encoding URL twice on https redirect") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: miss jira number
| } | ||
| val authority = s"$redirectServer:$port" | ||
| new URI(scheme, authority, path, query, null).toString | ||
| val queryEncoding = if (request.getQueryEncoding != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we try to use the same approach in createProxyURI? Looks like we can try to create a method to share in createRedirectURI and createProxyURI.
val rewrittenURI = URI.create(uri.toString())
if (query != null) {
return new URI(
rewrittenURI.getScheme(),
rewrittenURI.getAuthority(),
rewrittenURI.getPath(),
query,
rewrittenURI.getFragment()
).normalize()
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried but it doesn't do the decoding with URI.toString
scala> new URI("https://g.com/ctx%281%29?a%5B0%5D=b").normalize().toString
res28: String = https://g.com/ctx%281%29?a%5B0%5D=b
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can build a URI and call get getPath and getQuery, and then build a new URI string from the decode result. ```
scala> new URI("https://g.com/ctx%281%29?a%5B0%5D=b").getPath
res31: String = /ctx(1)
scala> new URI("https://g.com/ctx%281%29?a%5B0%5D=b").getQuery
res33: String = a[0]=b
But it seems equivalent of the current code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried but it doesn't do the decoding with URI.toString
URI.toString doesn't do the decoding. It will encode characters if necessary. For example,
scala> val uri = new java.net.URI(null, null, "foo bar", "foo bar", null).toString
uri: String = foo%20bar?foo%20bar
Anyway, this doesn't matter. I tried to consolidate these two method but it didn't look better.
|
Test build #126879 has finished for PR 29271 at commit
|
| } | ||
| val authority = s"$redirectServer:$port" | ||
| new URI(scheme, authority, path, query, null).toString | ||
| val queryEncoding = if (request.getQueryEncoding != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried but it doesn't do the decoding with URI.toString
URI.toString doesn't do the decoding. It will encode characters if necessary. For example,
scala> val uri = new java.net.URI(null, null, "foo bar", "foo bar", null).toString
uri: String = foo%20bar?foo%20bar
Anyway, this doesn't matter. I tried to consolidate these two method but it didn't look better.
|
@cloud-fan @srowen @zsxwing Thanks for the review! |
|
Merging to master and 3.0 |
### What changes were proposed in this pull request? When https is enabled for Spark UI, an HTTP request will be redirected as an encoded HTTPS URL: https://github.com/apache/spark/pull/10238/files#diff-f79a5ead735b3d0b34b6b94486918e1cR312 When we create the redirect url, we will call getRequestURI and getQueryString. Both two methods may return an encoded string. However, we pass them directly to the following URI constructor ``` URI(String scheme, String authority, String path, String query, String fragment) ``` As this URI constructor assumes both path and query parameters are decoded strings, it will encode them again. This makes the redirect URL encoded twice. This problem is on stage page with HTTPS enabled. The URL of "/taskTable" contains query parameter `order%5B0%5D%5Bcolumn%5D`. After encoded it becomes `order%255B0%255D%255Bcolumn%255D` and it will be decoded as `order%5B0%5D%5Bcolumn%5D` instead of `order[0][dir]`. When the parameter `order[0][dir]` is missing, there will be an excetpion from: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/status/api/v1/StagesResource.scala#L176 and the stage page fail to load. To fix the problem, we can try decoding the query parameters before encoding it. This is to make sure we encode the URL ### Why are the changes needed? Fix a UI issue when HTTPS is enabled ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? A new Unit test + manually test on a cluster Closes #29271 from gengliangwang/urlEncode. Authored-by: Gengliang Wang <[email protected]> Signed-off-by: Gengliang Wang <[email protected]> (cherry picked from commit 71aea02) Signed-off-by: Gengliang Wang <[email protected]>
…itten proxy URL
### What changes were proposed in this pull request?
When running Spark behind a reverse proxy(e.g. Nginx, Apache HTTP server), the request URL can be encoded twice if we pass the query string directly to the constructor of `java.net.URI`:
```
> val uri = "http://localhost:8081/test"
> val query = "order%5B0%5D%5Bcolumn%5D=0" // query string of URL from the reverse proxy
> val rewrittenURI = URI.create(uri.toString())
> new URI(rewrittenURI.getScheme(),
rewrittenURI.getAuthority(),
rewrittenURI.getPath(),
query,
rewrittenURI.getFragment()).toString
result: http://localhost:8081/test?order%255B0%255D%255Bcolumn%255D=0
```
In Spark's stage page, the URL of "/taskTable" contains query parameter order[0][dir]. After encoding twice, the query parameter becomes `order%255B0%255D%255Bdir%255D` and it will be decoded as `order%5B0%5D%5Bdir%5D` instead of `order[0][dir]`. As a result, there will be NullPointerException from https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/status/api/v1/StagesResource.scala#L176
Other than that, the other parameter may not work as expected after encoded twice.
This PR is to fix the bug by calling the method `URI.create(String URL)` directly. This convenience method can avoid encoding twice on the query parameter.
```
> val uri = "http://localhost:8081/test"
> val query = "order%5B0%5D%5Bcolumn%5D=0"
> URI.create(s"$uri?$query").toString
result: http://localhost:8081/test?order%5B0%5D%5Bcolumn%5D=0
> URI.create(s"$uri?$query").getQuery
result: order[0][column]=0
```
### Why are the changes needed?
Fix a potential bug when Spark's reverse proxy is enabled.
The bug itself is similar to #29271.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Add a new unit test.
Also, Manual UI testing for master, worker and app UI with an nginx proxy
Spark config:
```
spark.ui.port 8080
spark.ui.reverseProxy=true
spark.ui.reverseProxyUrl=/path/to/spark/
```
nginx config:
```
server {
listen 9000;
set $SPARK_MASTER http://127.0.0.1:8080;
# split spark UI path into prefix and local path within master UI
location ~ ^(/path/to/spark/) {
# strip prefix when forwarding request
rewrite /path/to/spark(/.*) $1 break;
#rewrite /path/to/spark/ "/" ;
# forward to spark master UI
proxy_pass $SPARK_MASTER;
proxy_intercept_errors on;
error_page 301 302 307 = handle_redirects;
}
location handle_redirects {
set $saved_redirect_location '$upstream_http_location';
proxy_pass $saved_redirect_location;
}
}
```
Closes #30552 from gengliangwang/decodeProxyRedirect.
Authored-by: Gengliang Wang <[email protected]>
Signed-off-by: Gengliang Wang <[email protected]>
…itten proxy URL
### What changes were proposed in this pull request?
When running Spark behind a reverse proxy(e.g. Nginx, Apache HTTP server), the request URL can be encoded twice if we pass the query string directly to the constructor of `java.net.URI`:
```
> val uri = "http://localhost:8081/test"
> val query = "order%5B0%5D%5Bcolumn%5D=0" // query string of URL from the reverse proxy
> val rewrittenURI = URI.create(uri.toString())
> new URI(rewrittenURI.getScheme(),
rewrittenURI.getAuthority(),
rewrittenURI.getPath(),
query,
rewrittenURI.getFragment()).toString
result: http://localhost:8081/test?order%255B0%255D%255Bcolumn%255D=0
```
In Spark's stage page, the URL of "/taskTable" contains query parameter order[0][dir]. After encoding twice, the query parameter becomes `order%255B0%255D%255Bdir%255D` and it will be decoded as `order%5B0%5D%5Bdir%5D` instead of `order[0][dir]`. As a result, there will be NullPointerException from https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/status/api/v1/StagesResource.scala#L176
Other than that, the other parameter may not work as expected after encoded twice.
This PR is to fix the bug by calling the method `URI.create(String URL)` directly. This convenience method can avoid encoding twice on the query parameter.
```
> val uri = "http://localhost:8081/test"
> val query = "order%5B0%5D%5Bcolumn%5D=0"
> URI.create(s"$uri?$query").toString
result: http://localhost:8081/test?order%5B0%5D%5Bcolumn%5D=0
> URI.create(s"$uri?$query").getQuery
result: order[0][column]=0
```
### Why are the changes needed?
Fix a potential bug when Spark's reverse proxy is enabled.
The bug itself is similar to #29271.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Add a new unit test.
Also, Manual UI testing for master, worker and app UI with an nginx proxy
Spark config:
```
spark.ui.port 8080
spark.ui.reverseProxy=true
spark.ui.reverseProxyUrl=/path/to/spark/
```
nginx config:
```
server {
listen 9000;
set $SPARK_MASTER http://127.0.0.1:8080;
# split spark UI path into prefix and local path within master UI
location ~ ^(/path/to/spark/) {
# strip prefix when forwarding request
rewrite /path/to/spark(/.*) $1 break;
#rewrite /path/to/spark/ "/" ;
# forward to spark master UI
proxy_pass $SPARK_MASTER;
proxy_intercept_errors on;
error_page 301 302 307 = handle_redirects;
}
location handle_redirects {
set $saved_redirect_location '$upstream_http_location';
proxy_pass $saved_redirect_location;
}
}
```
Closes #30552 from gengliangwang/decodeProxyRedirect.
Authored-by: Gengliang Wang <[email protected]>
Signed-off-by: Gengliang Wang <[email protected]>
(cherry picked from commit 5d0045e)
Signed-off-by: Gengliang Wang <[email protected]>
What changes were proposed in this pull request?
When https is enabled for Spark UI, an HTTP request will be redirected as an encoded HTTPS URL: https://github.com/apache/spark/pull/10238/files#diff-f79a5ead735b3d0b34b6b94486918e1cR312
When we create the redirect url, we will call getRequestURI and getQueryString. Both two methods may return an encoded string. However, we pass them directly to the following URI constructor
As this URI constructor assumes both path and query parameters are decoded strings, it will encode them again. This makes the redirect URL encoded twice.
This problem is on stage page with HTTPS enabled. The URL of "/taskTable" contains query parameter
order%5B0%5D%5Bcolumn%5D. After encoded it becomesorder%255B0%255D%255Bcolumn%255Dand it will be decoded asorder%5B0%5D%5Bcolumn%5Dinstead oforder[0][dir]. When the parameterorder[0][dir]is missing, there will be an excetpion from:https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/status/api/v1/StagesResource.scala#L176
and the stage page fail to load.
To fix the problem, we can try decoding the query parameters before encoding it. This is to make sure we encode the URL
Why are the changes needed?
Fix a UI issue when HTTPS is enabled
Does this PR introduce any user-facing change?
No
How was this patch tested?
A new Unit test + manually test on a cluster