-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-33611][UI] Avoid encoding twice on the query parameter of rewritten proxy URL #30552
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also try decoding twice here. It works as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this one, I am not very confident to decode the URI, which may break the authority part of the URI.
To be safe, let's decode the parameter only in this PR.
|
Test build #131995 has finished for PR 30552 at commit
|
0b84816 to
90d82c4
Compare
ef0a0f3 to
b8d3056
Compare
| } | ||
| rewrittenURI.normalize() | ||
| // SPARK-33611: use method `URI.create` to avoid percent-encoding twice on the query string. | ||
| URI.create(uri.toString() + queryString).normalize() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
toString() is unnecesary here BTW.
So this relies on query already being escaped. That may be fine for the logic of this code, although in another world it would have been better to let this method take unescaped query strings and manage it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion, Sean.
toString() is unnecesary here BTW.
uri is StringBuilder here. The original code also use toString().
So this relies on query already being escaped. That may be fine for the logic of this code, although in another world it would have been better to let this method take unescaped query strings and manage it here.
The query string here should be encoded already. I am not very familiar with this topic. Is there any better solution to handle both escaped/unescaped query string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh right, OK, ignore the toString() comment.
I just mean that semantically, encoding is a URI detail. The program should probably not worry about it except when going to and from URIs. But that's not how this code is structured. It clearly already expects and escaped string as argument, which then means callers have to deal with it. But, I don't think it's necessary to change. If it is escaped then yes this is a fix.
|
@cloud-fan @srowen Thanks for the review! |
|
Merging to master and 3.0 |
…itten proxy URL
### What changes were proposed in this pull request?
When running Spark behind a reverse proxy(e.g. Nginx, Apache HTTP server), the request URL can be encoded twice if we pass the query string directly to the constructor of `java.net.URI`:
```
> val uri = "http://localhost:8081/test"
> val query = "order%5B0%5D%5Bcolumn%5D=0" // query string of URL from the reverse proxy
> val rewrittenURI = URI.create(uri.toString())
> new URI(rewrittenURI.getScheme(),
rewrittenURI.getAuthority(),
rewrittenURI.getPath(),
query,
rewrittenURI.getFragment()).toString
result: http://localhost:8081/test?order%255B0%255D%255Bcolumn%255D=0
```
In Spark's stage page, the URL of "/taskTable" contains query parameter order[0][dir]. After encoding twice, the query parameter becomes `order%255B0%255D%255Bdir%255D` and it will be decoded as `order%5B0%5D%5Bdir%5D` instead of `order[0][dir]`. As a result, there will be NullPointerException from https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/status/api/v1/StagesResource.scala#L176
Other than that, the other parameter may not work as expected after encoded twice.
This PR is to fix the bug by calling the method `URI.create(String URL)` directly. This convenience method can avoid encoding twice on the query parameter.
```
> val uri = "http://localhost:8081/test"
> val query = "order%5B0%5D%5Bcolumn%5D=0"
> URI.create(s"$uri?$query").toString
result: http://localhost:8081/test?order%5B0%5D%5Bcolumn%5D=0
> URI.create(s"$uri?$query").getQuery
result: order[0][column]=0
```
### Why are the changes needed?
Fix a potential bug when Spark's reverse proxy is enabled.
The bug itself is similar to #29271.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Add a new unit test.
Also, Manual UI testing for master, worker and app UI with an nginx proxy
Spark config:
```
spark.ui.port 8080
spark.ui.reverseProxy=true
spark.ui.reverseProxyUrl=/path/to/spark/
```
nginx config:
```
server {
listen 9000;
set $SPARK_MASTER http://127.0.0.1:8080;
# split spark UI path into prefix and local path within master UI
location ~ ^(/path/to/spark/) {
# strip prefix when forwarding request
rewrite /path/to/spark(/.*) $1 break;
#rewrite /path/to/spark/ "/" ;
# forward to spark master UI
proxy_pass $SPARK_MASTER;
proxy_intercept_errors on;
error_page 301 302 307 = handle_redirects;
}
location handle_redirects {
set $saved_redirect_location '$upstream_http_location';
proxy_pass $saved_redirect_location;
}
}
```
Closes #30552 from gengliangwang/decodeProxyRedirect.
Authored-by: Gengliang Wang <[email protected]>
Signed-off-by: Gengliang Wang <[email protected]>
(cherry picked from commit 5d0045e)
Signed-off-by: Gengliang Wang <[email protected]>
What changes were proposed in this pull request?
When running Spark behind a reverse proxy(e.g. Nginx, Apache HTTP server), the request URL can be encoded twice if we pass the query string directly to the constructor of
java.net.URI:In Spark's stage page, the URL of "/taskTable" contains query parameter order[0][dir]. After encoding twice, the query parameter becomes
order%255B0%255D%255Bdir%255Dand it will be decoded asorder%5B0%5D%5Bdir%5Dinstead oforder[0][dir]. As a result, there will be NullPointerException from https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/status/api/v1/StagesResource.scala#L176Other than that, the other parameter may not work as expected after encoded twice.
This PR is to fix the bug by calling the method
URI.create(String URL)directly. This convenience method can avoid encoding twice on the query parameter.Why are the changes needed?
Fix a potential bug when Spark's reverse proxy is enabled.
The bug itself is similar to #29271.
Does this PR introduce any user-facing change?
No
How was this patch tested?
Add a new unit test.
Also, Manual UI testing for master, worker and app UI with an nginx proxy
Spark config:
nginx config: