Skip to content

Migrate some URIUtil methods to HttpScheme for proper normalization #11390

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

joakime
Copy link
Contributor

@joakime joakime commented Feb 8, 2024

  • Deprecate URIUtil methods that are not normalizing URI strings properly from component parts.
  • Add relevant HttpScheme methods to perform proper normalization of URI components.
  • Migrate all existing code to use new HttpScheme methods.

@joakime joakime added this to the 12.0.x milestone Feb 8, 2024
@joakime joakime requested a review from sbordet February 8, 2024 19:37
@joakime joakime self-assigned this Feb 8, 2024
* the returned String might not include a port, and will not include the ending {@code /} character.
* @see #appendNormalizedUri(StringBuilder, String, String, int)
*/
public static String normalizeUri(String scheme, String server, int port)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer normalizeURI() with URI all capital.

* the returned String might not include a port.
* @see #appendNormalizedUri(StringBuilder, String, String, int)
*/
public static String normalizeUri(String scheme, String server, int port, String path, String query)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto re URI.
I would also add fragment as this method may be used on the client too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, but I hope the client isn't sending the fragment on the HTTP protocol, as that's not spec compliant.

* @param server the server hostname, may not be blank or null.
* @param port the port
*/
public static void appendNormalizedUri(StringBuilder url, String scheme, String server, int port)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto re fragment.
I would rename it to concatNormalizeURI().

{
url.append(path);
if (StringUtil.isNotBlank(query))
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drop braces.

@@ -357,7 +358,7 @@ static String toRedirectURI(Request request, String location)
{
// make the location an absolute URI
StringBuilder url = new StringBuilder(128);
URIUtil.appendSchemeHostPort(url, uri.getScheme(), Request.getServerName(request), Request.getServerPort(request));
HttpScheme.appendNormalizedUri(url, uri.getScheme(), Request.getServerName(request), Request.getServerPort(request));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use the method that also takes the path and query, since you are adding the location in the next line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, as location already has the query considered, it is not a pure path.

@@ -217,7 +217,8 @@ protected String filterServerResponseHeader(HttpServletRequest request, Response
URI locationURI = URI.create(headerValue).normalize();
if (locationURI.isAbsolute() && isBackendLocation(locationURI))
{
StringBuilder newURI = URIUtil.newURIBuilder(request.getScheme(), request.getServerName(), request.getServerPort());
StringBuilder newURI = new StringBuilder();
HttpScheme.appendNormalizedUri(newURI, request.getScheme(), request.getServerName(), request.getServerPort());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code below this line adds path, query and fragment, so you can just use the new method with more parameters.

response.setContentLength(0);
baseRequest.getResponse().sendRedirect(_redirectCode, url, true);
baseRequest.getResponse().sendRedirect(_redirectCode, url.toString(), true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

url is already a String, no need to call toString()?

@@ -293,7 +294,7 @@ private void attemptLogoutRedirect(ServletRequest request)
if (_logoutRedirectPath != null)
{
StringBuilder sb = new StringBuilder(128);
URIUtil.appendSchemeHostPort(sb, request.getScheme(), request.getServerName(), request.getServerPort());
HttpScheme.appendNormalizedUri(sb, request.getScheme(), request.getServerName(), request.getServerPort());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the method with more arguments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not appropriate to use longer version here, as the next two lines build up a path.

request.getServerPort(),
locationURI.getRawPath(),
locationURI.getRawQuery());
String component = locationURI.getRawFragment();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add the fragment as parameter to the method with many parameters.

response.setContentLength(0);
response.sendRedirect(url, true);
response.sendRedirect(url.toString(), true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

url already a String?

Copy link
Contributor

@gregw gregw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm very cautious about changes like this.... they often have unintended consequences.
Can we bump this to 12.0.8, merge it immediately after 12.0.7 and give it a full month of testing?

@joakime joakime requested a review from sbordet February 13, 2024 20:14
Copy link
Contributor

@gregw gregw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merge of 11.0.x to 12.0.x of PR #11389

That should have been done in it's own PR, as it is not controversial.

Deprecate URIUtil methods that are not normalizing URI strings properly from component parts.

Can you explain what is incorrect about the current normalization?

Add relevant HttpScheme methods to perform proper normalization of URI components.

HttpScheme is the wrong location for these normalization methods. Keep them in URIUtil.

Migrate all existing code to use new HttpScheme methods.

You need to motivate why we are doing this. There are behaviour changes, API changes and class/location changes all in one PR, so it is hard to pick apart why/what/how

* @param query the optional query (any encoding is assumed to be performed already)
* @param fragment the optional fragment (any encoding is assumed to be performed already)
*/
public static void concatNormalizedURI(StringBuilder url, String scheme, String server, int port, String path, String query, String fragment)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of these new methods, and this one specifically, are not HttpScheme methods, but rather URIUtil. This class should only have schemes and their metadata, not arbitrary URI util methods.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would wind up duplicating HttpScheme in jetty-util then.
This change to move the URI behaviors that need scheme knowledge to HttpScheme was specifically requested by @sbordet

@joakime
Copy link
Contributor Author

joakime commented Feb 14, 2024

Merge of 11.0.x to 12.0.x of PR #11389

That should have been done in it's own PR, as it is not controversial.

Done, this is not relevant to this PR.

Deprecate URIUtil methods that are not normalizing URI strings properly from component parts.

Can you explain what is incorrect about the current normalization?

Note: The specific changes in this PR was requested by @sbordet

URIUtil version only works for a subset of schemes.
Scheme behavior is in HttpScheme class, but that class is jetty-http.
URIUtil version only works for schemes that are all lowercase too.
Also of note, URI spec about scheme case - https://datatracker.ietf.org/doc/html/rfc3986#section-3.1 (when we produce a URI it should be lowercase)
URIUtil version isn't used consistently across the codebase.

We've had reports that we generate URL/URI in various places that are triggering various auditing / security layers when it encounters things that are viewed as "suspicious" (a loosely defined term meaning "we do things that don't follow the specs" and is an indication of nefarious actors).
eg: Generating non-lowercase schemes, using port 80 or 443 in the URI for specific schemes (so far we have reports against http, https, ws, and wss. but there could be more), etc.
We gotten reports about this issue in our server Location response header code, websocket code, proxy code, client request generation code, HttpServletRequest getters, etc.
When auditing our own code, we also see issues in OpenID, constraint Confidential handling, and balancer servlet.

Add relevant HttpScheme methods to perform proper normalization of URI components.

HttpScheme is the wrong location for these normalization methods. Keep them in URIUtil.

The building of the normalized URI needs scheme knowledge to do a critical piece properly (the port).
If HttpScheme was in jetty-util then this PR would be much smaller.

Migrate all existing code to use new HttpScheme methods.

You need to motivate why we are doing this. There are behaviour changes, API changes and class/location changes all in one PR, so it is hard to pick apart why/what/how

Consistency, we don't normalize scheme / port combinations properly across the codebase.
The rationale was that the HttpScheme class was created to standardize things with all things HTTP scheme related, so put it there.

At this point, this PR is the completion of the task set to me from @sbordet
I can see both arguments from @sbordet and yourself.
While I can see the benefits of this PR, I'll ultimate defer to you two on the future of this PR.

@gregw
Copy link
Contributor

gregw commented Feb 15, 2024

@joakime thanks for the explanations.

I think we should do as you suggest and effectively "move" HttpScheme into org.eclip.jetty.util, or at least move the knowledge about default ports. The HttpScheme enum is rarely use the enum directly and when we do it is only really expecting http or https and we mostly convert to a string soon after that. It is already not named correctly once we added "ws" and "wss" to it.

So either we can create an org.eclipse.jetty.util.UriScheme class with metadata about known schemes and util methods to format them; or perhaps just add that to URIUtil. This can then include metadata about all common schemes, not just http. We can keep HttpScheme and it can obtain it's metadata from UriScheme/UriUtil or potentially we can deprecate it. The scheme CACHE can be moved to HttpURI.

The methods in URIUtil can then be corrected to do the normalization correctly (and new APIs added if the old ones are not good) and we could then see some explicit changes in URIUtilTest that make it clear how we have changed normalization. That way we don't end up with unused wrong normalization methods that might accidentally get used again in future, plus new normalization methods with different behaviours.

@sbordet thoughts?

@sbordet
Copy link
Contributor

sbordet commented Feb 16, 2024

@gregw the scheme and the port are related and HttpScheme (despite its name) already has constants for WS and WSS too.

Producing a proper URI string by concatenation requires scheme/port knowledge, so the concatenation logic must be close to HttpScheme.

I don't see why are you proposing to move everything to oej.util, since all the needs of these concatenations are in HTTP handling code.

I can live with deprecating HttpScheme and introducing URIScheme as a more generic replacement (and less confusing name when used for WebSocket) but I think it should stay in jetty-http.

@gregw
Copy link
Contributor

gregw commented Feb 16, 2024

@sbordet if we add URI normalization methods to oej.http, then we end up with two sets of URI normalization: in oej.util.URIUtil and oej.http.Whatever.

We already have URIUtil to do URI normalization and other URI tasks such as "Producing a proper URI string by concatenation". If it needs better knowledge of specific schemes to do scheme specific normalization, then lets move that knowledge down from oej.http to 'oej.util` and fix the normalization methods.

Adding oej.util.URIScheme and fixing URIUtil seams like the simplest way forward rather than introducing a whole new set of URI utility classes in a http specific package.

@joakime
Copy link
Contributor Author

joakime commented Feb 16, 2024

@sbordet if we add URI normalization methods to oej.http, then we end up with two sets of URI normalization: in oej.util.URIUtil and oej.http.Whatever.

Thing is jetty-util is also too far down the dependency tree for all of this.
Of all of the other jetty sub-modules that use URIUtil (that are not URIUtilTest) they all depend on jetty-http too.

Knowing this truth about our current codebase in HEAD, why not just use HttpURI instead? See Draft PR #11415 as an alternate solution.

@gregw
Copy link
Contributor

gregw commented Feb 16, 2024

@sbordet if we add URI normalization methods to oej.http, then we end up with two sets of URI normalization: in oej.util.URIUtil and oej.http.Whatever.

Thing is jetty-util is also too far down the dependency tree for all of this. Of all of the other jetty sub-modules that use URIUtil (that are not URIUtilTest) they all depend on jetty-http too.

Knowing this truth about our current codebase in HEAD, why not just use HttpURI instead? See Draft PR #11415 as an alternate solution.

@joakime HttpURI is certainly a better location for normalization than HttpScheme, especially as it already has builder methods.

However, it still does leave URIUtil with substandard normalization. We could make everything use HttpURI, but then it is poorly named as ws, file, etc. are not http.

What does "too far down" mean? I understand that currently URIUtil cannot see HttpScheme, but that does not prevent us creating URIScheme with all the meta data for commonly known schemes in it.

We can then fix the normalization that exists in URIUtil and we can also have builders in HttpURI that normalize correctly.

@joakime
Copy link
Contributor Author

joakime commented Feb 17, 2024

Closing in favor of #11415 that uses a mix of HttpURI and new URIUtil methods

@joakime joakime closed this Feb 17, 2024
@joakime joakime deleted the fix/12.0.x/uriutil-append-scheme-host-port branch February 28, 2024 13:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Status: ✅ Done
Development

Successfully merging this pull request may close these issues.

When producing URI/URL strings follow spec and produce lowercase schemes and drop default ports
3 participants