Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/http: add Request.RequestURI #2782

Closed
garyburd opened this issue Jan 25, 2012 · 7 comments
Closed

net/http: add Request.RequestURI #2782

garyburd opened this issue Jan 25, 2012 · 7 comments
Milestone

Comments

@garyburd
Copy link
Contributor

The net/url Parse function percent-decodes the path component of the URL.  This is a
lossy operation. 

An application cannot distinguish between "/" used as a path segment delimiter
and "/" encoded in a path segment. 

func main() {
    u, err := url.Parse("http://example.com/x/y%2fz";)
    if err != nil {
        panic(err)
    }
    fmt.Println(u.Path)
}

Output: /x/y/z

The second "/" is a delimiter. The third "/" is not. 

An application cannot recover the original encoding of the path from the parsed URL.  

func main() {
    u, err := url.Parse("http://example.com/%2b%3f";)
    if err != nil {
        panic(err)
    }
    fmt.Println(u)
}

Output: /+?

The application cannot determine from the parsed URL that the "+" is percent
encoded and that the "?" is percent-encoded in lowercase hex.

Scenarios where the information loss is a problem:

- The HTTP client in the net/http package cannot request resources with a path segment
containing "/".

- A server handler for the net/http package cannot implement OAuth 1.0 correctly. In
OAuth 1.0, the client signs parts of the request including the raw path. A handler
cannot in general verify the signature because the handler cannot get the raw path from
the parsed URL. 

Which compiler are you using (5g, 6g, 8g, gccgo)? 6g
Which operating system are you using? Lion
Which revision are you using?  (hg identify) 9f2be4fbbf69 weekly/weekly.2012-01-20
Please provide any additional information below.

Here's how libraries for other languages handle URL path encoding:

Python urlparse module: returns raw path
Ruby URI module: returns raw path
Java URI class: getPath() returns decoded path, getRawPath() returns raw path.
@rsc
Copy link
Contributor

rsc commented Jan 25, 2012

Comment 1:

I don't see anything in the RFCs that suggests it is required
to preserve these kinds of distinctions, and I think it is execrable
that people have designed web protocols that depend on them.
I don't want to make the URL interface worse by having to
handle this.  Instead, I propose to add a field
    RequestURI string
to the http.Request struct; for an incoming request, this field
will hold the entire raw URL (what followed the GET, uninterpreted)
and for an outgoing request, this field, if non-empty, takes precedence
over the URL field.

Labels changed: added priority-go1, removed priority-triage.

Owner changed to [email protected].

Status changed to Accepted.

@garyburd
Copy link
Contributor Author

Comment 2:

The OAuth 1.0 protocol requires access to the request URI. Preserving the encoding in
the parsed URL is only an issue if the parsed URL is the only access the request URI.
The RFCs do require a distinction between between the path delimiter "/" and and an
encoded "/" in a path segment. 
The proposed RequestURI field directly addresses the OAuth scenario and can be used as a
workaround for the "/" issue.

@rsc
Copy link
Contributor

rsc commented Jan 25, 2012

Comment 3:

Thanks.  I'm glad the RequestURI will address the needs here.
I read the RFCs (2616 and 3986) again last night and I can't figure out
where it says that URL-processing code must preserve the distinction
between a path delimiter "/" and an encoded "/" in a path segment.
Just for my own education, can you point out the relevant RFC and
section for me?  Thanks.
Russ

@garyburd
Copy link
Contributor Author

Comment 4:

Section 2.2 of RFC 3986 defines "/" as a reserved character.
The same section says:
   The purpose of reserved characters is to provide a set of delimiting
   characters that are distinguishable from other data within a URI.
   URIs that differ in the replacement of a reserved character with its
   corresponding percent-encoded octet are not equivalent.  Percent-
   encoding a reserved character, or decoding a percent-encoded octet
   that corresponds to a reserved character, will change how the URI is
   interpreted by most applications.  Thus, characters in the reserved
   set are protected from normalization and are therefore safe to be
   used by scheme-specific and producer-specific algorithms for
   delimiting data subcomponents within a URI.

@rsc
Copy link
Contributor

rsc commented Jan 25, 2012

Comment 5:

My reading was that that only applies before you've
split the URL into pieces, but I've been confused before.
Oh well.

@bradfitz
Copy link
Contributor

Comment 6:

http://golang.org/cl/5580044/

Owner changed to @bradfitz.

Status changed to Started.

@bradfitz
Copy link
Contributor

Comment 7:

This issue was closed by revision 899cd04.

Status changed to Fixed.

@rsc rsc added this to the Go1 milestone Apr 10, 2015
@rsc rsc removed the priority-go1 label Apr 10, 2015
rsc added a commit that referenced this issue Jun 22, 2015
Historically we have declined to try to provide real support for URLs
that contain %2F in the path, but they seem to be popping up more
often, especially in (arguably ill-considered) REST APIs that shoehorn
entire paths into individual path elements.

The obvious thing to do is to introduce a URL.RawPath field that
records the original encoding of Path and then consult it during
URL.String and URL.RequestURI. The problem with the obvious thing
is that it breaks backward compatibility: if someone parses a URL
into u, modifies u.Path, and calls u.String, they expect the result
to use the modified u.Path and not the original raw encoding.

Split the difference by treating u.RawPath as a hint: the observation
is that there are many valid encodings of u.Path. If u.RawPath is one
of them, use it. Otherwise compute the encoding of u.Path as before.

If a client does not use RawPath, the only change will be that String
selects a different valid encoding sometimes (the original passed
to Parse).

This ensures that, for example, HTTP requests use the exact
encoding passed to http.Get (or http.NewRequest, etc).

Also add new URL.EscapedPath method for access to the actual
escaped path. Clients should use EscapedPath instead of
reading RawPath directly.

All the old workarounds remain valid.

Fixes #5777.
Might help #9859.
Fixes #7356.
Fixes #8767.
Fixes #8292.
Fixes #8450.
Fixes #4860.
Fixes #10887.
Fixes #3659.
Fixes #8248.
Fixes #6658.
Reduces need for #2782.

Change-Id: I77b88f14631883a7d74b72d1cf19b0073d4f5473
Reviewed-on: https://go-review.googlesource.com/11302
Reviewed-by: Brad Fitzpatrick <[email protected]>
@golang golang locked and limited conversation to collaborators Jun 24, 2016
This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants