Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ngx.escape_uri should escape RFC 3986 section 2.2 Reserved Characters #1124

Open
hunterli opened this issue Jul 26, 2017 · 16 comments
Open

ngx.escape_uri should escape RFC 3986 section 2.2 Reserved Characters #1124

hunterli opened this issue Jul 26, 2017 · 16 comments

Comments

@hunterli
Copy link

hunterli commented Jul 26, 2017

$ python -c "import urllib; print urllib.quote(\"'\")"
%27

$ resty -e "ngx.say(ngx.escape_uri(\"'\"))"
'
ngx.escape_uri should escape "'" to %27 but not.

$resty -v
resty 0.17
nginx version: openresty/1.11.2.3
built by gcc 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4)
built with OpenSSL 1.0.2k 26 Jan 2017
TLS SNI support enabled
configure arguments: --prefix=/usr/local/openresty/nginx --with-cc-opt=-O2 --add-module=../ngx_devel_kit-0.3.0 --add-module=../echo-nginx-module-0.60 --add-module=../xss-nginx-module-0.05 --add-module=../ngx_coolkit-0.2rc3 --add-module=../set-misc-nginx-module-0.31 --add-module=../form-input-nginx-module-0.12 --add-module=../encrypted-session-nginx-module-0.06 --add-module=../srcache-nginx-module-0.31 --add-module=../ngx_lua-0.10.8 --add-module=../ngx_lua_upstream-0.06 --add-module=../headers-more-nginx-module-0.32 --add-module=../array-var-nginx-module-0.05 --add-module=../memc-nginx-module-0.18 --add-module=../redis2-nginx-module-0.14 --add-module=../redis-nginx-module-0.3.7 --add-module=../rds-json-nginx-module-0.14 --add-module=../rds-csv-nginx-module-0.07 --with-ld-opt=-Wl,-rpath,/usr/local/openresty/luajit/lib --with-openssl=/tmp/openssl-1.0.2k --with-pcre=/tmp/pcre-8.39 --with-file-aio --with-http_addition_module --with-http_auth_request_module --with-http_dav_module --with-http_flv_module --with-http_geoip_module=dynamic --with-http_gunzip_module --with-http_gzip_static_module --with-http_image_filter_module=dynamic --with-http_mp4_module --with-http_random_index_module --with-http_realip_module --with-http_secure_link_module --with-http_slice_module --with-http_ssl_module --with-http_stub_status_module --with-http_sub_module --with-http_v2_module --with-http_xslt_module=dynamic --with-ipv6 --with-mail --with-mail_ssl_module --with-md5-asm --with-pcre-jit --with-sha1-asm --with-stream --with-stream_ssl_module --with-threads

@agentzh
Copy link
Member

agentzh commented Jul 26, 2017

@hunterli Will you contribute a pull request for it? Thanks!

@hunterli
Copy link
Author

@agentzh OK, I'll try it.

@agentzh
Copy link
Member

agentzh commented Jul 27, 2017

@hunterli Great! Looking forward to it!

agentzh pushed a commit that referenced this issue Jul 27, 2017
ngx.escape_uri should also escape RFC 3986 section 2.2 Reserved Characters.

Signed-off-by: Yichun Zhang (agentzh) <[email protected]>
@spacewander
Copy link
Member

I am afraid I don't think it is a bug.

According to the document, ngx.escape_uri is used to:

Escape str as a URI component.

Although its name is escape_uri.

Escape uri is different from escaping URI component.
The second one doesn't need to escape '.
Refer to RFC 3986 section 2.2:

A subset of the reserved characters (gen-delims) is used as
delimiters of the generic URI components described in Section 3

Note that the RFC only requires the gen-delims part of reserved characters. Not all the reserved characters.

What urllib.quote does is more like escaping uri instead of escaping URI component.
According to its document, urllib.quote is used to:

By default, this function is intended for quoting the path section of the URL.

Don't mix up these two differnt things.

@hunterli
Copy link
Author

hunterli commented Jul 28, 2017

@spacewander
FYI: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/encodeURIComponent said:
To be more stringent in adhering to RFC 3986 (which reserves !, ', (, ), and *), even though these characters have no formalized URI delimiting uses, the following can be safely used

@spacewander
Copy link
Member

@hunterli
Could you point out where RFC 3986 says that !, ', (, ), and * are reserved for URI component?

@spacewander
Copy link
Member

One place I found including !, ', (, ), and * is the D.2. Modifications, under Appendix D. Changes from RFC 2396 :

The mark characters that are typically unsafe to decode,
including the exclamation mark ("!"), asterisk ("*"), single-quote
("'"), and open and close parentheses ("(" and ")"), have been moved
to the reserved set in order to clarify the distinction between
reserved and unreserved and, hopefully, to answer the most common
question of scheme designers.

However, it just says ' is reserved compared with RFC 2396, doesn't say it is reserved for URI component.

According to Appendix A. Collected ABNF for URI, ' is one of the sub-delims, which could be a part of query or fragment.

@hunterli
Copy link
Author

@spacewander I test url encode function in php and lua, both of them output %27. So may be compatible with other programming language to escape sub-delims is a acceptable answer? (Sorry for poor English)

php -r "echo urlencode("test'test");"
test%27test
FYI: http://lua-users.org/wiki/StringRecipes

URL-encode a string

function url_encode(str)
if (str) then
str = string.gsub (str, "\n", "\r\n")
str = string.gsub (str, "([^%w %-%_%.%~])",
function (c) return string.format ("%%%02X", string.byte(c)) end)
str = string.gsub (str, " ", "+")
end
return str
end

@spacewander
Copy link
Member

Why not be compatible with Javascript? It doesn't escape ', too. And it doesn't strictly follow the RFC 3986, either. Do you like PHP more than Javascript? 😄

Another option is do what Go developers did: they just simply said:

we've accepted that we can't fully implement the RFC at this point for compatibility reasons.

Luckily, our previous implementation is more compatible with RFC 3986. Why should we break the compatibility with RFC 3986 and break the compatibility with former code at the same time?

If your target environment requires special encode rule, just do it in your code.

spacewander added a commit to spacewander/lua-nginx-module that referenced this issue Jul 31, 2017
…penresty#1124"

Commit f170505 breaks the compatibility
with RFC 3986. Here is two reasons:
1. Quote from RFC 3986 Section 2.2:
> A subset of the reserved characters (gen-delims) is used as
delimiters of the generic URI components described in Section 3

Note that RFC 3986 says 'a subset of the reserved characters
(gen-delims)', not all the reserved characters. The characters escaped
in that commit are 'sub-delims'. They are not required to be escaped
according to Section 2.2.

2. Refer to RFC 3986 "Appendix A.  Collected ABNF for URI", sub-delims
could be used as part of query and other components. This use case shows
that sub-delims are valid in some component of URI. Therefore, it would
be better if we don't escape them for URI component.
agentzh pushed a commit that referenced this issue Aug 1, 2017
…1124"

Commit f170505 breaks the compatibility
with RFC 3986. Here is two reasons:
1. Quote from RFC 3986 Section 2.2:
> A subset of the reserved characters (gen-delims) is used as
delimiters of the generic URI components described in Section 3

Note that RFC 3986 says 'a subset of the reserved characters
(gen-delims)', not all the reserved characters. The characters escaped
in that commit are 'sub-delims'. They are not required to be escaped
according to Section 2.2.

2. Refer to RFC 3986 "Appendix A.  Collected ABNF for URI", sub-delims
could be used as part of query and other components. This use case shows
that sub-delims are valid in some component of URI. Therefore, it would
be better if we don't escape them for URI component.

Signed-off-by: Yichun Zhang (agentzh) <[email protected]>
@hunterli
Copy link
Author

hunterli commented Aug 2, 2017

@spacewander I "Try Go" in https://golang.org:

package main

import(
"fmt"
"net/url"
)

func main() {
fmt.Println(url.QueryEscape("'"))
}

output:%27

@hunterli
Copy link
Author

hunterli commented Aug 2, 2017

@spacewander In RFC 3986 2.2 Reserved Characters:

  reserved    = gen-delims / sub-delims

  gen-delims  = ":" / "/" / "?" / "#" / "[" / "]" / "@"

  sub-delims  = "!" / "$" / "&" / "'" / "(" / ")"
              / "*" / "+" / "," / ";" / "="

For "&" and "'" both in sub-delims, ngx.escape_uri encode "&", why not "'"?

@spacewander
Copy link
Member

@hunterli

  1. Go's implementation doesn't follow the RFC 3986. And for backward compability, they don't want to fix it. See proposal: net/url: adhere to RFC3986 golang/go#16127.
  2. The original ngx.escape_uri doesn't strictly follow the RFC 3986. However, escape other sub-delims will make it less compatible with the RFC 3986 and also break the backward compability.

@hunterli
Copy link
Author

hunterli commented Aug 3, 2017

被你弄糊涂了,Sorry,中文直接上了,看不懂就放弃了。也可能是我理解错误,RFC说"Reserved Characters"都应该escape,"Reserved Characters"包含gen-delims和sub-delims,这样的话gen-delims和sub-delims都应该escape,否则就是"白马非马"了。

@spacewander
Copy link
Member

RFC说"Reserved Characters"都应该escape

Could you point out where RFC 3986 says that all reserved characters should be escaped for URI component?
Refer to Appendix A. Collected ABNF for URI, sub-delims could be a valid part of some URI components.

@hunterli
Copy link
Author

hunterli commented Aug 3, 2017

URI producing applications should percent-encode data octets that
correspond to characters in the reserved set unless these characters
are specifically allowed by the URI scheme to represent data in that
component.

这个是什么意思呀?通常的用户不是这样用么:
http://example.com/query?test=ngx.escape("test=&'test")
如果"&","=","'"之类做为分隔符,就程序员自己放在ngx.escape函数外面吧?

Appendix A. 里面说:

query = *( pchar / "/" / "?" )

那么"&","="都不应该escape?能这样理解么?
http://example.com/query?test=value_part1=value_part2&value_part3=foo
test的值是:value_part1=value_part2&value_part3=foo
这样的URL是正确可解析的?很多WebServer应该都不这样理解吧。

@monkeyDluffy6017
Copy link
Contributor

monkeyDluffy6017 commented Apr 24, 2023

Why does the openresty implementation not match nginx?
Migrating from nginx can be confusing
I won't discuss whose implementation follows rfc, but I think openresty should try to be consistent with nginx

For openresty:
image

For nginx:
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants