-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow not validating target #453
Comments
Here is another example of this wojtekmach/req#270 (comment) (cc @thbar):
@PragTob if you URI-encode offending characters in your requests, would they still work? It works in the example above: iex> {:ok, conn} = Mint.HTTP.connect(:https, "api.oisemob.cityway.fr", 443)
iex> url = "/dataflow/offre-tc/download?provider=COROLIS_URB|COROLIS_INT&dataFormat=NETEX&dataProfil=OPENDATA"
iex> url = String.replace(url, "|", URI.encode("|"))
iex> {:ok, conn, ref} = Mint.HTTP.request(conn, "GET", url, [], "") Instead of making Mint more relaxed, while I'm not super looking forward to doing that, I think I can automatically URI-encode a given URL in Req. (I already do when using WDYT? |
@wojtekmach thanks for the comment 🙏 The issue I had with URL encoding the query string was that I think it'll usually end up encoding things twice. I.e. if I exchange my sanitize query code with just
And yeah I don't want to double encode values I think. Or maybe I'm not getting your approach 🤔 A solution in To illustrate, I think it may be helpful to share the code & tests we're currently using for the escaping: SanitizeQueryStringTestdefmodule SanitizeQueryStringTest do
use ExUnit.Case, async: true
import SanitizeQueryString
describe "sanitize/1" do
test "an normal query string survives easily" do
query_string = "a=123"
assert query_string == sanitize(query_string)
end
test "an empty query string stays an empty query string" do
assert "" == sanitize("")
end
test "nil returns an empty query string" do
assert "" == sanitize(nil)
end
test "macros are indeed removed from the query string" do
assert "a=b&d=%5B%25woof%25%5D&c=%24%7Bmagic%7D&b=%25%25mega%25%25" ==
sanitize("a=b&d=[%woof%]&c=${magic}&b=%%mega%%")
end
test "empty values are also handled fine" do
assert "a=" == sanitize("a=")
assert "a=1&b=" == sanitize("a=1&b=")
end
test "just keys we also leave alone" do
assert "key" == sanitize("key")
assert "oof=first&ha" == sanitize("oof=first&ha")
end
test "if somehow the query has more `=`s than it should have" do
assert "key=val%3Dval" == sanitize("key=val=val")
assert "key=val&key2=val%3Dvaleria" == sanitize("key=val&key2=val=valeria")
end
test "if it's already encoded we don't re-encode it" do
assert "key=val%3Dval" == sanitize("key=val%3Dval")
end
test "`%ebuy!` while technically valid isn't something we want as ! is reserved" do
# also bypass exploded
assert "key=%25ebuy%21" == sanitize("key=%ebuy!")
end
test "leaves normal values alone even with lots of % encodings" do
query_string =
URI.encode_query(%{"a" => "1234", "magic" => "like-to-talk", "text" => "!!!!11232?%$"})
assert query_string == sanitize(query_string)
end
test "handles [%...%] macros" do
assert "key=%5B%25macro%25%5D" == sanitize("key=[%macro%]")
end
test "handles ${...} macros" do
assert "key=%24%7Bmacro%7D" == sanitize("key=${macro}")
end
test "handles %%...%% macros" do
assert "key=%25%25macro%25%25" == sanitize("key=%%macro%%")
end
# if you did a full string regex approach the eagerness may easily lead to this (see U modifier)
test "does not overeagerly snatch up values in-between macros" do
query_string = "a=%%magic%%&key=value&b=%%macro%%"
assert "a=%25%25magic%25%25&key=value&b=%25%25macro%25%25" == sanitize(query_string)
end
@leave_alone_values [
"${miss-closing",
"%%abds%",
"%huhu%%",
"abc${dfg}",
"abcd%%huha%%",
"so[%this-is-not-start-of-the-value%]",
"[%this-is-not-end-of-the-value%]so-much-more",
"[%am-partial]",
"[other-partial%]",
"{missing-dollars}"
]
for value <- @leave_alone_values do
@value value
test "#{value} is encoded but decoded it's the same again" do
query_string = "some-key=#{@value}"
sanitized = sanitize(query_string)
decoded = URI.decode(sanitized)
assert query_string == decoded
end
end
test "and how about someone puts an entire URL in there" do
assert "url=https%3A%2F%2Fbpi.rtactivate.com%2Ftag%2Fid%2FD1107" ==
sanitize("url=https://bpi.rtactivate.com/tag/id/D1107")
end
test "and how about someone puts an entire URL in there, but escaped" do
already_escaped = "url=https%3A%2F%2Fbpi.rtactivate.com%2Ftag%2Fid%2FD1107"
assert already_escaped == sanitize(already_escaped)
end
end
end SanitizeQueryStringdefmodule SanitizeQueryString do
# _should_ be safe to do, see:
# * https://url.spec.whatwg.org/#urlencoded-parsing
# * https://www.rfc-editor.org/rfc/rfc3986#section-2.2
def sanitize(query_string) when is_binary(query_string) and query_string != "" do
query_string
|> String.split("&")
|> Enum.map_join("&", &sanitize_values/1)
end
def sanitize(_), do: ""
defp sanitize_values(sequence) do
sequence
# We expect only one `=` - otherwise encode it
|> String.split("=", parts: 2)
|> escape_value()
end
defp escape_value([key, original_value]) do
sanitized_value =
if needs_sanitization?(original_value) do
URI.encode_www_form(original_value)
else
original_value
end
"#{key}=#{sanitized_value}"
end
defp escape_value([key]), do: key
defp needs_sanitization?(string) do
# Went through many iterations before we got here.
# Went this way because:
# 1. wanted to stop playing whack-a-mole with ever new macro patterns and other weird chars
# 2. it's not as easy as checking for unescaped characters and escaping, as "%%" is common, but not a valid escape
!valid_percent_encoding?(string)
end
# https://url.spec.whatwg.org/#percent-decode
defp valid_percent_encoding?(string)
@valid_hex_characters ~c"0123456789abcdefABCDEF"
# percent encoding is % followed by 2 hex characters
defp valid_percent_encoding?(<<?%, hex1, hex2, rest::binary>>)
when hex1 in @valid_hex_characters and hex2 in @valid_hex_characters do
valid_percent_encoding?(rest)
end
defp valid_percent_encoding?(<<char, rest::binary>>) do
# Technically this _could_ be `URI.char_unescaped?/1` but bypass exploded for
# something like `%ebuy!` which isn't a great sign for its general applicably
if URI.char_unreserved?(char) do
valid_percent_encoding?(rest)
else
false
end
end
# everything was valid, so guess it's valid!
defp valid_percent_encoding?(<<>>), do: true
end |
Fair enough about not wanting to double-encode. Are you able to use Req params feature by any chance? iex> r = Req.new(url: "/", params: [key: "val=val"]) ; Req.Request.prepare(r).url |> to_string()
"/?key=val%3Dval" |
@wojtekmach I don't think we are. Our use cases/issue are two:
One of our first attempts to fix it was to have |
What would be the issue in Plug, that it shouldn't blow up i.e. have a relaxed parsing? OK, I could see an argument to skip validation, similar to recent |
I forget what the exception was but yeah parsing failed. I'd need to mull it over. I sadly agree with the "garbage in, garbage out" - exactly the same as As for sanitation, the code I posted above does exactly that and I'm relatively confident in the tests. I'm not 100% sure I used the correct reserved character function above (but I think That said, that might work but I think I'd prefer for that to be a "non-issue" given an option in |
Even though the lack of bunny pictures is disappointing @PragTob, yeah I think this can make sense. The option can be on by default, but allowed to be |
@whatyouhide 😂😂😂😂 I was contemplating it but thought maybe people get annoyed when I waste their premium screen space with bunny pictures all the time. I shall make up for my past transgressions! |
@whatyouhide now that I tried to make up for my past transgressions and we have that out of the way, very happy to take a stab at implementing it! |
fantastic, thanks for the pictures and the help @PragTob! |
@whatyouhide thanks for a fantastic library, least I can do! 💚 |
As discussed in elixir-mint#453 I tried to follow the guidance of `case_sensitive_headers` so that these options are treated somewhat similarly.
As discussed in elixir-mint#453 I tried to follow the guidance of `case_sensitive_headers` so that these options are treated somewhat similarly.
👋
Hi there everyone and thank you so much for mint - been a joy to use via
req
and helps us make a significant amount of HTTP requests! 💚Problem
In our context we're frequently conftonted with query strings that aren't strictly standard conform. These contain "macros". Those look something like this:
referrer={encSite}
and can occur multiple times per URL.There are many different patterns of these that I know of, usually occurring in the value part of the query params, some more patterns I know of:
key=[%macro%]
key=${macro}
key=%%macro%%
"key=%ebuy!"
This blows up in
Mint.HTTP1.Request.validate_target!/1
( and/2
)mint/lib/mint/http1/request.ex
Lines 49 to 65 in 50b11d6
While I understand enforcing standards, sometimes the real (old) applications out there makes that hard.
Small example
Solution
I'd be really happy about being able to provide an option where validating the target is skipped or made more lenient. Or some way in which I could easily accomplish that. I'm not sure of the consequences of this but yeah :)
(and then get to it through Req and Finch)
I'm happy to implement code to do this, once aligned on a solution.
I also have a bunch of affected URLs in tests (running against bypass) or as examples lying around. So, I can definitely test solutions.
Workaround
The current workaround we use is that we implemented code to sanitize the query string (working similar to
validate_target!/1
, it's fun as the URLs we get are partially encoded and partially not so running encode on all of it is not what we want).The topic gets a bit harder though, as sometimes redirects will also include these broken URLs. Which is also what you see in the example I posted above. As a workaround for this we're implementing our own redirect following now (so that we can sanitize the query strings on every hop). This is the part where I decided to open this issue, as I'd really like to get rid of this at least 😅 (but at best all the custom code I wrote for this).
Behavior of other HTTP clients
For funzies I tried the URL in multiple HTTP clients, and it worked in all of them: Firefox, Chrome, curl, wget
So while I don't think it's standard conform, it seems to be common practice among some of the most popular/widely used "HTTP" clients. So, I think at least having the option to process it would be nice.
Again, thanks a lot for all your work in providing mint and friends. It's much appreciated! 💚
The text was updated successfully, but these errors were encountered: