Allow not validating target #453

PragTob · 2024-11-06T15:32:09Z

👋

Hi there everyone and thank you so much for mint - been a joy to use via req and helps us make a significant amount of HTTP requests! 💚

Problem

In our context we're frequently conftonted with query strings that aren't strictly standard conform. These contain "macros". Those look something like this: referrer={encSite} and can occur multiple times per URL.

There are many different patterns of these that I know of, usually occurring in the value part of the query params, some more patterns I know of:

key=[%macro%]
key=${macro}
key=%%macro%%
"key=%ebuy!"

This blows up in Mint.HTTP1.Request.validate_target!/1 ( and /2)

mint/lib/mint/http1/request.ex

Lines 49 to 65 in 50b11d6

    
           # Percent-encoding is not case sensitive so we have to account for lowercase and uppercase. 
        
           @hex_characters ~c"0123456789abcdefABCDEF" 
        
           defp validate_target!(target), do: validate_target!(target, target) 
        
           defp validate_target!(<<?%, char1, char2, rest::binary>>, original_target) 
        
                when char1 in @hex_characters and char2 in @hex_characters do 
        
             validate_target!(rest, original_target) 
        
           end 
        
           defp validate_target!(<<char, rest::binary>>, original_target) do 
        
             if URI.char_unescaped?(char) do 
        
               validate_target!(rest, original_target) 
        
             else 
        
               throw({:mint, {:invalid_request_target, original_target}}) 
        
             end 
        
           end

While I understand enforcing standards, sometimes the real (old) applications out there makes that hard.

Small example

iex(1)> HttpService.get!("https://some.url.com/path?rb=51439&ca=20863020&ra=1730132562831&_o=51439&_t=20863020")
# in this example, a redirect happens
** (Req.HTTPError) invalid request target: "/path?c=bd8618c307ae9885a12561b7191e2cea&cid=5134455426882927793&referrer={encSite}&some=more"
    (req 0.5.6) lib/req.ex:1092: Req.request!/2
    iex:1: (file)

Solution

I'd be really happy about being able to provide an option where validating the target is skipped or made more lenient. Or some way in which I could easily accomplish that. I'm not sure of the consequences of this but yeah :)

(and then get to it through Req and Finch)

I'm happy to implement code to do this, once aligned on a solution.

I also have a bunch of affected URLs in tests (running against bypass) or as examples lying around. So, I can definitely test solutions.

Workaround

The current workaround we use is that we implemented code to sanitize the query string (working similar to validate_target!/1, it's fun as the URLs we get are partially encoded and partially not so running encode on all of it is not what we want).

The topic gets a bit harder though, as sometimes redirects will also include these broken URLs. Which is also what you see in the example I posted above. As a workaround for this we're implementing our own redirect following now (so that we can sanitize the query strings on every hop). This is the part where I decided to open this issue, as I'd really like to get rid of this at least 😅 (but at best all the custom code I wrote for this).

Behavior of other HTTP clients

For funzies I tried the URL in multiple HTTP clients, and it worked in all of them: Firefox, Chrome, curl, wget

So while I don't think it's standard conform, it seems to be common practice among some of the most popular/widely used "HTTP" clients. So, I think at least having the option to process it would be nice.

Again, thanks a lot for all your work in providing mint and friends. It's much appreciated! 💚

The text was updated successfully, but these errors were encountered:

wojtekmach · 2024-11-06T16:17:09Z

Here is another example of this wojtekmach/req#270 (comment) (cc @thbar):

iex> {:ok, conn} = Mint.HTTP.connect(:https, "api.oisemob.cityway.fr", 443)
iex> url = "/dataflow/offre-tc/download?provider=COROLIS_URB|COROLIS_INT&dataFormat=NETEX&dataProfil=OPENDATA"
iex> {:error, conn, e} = Mint.HTTP.request(conn, "GET", url, [], "")
iex> e
%Mint.HTTPError{
  reason: {:invalid_request_target,
   "/dataflow/offre-tc/download?provider=COROLIS_URB|COROLIS_INT&dataFormat=NETEX&dataProfil=OPENDATA"},
  module: Mint.HTTP1
}

@PragTob if you URI-encode offending characters in your requests, would they still work? It works in the example above:

iex> {:ok, conn} = Mint.HTTP.connect(:https, "api.oisemob.cityway.fr", 443)
iex> url = "/dataflow/offre-tc/download?provider=COROLIS_URB|COROLIS_INT&dataFormat=NETEX&dataProfil=OPENDATA"
iex> url = String.replace(url, "|", URI.encode("|"))
iex> {:ok, conn, ref} = Mint.HTTP.request(conn, "GET", url, [], "")

Instead of making Mint more relaxed, while I'm not super looking forward to doing that, I think I can automatically URI-encode a given URL in Req. (I already do when using params: ....)

WDYT?

PragTob · 2024-11-06T16:37:10Z

@wojtekmach thanks for the comment 🙏

The issue I had with URL encoding the query string was that I think it'll usually end up encoding things twice.

I.e. if I exchange my sanitize query code with just URI.encode I get test failures like this:

  1) test sanitize/1 if it's already encoded we don't re-encode it (SanitizeQueryStringTest)
     test/sanitize_query_string_test.exs:41
     Assertion with == failed
     code:  assert "key=val%3Dval" == sanitize("key=val%3Dval")
     left:  "key=val%3Dval"
     right: "key=val%253Dval"
     stacktrace:
       test/sanitize_query_string_test.exs:42: (test)

And yeah I don't want to double encode values I think. Or maybe I'm not getting your approach 🤔

A solution in Req would definitely work for me, I just wanted to start at mint as that's where the thrown error originates :)

To illustrate, I think it may be helpful to share the code & tests we're currently using for the escaping:

SanitizeQueryStringTest

defmodule SanitizeQueryStringTest do
  use ExUnit.Case, async: true

  import SanitizeQueryString

  describe "sanitize/1" do
    test "an normal query string survives easily" do
      query_string = "a=123"

      assert query_string == sanitize(query_string)
    end

    test "an empty query string stays an empty query string" do
      assert "" == sanitize("")
    end

    test "nil returns an empty query string" do
      assert "" == sanitize(nil)
    end

    test "macros are indeed removed from the query string" do
      assert "a=b&d=%5B%25woof%25%5D&c=%24%7Bmagic%7D&b=%25%25mega%25%25" ==
               sanitize("a=b&d=[%woof%]&c=${magic}&b=%%mega%%")
    end

    test "empty values are also handled fine" do
      assert "a=" == sanitize("a=")
      assert "a=1&b=" == sanitize("a=1&b=")
    end

    test "just keys we also leave alone" do
      assert "key" == sanitize("key")
      assert "oof=first&ha" == sanitize("oof=first&ha")
    end

    test "if somehow the query has more `=`s than it should have" do
      assert "key=val%3Dval" == sanitize("key=val=val")
      assert "key=val&key2=val%3Dvaleria" == sanitize("key=val&key2=val=valeria")
    end

    test "if it's already encoded we don't re-encode it" do
      assert "key=val%3Dval" == sanitize("key=val%3Dval")
    end

    test "`%ebuy!` while technically valid isn't something we want as ! is reserved" do
      # also bypass exploded
      assert "key=%25ebuy%21" == sanitize("key=%ebuy!")
    end

    test "leaves normal values alone even with lots of % encodings" do
      query_string =
        URI.encode_query(%{"a" => "1234", "magic" => "like-to-talk", "text" => "!!!!11232?%$"})

      assert query_string == sanitize(query_string)
    end

    test "handles [%...%] macros" do
      assert "key=%5B%25macro%25%5D" == sanitize("key=[%macro%]")
    end

    test "handles ${...} macros" do
      assert "key=%24%7Bmacro%7D" == sanitize("key=${macro}")
    end

    test "handles %%...%% macros" do
      assert "key=%25%25macro%25%25" == sanitize("key=%%macro%%")
    end

    # if you did a full string regex approach the eagerness may easily lead to this (see U modifier)
    test "does not overeagerly snatch up values in-between macros" do
      query_string = "a=%%magic%%&key=value&b=%%macro%%"

      assert "a=%25%25magic%25%25&key=value&b=%25%25macro%25%25" == sanitize(query_string)
    end

    @leave_alone_values [
      "${miss-closing",
      "%%abds%",
      "%huhu%%",
      "abc${dfg}",
      "abcd%%huha%%",
      "so[%this-is-not-start-of-the-value%]",
      "[%this-is-not-end-of-the-value%]so-much-more",
      "[%am-partial]",
      "[other-partial%]",
      "{missing-dollars}"
    ]
    for value <- @leave_alone_values do
      @value value
      test "#{value} is encoded but decoded it's the same again" do
        query_string = "some-key=#{@value}"
        sanitized = sanitize(query_string)
        decoded = URI.decode(sanitized)
        assert query_string == decoded
      end
    end

    test "and how about someone puts an entire URL in there" do
      assert "url=https%3A%2F%2Fbpi.rtactivate.com%2Ftag%2Fid%2FD1107" ==
               sanitize("url=https://bpi.rtactivate.com/tag/id/D1107")
    end

    test "and how about someone puts an entire URL in there, but escaped" do
      already_escaped = "url=https%3A%2F%2Fbpi.rtactivate.com%2Ftag%2Fid%2FD1107"

      assert already_escaped == sanitize(already_escaped)
    end
  end
end

SanitizeQueryString

defmodule SanitizeQueryString do


  # _should_ be safe to do, see:
  # * https://url.spec.whatwg.org/#urlencoded-parsing
  # * https://www.rfc-editor.org/rfc/rfc3986#section-2.2
  def sanitize(query_string) when is_binary(query_string) and query_string != "" do
    query_string
    |> String.split("&")
    |> Enum.map_join("&", &sanitize_values/1)
  end

  def sanitize(_), do: ""

  defp sanitize_values(sequence) do
    sequence
    # We expect only one `=` - otherwise encode it
    |> String.split("=", parts: 2)
    |> escape_value()
  end

  defp escape_value([key, original_value]) do
    sanitized_value =
      if needs_sanitization?(original_value) do
        URI.encode_www_form(original_value)
      else
        original_value
      end

    "#{key}=#{sanitized_value}"
  end

  defp escape_value([key]), do: key

  defp needs_sanitization?(string) do
    # Went through many iterations before we got here.
    # Went this way because:
    # 1. wanted to stop playing whack-a-mole with ever new macro patterns and other weird chars
    # 2. it's not as easy as checking for unescaped characters and escaping, as "%%" is common, but not a valid escape
    !valid_percent_encoding?(string)
  end

  # https://url.spec.whatwg.org/#percent-decode
  defp valid_percent_encoding?(string)
  @valid_hex_characters ~c"0123456789abcdefABCDEF"
  # percent encoding is % followed by 2 hex characters
  defp valid_percent_encoding?(<<?%, hex1, hex2, rest::binary>>)
       when hex1 in @valid_hex_characters and hex2 in @valid_hex_characters do
    valid_percent_encoding?(rest)
  end

  defp valid_percent_encoding?(<<char, rest::binary>>) do
    # Technically this _could_ be `URI.char_unescaped?/1` but bypass exploded for
    # something like `%ebuy!` which isn't a great sign for its general applicably
    if URI.char_unreserved?(char) do
      valid_percent_encoding?(rest)
    else
      false
    end
  end

  # everything was valid, so guess it's valid!
  defp valid_percent_encoding?(<<>>), do: true
end

wojtekmach · 2024-11-06T16:45:45Z

Fair enough about not wanting to double-encode. Are you able to use Req params feature by any chance?

iex> r = Req.new(url: "/", params: [key: "val=val"]) ; Req.Request.prepare(r).url |> to_string()
"/?key=val%3Dval"

PragTob · 2024-11-06T16:54:38Z

@wojtekmach I don't think we are. Our use cases/issue are two:

pass on the query params send to us in a request to another URL which we have saved (so we retrieve the URL, attach the sanitized incoming query string and send it off to Req - we're a fancy proxy)
the case when the "macro" param shows up in a redirect down the line, and then req just gets the URL from location not passing through params

One of our first attempts to fix it was to have Plug parse the params (we don't usually) but that also blew up and so I removed it, attempting to work on the String base to get the bad values out as early as possible 😅 - Thinking about it, this may be worth an issue in Plug as well, but working around that was "easi-er" for us :)

wojtekmach · 2024-11-06T17:12:15Z

What would be the issue in Plug, that it shouldn't blow up i.e. have a relaxed parsing?

OK, I could see an argument to skip validation, similar to recent :case_sensitive_headers option. Garbage in garbage out huh? But yeah, I'm curious if there are other solutions. Perhaps smarter sanitation in your code (or Req) that does not double-encode? Though that seems pretty fragile.

PragTob · 2024-11-07T09:09:42Z

I forget what the exception was but yeah parsing failed. I'd need to mull it over.

I sadly agree with the "garbage in, garbage out" - exactly the same as :case_sensitive_headers - I've run into that particular issue more than once 😢 (not with Req/Mint).
Like, I like to be standard conformant and compliant so ideologically I support it - but the world out there isn't standard conformant/compliant and that's the world in which our applications have to live. The clinching point for me was Chrome, curl and wget all supporting these URLs without complaint. At that point it feels like a "lived" standard to me.

As for sanitation, the code I posted above does exactly that and I'm relatively confident in the tests. I'm not 100% sure I used the correct reserved character function above (but I think Mint kept blowing up when I used the other one). So, I think it's possible (before that we were playing whack-a-mole with hard-coded patterns).

That said, that might work but I think I'd prefer for that to be a "non-issue" given an option in Mint. Unless there is some other problem with that that I don't see/understand. As long as it's opt-in in like "be careful, you may shoot yourself in the foot with this" I think that's perfectly reasonable (but I also don't have the context of a maintainer her).

whatyouhide · 2024-11-07T16:43:18Z

Even though the lack of bunny pictures is disappointing @PragTob, yeah I think this can make sense. The option can be on by default, but allowed to be false to skip validation. Any chance you would want to work on this? 🙃

PragTob · 2024-11-08T08:24:09Z

@whatyouhide 😂😂😂😂

I was contemplating it but thought maybe people get annoyed when I waste their premium screen space with bunny pictures all the time. I shall make up for my past transgressions!

PragTob · 2024-11-08T08:24:55Z

@whatyouhide now that I tried to make up for my past transgressions and we have that out of the way, very happy to take a stab at implementing it!

whatyouhide · 2024-11-08T08:55:22Z

fantastic, thanks for the pictures and the help @PragTob!

PragTob · 2024-11-08T09:29:05Z

@whatyouhide thanks for a fantastic library, least I can do! 💚

As discussed in elixir-mint#453 I tried to follow the guidance of `case_sensitive_headers` so that these options are treated somewhat similarly.

PragTob added a commit to PragTob/mint that referenced this issue Nov 8, 2024

Introduce new HTTP1 option: skip_target_validation

d5f397e

As discussed in elixir-mint#453 I tried to follow the guidance of `case_sensitive_headers` so that these options are treated somewhat similarly.

PragTob mentioned this issue Nov 8, 2024

Allow not validating target #456

Merged

1 task

PragTob added a commit to PragTob/mint that referenced this issue Nov 8, 2024

Introduce new HTTP1 option: skip_target_validation

404a702

As discussed in elixir-mint#453 I tried to follow the guidance of `case_sensitive_headers` so that these options are treated somewhat similarly.

whatyouhide closed this as completed in 8516bcd Nov 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow not validating target #453

Allow not validating target #453

PragTob commented Nov 6, 2024

wojtekmach commented Nov 6, 2024

PragTob commented Nov 6, 2024

wojtekmach commented Nov 6, 2024

PragTob commented Nov 6, 2024

wojtekmach commented Nov 6, 2024

PragTob commented Nov 7, 2024

whatyouhide commented Nov 7, 2024

PragTob commented Nov 8, 2024

PragTob commented Nov 8, 2024

whatyouhide commented Nov 8, 2024

PragTob commented Nov 8, 2024

Allow not validating target #453

Allow not validating target #453

Comments

PragTob commented Nov 6, 2024

Problem

Solution

Workaround

Behavior of other HTTP clients

wojtekmach commented Nov 6, 2024

PragTob commented Nov 6, 2024

wojtekmach commented Nov 6, 2024

PragTob commented Nov 6, 2024

wojtekmach commented Nov 6, 2024

PragTob commented Nov 7, 2024

whatyouhide commented Nov 7, 2024

PragTob commented Nov 8, 2024

PragTob commented Nov 8, 2024

whatyouhide commented Nov 8, 2024

PragTob commented Nov 8, 2024