Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tests and spec don't agree about base URL in match result inputs #133

Closed
lucacasonato opened this issue Sep 7, 2021 · 12 comments
Closed

Comments

@lucacasonato
Copy link
Member

As per spec, the second item in the inputs field of the match result should be the stringification of the parsed base url URL (it actually says to add the URL object itself, but I'll just consider that a bug).

For this test case, the stringification of the base url is 'https://example.com/".

{
    "pattern": [{ "pathname": "/foo/bar" }],
    "inputs": [ "./foo/bar", "https://example.com" ],
    "expected_match": {
      "hostname": { "input": "example.com", "groups": { "0": "example.com" } },
      "pathname": { "input": "/foo/bar", "groups": {} },
      "protocol": { "input": "https", "groups": { "0": "https" } }
    }
  }

Yet the test runner wants it to be https://example.com. This is the case in Chromium right now (it appends the URL string, not the stringification of the parsed url). See https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/renderer/modules/url_pattern/url_pattern.cc;l=496-497;drc=1a1342c1f2d295948d79211cd860721732990ed8

I would say that the spec behaviour actually makes more sense, and that the test should be fixed to look like this:

{
    "pattern": [{ "pathname": "/foo/bar" }],
    "inputs": [ "./foo/bar", "https://example.com" ],
    "expected_match": {
      "inputs": [ "./foo/bar", "https://example.com/" ],
      "hostname": { "input": "example.com", "groups": { "0": "example.com" } },
      "pathname": { "input": "/foo/bar", "groups": {} },
      "protocol": { "input": "https", "groups": { "0": "https" } }
    }
  }

If we say this is right, then I suggest for the URLParserInit we should also change the spec & Chromium to have the inputs array contain the "processed" URLParserInit instead of the original.

@wanderview
Copy link
Member

This was discussed in issues previously at:

#34

The intent was to make the result.inputs have the original unprocessed value. So I'm inclined to fix the spec to match the implementation.

@lucacasonato
Copy link
Member Author

Ah, missed that issue. Seems fine, although I do wonder why this inputs field is needed at all then. If the input is just the unprocessed value, I'd argue there is no point in returning it at all (the user already has it, as they passed it in).

@wanderview
Copy link
Member

Its a weak use case, but the idea was the user may want to pass the result on to other code to process and it may expect the same exact input for some other comparison/operation. This matches what Regexp.exec() does, etc.

@lucacasonato
Copy link
Member Author

lucacasonato commented Sep 7, 2021

Ok - I could maybe see some value here, but it seems to me it wouldn't be much more difficult to just pass the inputs manually as user. I sorta had the expectation when I first saw the dictionary definition that these "inputs" are the normalized values, as this "plain passthrough" is not very common.

@wanderview
Copy link
Member

Its not too late to change this. I'm just not sure which is better. @domenic do you have any thoughts here?

@domenic
Copy link
Member

domenic commented Sep 7, 2021

This is a tough one. I'm afraid I can't be of much help as a tiebreaker; I see both sides.

My only contribution is that I could see the RegExp precedent cutting either way. Is inputs meant to respresent the input string, or the input URL? Clearly for RegExps it's the input string, but maybe for URLPattern the input domain is more like URLs, and we just happen to represent them as strings?

@wanderview
Copy link
Member

@jeffpossnick do you have an opinion from a web developer point of view?

@wanderview
Copy link
Member

I made a twitter poll as well:

https://twitter.com/wanderview/status/1435342537288556548

@lucacasonato
Copy link
Member Author

Looking at the poll, my opinion is obviously in the minority. I'd say let's keep behaviour like it is implemented in Chrome, and update the spec. Seems most people's expectations would be met. The normalized values are always available via the input on individual component results anyway.

@lucacasonato
Copy link
Member Author

lucacasonato commented Sep 7, 2021

Someone on Twitter did bring up this point: https://twitter.com/bhathos/status/1435353793588350979. This shows that the expectation is that the matcher matches against whatever "result.input" is - that is however not the case. The matcher matches against the normalized value of result.input. Maybe there would be more insight into the system by users if the matcher returned the normalized inputs.

I'm impartial now.

@wanderview
Copy link
Member

The matcher also doesn't operate on the entire inputs array. It operates on individual component values after normalization. Those normalized inputs are available at result.pathname.input, etc. Combined with result.inputs that lets you see what normalization took place before the matcher ran.

As I wrote in the previous issue, though, it is all a bit weird and I'm not sure if there is a perfect answer. I think the current spec is the least bad approach, though.

@wanderview
Copy link
Member

Final poll results:

  • 173 votes
  • 46.8% for original raw value
  • 17.3% for encoded value
  • 35.8% don't know

I'm going to go ahead and update the spec to use the original raw value.

wanderview added a commit that referenced this issue Sep 8, 2021
Add original baseURL string to result.inputs. (Fixes #133)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants