Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

url: Improve WHATWG URLSearchParams spec compliance #9484

Closed
wants to merge 2 commits into from

Conversation

TimothyGu
Copy link
Member

@TimothyGu TimothyGu commented Nov 6, 2016

Checklist
  • make -j8 test (UNIX), or vcbuild test nosign (Windows) passes
  • tests and/or benchmarks are included
  • documentation is changed or added (WHATWG URL interface is still experimental and undocumented)
  • commit message follows commit guidelines
Affected core subsystem(s)

url, test

Description of change

This PR aims to improve the general compatibility of URLSearchParams class with WHATWG's URL Standard. Specifically, the PR consists of the following parts:

  • Make URLSearchParams constructor signature the same as the spec
  • Strip leading ? in URL#search's setter
  • Spec-compliant iterable interface
    • Implementation of keys, entries, and forEach methods
    • Use a dedicated URLSearchParamsIterator object instead of a generator function for @@iterator, per spec
  • Add tests from W3C's Web Platform Tests.
    • Some tests, e.g. the ones related to %20 vs +, are still commented out.

TODOs and points of discussion:

  • Make URLSearchParams public
  • Crediting W3C for the tests
  • It might be better for iterator creation to be made generic and shared for the entire code base.
  • Investigate the applicability of a "security check" as mandated by the spec for @@iterator, keys, values, and entries
  • Multiple deviations of querystring's behaviors with the WHATWG URL Standard
    • querystring does not parse empty values properly
    • Currently, querystring uses an implementation of encodeURIComponent to encode the query string. However, the application/x-www-form-urlencoded byte serializer disagrees on certain characters such as 0x20 (space; encodeURIComponent escapes it as %20 while querystring should serialize it to +), 0x27 (left paren; encodeURIComponent does not escape but querystring should), etc.
  • Is there a reason against making all values in the hashmap arrays?

Fixes: #9302

@nodejs-github-bot nodejs-github-bot added the url Issues and PRs related to the legacy built-in url module. label Nov 6, 2016
@mscdex
Copy link
Contributor

mscdex commented Nov 6, 2016

/cc @jasnell

@jasnell
Copy link
Member

jasnell commented Nov 6, 2016

Awesome! I won't have a chance to review in depth until Monday but it's awesome to see you jump in on this. Thank you!

update(this, search);
this[searchParams][searchParams] = querystring.parse(this.search);
this[searchParams][searchParams] = querystring.parse(this[context].query);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is more for consistency with the constructor and cosmetics than anything else: it's a bit odd to see the property being read in its setter.

@TimothyGu TimothyGu changed the title Improve URLSearchParams spec compliance url: Improve WHATWG URLSearchParams spec compliance Nov 7, 2016
@TimothyGu
Copy link
Member Author

TimothyGu commented Nov 7, 2016

About making URLSearchParams public: it was indeed proposed in the EP document:

Accessible via require('url').URLSearchParams



In addition to the %20 vs + issue reported above, another bug/nonconformity in querystring is exemplified by this test case:

While the standard says to:

4.Let sequences be the result of splitting input on &.

...

6.For each byte sequence bytes in sequences, run these substeps:

  1. If bytes is the empty byte sequence, run these substeps for the next byte sequence.

which essentially means to skip all instances of leading, trailing, or consecutive ampersands, querystring treats it as =:

querystring.parse('')
// => ✓ {}
querystring.parse('&')
// => ✗ { '': '' }
// => Expected: {}
querystring.parse('=')
// => ✓ { '': '' }
querystring.parse('&=')
// => ✗ { '': [ '', '' ] }
// => Expected: { '': '' }
querystring.parse('&&')
// => ✗ { '': [ '', '' ] }
// => Expected: {}
querystring.parse('&&=')
// => ✗ { '': [ '', '', '' ] }
// => Expected: { '': '' }

// In Chrome 55 with #enable-experimental-web-platform-features
[...new URLSearchParams('')]
// => ✓ []
[...new URLSearchParams('&')]
// => ✓ []
[...new URLSearchParams('=')]
// => ✓ [ [ '', '' ] ]
[...new URLSearchParams('&=')]
// => ✓ [ [ '', '' ] ]
[...new URLSearchParams('&&')]
// => ✓ []
[...new URLSearchParams('&&=')]
// => ✓ [ [ '', '' ] ]


Another thing I realized is that currently the conversion from ECMAScript String to IDL's USVString isn't strictly done in the JS-based URLSearchParams code (while it is being done in the C++ code for URL parsing, presumably from the construction of a Utf8Value), which leads to cases such as the following:

const { URL } = require('url')

// '\dc00' is an unpaired trail Unicode surrogate
const url = new URL('https://asdf.com/?a=\udc00asdf')

url.search
// => '?a=%EF%BF%BDasdf'
// This is correct: `%EF%BF%BD` corresponds to U+FFFD REPLACEMENT CHARACTER,
// as stipulated by the Web IDL spec:
// https://heycam.github.io/webidl/#dfn-obtain-unicode

url.searchParams.set('a', '\udc00asdf')

url.search
// => '?a=%F0%90%81%A1sdf'
// Here a URL with an invalid Unicode string results, as there isn't a check for
// unpaired surrogates in the userland JS code.

url.search = 'a=\udc00asdf'

url.search
// => '?a=%EF%BF%BDasdf'
// search's setter forces a reparse, so C++ code's Unicode
// sanitation kicks in here

So in addition to making the querystring module conformant to WHATWG standards, this issue would have to be fixed somehow, either by doing some string wizardry in JS or add a C++ binding for sanitizing.

@jasnell
Copy link
Member

jasnell commented Nov 7, 2016

Yep, you're absolutely right. In the initial implementation I decided to punt on 100% compliance with the URLSearchParams in order to get it landed, knowing that we could continue to iterate on it while the feature was still experimental. My plan has been to move the processing into the node_url.cc native code to echo the main URL parsing.

@TimothyGu
Copy link
Member Author

@jasnell, have you gotten a chance to look at this PR?

@jasnell
Copy link
Member

jasnell commented Nov 13, 2016

I have not and I am sorry. I have been in Tokyo for nodefest and have been
a bit distracted. I will review tomorrow when I am back in the States

On Mon, Nov 14, 2016 at 7:32 AM Timothy Gu [email protected] wrote:

@jasnell https://github.com/jasnell, have you gotten a chance to look
at this PR?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#9484 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAa2eVZF4GtP8CNucE4vgaizwrPj_o0Eks5q95AJgaJpZM4KqbjZ
.

@TimothyGu
Copy link
Member Author

@jasnell, ping.

@TimothyGu TimothyGu force-pushed the urlsearchparams branch 2 times, most recently from d04241e to fe08218 Compare November 25, 2016 00:40
@TimothyGu
Copy link
Member Author

TimothyGu commented Nov 25, 2016

Just a heads up, in case anyone is interested I've reimplemented the core parsing and serializing routine in C++, in TimothyGu/node@urlsearchparams...urlsearchparams-cpp. I will submit a PR for that after this one is merged.

@jasnell jasnell self-assigned this Nov 25, 2016
@jasnell
Copy link
Member

jasnell commented Nov 25, 2016

@TimothyGu ... I've taken an initial look at things look good but have not yet had the opportunity to do a proper complete review. Haven't forgotten about it tho!

@TimothyGu TimothyGu force-pushed the urlsearchparams branch 4 times, most recently from 9754f70 to 796213f Compare November 27, 2016 05:08
- Make URLSearchParams constructor spec-compliant
- Strip leading `?` in URL#search's setter
- Spec-compliant iterable interface
- More precise handling of update steps as mandated by the spec
- Add class strings to URLSearchParams objects and their prototype
- Make sure `this instanceof URLSearchParams` in methods

Also included are relevant tests from W3C's Web Platform Tests
(https://github.com/w3c/web-platform-tests/tree/master/url).

Fixes: nodejs#9302
@TimothyGu
Copy link
Member Author

@jasnell, ping 3x...

@jasnell
Copy link
Member

jasnell commented Dec 4, 2016

Copy link
Member

@jasnell jasnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM if CI is green! Thank you very much!

@jasnell
Copy link
Member

jasnell commented Dec 4, 2016

@TimothyGu ... things look good except for a couple of linter nits. Can you run make lint and fix the errors.

@jasnell
Copy link
Member

jasnell commented Dec 4, 2016

italoacasas pushed a commit that referenced this pull request Dec 7, 2016
- Make URLSearchParams constructor spec-compliant
- Strip leading `?` in URL#search's setter
- Spec-compliant iterable interface
- More precise handling of update steps as mandated by the spec
- Add class strings to URLSearchParams objects and their prototype
- Make sure `this instanceof URLSearchParams` in methods

Also included are relevant tests from W3C's Web Platform Tests
(https://github.com/w3c/web-platform-tests/tree/master/url).

Fixes: #9302
PR-URL: #9484
Reviewed-By: James M Snell <[email protected]>
@TimothyGu TimothyGu deleted the urlsearchparams branch December 7, 2016 17:23
@stevenvachon
Copy link

Not in 7.2.1? It's not in the changelog, anyway.

Fishrock123 pushed a commit that referenced this pull request Dec 13, 2016
- Make URLSearchParams constructor spec-compliant
- Strip leading `?` in URL#search's setter
- Spec-compliant iterable interface
- More precise handling of update steps as mandated by the spec
- Add class strings to URLSearchParams objects and their prototype
- Make sure `this instanceof URLSearchParams` in methods

Also included are relevant tests from W3C's Web Platform Tests
(https://github.com/w3c/web-platform-tests/tree/master/url).

Fixes: #9302
PR-URL: #9484
Reviewed-By: James M Snell <[email protected]>
@italoacasas italoacasas mentioned this pull request Dec 15, 2016
italoacasas added a commit that referenced this pull request Dec 15, 2016
Notable changes:
- buffer:
  - fix single-character string filling (Anna Henningsen) #9837
  - handle UCS2.fill() properly on BE (Anna Henningsen) #9837
- url:
  -  including base argument in originFor (joyeecheung) #10021
  - improve URLSearchParams spec compliance (Timothy Gu) #9484
- http:
  - remove stale timeout listeners (Karl Böhlmark) #9440
- build:
  - fix node_g target (Daniel Bevenius) #10153

PR-URL: #10277
italoacasas added a commit that referenced this pull request Dec 15, 2016
Notable changes

SEMVER-MINOR
- url:
    - add inspect function to TupleOrigin (Safia Abdalla) #10039
- crypto:
    - allow adding extra certs to well-known CAs (Sam Roberts) #9139

SEMVER-PATCH
- buffer:
    - fix single-character string filling (Anna Henningsen) #9837
    - handle UCS2 .fill() properly on BE (Anna Henningsen) #9837
- url:
    - including base argument in originFor (joyeecheung) #10021
    - improve URLSearchParams spec compliance (Timothy Gu) #9484
- http:
    - remove stale timeout listeners (Karl Böhlmark) #9440
- build:
    - fix node_g target (Daniel Bevenius) #10153
- fs:
    - remove unused argument from copyObject() (Ethan Arrowood) #10041
- timers:
    - fix handling of cleared immediates (hveldstra) #9759

PR-URL: #10277
italoacasas added a commit that referenced this pull request Dec 17, 2016
Notable changes:
* **crypto**:
  - Allow adding extra certificates to well-known CAs. (Sam Roberts)
    [#9139](#9139)
* **buffer**:
  - Fix single-character string filling. (Anna Henningsen)
    [#9837](#9837)
  - Handle UCS2 `.fill()` properly on BE. (Anna Henningsen)
    [#9837](#9837)
* **url**:
  - Add inspect function to TupleOrigin. (Safia Abdalla)
    [#10039](#10039)
  - Including base argument in originFor. (joyeecheung)
    [#10021](#10021)
  - Improve URLSearchParams spec compliance. (Timothy Gu)
    [#9484](#9484)
* **http**:
  - Remove stale timeout listeners. (Karl Böhlmark)
    [#9440](#9440)
* **build**:
  - Fix node_g target. (Daniel Bevenius)
    [#10153](#10153)
* **fs**:
  - Remove unused argument from copyObject(). (EthanArrowood)
    [#10041](#10041)
* **timers**:
  - Fix handling of cleared immediates. (hveldstra)
    [#9759](#9759)
* **src**:
  - Add wrapper for process.emitWarning(). (SamRoberts)
    [#9139](#9139)
  - Fix string format mistake for 32 bit node.(Alex Newman)
    [#10082](#10082)
italoacasas added a commit that referenced this pull request Dec 19, 2016
Notable changes:
* **crypto**: The built-in list of Well-Known CAs (Certificate Authorities) can now
              be extended via a NODE_EXTRA_CA_CERTS environment variable. (Sam Roberts)
              [#9139](#9139)
* **buffer**: buffer.fill() now works properly for the UCS2 encoding on Big-Endian
              machines. (Anna Henningsen) [#9837](#9837)
* **url**:
    - Including base argument in URL.originFor() to meet specification compliance. (joyeecheung)
      [#10021](#10021)
    - Improve URLSearchParams to meet specification compliance. (Timothy Gu)
      [#9484](#9484)
* **http**: Remove stale timeout listeners in order to prevent a memory leak when using
            keep alive. (Karl Böhlmark) [#9440](#9440)
cjihrig added a commit to cjihrig/node that referenced this pull request Dec 20, 2016
Notable changes:

* buffer:
  - buffer.fill() now works properly for the UCS2 encoding on
    Big-Endian machines.
    (Anna Henningsen) nodejs#9837
* cluster:
  - disconnect() now returns a reference to the disconnected
    worker. (Sean Villars)
    nodejs#10019
* crypto:
  - The built-in list of Well-Known CAs (Certificate Authorities)
    can now be extended via a NODE_EXTRA_CA_CERTS environment
    variable. (Sam Roberts)
    nodejs#9139
* http:
  - Remove stale timeout listeners in order to prevent a memory leak
    when using keep alive. (Karl Böhlmark)
    nodejs#9440
* tls:
  - Allow obvious key/passphrase combinations. (Sam Roberts)
    nodejs#10294
* url:
  - Including base argument in URL.originFor() to meet specification
    compliance. (joyeecheung)
    nodejs#10021
  - Improve URLSearchParams to meet specification compliance.
    (Timothy Gu) nodejs#9484

PR-URL: nodejs#10277
cjihrig added a commit to cjihrig/node that referenced this pull request Dec 20, 2016
Notable changes:

* buffer:
  - buffer.fill() now works properly for the UCS2 encoding on
    Big-Endian machines.
    (Anna Henningsen) nodejs#9837
* cluster:
  - disconnect() now returns a reference to the disconnected
    worker. (Sean Villars)
    nodejs#10019
* crypto:
  - The built-in list of Well-Known CAs (Certificate Authorities)
    can now be extended via a NODE_EXTRA_CA_CERTS environment
    variable. (Sam Roberts)
    nodejs#9139
* http:
  - Remove stale timeout listeners in order to prevent a memory leak
    when using keep alive. (Karl Böhlmark)
    nodejs#9440
* tls:
  - Allow obvious key/passphrase combinations. (Sam Roberts)
    nodejs#10294
* url:
  - Including base argument in URL.originFor() to meet specification
    compliance. (joyeecheung)
    nodejs#10021
  - Improve URLSearchParams to meet specification compliance.
    (Timothy Gu) nodejs#9484

PR-URL: nodejs#10277
cjihrig added a commit that referenced this pull request Dec 20, 2016
Notable changes:

* buffer:
  - buffer.fill() now works properly for the UCS2 encoding on
    Big-Endian machines.
    (Anna Henningsen) #9837
* cluster:
  - disconnect() now returns a reference to the disconnected
    worker. (Sean Villars)
    #10019
* crypto:
  - The built-in list of Well-Known CAs (Certificate Authorities)
    can now be extended via a NODE_EXTRA_CA_CERTS environment
    variable. (Sam Roberts)
    #9139
* http:
  - Remove stale timeout listeners in order to prevent a memory leak
    when using keep alive. (Karl Böhlmark)
    #9440
* tls:
  - Allow obvious key/passphrase combinations. (Sam Roberts)
    #10294
* url:
  - Including base argument in URL.originFor() to meet specification
    compliance. (joyeecheung)
    #10021
  - Improve URLSearchParams to meet specification compliance.
    (Timothy Gu) #9484

PR-URL: #10277
imyller added a commit to imyller/meta-nodejs that referenced this pull request Dec 21, 2016
    Notable changes:

    * buffer:
      - buffer.fill() now works properly for the UCS2 encoding on
        Big-Endian machines.
        (Anna Henningsen) nodejs/node#9837
    * cluster:
      - disconnect() now returns a reference to the disconnected
        worker. (Sean Villars)
        nodejs/node#10019
    * crypto:
      - The built-in list of Well-Known CAs (Certificate Authorities)
        can now be extended via a NODE_EXTRA_CA_CERTS environment
        variable. (Sam Roberts)
        nodejs/node#9139
    * http:
      - Remove stale timeout listeners in order to prevent a memory leak
        when using keep alive. (Karl Bohlmark)
        nodejs/node#9440
    * tls:
      - Allow obvious key/passphrase combinations. (Sam Roberts)
        nodejs/node#10294
    * url:
      - Including base argument in URL.originFor() to meet specification
        compliance. (joyeecheung)
        nodejs/node#10021
      - Improve URLSearchParams to meet specification compliance.
        (Timothy Gu) nodejs/node#9484

    PR-URL: nodejs/node#10277

Signed-off-by: Ilkka Myller <[email protected]>
@hiroppy hiroppy added the whatwg-url Issues and PRs related to the WHATWG URL implementation. label Apr 21, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
url Issues and PRs related to the legacy built-in url module. whatwg-url Issues and PRs related to the WHATWG URL implementation.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

WHATWG URLSearchParams issue
8 participants