Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot use protocol-relative URL in script src attribute #531

Closed
ronosm opened this issue Jan 27, 2022 · 8 comments · Fixed by #555
Closed

Cannot use protocol-relative URL in script src attribute #531

ronosm opened this issue Jan 27, 2022 · 8 comments · Fixed by #555
Labels

Comments

@ronosm
Copy link

ronosm commented Jan 27, 2022

It's not possible to add a protocol-relative URL in the src attribute from script tag.

At naughtyHref function it is handled, but after that this code is executed:

const parsed = new URL(value);

This code generates an exception when a protocol-relative URL is used, generating an exception making allowed value as false:

} catch (e) {
allowed = false;
}

Finally, it makes the src attribute is removed from script tag.

This happens between lines 325 and 355.

@ronosm ronosm added the bug label Jan 27, 2022
@gmarinov
Copy link

gmarinov commented Feb 20, 2022

+1

here:

https://github.com/apostrophecms/sanitize-html/blob/3cdc262/index.js#L333-L338

  • there's some discussion over "protocol-relative" URLs and apparently they're considered an anti-pattern.
  • but surely referencing an https:// resource from http:// origin creates CORS issues.
  • these URLs are supported in the browser and legacy HTML content is full of them

while new URL('//my.url') breaks without protocol, i don't think it should be up to sanitize-html to enforce the scheme of URLs when no allowedScriptHostnames is defined?

i.e. const parsed = new URL(value); should move inside the if block?

https://github.com/apostrophecms/sanitize-html/blob/3cdc262/index.js#L340-L348

edit: that probably applies to iframe src too

tag @yorickgirard

@boutell
Copy link
Member

boutell commented Feb 22, 2022

Is this happening in the browser only? I believe the WHATWG URL parser in nodejs supports it.

A small subclass of the URL class to work around this issue probably wouldn't be difficult to contribute as a PR. The idea being to stub in the https: protocol but then stub it out again in toString if it was stubbed in.

@boutell
Copy link
Member

boutell commented Feb 22, 2022

(As a PR on this module that is. Modifying URL upstream is unrealistic of course.)

@gmarinov
Copy link

@boutell , I guess what I'm saying is that in the absence of explicit allow-list of domains, it shouldn't even try to police URLs.

$ node
Welcome to Node.js v16.14.0.
Type ".help" for more information.

> new URL('//google.com')
Uncaught TypeError [ERR_INVALID_URL]: Invalid URL
    at __node_internal_captureLargerStackTrace (node:internal/errors:464:5)
    at new NodeError (node:internal/errors:371:5)
    at onParseError (node:internal/url:552:9)
    at new URL (node:internal/url:628:5) {
  input: '//google.com',
  code: 'ERR_INVALID_URL'
}

> new URL('ftp://google.com');
URL {
  href: 'ftp://google.com/',
  origin: 'ftp://google.com',
  protocol: 'ftp:',
  username: '',
  password: '',
  host: 'google.com',
  hostname: 'google.com',
  port: '',
  pathname: '/',
  search: '',
  searchParams: URLSearchParams {},
  hash: ''
}
> 

@boutell
Copy link
Member

boutell commented Feb 22, 2022

I don't think it would be a good idea to accept invalid URLs, but I agree that protocol relative URLs should not be considered invalid, at least by default, at least not yet. This is why I'm suggesting using a subclass wrapper for URL that accepts that particular case without reimplementing URL.

@boutell
Copy link
Member

boutell commented Feb 22, 2022 via email

@gmarinov
Copy link

it's just that till a few updates ago, sanitize-html didn't have this issue, right?

and in the browser, where the sanitized html ends up, protocol-relative URLs are not invalid.

@boutell
Copy link
Member

boutell commented Feb 22, 2022

src has always been passed through naughtyHref no matter what the tag is. I think what we're seeing is that Node 16 now has the same strict policy on protocol relative URLs that is enforced by Safari. A more tolerant subclass of Url would resolve it.

paweljq pushed a commit to paweljq/sanitize-html that referenced this issue Jul 12, 2022
paweljq added a commit to paweljq/sanitize-html that referenced this issue Jul 12, 2022
paweljq added a commit to paweljq/sanitize-html that referenced this issue Jul 12, 2022
paweljq added a commit to paweljq/sanitize-html that referenced this issue Jul 12, 2022
boutell added a commit that referenced this issue Jul 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants