Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Malformed url in links creates an invalid AMP page #977

Closed
rubengonzalezmrf opened this issue Feb 26, 2018 · 9 comments
Closed

Malformed url in links creates an invalid AMP page #977

rubengonzalezmrf opened this issue Feb 26, 2018 · 9 comments
Milestone

Comments

@rubengonzalezmrf
Copy link

rubengonzalezmrf commented Feb 26, 2018

If an editor ads a link like <a href="http://foo%20bar">foo bar</a> by mistake, the generated amp page will be invalid.

I think it would be great to add an AMP_Link_Sanitizer class that checks the urls of the links and remove the tag if the url is malformed.

@westonruter
Copy link
Member

Interesting. When validating the AMP Validator just links to https://www.ampproject.org/docs/reference/spec#links which says:

The javascript: schema is disallowed.

I'm not sure how the validator is understanding a URL like http://foo%20bar, but somehow the domain is causing a validation error. I don't see why such a URL would be disallowed from looking at the spec:

https://github.com/ampproject/amphtml/blob/master/validator/validator-main.protoascii#L1436-L1470

Sanitization of the value should be done in the AMP_Tag_And_Attribute_Sanitizer, maybe in the check_attr_spec_rule_allowed_protocol method? I'm not sure if the validator is complaining about the protocol or about something else.

@rubengonzalezmrf
Copy link
Author

rubengonzalezmrf commented Feb 27, 2018

Hi, the validator is complaining about the host, the protocol seems to be correct, it's http: but looking at the validator code, in the file:

amphtml/validator/engine/parse-url.js

There's this code in line 300 which validates that the host doesn't contain any weird character, like a whitespace:

for (let ii = 0; ii < unescapedHost.length; ++ii) {
	const charCode = unescapedHost.charCodeAt(ii);
	if (!hostCharIsValid(charCode)) {
		this.isValid = false;
	}
}
return unescapedHost;

I agree it should be in AMP_Tag_And_Attribute_Sanitizer class but maybe in a new check_attr_spec_rule_allowed_host method?

@westonruter
Copy link
Member

Sounds good!

@davisshaver
Copy link
Contributor

I have run into this, too. Other examples seen in the wild during validation:

  • User types mail to: instead of mailto, breaks validation.
  • User accidentally inputs text into link, breaks validation.
  • User accidentally includes a space in the URL, which gets caught as mentioned at top.

Might be test case fodder, lemme know if you want help @rubengonzalezmrf

@westonruter
Copy link
Member

@rubengonzalezmrf from your comments I assumed you were looking to open a PR?

@rubengonzalezmrf
Copy link
Author

@westonruter yes, i've done it already: Pull request #983

@rubengonzalezmrf
Copy link
Author

rubengonzalezmrf commented Mar 2, 2018

@davisshaver if a user inputs just text in a link like "foo bar", wordpress automatically prepends the protocol http:// so it's the same case than the "http://foo bar". Same thing happens when you add a link like mail to:[email protected] it's automatically converted to http://mail to:[email protected]

So at the end all test cases are the same, the host can't contain whitespaces or other invalid chars

@davisshaver
Copy link
Contributor

@rubengonzalezmrf I encountered the issue mentioned above while testing/migrating legacy content. Seems like the editor behavior probably changed since the content was originally produced but I still think the test cases are warranted for backwards compatibility (even if they seem unlikely to happen today).

@rubengonzalezmrf
Copy link
Author

I added more test cases and checks. But i don't really know why the unit tests are not passing, the code works in my wordpress installation. I'm sure i'm missing something with the tests, but can't find what it is. Some help would be appreciated

@westonruter westonruter added this to the v0.7 milestone Mar 8, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants