Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Selector quoted attribute values #6

Merged
merged 2 commits into from
May 25, 2016

Conversation

yannham
Copy link
Contributor

@yannham yannham commented May 24, 2016

According to reference https://www.w3.org/TR/css3-selectors/#attribute-selectors and https://www.w3.org/TR/CSS21/syndata.html#strings, attributes values in selectors can be either identifier or quoted strings.
While identifiers are syntactically heavily restricted and I don't see any problem with lambdasoup being more resilient, selectors like

soup $ "[attr='a value']"
match markups of the form
<markup attr="&#39;a value&#39;"/>

and similarly
soup $ "[attr=\"a value\"]"
match markups of the form
<markup attr="&quot;a value&quot;"/>

I see a few problems with that :

  • It is counter-intuitive and not compatible with the standard : one would expect [attr='a value'] to parse simple quotes as delimiters, and rather match <markup attr="a value">.
  • Currently, there is no mean to select attributes with values containing characters ')' or ']' since the parser understand it as final delimiters without a way to escape it.

This PR tries to solve the problem by

  1. modifying parse_quoted_string to parse string delimited by simple or double quotes, with possibility of escaping the delimiter inside the string if preceded by ''
  2. in the function that parses attributes values (parse_string), parse the string as a quoted string using parse_quoted_string if it begins with a simple or a double quote, or parse it exactly as before if it doesn't.
  3. Adding some tests to validate the behavior

@aantron
Copy link
Owner

aantron commented May 24, 2016

Thanks, I agree that this should be fixed. Will take a look at the patches in detail in a bit.

| Some c -> Buffer.add_char buffer c; Stream.junk stream; loop ()
in
loop ()
match Stream.peek stream with
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: git diff shows a dangling space at the end of this line.

@aantron
Copy link
Owner

aantron commented May 24, 2016

Looks great! Thanks again.

Could you please amend the first commit to take care of the inline nit?

Would you prefer a speedy update in OPAM, or are you comfortable using a development version for a while?

@aantron
Copy link
Owner

aantron commented May 24, 2016

Also, this intentionally does not try to correctly support newlines, as described in https://www.w3.org/TR/CSS21/syndata.html#strings, but that's fine. That can be added later if someone needs it. I just want to note it, in case someone looks in this discussion.

@yannham
Copy link
Contributor Author

yannham commented May 24, 2016

Thanks, lambda-soup is a cool library, appreciate your work on it too. I added a commit to take care of the dangling space. I don't need a speedy opam update, development version is fine. It indeed does not try to support newlines (neither the precise syntax specified in the W3C reference, which is more restrictive), because in pure OCaml using a plain newline would do just fine I think. I guess problems could appear if one try to read W3C-compliant selectors from an external source, but as you state, it can be quite easily added later if needed : the lesser, the safer ^^

@aantron
Copy link
Owner

aantron commented May 24, 2016

Could you please squash the whitespace fix into 55c3b91, giving this PR a 2-commit history? It will then be as nice and tidy as when originally submitted (thank you for that!), and ready for merge.

@yannham yannham force-pushed the selector-quoted-attribute-values branch from 28d61a6 to fadd7fc Compare May 25, 2016 00:37
@yannham
Copy link
Contributor Author

yannham commented May 25, 2016

Should be ok now

@aantron aantron merged commit 7b6772f into aantron:master May 25, 2016
@aantron
Copy link
Owner

aantron commented May 25, 2016

Much appreciated!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants