Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attribute values aren't normalized properly #371

Open
dralley opened this issue Mar 15, 2022 · 1 comment · May be fixed by #379
Open

Attribute values aren't normalized properly #371

dralley opened this issue Mar 15, 2022 · 1 comment · May be fixed by #379
Assignees

Comments

@dralley
Copy link
Collaborator

dralley commented Mar 15, 2022

Section 3.3.3 of the XML spec (the attribute value normalization section) defines that

For each character, entity reference, or character reference in the unnormalized attribute value, beginning with the first and continuing to the last, do the following:

  • For a character reference, append the referenced character to the normalized value.

  • For an entity reference, recursively apply step 3 of this algorithm to the replacement text of the entity.

  • For a white space character (#x20, #xD, #xA, #x9), append a space character (#x20) to the normalized value.

  • For another character, append the character to the normalized value.

If the attribute type is not CDATA, then the XML processor MUST further process the normalized attribute value by discarding any leading and trailing space (#x20) characters, and by replacing sequences of space (#x20) characters by a single space (#x20) character.

Note that if the unnormalized attribute value contains a character reference to a white space character other than space (#x20), the normalized value contains the referenced character itself (#xD, #xA or #x9). This contrasts with the case where the unnormalized value contains a white space character (not a reference), which is replaced with a space character (#x20) in the normalized value and also contrasts with the case where the unnormalized value contains an entity reference whose replacement text contains a white space character; being recursively processed, the white space character is replaced with a space character (#x20) in the normalized value.

All attributes for which no declaration has been read SHOULD be treated by a non-validating processor as if declared CDATA.

It is an error if an attribute value contains a reference to an entity for which no declaration has been read.

TL;DR: Bullet point 3 states that 0xD, 0xA, and 0x9 (\r, \n and \t) within an attribute value ought to be treated as whitespace and replaced with a whitespace character (0x20), unless the character was explicitly referenced.

https://www.w3.org/TR/xml/#AVNormalize

Here's a half-written test which demonstrates the failure:

#[test]
fn test_attribute_value_normalization() {
    let mut r = Reader::from_str("<a attr=\"tab character\tor newline\nor return\rshould be replaced\">");
    r.trim_text(true);
    let mut buf = Vec::new();
    match r.read_event(&mut buf) {
        Ok(Start(e)) => {
            let mut attrs = e.attributes();
            match attrs.next() {
                Some(Ok(attr)) => assert_eq!(attr, ("attr".as_bytes(), "tab character or newline or return should be replaced".as_bytes()).into()),
                x => panic!("expected attribute 'attr', got {:?}", x),
            }
        }
        x => panic!("expected <a attr='>'>, got {:?}", x),
    }
}
---- test_attribute_value_normalization stdout ----
thread 'test_attribute_value_normalization' panicked at 'assertion failed: `(left == right)`
  left: `Attribute { key: "attr", value: Borrowed("tab character0x9or newline0xAor return0xDshould be replaced") }`,
 right: `Attribute { key: "attr", value: Borrowed("tab character or newline or return should be replaced") }`', tests/unit_tests.rs:850:35
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
dralley added a commit to dralley/quick-xml that referenced this issue Apr 3, 2022
@dralley dralley linked a pull request Apr 3, 2022 that will close this issue
dralley added a commit to dralley/quick-xml that referenced this issue Apr 3, 2022
dralley added a commit to dralley/quick-xml that referenced this issue Apr 3, 2022
dralley added a commit to dralley/quick-xml that referenced this issue Apr 3, 2022
dralley added a commit to dralley/quick-xml that referenced this issue Apr 3, 2022
dralley added a commit to dralley/quick-xml that referenced this issue Apr 3, 2022
dralley added a commit to dralley/quick-xml that referenced this issue Apr 3, 2022
dralley added a commit to dralley/quick-xml that referenced this issue Jun 19, 2022
dralley added a commit to dralley/quick-xml that referenced this issue Jun 20, 2022
dralley added a commit to dralley/quick-xml that referenced this issue Jun 20, 2022
@dralley
Copy link
Collaborator Author

dralley commented Jun 20, 2022

HTML doesn't seem to do any attribute normalization, so we ought to take that into account.

dralley added a commit to dralley/quick-xml that referenced this issue Jun 20, 2022
dralley added a commit to dralley/quick-xml that referenced this issue Jun 20, 2022
dralley added a commit to dralley/quick-xml that referenced this issue Jun 23, 2022
dralley added a commit to dralley/quick-xml that referenced this issue Jul 4, 2022
dralley added a commit to dralley/quick-xml that referenced this issue Jul 4, 2022
dralley added a commit to dralley/quick-xml that referenced this issue Jul 4, 2022
dralley added a commit to dralley/quick-xml that referenced this issue Jul 4, 2022
@dralley dralley self-assigned this Jul 15, 2022
dralley added a commit to dralley/quick-xml that referenced this issue Oct 31, 2022
dralley added a commit to dralley/quick-xml that referenced this issue Jan 7, 2023
dralley added a commit to dralley/quick-xml that referenced this issue Jan 7, 2023
dralley added a commit to dralley/quick-xml that referenced this issue Jan 24, 2023
dralley added a commit to dralley/quick-xml that referenced this issue Jan 30, 2023
dralley added a commit to dralley/quick-xml that referenced this issue Jan 31, 2023
dralley added a commit to dralley/quick-xml that referenced this issue Jan 31, 2023
dralley added a commit to dralley/quick-xml that referenced this issue Mar 13, 2023
dralley added a commit to dralley/quick-xml that referenced this issue Jun 19, 2023
dralley added a commit to dralley/quick-xml that referenced this issue Jun 19, 2023
dralley added a commit to dralley/quick-xml that referenced this issue Jul 10, 2023
dralley added a commit to dralley/quick-xml that referenced this issue Aug 11, 2023
dralley added a commit to dralley/quick-xml that referenced this issue Oct 23, 2023
dralley added a commit to dralley/quick-xml that referenced this issue Nov 12, 2023
dralley added a commit to dralley/quick-xml that referenced this issue Jun 14, 2024
dralley added a commit to dralley/quick-xml that referenced this issue Jun 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant