-
Notifications
You must be signed in to change notification settings - Fork 243
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes in unescape routine #771
Conversation
Codecov ReportAttention: Patch coverage is
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## master #771 +/- ##
==========================================
- Coverage 61.81% 60.16% -1.66%
==========================================
Files 41 41
Lines 16798 16138 -660
==========================================
- Hits 10384 9709 -675
- Misses 6414 6429 +15
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
…dEntity` and tweak error message
XML allows any number of leading zeroes. That is not explicitly mentioned in the specification, but enforced by the conformance test suite (https://www.w3.org/XML/Test/)
src/escape.rs
Outdated
// We should not allow sign numbers, but u32::from_str_radix will accept them. | ||
// Because we cannot directly construct necessary error, we use the function | ||
// which would return it for us | ||
b'+' | b'-' => u32::from_str_radix("=", 10), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like this very much, personally. I can see the appeal of using from_str_radix
and ParseIntError
, the overlap is pretty significant, but this is definitely a hack around the mismatch in semantics. I'd rather define our own ParseIntError
(or equivalent) that is easily convertible from the original and can be easily returned on its own. You can copy the variants and descriptions straight from the original.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After thinking I realized that here we needed our own type of error in any case (ParseCharRefError
), into which I moved the variants that make sense only when parsing the character reference.
Negative sing will return ParseIntError(InvalidDigit) because we parse into unsigned number, but `+` sign is allowed by the parse method. To return the same error for both `+` and `-` we check for sign itself before parse
/// | ||
/// Currently, only `0x0` character produces this error. | ||
IllegalCharacter(u32), | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
// We should not allow sign numbers, but u32::from_str_radix will accept `+`. | ||
// We also handle `-` to be consistent in returned errors | ||
Some(b'+') | Some(b'-') => Err(ParseCharRefError::UnexpectedSign), | ||
_ => u32::from_str_radix(src, radix).map_err(ParseCharRefError::InvalidNumber), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Much better, thanks
This PR fixes 2 errors that I found when working on a new way to represent entity references:
quick_xml
does not allow too long numbers to represent character references. It, however, does not takes in account, that any number of leading zeroes are allowed, so the real character entity length can be any. Conformance test suite contains such tests.Attribute::unescape_value
no longer unescaped predefined entities. That was fixed and checked that no similar places remainsCorresponding test (xmltest\valid\sa\042.xml -- one of):