-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix missing range checks #23
Fix missing range checks #23
Conversation
Formerly input strings were processed byte-wise, since it happens now character-wise the tests which result in Error::InvalidChar(c)'s had to be changed.
f21e548
to
866e15f
Compare
The tests fail due to rustup issues with nightly: rust-lang/rust#51699. |
} | ||
|
||
// Uppercase | ||
let c = if b >= b'A' && b <= b'Z' { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was dropping conversion to lower case intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is intentional. When I looked at CHARSET_REV
I noticed that there are entries for both cases, so the conversion to lowercase should be redundant. But I still have to verify that CHARSET_REV
actually satisfies all my assumptions about it, that's one of the reasons this PR is still WIP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this use of CHARSET_REV
in testing for proper range.
// Lowercase | ||
if b >= b'a' && b <= b'z' { | ||
|
||
if c.is_lowercase() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The is_lowercase
and is_uppercase
functions are probably slower (since they work for all Unicode characters) than the simple range checks used before, but they seem so much more idiomatic. The ascii-equivalent (is_ascii_lowercase
) is still nightly
-only, so if we wanted to avoid handwritten range checks we had to use our own trait for that. But I don't see the need for such optimizations right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation of is_lowercase
is:
pub fn is_lowercase(self) -> bool {
match self {
'a'...'z' => true,
c if c > '\x7f' => derived_property::Lowercase(c),
_ => false,
}
}
So for ASCII characters, it should be fairly performant. is_uppercase
has a similar structure.
// Lowercase | ||
if b >= b'a' && b <= b'z' { | ||
|
||
if c.is_lowercase() { | ||
has_lower = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't find a better (and preferably shorter/simpler) way of doing the checks for "all characters have the same case". At least none that doesn't need some bigger refactoring of the HRP processing, so I will leave it for now and maybe open another PR dedicated to refactoring.
3f510a5
to
7a8c1d4
Compare
Since this seems to be only an internal bug fix I increased the version from |
@@ -402,7 +397,7 @@ pub enum Error { | |||
/// The data or human-readable part is too long or too short | |||
InvalidLength, | |||
/// Some part of the string contains an invalid character | |||
InvalidChar(u8), | |||
InvalidChar(char), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this enum variant change a breaking change that needs a major version bump?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, sorry, I will have to bump the version to 0.5.0
. Would you generally agree that it's better to process unicode strings char-wise instead of their UTF-8 representation byte-wise? Because this change is what made this API break necessary.
7a8c1d4
to
afa37d1
Compare
Pushed |
Fixes #22.
TODO:
from_str_lenient
to catch all bad charactersError::InvalidChar
since its signature was updated-1
inCHARSET_REV