Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix missing range checks #23

Merged
merged 5 commits into from
Jul 5, 2018

Conversation

sgeisler
Copy link
Contributor

@sgeisler sgeisler commented Jun 22, 2018

Fixes #22.

TODO:

  • rewrite from_str_lenient to catch all bad characters
  • adapt tests that use Error::InvalidChar since its signature was updated
  • validate that all bad characters are -1 in CHARSET_REV
  • write tests to show the bug is fixed
  • have a look at the checks for mixed case, they seem overly complicated

sgeisler added 2 commits June 22, 2018 17:38
Formerly input strings were processed byte-wise, since it happens now character-wise the tests which result in Error::InvalidChar(c)'s had to be changed.
@sgeisler sgeisler force-pushed the fix-missing-range-checks branch from f21e548 to 866e15f Compare June 22, 2018 21:24
@sgeisler
Copy link
Contributor Author

The tests fail due to rustup issues with nightly: rust-lang/rust#51699.

}

// Uppercase
let c = if b >= b'A' && b <= b'Z' {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was dropping conversion to lower case intentional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is intentional. When I looked at CHARSET_REV I noticed that there are entries for both cases, so the conversion to lowercase should be redundant. But I still have to verify that CHARSET_REV actually satisfies all my assumptions about it, that's one of the reasons this PR is still WIP.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this use of CHARSET_REV in testing for proper range.

// Lowercase
if b >= b'a' && b <= b'z' {

if c.is_lowercase() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The is_lowercase and is_uppercase functions are probably slower (since they work for all Unicode characters) than the simple range checks used before, but they seem so much more idiomatic. The ascii-equivalent (is_ascii_lowercase) is still nightly-only, so if we wanted to avoid handwritten range checks we had to use our own trait for that. But I don't see the need for such optimizations right now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation of is_lowercase is:

    pub fn is_lowercase(self) -> bool {
        match self {
            'a'...'z' => true,
            c if c > '\x7f' => derived_property::Lowercase(c),
            _ => false,
        }
    }

So for ASCII characters, it should be fairly performant. is_uppercase has a similar structure.

// Lowercase
if b >= b'a' && b <= b'z' {

if c.is_lowercase() {
has_lower = true;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't find a better (and preferably shorter/simpler) way of doing the checks for "all characters have the same case". At least none that doesn't need some bigger refactoring of the HRP processing, so I will leave it for now and maybe open another PR dedicated to refactoring.

@sgeisler sgeisler force-pushed the fix-missing-range-checks branch from 3f510a5 to 7a8c1d4 Compare June 25, 2018 23:05
@sgeisler sgeisler changed the title WIP: fix missing range checks Fix missing range checks Jun 25, 2018
@sgeisler
Copy link
Contributor Author

Since this seems to be only an internal bug fix I increased the version from 0.4.1 to 0.4.2. Can anyone review the changes, please? @clarkmoody @tamasblummer

@sgeisler sgeisler requested a review from clarkmoody June 25, 2018 23:07
@@ -402,7 +397,7 @@ pub enum Error {
/// The data or human-readable part is too long or too short
InvalidLength,
/// Some part of the string contains an invalid character
InvalidChar(u8),
InvalidChar(char),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this enum variant change a breaking change that needs a major version bump?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, sorry, I will have to bump the version to 0.5.0. Would you generally agree that it's better to process unicode strings char-wise instead of their UTF-8 representation byte-wise? Because this change is what made this API break necessary.

@sgeisler sgeisler force-pushed the fix-missing-range-checks branch from 7a8c1d4 to afa37d1 Compare June 27, 2018 19:25
@clarkmoody clarkmoody merged commit abb419a into rust-bitcoin:master Jul 5, 2018
@clarkmoody
Copy link
Member

Pushed 0.5.0 to crates.io

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Panic when decoding bech32 string containing an uppercase 'O'
3 participants