Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trying to reference the Cf category generates an error #719

Closed
justinmchase opened this issue Oct 13, 2020 · 1 comment
Closed

Trying to reference the Cf category generates an error #719

justinmchase opened this issue Oct 13, 2020 · 1 comment

Comments

@justinmchase
Copy link

justinmchase commented Oct 13, 2020

If I use the unicode class cf:

static ref RE: Regex = Regex::new(r"^[\p{cf}]$").unwrap();

It generates this error:

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
regex parse error:
    ^[\p{cf}]$
      ^^^^^^
error: Unicode property not found
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
)', src/runtime/modules/unicode.rs:21:28

But I see the format is here in the code:
https://github.com/rust-lang/regex/blob/master/regex-syntax/src/unicode_tables/property_values.rs#L72

If I change the regex to be \p{Format} then it actually works. This is a little confusing to me, can someone please explain why \p{cf} doesn't work?

@BurntSushi
Copy link
Member

BurntSushi commented Oct 13, 2020

This is fixed in regex 1.4.1 on crates.io. Thank you for the bug report! Good find.

jorickert pushed a commit to jorickert/regex that referenced this issue Aug 1, 2022
'sc' refers to the 'Currency_Symbol' general category,
 but is also the abbreviation for the 'Script' property.

 Fixes rust-lang#835
 Related rust-lang#719 b1489c8
jorickert pushed a commit to jorickert/regex that referenced this issue Aug 1, 2022
'sc' refers to the 'Currency_Symbol' general category,
but is also the abbreviation for the 'Script' property.

Fixes rust-lang#835
Related rust-lang#719 b1489c8
BurntSushi pushed a commit that referenced this issue Mar 4, 2023
'sc' refers to the 'Currency_Symbol' general category, but is also
the abbreviation for the 'Script' property. So when going through the
canonicalization process, it would get normalized to 'Script' before
being checked as a general category. We fix it by special casing it.

See also #719

Fixes #835, #899
BurntSushi pushed a commit that referenced this issue Mar 5, 2023
'sc' refers to the 'Currency_Symbol' general category, but is also
the abbreviation for the 'Script' property. So when going through the
canonicalization process, it would get normalized to 'Script' before
being checked as a general category. We fix it by special casing it.

See also #719

Fixes #835, #899
BurntSushi pushed a commit that referenced this issue Mar 21, 2023
'sc' refers to the 'Currency_Symbol' general category, but is also
the abbreviation for the 'Script' property. So when going through the
canonicalization process, it would get normalized to 'Script' before
being checked as a general category. We fix it by special casing it.

See also #719

Fixes #835, #899
BurntSushi pushed a commit that referenced this issue Apr 15, 2023
'sc' refers to the 'Currency_Symbol' general category, but is also
the abbreviation for the 'Script' property. So when going through the
canonicalization process, it would get normalized to 'Script' before
being checked as a general category. We fix it by special casing it.

See also #719

Fixes #835, #899
BurntSushi pushed a commit that referenced this issue Apr 17, 2023
'sc' refers to the 'Currency_Symbol' general category, but is also
the abbreviation for the 'Script' property. So when going through the
canonicalization process, it would get normalized to 'Script' before
being checked as a general category. We fix it by special casing it.

See also #719

Fixes #835, #899
BurntSushi pushed a commit that referenced this issue Apr 17, 2023
'sc' refers to the 'Currency_Symbol' general category, but is also
the abbreviation for the 'Script' property. So when going through the
canonicalization process, it would get normalized to 'Script' before
being checked as a general category. We fix it by special casing it.

See also #719

Fixes #835, #899
BurntSushi pushed a commit that referenced this issue Apr 17, 2023
'sc' refers to the 'Currency_Symbol' general category, but is also
the abbreviation for the 'Script' property. So when going through the
canonicalization process, it would get normalized to 'Script' before
being checked as a general category. We fix it by special casing it.

See also #719

Fixes #835, #899
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants