Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
48: Char property macro 2.0 r=behnam Replaces #41. See #41 for earlier discussion. An example will show better than I can tell: ```rust char_property! { /// Represents the Unicode character /// [*Bidi_Class*](http://www.unicode.org/reports/tr44/#Bidi_Class) property, /// also known as the *bidirectional character type*. /// /// * <http://www.unicode.org/reports/tr9/#Bidirectional_Character_Types> /// * <http://www.unicode.org/reports/tr44/#Bidi_Class_Values> pub enum BidiClass { /// Any strong left-to-right character /// /// ***General Scope*** /// /// LRM, most alphabetic, syllabic, Han ideographs, /// non-European or non-Arabic digits, ... LeftToRight { abbr => L, long => Left_To_Right, display => "Left-to-Right", } /// Any strong right-to-left (non-Arabic-type) character /// /// ***General Scope*** /// /// RLM, Hebrew alphabet, and related punctuation RightToLeft { abbr => R, long => Right_To_Left, display => "Right-to-Left", } /// Any strong right-to-left (Arabic-type) character /// /// ***General Scope*** /// /// ALM, Arabic, Thaana, and Syriac alphabets, /// most punctuation specific to those scripts, ... ArabicLetter { abbr => AL, long => Arabic_Letter, display => "Right-to-Left Arabic", } } } /// Abbreviated name bindings for the `BidiClass` property pub mod abbr_names for abbr; /// Name bindings for the `BidiClass` property as they appear in Unicode documentation pub mod long_names for long; ``` expands to: ```rust /// Represents the Unicode character /// [*Bidi_Class*](http://www.unicode.org/reports/tr44/#Bidi_Class) property, /// also known as the *bidirectional character type*. /// /// * <http://www.unicode.org/reports/tr9/#Bidirectional_Character_Types> /// * <http://www.unicode.org/reports/tr44/#Bidi_Class_Values> #[allow(bad_style)] #[derive(Copy, Clone, Debug, Eq, PartialEq, Hash)] pub enum BidiClass { /// Any strong left-to-right character LeftToRight, /// Any strong right-to-left (non-Arabic-type) character RightToLeft, /// Any strong right-to-left (Arabic-type) character ArabicLetter, } /// Abbreviated name bindings for the `BidiClass` property #[allow(bad_style)] pub mod abbr_names { pub use super::BidiClass::LeftToRight as L; pub use super::BidiClass::RightToLeft as R; pub use super::BidiClass::ArabicLetter as AL; } /// Name bindings for the `BidiClass` property as they appear in Unicode documentation #[allow(bad_style)] pub mod long_names { pub use super::BidiClass::LeftToRight as Left_To_Right; pub use super::BidiClass::RightToLeft as Right_To_Left; pub use super::BidiClass::ArabicLetter as Arabic_Letter; } #[allow(bad_style)] #[allow(unreachable_patterns)] impl ::std::str::FromStr for BidiClass { type Err = (); fn from_str(s: &str) -> Result<Self, Self::Err> { match s { "LeftToRight" => Ok(BidiClass::LeftToRight), "RightToLeft" => Ok(BidiClass::RightToLeft), "ArabicLetter" => Ok(BidiClass::ArabicLetter), "L" => Ok(BidiClass::LeftToRight), "R" => Ok(BidiClass::RightToLeft), "AL" => Ok(BidiClass::ArabicLetter), "Left_To_Right" => Ok(BidiClass::LeftToRight), "Right_To_Left" => Ok(BidiClass::RightToLeft), "Arabic_Letter" => Ok(BidiClass::ArabicLetter), _ => Err(()), } } } #[allow(bad_style)] #[allow(unreachable_patterns)] impl ::std::fmt::Display for BidiClass { fn fmt(&self, f: &mut ::std::fmt::Formatter) -> ::std::fmt::Result { match *self { BidiClass::LeftToRight => write!(f, "{}", "Left-to-Right"), BidiClass::RightToLeft => write!(f, "{}", "Right-to-Left"), BidiClass::ArabicLetter => write!(f, "{}", "Right-to-Left Arabic"), BidiClass::LeftToRight => write!(f, "{}", "Left_To_Right".replace('_', " ")), BidiClass::RightToLeft => write!(f, "{}", "Right_To_Left".replace('_', " ")), BidiClass::ArabicLetter => write!(f, "{}", "Arabic_Letter".replace('_', " ")), _ => { write!( f, "{}", match *self { BidiClass::LeftToRight => "L", BidiClass::RightToLeft => "R", BidiClass::ArabicLetter => "AL", BidiClass::LeftToRight => "LeftToRight", BidiClass::RightToLeft => "RightToLeft", BidiClass::ArabicLetter => "ArabicLetter", } ) } } } } #[allow(bad_style)] impl ::char_property::EnumeratedCharProperty for BidiClass { fn abbr_name(&self) -> &'static str { match *self { BidiClass::LeftToRight => "L", BidiClass::RightToLeft => "R", BidiClass::ArabicLetter => "AL", } } fn all_values() -> &'static [BidiClass] { const VALUES: &[BidiClass] = &[ BidiClass::LeftToRight, BidiClass::RightToLeft, BidiClass::ArabicLetter, ]; VALUES } } ``` All three of the `abbr`, `long`, and `display` properties of the enum are optional, and have sane fallbacks: `abbr_name` and `long_name` return `None` if unspecified, and `fmt::Display` will check, in order, for `display`, `long_name`, `abbr_name`, and the variant name until it finds one to use (stringified, of course). `FromStr` is defined, matching against any of the provided `abbr`, `long`, and variant name. <hr /> Important notes: - <strike>The current format uses associated consts, so it works on beta but won't work on stable until 1.20 is stable.</strike> - Consts have a slightly different meaning than `pub use` -- `pub use` aliases the type where `const` is a new object and if used in pattern matching is a `==` call and not a pattern match. - For this reason I'm actually slightly leaning towards using `pub use` even once associated consts land; they're compartmentalized (so `use Property::*` doesn't pull in 3x as many symbols as there are variants). After using the const based aliasing for a little bit, I'm inclined to like the current solution of `unic::ucd::bidi::BidiClass::*` + `unic::ucd::bidi::bidi_class::abbr_names::*`. These really should be a `pub use` and not a `const`. - Note that I still think `const` are the way to go for cases like `Canonical_Combining_Class`, though. - <strike>The current syntax could easily be adapted to use modules instead of associated consts, but was written with the associated consts so we could get a feel of how it would look with them.</strike> - The zero-or-more meta match before a enum variant conflicts with the ident match before 1.20. See rust-lang/rust#42913, rust-lang/rust#24189 - There only tests of the macro are rather thin and could be expanded. - It's a macro, so the response when you stick stuff not matching the expected pattern is cryptic at best. - The `CharProperty` trait is pretty much the lowest common denominator. It's a starting point, and we can iterate from there. - How and where do we want to make `CharProperty` a externally visible trait? Currently having it in namespace is the only way to access `abbr_name` and `long_name`. - <strike>Earlier discussion suggested putting these into `unic::utils::char_property`. Moving it would be simple, but for now it's living in the root of `unic-utils`</strike> - <strike>The crate `unic-utils` is currently in the workspace by virtue of being a dependency of `unic`, but is not in any way visible a crate depending on `unic`.</strike> - <strike>Documentation doesn't exist.</strike>
- Loading branch information