-
Notifications
You must be signed in to change notification settings - Fork 173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create wrapper structs and interfaces for properties data structures and payloads #1333
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, please remove the 4th commit from this PR.
@@ -14,7 +14,7 @@ components/icu4x/ @unicode-org/icu4x-owners | |||
components/locale_canonicalizer/ @dminor @zbraniecki | |||
components/locid/ @zbraniecki @nciric | |||
components/plurals/ @zbraniecki @sffc | |||
components/uniset/ @echeran @iainireland | |||
components/codepointset/ echeran @iainireland |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please restore yourself
@@ -468,7 +468,7 @@ jobs: | |||
- components/locid | |||
- components/plurals | |||
- components/datetime | |||
- utils/uniset | |||
- utils/codepointset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reminder to update the gh_pages branch
@@ -57,7 +57,7 @@ ICU4X 0.1’s selection of components aimed to: | |||
|
|||
- Validate low-level models around [data management](https://github.com/unicode-org/icu4x/blob/master/docs/design/data_pipeline.md) | |||
- Establish project culture using simple building blocks such as [Locale](https://docs.rs/icu_locid/0.1.0/icu_locid/) and [Plural Rules](https://docs.rs/icu_plurals/0.1.0/icu_plurals/) | |||
- Introduce a low-level foundation for string operations with [UnicodeSet](https://docs.rs/icu_datetime/0.1.0/icu_datetime/) | |||
- Introduce a low-level foundation for string operations with [UnicodeSet](https://docs.rs/icu_codepointset/0.1.0/icu_codepointset/) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: Change this to go to icu_uniset/0.1.0
@@ -117,7 +117,7 @@ Mozilla currently maintains [its own segmentation engine](https://searchfox.org/ | |||
|
|||
Replacing lwbrk with ICU4C for layout needs would require a substantial effort to customize data management to handle Mozilla’s custom data. | |||
|
|||
ICU4X offers an environment in which we are developing a new Unicode UAX#14/UAX#29 compatible segmentation API which will use the foundational [UnicodeSet API](https://docs.rs/icu_uniset/0.1.0/icu_uniset/), and fit the needs of both Layout and ECMA-402. | |||
ICU4X offers an environment in which we are developing a new Unicode UAX#14/UAX#29 compatible segmentation API which will use the foundational [CodePointSet API](https://docs.rs/icu_codepointset/0.1.0/icu_codepointset/), and fit the needs of both Layout and ECMA-402. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either revert or link to the staging docs
@@ -33,7 +33,7 @@ The data struct definitions should live in the crate that uses them. By conventi | |||
|
|||
- `icu::decimal::provider::DecimalSymbolsV1` | |||
- `icu::locale_canonicalizer::provider::LikelySubtagsV1` | |||
- `icu::uniset::provider::UnicodePropertyV1` | |||
- `icu::codepointset::provider::UnicodePropertyV1` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: This should be icu::properties anyway
//! It is an implementation of the existing [ICU4C UnicodeSet API](https://unicode-org.github.io/icu-docs/apidoc/released/icu4c/classicu_1_1UnicodeSet.html). | ||
//! It is an implementation of the existing [ICU4C CodePointSet API](https://unicode-org.github.io/icu-docs/apidoc/released/icu4c/classicu_1_1CodePointSet.html). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: This one should not change
@@ -70,6 +66,9 @@ | |||
|
|||
#![cfg_attr(not(any(test, feature = "std")), no_std)] | |||
|
|||
extern crate alloc; | |||
|
|||
pub mod api_util; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Make the mod non-pub
|
||
fn contains_u32(&self, code_point: u32) -> bool; | ||
|
||
// return type cannot be `impl ExactSizeIterator<Item = RangeInclusive<u32>>;`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The type needs to go the trait as an associated type. However, we should discuss whether we want this all in one trait or whether to split this into smaller traits.
pub trait CodePointSetLike {
type Iter: ExactSizeIterator<Item = RangeInclusive<u32>>
}
self.payload.get().inv_list.contains_u32(code_point) | ||
} | ||
|
||
fn iter_ranges(&self) -> Vec<u32> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This becomes -> Self::Iter
fn iter_ranges(&self) -> Vec<u32>; | ||
} | ||
|
||
impl CodePointSetLike for CodePointSetData { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add the fully qualified iterator type here
@@ -70,6 +66,9 @@ | |||
|
|||
#![cfg_attr(not(any(test, feature = "std")), no_std)] | |||
|
|||
extern crate alloc; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't forget to delete this after removing the Vec
pub payload: DataPayload<UnicodePropertyV1Marker>, | ||
} | ||
|
||
pub trait CodePointSetLike { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Issue: This trait should be in icu_codepointset
@@ -468,7 +468,7 @@ jobs: | |||
- components/locid | |||
- components/plurals | |||
- components/datetime | |||
- utils/uniset | |||
- utils/codepointset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, let's NOT change the crate name. Only change the type name
//! properties. See the [`sets`] module for more details. | ||
//! | ||
//! APIs that return a [`CodePointTrie`] exist for certain enumerated properties. See the | ||
//! [`maps`] module for more details. | ||
//! | ||
//! # Examples | ||
//! | ||
//! ## Property data as `UnicodeSet`s | ||
//! ## Property data as `CodePointSet`s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed: CodePointInversionList
Fixes #1239