Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create wrapper structs and interfaces for properties data structures and payloads #1333

Closed
wants to merge 4 commits into from

Conversation

echeran
Copy link
Contributor

@echeran echeran commented Nov 19, 2021

Fixes #1239

@echeran echeran requested a review from sffc November 19, 2021 21:29
Copy link
Member

@sffc sffc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, please remove the 4th commit from this PR.

@@ -14,7 +14,7 @@ components/icu4x/ @unicode-org/icu4x-owners
components/locale_canonicalizer/ @dminor @zbraniecki
components/locid/ @zbraniecki @nciric
components/plurals/ @zbraniecki @sffc
components/uniset/ @echeran @iainireland
components/codepointset/ echeran @iainireland
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please restore yourself

@@ -468,7 +468,7 @@ jobs:
- components/locid
- components/plurals
- components/datetime
- utils/uniset
- utils/codepointset
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reminder to update the gh_pages branch

@@ -57,7 +57,7 @@ ICU4X 0.1’s selection of components aimed to:

- Validate low-level models around [data management](https://github.com/unicode-org/icu4x/blob/master/docs/design/data_pipeline.md)
- Establish project culture using simple building blocks such as [Locale](https://docs.rs/icu_locid/0.1.0/icu_locid/) and [Plural Rules](https://docs.rs/icu_plurals/0.1.0/icu_plurals/)
- Introduce a low-level foundation for string operations with [UnicodeSet](https://docs.rs/icu_datetime/0.1.0/icu_datetime/)
- Introduce a low-level foundation for string operations with [UnicodeSet](https://docs.rs/icu_codepointset/0.1.0/icu_codepointset/)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Change this to go to icu_uniset/0.1.0

@@ -117,7 +117,7 @@ Mozilla currently maintains [its own segmentation engine](https://searchfox.org/

Replacing lwbrk with ICU4C for layout needs would require a substantial effort to customize data management to handle Mozilla’s custom data.

ICU4X offers an environment in which we are developing a new Unicode UAX#14/UAX#29 compatible segmentation API which will use the foundational [UnicodeSet API](https://docs.rs/icu_uniset/0.1.0/icu_uniset/), and fit the needs of both Layout and ECMA-402.
ICU4X offers an environment in which we are developing a new Unicode UAX#14/UAX#29 compatible segmentation API which will use the foundational [CodePointSet API](https://docs.rs/icu_codepointset/0.1.0/icu_codepointset/), and fit the needs of both Layout and ECMA-402.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either revert or link to the staging docs

@@ -33,7 +33,7 @@ The data struct definitions should live in the crate that uses them. By conventi

- `icu::decimal::provider::DecimalSymbolsV1`
- `icu::locale_canonicalizer::provider::LikelySubtagsV1`
- `icu::uniset::provider::UnicodePropertyV1`
- `icu::codepointset::provider::UnicodePropertyV1`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: This should be icu::properties anyway

//! It is an implementation of the existing [ICU4C UnicodeSet API](https://unicode-org.github.io/icu-docs/apidoc/released/icu4c/classicu_1_1UnicodeSet.html).
//! It is an implementation of the existing [ICU4C CodePointSet API](https://unicode-org.github.io/icu-docs/apidoc/released/icu4c/classicu_1_1CodePointSet.html).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: This one should not change

@@ -70,6 +66,9 @@

#![cfg_attr(not(any(test, feature = "std")), no_std)]

extern crate alloc;

pub mod api_util;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Make the mod non-pub


fn contains_u32(&self, code_point: u32) -> bool;

// return type cannot be `impl ExactSizeIterator<Item = RangeInclusive<u32>>;`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type needs to go the trait as an associated type. However, we should discuss whether we want this all in one trait or whether to split this into smaller traits.

pub trait CodePointSetLike {
    type Iter: ExactSizeIterator<Item = RangeInclusive<u32>>
}

self.payload.get().inv_list.contains_u32(code_point)
}

fn iter_ranges(&self) -> Vec<u32> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This becomes -> Self::Iter

fn iter_ranges(&self) -> Vec<u32>;
}

impl CodePointSetLike for CodePointSetData {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add the fully qualified iterator type here

@@ -70,6 +66,9 @@

#![cfg_attr(not(any(test, feature = "std")), no_std)]

extern crate alloc;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't forget to delete this after removing the Vec

pub payload: DataPayload<UnicodePropertyV1Marker>,
}

pub trait CodePointSetLike {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: This trait should be in icu_codepointset

@@ -468,7 +468,7 @@ jobs:
- components/locid
- components/plurals
- components/datetime
- utils/uniset
- utils/codepointset
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, let's NOT change the crate name. Only change the type name

//! properties. See the [`sets`] module for more details.
//!
//! APIs that return a [`CodePointTrie`] exist for certain enumerated properties. See the
//! [`maps`] module for more details.
//!
//! # Examples
//!
//! ## Property data as `UnicodeSet`s
//! ## Property data as `CodePointSet`s
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed: CodePointInversionList

@echeran
Copy link
Contributor Author

echeran commented Aug 9, 2022

Closing this old draft PR because it has been superseded by the subsequent proper design work and PRs #2112 and #2140 and other related PRs.

@echeran echeran closed this Aug 9, 2022
@echeran echeran deleted the props-return-types branch June 8, 2023 01:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Revisit return values of unicode property functions
2 participants