Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove allocating datetime skeletons (DateSkeletonPatternsV1, datetime/skeletons@1) #1678

Closed
4 tasks done
Manishearth opened this issue Mar 8, 2022 · 12 comments · Fixed by #5612
Closed
4 tasks done
Assignees
Labels
C-datetime Component: datetime, calendars, time zones help wanted Issue needs an assignee question Unresolved questions; type unclear

Comments

@Manishearth
Copy link
Member

Manishearth commented Mar 8, 2022

part of #876, part of #856

DateSkeletonPatternsV1 is one of our final remaining non-zero-copy types. It's a rather complicated tree:

pub struct DateSkeletonPatternsV1<'data>(
    pub LiteMap<SkeletonV1, PatternPlurals<'data>>,
);

pub struct SkeletonV1(pub Skeleton);
pub struct Skeleton(pub(crate) SmallVec<[fields::Field; 5]>);

// Has a ULE type already
pub struct Field {
    pub symbol: FieldSymbol,
    pub length: FieldLength,
}

pub enum PatternPlurals<'data> {
    /// A collection of pattern variants for when plurals differ.
    MultipleVariants(PluralPattern<'data>),
    /// A single pattern.
    SinglePattern(Pattern<'data>),
}

pub struct PluralPattern<'data> {
    /// The field that 'variants' are predicated on.
    pivot_field: Week,
    pub(crate) zero: Option<Pattern<'data>>,
    pub(crate) one: Option<Pattern<'data>>,
    pub(crate) two: Option<Pattern<'data>>,
    pub(crate) few: Option<Pattern<'data>>,
    pub(crate) many: Option<Pattern<'data>>,
    pub(crate) other: Pattern<'data>,
}

pub struct Pattern<'data> {
    pub items: ZeroVec<'data, PatternItem>,
    pub(crate) time_granularity: TimeGranularity,
}

There are two parts to this: firstly, the skeleton needs to be made zero-copy, and then PatternPlurals. Both need to be VarULE or ULE to work inside a ZeroMap.

Skeleton

Skeleton seems easy, we replace Skeleton with #[make_varule] struct Skeleton(ZeroVec<'data, Fields>). I'm a bit worried that this will make lookup rather slow (since Ord will be slower): but also as far as I can tell, we never use the LiteMap as an actual map at use time, we only iterate the map in order.

Another option is to replace Skeleton with the unparsed skeleton string.

PatternPlurals

I'd rather not write a custom ULE type here, but ultimately, we can. I do think, however, we can get most of the benefits by restructuring this type a bit.

Basically, this type can be structured as

#[make_varule]
struct  PatternPlurals<'data> {
    pivot_field: Option<Week>, // this needs a ULE impl for Option, which can be done.
    patterns: VarZeroVec<'data, VarZeroSlice<Option<Pattern>>>,
}

#[make_varule]
pub struct Pattern<'data> {
    pub(crate) time_granularity: TimeGranularity,
    pub items: ZeroVec<'data, PatternItem>,
}

We will need an AsULE implementation on Option<T: AsULE> as well as a VarULE implementation on Option<T: VarULE> in cases where we can guarantee that T never has length zero (this can be done with an additional trait).

Then, PatternPlurals::patterns becomes a vector that must have at least one non-None element in the beginning, and the rest of the elements can be nonexistent or None (or we can guarantee that it either has length 1 or length 6).

Thoughts?

Feedback requested:

@Manishearth Manishearth added the needs-approval One or more stakeholders need to approve proposal label Mar 8, 2022
@sffc
Copy link
Member

sffc commented Mar 8, 2022

I wish we could avoid making PatternPlurals a ULE.

This part of the datetime provider may be changing as part of #1317. In that issue, the skeletons will no longer be an unlimited map; there may be a limited number of possible skeletons, such that we can just make a big struct and let Postcard do the work instead of ZeroVec.

As a general rule, if we start having very deeply nested ZeroMaps, we should rethink the architecture of the data payload. Putting more into the Postcard part of the deserialization is also better for CrabBake, since CrabBake can bypass Postcard but it can't bypass ZeroVec.

@Manishearth
Copy link
Member Author

As a general rule, if we start having very deeply nested ZeroMaps, we should rethink the architecture of the data payload. Putting more into the Postcard part of the deserialization is also better for CrabBake, since CrabBake can bypass Postcard but it can't bypass ZeroVec.

I strongly agree with this. Though one hazard with making it a big struct is that adding fields in the future is a breaking change. We could still make that work with an array but we're back to nesting zerovecs and needing ULEs.

I accidentally posted this issue before I was done but I was considering advocating for a case where we do more of the parsing at formatter construction time instead of mandating it be zero-copy. But if this is changing in #1317, that works too.

If we're going for a big struct, we can block this on #1317 and move on. @gregtatum , do you have an idea of how long #1317 will take? Moving everything to zero-copy does block CrabBake, and while we can have components opt out of CrabBake I think it would be nice if DTF didn't need to.

@sffc
Copy link
Member

sffc commented Mar 8, 2022

Moving everything to zero-copy does block CrabBake, and while we can have components opt out of CrabBake I think it would be nice if DTF didn't need to.

My preference for a short-term solution, if we need it, would be to put skeletons behind a feature flag, so that DTF with default features is zero-copy.

@Manishearth
Copy link
Member Author

Yeah I think I can just make it so that CrabBake's provider is unable to provide certain keys

@gregtatum
Copy link
Member

Another option is to replace Skeleton with the unparsed skeleton string.

I'm not sure with Zero copy if this adds anything, as you would then have to parse the skeleton, and do similar validation work. I'm not as familiar with the implementation details of zero copy to really speak with expertise on it though.

@Manishearth
Copy link
Member Author

I'm not sure with Zero copy if this adds anything, as you would then have to parse the skeleton, and do similar validation work. I'm not as familiar with the implementation details of zero copy to really speak with expertise on it though.

Yep; we don't get any zc perf benefit, it just enables crab-bake (which requires everything to be zerocopy)

@Manishearth
Copy link
Member Author

Manishearth commented Mar 10, 2022

I would also prefer to not go down the "data zero-copy but we allocate afterwards" route though

@zbraniecki
Copy link
Member

that looks good to me. I'd suggest making sure that the PatternPlurals encapsulation is really good so that even internal consumers never think of it as an array and never have to reason about which plural category 3rd element is.

@gregtatum
Copy link
Member

it just enables crab-bake

In that case that seems fine to me.

@nordzilla
Copy link
Member

LGTM after reading the comments. I don't think I have anything else to add.

@sffc sffc added question Unresolved questions; type unclear C-datetime Component: datetime, calendars, time zones and removed needs-approval One or more stakeholders need to approve proposal labels Mar 31, 2022
@sffc sffc added this to the ICU4X 1.1 milestone May 26, 2022
@sffc sffc added the help wanted Issue needs an assignee label May 26, 2022
@robertbastian
Copy link
Member

This needs to be removed

@robertbastian robertbastian removed their assignment Jul 11, 2024
@sffc sffc changed the title Zero-copy datetime skeletons (DateSkeletonPatternsV1, datetime/skeletons@1) Remove allocating datetime skeletons (DateSkeletonPatternsV1, datetime/skeletons@1) Sep 17, 2024
@sffc
Copy link
Member

sffc commented Sep 28, 2024

The data has been removed, and I'm removing the marker in #5612

To track a longer-term home for these things, I created #5611

@sffc sffc closed this as completed in e682005 Sep 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-datetime Component: datetime, calendars, time zones help wanted Issue needs an assignee question Unresolved questions; type unclear
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants