Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improving or getting rid of the DataPayload RcStruct variant #1262

Closed
1 of 2 tasks
Manishearth opened this issue Nov 4, 2021 · 24 comments · Fixed by #1299
Closed
1 of 2 tasks

Improving or getting rid of the DataPayload RcStruct variant #1262

Manishearth opened this issue Nov 4, 2021 · 24 comments · Fixed by #1299
Assignees
Labels
C-data-infra Component: provider, datagen, fallback, adapters help wanted Issue needs an assignee S-medium Size: Less than a week (larger bug fix or enhancement) T-techdebt Type: ICU4X code health and tech debt

Comments

@Manishearth
Copy link
Member

Manishearth commented Nov 4, 2021

Related: #1151

(I've been discussing this with @sffc and wanted to have this on file so I don't forget the details)

DataPayload has the following innards:

pub(crate) enum DataPayloadInner<'data, M>
where
M: DataMarker<'data>,
{
RcStruct(Yoke<M::Yokeable, Rc<M::Cart>>),
Owned(Yoke<M::Yokeable, ()>),
RcBuf(Yoke<M::Yokeable, Rc<[u8]>>),
}

The Owned and RcBuf variants are relatively easy to deal with and make use of Yoke as originally intended. RcStruct is weird, it's essentially intended for creating a local data struct (say, Foo<'a>), and storing a borrowed reference to it. Typically, RcStruct will be something like RcStruct<Yoke<Foo<'static>, Rc<Foo<'data>>>, where 'data is borrowing from potentially local data.

This has a couple problems:

  • We have an extra lifetime that shows up anywhere. We've also unfortunately named it 'data (I think this is my fault), which we also use in data structs to refer to the lifetime of their backing zero-copy buffer. This lifetime in general complicates a lot of stuff
  • Projections are less useful (Converging a payload from two different projections #1151)
  • We have an extra associated type (M::Cart) to deal with, making code more confusing. We have to deal with two kinds of "cart" types -- the allocating cart types that Yoke typically expects, and the "inner" cart types that data payloads use (that get allocated behind an Rc)

There are a couple things we can do here to improve the status quo:

Keep the lifetime, but get rid of M::Cart

This fixes #1151. Essentially, the only reason we need cart in the yoke is to call its destructor. RcStruct can instead be written as:

RcStruct(Yoke<M::Yokeable, Rc<dyn Drop + 'data>>)

and have the same effect. Note that dyn Drop isn't actually legal, but we can define a dummy internal trait with no methods that is blanket implemented (trait objects always have destructors in their vtables), and add Yoke helpers that can erase carts with it.

Get rid of RcStruct

We can get rid of this entirely. This cleans up the lifetimes and types significantly (and fixes #1151), but it means that if someone wishes to create a DataPayload from locally borrowed data, they must construct it as Owned data. We could potentially add a ToOwned implementation (or something similar) to the ZeroCopyFrom custom derive that makes this pleasant to use when working with borrowed data.

We can potentially make this more pleasant in the future; another way (this may be more expensive than ToOwned , worth measuring) would be to postcard serialize the object and then construct the data payload as an RcBuf deserialization.

I am leaning towards getting rid of RcStruct entirely.

Feedback requested from

@Manishearth Manishearth added C-data-infra Component: provider, datagen, fallback, adapters discuss Discuss at a future ICU4X-SC meeting labels Nov 4, 2021
@sffc
Copy link
Member

sffc commented Nov 4, 2021

In the future, or perhaps as part of this issue, I would like to see the ability to plug a custom allocator into RcBuf. This could help optimize certain cases where clients want to use local data that we remove when adding RcStruct.

I also want to point out that in many cases RcBuf can be used without Serde. For example, a struct that contains only a single string can put that string as bytes into the RcBuf and then point to it; it doesn't need the Postcard machinery. A struct containing several strings could put a VarZeroVec to in the RcBuf. So we only really need to use Postcard in more complex cases.

@Manishearth
Copy link
Member Author

Manishearth commented Nov 4, 2021

Good point, we can get away with just using EncodeAsVarULE for this (which will be derivable eventually)

@sffc
Copy link
Member

sffc commented Nov 4, 2021

Good point, we can get away with just using EncodeAsVarULE for this (which will be derivable eventually)

Yes; I think this works for simpler types. For complex types that aren't compatible with EncodeAsVarULE, we can use Postcard.

@sffc
Copy link
Member

sffc commented Nov 5, 2021

2021-11-05:

Conclusion: move forward with "Get rid of RcStruct"

@sffc sffc added S-medium Size: Less than a week (larger bug fix or enhancement) T-techdebt Type: ICU4X code health and tech debt v1 and removed discuss Discuss at a future ICU4X-SC meeting labels Nov 5, 2021
@sffc sffc added this to the ICU4X 0.5 milestone Nov 5, 2021
@Manishearth
Copy link
Member Author

Prioritization input: unlikely to be blocking people in November, perhaps later

@sffc sffc added the help wanted Issue needs an assignee label Nov 5, 2021
@Manishearth
Copy link
Member Author

DataPayload::from_partial_owned() is used in some tricky ways for upcasting currently:

fn upcast(other: DataPayload<'static, M>) -> DataPayload<'static, ErasedDataStructMarker> {
use crate::data_provider::DataPayloadInner::*;
let cart: Rc<dyn ErasedDataStruct> = match other.inner {
RcStruct(yoke) => Rc::from(yoke),
Owned(yoke) => Rc::from(yoke),
RcBuf(yoke) => Rc::from(yoke),
};
DataPayload::from_partial_owned(cart)

fn upcast(other: DataPayload<'data, M>) -> DataPayload<'data, SerdeSeDataStructMarker> {
use crate::data_provider::DataPayloadInner::*;
let cart: Rc<dyn SerdeSeDataStruct<'data> + 'data> = match other.inner {
RcStruct(yoke) => Rc::from(yoke),
Owned(yoke) => Rc::from(yoke),
RcBuf(yoke) => Rc::from(yoke),
};
DataPayload::from_partial_owned(cart)

Looking at this we may still need a dyn Drop variant (though there may be ways to get rid of this: @sffc do you have ideas?), but we might still be able to get rid of the lifetime.

@Manishearth
Copy link
Member Author

Hmm, I think we can handle these by using projections instead, actually. Annoyingly this means that the steps are more inextricably linked so it's harder to do incrementally.

@Manishearth
Copy link
Member Author

I have this, but I'm hitting the HRTB resolution error (same one that's affecting my ZCF code) again.

Need to file an issue.

impl<'data, M> crate::dynutil::UpcastDataPayload<'data, M> for SerdeSeDataStructMarker
where
    M: DataMarker<'data>,
    M::Cart: IsCovariant<'data>,
    for<'a> &'a <M::Yokeable as Yokeable<'a>>::Output: serde::Serialize,
    for<'a> &'a <M::Yokeable as Yokeable<'a>>::Output: IsCovariant<'a>,
{
    fn upcast(other: DataPayload<'data, M>) -> DataPayload<'data, SerdeSeDataStructMarker> {
         use crate::data_provider::DataPayloadInner;
         match other.inner {
            DataPayloadInner::RcStruct(yoke) => {
                // same thing as before
                let cart: Rc<dyn SerdeSeDataStruct<'data> + 'data> = Rc::from(yoke);
                DataPayload::from_partial_owned(cart)
            }
            DataPayloadInner::Owned(yoke) => {
                let inner = DataPayloadInner::Owned(yoke.project(|payload , _| SerdeSeDataStructDynRef(Box::new(payload))));
                DataPayload {inner}
            }
            DataPayloadInner::RcBuf(yoke) => unimplemented!(),
         }
    }
}

@sffc
Copy link
Member

sffc commented Nov 5, 2021

Yeah, ideally we can solve upcast with a projection. One problem though is that you can't upcast or downcast something to ErasedDataStruct that has a non-static lifetime, which projection requires since projection operates on the shortened lifetime. Note that currently upcasting to ErasedDataStruct is only implemented on DataPayload<'static, M>.

@Manishearth
Copy link
Member Author

Manishearth commented Nov 5, 2021

Filed rust-lang/rust#90638 about the ICE/error

Current progress stashed in https://github.com/Manishearth/icu4x/tree/rm-rcstruct

@sffc
Copy link
Member

sffc commented Nov 9, 2021

Another use case: for Unicode Properties FFI, I want to map-project CodePointMapResult<'data, Script> to CodePointMapResult<'data, u32> in order to type-erase the Script, but I can't do that because of the Cart.

@Manishearth
Copy link
Member Author

Given that getting rid of the lifetime is going to be an annoying PR, do you think it's worth first getting rid of the Cart in a separate PR? That sounds much more manageable and gets us the immediate things we need without requiring a much gnarlier set of changes.

@sffc
Copy link
Member

sffc commented Nov 9, 2021

Does the dyn Drop + 'a solution work with the upcast/downcast?

@Manishearth
Copy link
Member Author

Yes because the cart stays the same type, I think

@sffc
Copy link
Member

sffc commented Nov 9, 2021

Cool. It solves some short term problems, but does it get us closer to the eventual goal of removing the lifetime? If so, then go for it.

@Manishearth
Copy link
Member Author

It doesn't get us further, and we're going to be getting rid of Cart anyway, so it makes that PR a little bit simpler :)

But it might not be worth doing if we're going to clean up that lifetime soon.

@sffc
Copy link
Member

sffc commented Nov 9, 2021

But it might not be worth doing if we're going to clean up that lifetime soon.

The approach of getting rid of RcStruct is blocked on an ICE, right? Are there any other workarounds, and do you have a sense of when the ICE might be fixed?

@Manishearth
Copy link
Member Author

So, I'm not sure. The incremental approach I was taking is blocked on an ICE but it's quite possible there's another approach that isn't.

I actually think we may be able to get rid of the lifetime and the cart without immediately deleting RcStruct as well (and only using RcStruct for upcasting), This would require getting rid of the cart before the lifetime so I think it's worth trying!

@Manishearth
Copy link
Member Author

Manishearth commented Nov 10, 2021

The steps would be:

  1. stop using Cart in RcStruct, use an erased cart instead
  2. Remove Cart and the DataMarker lifetime
  3. Remove the lifetime from RcStruct
  4. now try to remove RcStruct

@Manishearth
Copy link
Member Author

Manishearth commented Nov 10, 2021

I might take a crack at step 1, especially since it's purely internal

@Manishearth
Copy link
Member Author

Finished step 0: #1278

@Manishearth
Copy link
Member Author

Steps 1-2 were easier than I thought: #1279

@Manishearth
Copy link
Member Author

Step 3: #1297

@Manishearth
Copy link
Member Author

Step 4: #1299

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-data-infra Component: provider, datagen, fallback, adapters help wanted Issue needs an assignee S-medium Size: Less than a week (larger bug fix or enhancement) T-techdebt Type: ICU4X code health and tech debt
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants