Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Field serialization proxy via newtypes #198

Closed
wants to merge 4 commits into from
Closed

Conversation

arcnmx
Copy link

@arcnmx arcnmx commented Jan 1, 2016

This change adds two new attributes: serializer and deserializer. They can be used to specify a newtype that performs {de,}serialization functionality without actually making it part of your struct type definition.

Mainly looking for design and implementation feedback. I did this pretty quickly and haven't tested it much with real-life workloads yet. I may have missed some cases (enum variants? tuples?).

serializer

An expression that resolves to a function or newtype that, when called, produces a value to be serialized. The prototype is roughly for<T: serde::Serialize> FnOnce(&ValueType) -> T.

struct Serializer<'a>(&'a u8);

impl<'a> serde::Serialize for Serializer<'a> { ... }

#[derive(Serialize)]
struct S {
    #[serde(serializer = "Serializer")]
    v: u8,
}

deserializer

A name or path to a type that satisfies T: serde::Deserialize + Into<ValueType>

struct Deserializer(u8);

impl serde::Deserialize for Deserializer { ... }
impl Into<u8> for Deserializer { ... }

#[derive(Deserialize)]
struct S {
    #[serde(deserializer = "Deserializer")]
    v: u8,
}

@oli-obk
Copy link
Member

oli-obk commented Jan 2, 2016

Very interesting concept. Maybe something like this could be used to implement the various feature requests like skipping fields. But that's for a future PR. It would certainly stop the current attribute inflation and simplify the design of the current attributes. One potential flaw with your current code might be generics.

@arcnmx
Copy link
Author

arcnmx commented Jan 2, 2016

The main motivation here was that I'm constantly making two or three copies of structures:

  1. real public-facing version without the implementation details of newtypes for serde
    • This can be mitigated by modifying upstream libraries (or serde itself), but that doesn't work all the time, or when there are different domain-specific ways to serialize something, etc.
  2. A version with the newtypes defined inside Deserialize::deserialize and then mapping between the two.
  3. Sometimes another newtype-riddled version for Serialize with all members being references instead to avoid copies.
    • This can be sort of remedied by making 2. CoW-like.

@oli-obk

Maybe something like this could be used to implement the various feature requests like skipping fields.

While the current serde design doesn't really allow for this, I saw a "pre-mapkey-serialize-check-hook" kind of proposal that would certainly make that work; you could then implement skip_serializing_if_empty in a more generic way with this! default already can be done, I believe?

One potential flaw with your current code might be generics.

Hm, what about generics? serializer and deserializer / Into<T> should work alright due to inference, and you can use deserializer = "Deserializer::<u8>" or whatever is necessary if the type itself is generic.

@arcnmx
Copy link
Author

arcnmx commented Jan 3, 2016

Fun fact: For serializer's value, since it's an expression, you can provide as complex logic as you want. You can even reference self* or other unrelated fields of the struct to determine how to serialize something.

* actually self.value due to the hidden visitor struct. If we want to actually officially support this, we should probably do an ident replacement over the provided expression to hide that implementation detail.

I mean just look at this fine and very readable exemplar that actually works:

#[derive(Serialize)]
struct S {
    a: u8,
    #[serde(serializer = "|b: &u8| self.value.a + *b")]
    b: u8,
}

Note: while I think this could be potentially useful in certain scenarios, it needs more design and consideration. I'd suggest documenting that only an ident or path is allowed in this field, and anything else is not guaranteed to be compatible with later versions until a design has been finalized.

@erickt
Copy link
Member

erickt commented Jan 6, 2016

Interesting! This is a neat idea. I've long wanted to have #[serde(default = $expr)] for those cases where there isn't a Default impl for a field. This would cut down a lot on all those excessive newtypes.

Another option would be to use functions instead of types, as in:

struct Foo {
    #[serde(serializer = "some_ser_fn")]
    #[serde(deserializer = "some_de_fn")]
    x: u8,
}

And we'd codegen calls out to functions that had this interface:

fn some_ser_fn<T>(ser: &mut T, value: u8) -> Result<(), T::Error>
where T: Serializer {
    ...
}

fn some_de_fn<T>(de: &mut T) -> Result<u8, T::Error>
where T: Deserializer {
    ...
}

Would that make thing simpler, or more complicated? That would mean that you wouldn't have to implement Serialize/Deserialize, but if you did, then you'd have to hop through one more hoop.

I'm not sure if this could implement "skip_if_empty" though. That check needs to happen before we serialize the field name, but the obvious implementation of this we would have already serialized the field name at this point.

@arcnmx
Copy link
Author

arcnmx commented Jan 6, 2016

@erickt

Would that make thing simpler, or more complicated? That would mean that you wouldn't have to implement Serialize/Deserialize, but if you did, then you'd have to hop through one more hoop.

Hard to say. I considered it, but don't remember why I went with this approach. Let's break it down...

  • serializer: I want to say this is complicated because you're passing a value along to serialize_struct_elt() that takes a T: Serialize? Is it possible to do some extra codegen magic to get this to work? It might be possible to make struct SomeContainer<S: Serializer, F: FnOnce(&ValueType, &mut S) -> etc>(F) impl Serialize or something? I feel like the generic S gets in the way of that...
  • deserializer: This one's not a problem. I think that might be the more flexible approach since it would allow you to provide additional data like #[serde(deserializer = "DefaultDeserializer(false).deserialize")]. We could maybe(?) use a new trait instead to avoid having to write .deserialize? If only we could manually impl Fn on stable...

This is all very confusing to think about...

I'm not sure if this could implement "skip_if_empty" though. That check needs to happen before we serialize the field name, but the obvious implementation of this we would have already serialized the field name at this point.

Yeah... I saw previous discussion about adjusting the traits to allow this in an issue somewhere, but that's a bit out of scope of this PR. It would be very nice though if possible! This may be one of those examples where using Serialize rather than a function is more future-proof if more methods were to be added to the trait.

I've long wanted to have #[serde(default = $expr)] for those cases where there isn't a Default impl for a field.

While I'd like to make this possible via this PR's deserializer attribute, this also wouldn't be very hard to add on its own. Personally though I've found the "serializing-things-that-don't-natively-impl-Serializer-properly" to be a much more common cause of newtypes than "wrong-Default".

@oli-obk
Copy link
Member

oli-obk commented Jan 11, 2016

I'm not sure if this could implement "skip_if_empty" though. That check needs to happen before we serialize the field name, but the obvious implementation of this we would have already serialized the field name at this point.

Well... if there's a method on the value that simply returns true or false depending on whether it should be serialized, then this method could be checked before serializing the field name.

While I'd like to make this possible via this PR's deserializer attribute, this also wouldn't be very hard to add on its own.

This might be a bit too magical, but the default could automatically be taken if the Deserializer type implements Default. Since the Deserializer type should only exist for the newtype deserialization scheme, and not for actual usage this wouldn't really be suprising though (it would be nice to have a check in the future checking if the Deserializer type is private).

@erickt
Copy link
Member

erickt commented Jan 11, 2016

@oli-obk: Another option would be when serializing fields, the Serializer passes an outer closure to the Serialize impl. If it decides to serialize the value, then it passes an inner closure to outer one. The outer one actually serializes the field name, then calls the inner one to do the actual serialization. Not calling the closure is equivalent to not serializing the key-value pair.

@oli-obk
Copy link
Member

oli-obk commented Jan 13, 2016

sounds good.

So we still have the open question on whether to use newtypes or free functions. I guess free function have less boilerplate while newtypes have a better transition strategy for people already using newtypes in their structs and they also are more concise.

Since it's already implemented for newtypes, I don't see why it should be changed without a strong argument for free functions.

@target-san
Copy link

@arcnmx We might possibly avoid COWs and have single "proxy" type, assuming that if we have struct Proxy(FieldType), then mem::transmute(&field) as &Proxy won't produce hiccups. What do you think?

@kstep
Copy link

kstep commented Jan 28, 2016

I often implement APIs to different JSON-based HTTP services, and I often have a situation when physical type doesn't match logical one. E.g. number (u8) can be used instead of boolean (real case), so I have to leave u8 fields in my structure (and it's not very ergonomic to work with it, but it's easy to implement (de)serialize traits for it with #[derive(Deserialize)]), or make it bool (in which case I have to implement Deserialize manually, just to decode a couple of integer fields into booleans, and it's tedious to do it).

This PR would help me very much.

@erickt
Copy link
Member

erickt commented Jan 28, 2016

@target-san: I'd strongly prefer to avoid using unsafe code if possible. I'd hate for serde to introduce potential memory bugs in difficult-to-audit code generated code.

@arcnmx
Copy link
Author

arcnmx commented Jan 28, 2016

@target-san what @erickt said. We could create a trait like:

trait Proxy<T> {
fn from(t: T) -> Self;
fn from_ref(t: &T) -> &Self;
fn into(self) -> T;
}

for it and whatever else is needed though.

I need to circle 'round and think on the best interface for all of this again soon... Need something solid and nice to use to propose.

@target-san
Copy link

@erickt Surely this looks like a hack. Though, if we know that newtype has the same memory layout as its wrapped type (and this is almost 100%, due to Rust's zero overhead policy), I personally don't see big problem here. First, only newtype (i.e. tuple struct with 1 element) can be used as proxy. Second, no one prohibits us from adding small utility function which will perform transmute under the hood. At least, it's worth trying.

@arcnmx Trait might be the solution, but it will hammer down ergonomics. And some proxies might have only one way to do the trick. Though, we can play on the fact that newtype declaration also creates x -> y function which creates newtype instance from nested type's value.

So I assume we have basically two ways here

  1. Free functions of specific signature

    • easy to write
    • cannot support additional things like Empty trait from another PR
  2. Proxy newtype

    • can support addiitonal helper traits, like Empty

    2a. Bare newtype

    • easy to write for client
    • might require reference transmutes to provide smooth support

    2b. Additional proxy trait

    • consistent, possibly no unsafe code
    • might require splitting in two traits to support only 'ser' or 'de' way
    • additional boilerplate

@erickt
Copy link
Member

erickt commented Feb 21, 2016

This was alternatively implemented in #238. Thanks though!

@erickt erickt closed this Feb 21, 2016
rubdos pushed a commit to rubdos/serde that referenced this pull request Jun 20, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

5 participants