Field serialization proxy via newtypes #198

arcnmx · 2016-01-01T00:03:54Z

This change adds two new attributes: serializer and deserializer. They can be used to specify a newtype that performs {de,}serialization functionality without actually making it part of your struct type definition.

Mainly looking for design and implementation feedback. I did this pretty quickly and haven't tested it much with real-life workloads yet. I may have missed some cases (enum variants? tuples?).

serializer

An expression that resolves to a function or newtype that, when called, produces a value to be serialized. The prototype is roughly for<T: serde::Serialize> FnOnce(&ValueType) -> T.

struct Serializer<'a>(&'a u8);

impl<'a> serde::Serialize for Serializer<'a> { ... }

#[derive(Serialize)]
struct S {
    #[serde(serializer = "Serializer")]
    v: u8,
}

deserializer

A name or path to a type that satisfies T: serde::Deserialize + Into<ValueType>

struct Deserializer(u8);

impl serde::Deserialize for Deserializer { ... }
impl Into<u8> for Deserializer { ... }

#[derive(Deserialize)]
struct S {
    #[serde(deserializer = "Deserializer")]
    v: u8,
}

oli-obk · 2016-01-02T13:58:56Z

Very interesting concept. Maybe something like this could be used to implement the various feature requests like skipping fields. But that's for a future PR. It would certainly stop the current attribute inflation and simplify the design of the current attributes. One potential flaw with your current code might be generics.

arcnmx · 2016-01-02T23:56:42Z

The main motivation here was that I'm constantly making two or three copies of structures:

real public-facing version without the implementation details of newtypes for serde
- This can be mitigated by modifying upstream libraries (or serde itself), but that doesn't work all the time, or when there are different domain-specific ways to serialize something, etc.
A version with the newtypes defined inside Deserialize::deserialize and then mapping between the two.
Sometimes another newtype-riddled version for Serialize with all members being references instead to avoid copies.
- This can be sort of remedied by making 2. CoW-like.

@oli-obk

Maybe something like this could be used to implement the various feature requests like skipping fields.

While the current serde design doesn't really allow for this, I saw a "pre-mapkey-serialize-check-hook" kind of proposal that would certainly make that work; you could then implement skip_serializing_if_empty in a more generic way with this! default already can be done, I believe?

One potential flaw with your current code might be generics.

Hm, what about generics? serializer and deserializer / Into<T> should work alright due to inference, and you can use deserializer = "Deserializer::<u8>" or whatever is necessary if the type itself is generic.

arcnmx · 2016-01-03T01:16:35Z

Fun fact: For serializer's value, since it's an expression, you can provide as complex logic as you want. You can even reference self* or other unrelated fields of the struct to determine how to serialize something.

* actually self.value due to the hidden visitor struct. If we want to actually officially support this, we should probably do an ident replacement over the provided expression to hide that implementation detail.

I mean just look at this fine and very readable exemplar that actually works:

#[derive(Serialize)]
struct S {
    a: u8,
    #[serde(serializer = "|b: &u8| self.value.a + *b")]
    b: u8,
}

Note: while I think this could be potentially useful in certain scenarios, it needs more design and consideration. I'd suggest documenting that only an ident or path is allowed in this field, and anything else is not guaranteed to be compatible with later versions until a design has been finalized.

erickt · 2016-01-06T20:42:08Z

Interesting! This is a neat idea. I've long wanted to have #[serde(default = $expr)] for those cases where there isn't a Default impl for a field. This would cut down a lot on all those excessive newtypes.

Another option would be to use functions instead of types, as in:

struct Foo {
    #[serde(serializer = "some_ser_fn")]
    #[serde(deserializer = "some_de_fn")]
    x: u8,
}

And we'd codegen calls out to functions that had this interface:

fn some_ser_fn<T>(ser: &mut T, value: u8) -> Result<(), T::Error>
where T: Serializer {
    ...
}

fn some_de_fn<T>(de: &mut T) -> Result<u8, T::Error>
where T: Deserializer {
    ...
}

Would that make thing simpler, or more complicated? That would mean that you wouldn't have to implement Serialize/Deserialize, but if you did, then you'd have to hop through one more hoop.

I'm not sure if this could implement "skip_if_empty" though. That check needs to happen before we serialize the field name, but the obvious implementation of this we would have already serialized the field name at this point.

arcnmx · 2016-01-06T21:54:14Z

@erickt

Would that make thing simpler, or more complicated? That would mean that you wouldn't have to implement Serialize/Deserialize, but if you did, then you'd have to hop through one more hoop.

Hard to say. I considered it, but don't remember why I went with this approach. Let's break it down...

serializer: I want to say this is complicated because you're passing a value along to serialize_struct_elt() that takes a T: Serialize? Is it possible to do some extra codegen magic to get this to work? It might be possible to make struct SomeContainer<S: Serializer, F: FnOnce(&ValueType, &mut S) -> etc>(F) impl Serialize or something? I feel like the generic S gets in the way of that...
deserializer: This one's not a problem. I think that might be the more flexible approach since it would allow you to provide additional data like #[serde(deserializer = "DefaultDeserializer(false).deserialize")]. We could maybe(?) use a new trait instead to avoid having to write .deserialize? If only we could manually impl Fn on stable...

This is all very confusing to think about...

I'm not sure if this could implement "skip_if_empty" though. That check needs to happen before we serialize the field name, but the obvious implementation of this we would have already serialized the field name at this point.

Yeah... I saw previous discussion about adjusting the traits to allow this in an issue somewhere, but that's a bit out of scope of this PR. It would be very nice though if possible! This may be one of those examples where using Serialize rather than a function is more future-proof if more methods were to be added to the trait.

I've long wanted to have #[serde(default = $expr)] for those cases where there isn't a Default impl for a field.

While I'd like to make this possible via this PR's deserializer attribute, this also wouldn't be very hard to add on its own. Personally though I've found the "serializing-things-that-don't-natively-impl-Serializer-properly" to be a much more common cause of newtypes than "wrong-Default".

oli-obk · 2016-01-11T14:46:31Z

I'm not sure if this could implement "skip_if_empty" though. That check needs to happen before we serialize the field name, but the obvious implementation of this we would have already serialized the field name at this point.

Well... if there's a method on the value that simply returns true or false depending on whether it should be serialized, then this method could be checked before serializing the field name.

While I'd like to make this possible via this PR's deserializer attribute, this also wouldn't be very hard to add on its own.

This might be a bit too magical, but the default could automatically be taken if the Deserializer type implements Default. Since the Deserializer type should only exist for the newtype deserialization scheme, and not for actual usage this wouldn't really be suprising though (it would be nice to have a check in the future checking if the Deserializer type is private).

erickt · 2016-01-11T17:53:20Z

@oli-obk: Another option would be when serializing fields, the Serializer passes an outer closure to the Serialize impl. If it decides to serialize the value, then it passes an inner closure to outer one. The outer one actually serializes the field name, then calls the inner one to do the actual serialization. Not calling the closure is equivalent to not serializing the key-value pair.

oli-obk · 2016-01-13T14:50:15Z

sounds good.

So we still have the open question on whether to use newtypes or free functions. I guess free function have less boilerplate while newtypes have a better transition strategy for people already using newtypes in their structs and they also are more concise.

Since it's already implemented for newtypes, I don't see why it should be changed without a strong argument for free functions.

target-san · 2016-01-28T12:49:04Z

@arcnmx We might possibly avoid COWs and have single "proxy" type, assuming that if we have struct Proxy(FieldType), then mem::transmute(&field) as &Proxy won't produce hiccups. What do you think?

kstep · 2016-01-28T13:19:34Z

I often implement APIs to different JSON-based HTTP services, and I often have a situation when physical type doesn't match logical one. E.g. number (u8) can be used instead of boolean (real case), so I have to leave u8 fields in my structure (and it's not very ergonomic to work with it, but it's easy to implement (de)serialize traits for it with #[derive(Deserialize)]), or make it bool (in which case I have to implement Deserialize manually, just to decode a couple of integer fields into booleans, and it's tedious to do it).

This PR would help me very much.

erickt · 2016-01-28T17:21:34Z

@target-san: I'd strongly prefer to avoid using unsafe code if possible. I'd hate for serde to introduce potential memory bugs in difficult-to-audit code generated code.

arcnmx · 2016-01-28T17:27:23Z

@target-san what @erickt said. We could create a trait like:

trait Proxy<T> {
fn from(t: T) -> Self;
fn from_ref(t: &T) -> &Self;
fn into(self) -> T;
}

for it and whatever else is needed though.

I need to circle 'round and think on the best interface for all of this again soon... Need something solid and nice to use to propose.

target-san · 2016-01-28T23:13:44Z

@erickt Surely this looks like a hack. Though, if we know that newtype has the same memory layout as its wrapped type (and this is almost 100%, due to Rust's zero overhead policy), I personally don't see big problem here. First, only newtype (i.e. tuple struct with 1 element) can be used as proxy. Second, no one prohibits us from adding small utility function which will perform transmute under the hood. At least, it's worth trying.

@arcnmx Trait might be the solution, but it will hammer down ergonomics. And some proxies might have only one way to do the trick. Though, we can play on the fact that newtype declaration also creates x -> y function which creates newtype instance from nested type's value.

So I assume we have basically two ways here

Free functions of specific signature
- easy to write
- cannot support additional things like Empty trait from another PR
Proxy newtype
- can support addiitonal helper traits, like Empty
2a. Bare newtype
- easy to write for client
- might require reference transmutes to provide smooth support
2b. Additional proxy trait
- consistent, possibly no unsafe code
- might require splitting in two traits to support only 'ser' or 'de' way
- additional boilerplate

erickt · 2016-02-21T23:39:20Z

This was alternatively implemented in #238. Thanks though!

Fixes serde-rs#198

arcnmx added 2 commits December 31, 2015 18:53

fix(cargo): Bump syntex versions

381c920

feat(codegen): Field serialization proxy newtypes

0652789

Refer to Into<> via absolute path

69f51ed

arcnmx force-pushed the newtype branch from 232c58c to 4651c95 Compare January 3, 2016 09:10

Support references to self in serializer attr

4651c95

erickt mentioned this pull request Jan 19, 2016

Consider adding support for #[serde(default = $expr)] #216

Closed

erickt added the RFC label Jan 19, 2016

JohnHeitmann mentioned this pull request Jan 24, 2016

Consider changing Ipv*Addr serialization to a tuple of integers from a string #181

Closed

erickt mentioned this pull request Jan 28, 2016

Path serialization can panic #57

Closed

kstep mentioned this pull request Feb 2, 2016

Some response fields should have more semantic types kstep/vkrs#37

Open

This was referenced Feb 16, 2016

Add codegen expr magic for default, skip serializing, and alternative serializers #237

Closed

Add codegen expr magic for default, skip serializing, and alternative serializers #238

Merged

erickt closed this Feb 21, 2016

rubdos pushed a commit to rubdos/serde that referenced this pull request Jun 20, 2017

Implement Default for Complex

c7c974e

Fixes serde-rs#198

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Field serialization proxy via newtypes #198

Field serialization proxy via newtypes #198

arcnmx commented Jan 1, 2016

oli-obk commented Jan 2, 2016

arcnmx commented Jan 2, 2016

arcnmx commented Jan 3, 2016

erickt commented Jan 6, 2016

arcnmx commented Jan 6, 2016

oli-obk commented Jan 11, 2016

erickt commented Jan 11, 2016

oli-obk commented Jan 13, 2016

target-san commented Jan 28, 2016

kstep commented Jan 28, 2016

erickt commented Jan 28, 2016

arcnmx commented Jan 28, 2016

target-san commented Jan 28, 2016

erickt commented Feb 21, 2016

Field serialization proxy via newtypes #198

Field serialization proxy via newtypes #198

Conversation

arcnmx commented Jan 1, 2016

serializer

deserializer

oli-obk commented Jan 2, 2016

arcnmx commented Jan 2, 2016

arcnmx commented Jan 3, 2016

erickt commented Jan 6, 2016

arcnmx commented Jan 6, 2016

oli-obk commented Jan 11, 2016

erickt commented Jan 11, 2016

oli-obk commented Jan 13, 2016

target-san commented Jan 28, 2016

kstep commented Jan 28, 2016

erickt commented Jan 28, 2016

arcnmx commented Jan 28, 2016

target-san commented Jan 28, 2016

erickt commented Feb 21, 2016