Add Definition::Enum::tag_width field #215

mina86 · 2023-09-08T18:40:48Z

The field allows specifying how many bytes the discriminant byte
takes. This in turn allows for better support of custom encoding
formats with more than 256 variants and untagged unions.

    Enum {
        /// Width in bytes of the discriminant tag.
        ///
        /// Zero indicates this is an untagged union.  In standard borsh
        /// encoding this is one however custom encoding formats may use larger
        /// width if they need to encode more than 256 variants.
        ///
        /// Note: This definition must not be used if the tag is not encoded
        /// using little-endian format.
        tag_width: u8,

        /// Possible variants of the enumeration.
        variants: Vec<(VariantName, Declaration)>,
    },

relates to: #181

mina86 · 2023-09-09T00:35:43Z

Another idea is to introduce a Width type:

pub enum Width {
    NoWidth,
    LE8 = 1,
    LE16 = 2,
    LE24 = 3,
    LE32 = 4,
    LE40 = 5,
    LE48 = 6,
    LE56 = 7,
    LE64 = 8,
}

which would specify how the tag is encoded. That would leave possibility for extensions in the future.

dj8yfo · 2023-09-09T06:51:25Z

Offtopic:

    /// A fixed-size tuple with the length known at the compile time and the
    /// elements of different types.
    Tuple {
        /// Types of elements of the tuple.
        elements: Vec<Declaration>,
    },
    /// A structure, structurally similar to a tuple.
    Struct {
        /// Named fields of the structure with their types.
        ///
        /// Tuple-like structures are modelled as a `Tuple`.
        fields: Vec<(FieldName, Declaration)>,
    },

will be better than

    Tuple {
       /// Types of elements of the tuple.
       elements: Vec<(Option<FieldName>, Declaration)>,
   },

as the latter allows a mix of named and unnamed fields.

dj8yfo · 2023-09-09T14:45:28Z

Please update branch with new changes in trunk as well as (periodically) other prs.
Occasionally compile or other errors occur despite a pr being able to be merged cleanly.

The field allows specifying how many bytes the discriminant byte takes. This in turn allows for better support of custom encoding formats with more than 256 variants and untagged unions. Issue: near#181

dj8yfo · 2023-09-10T10:38:04Z

@mina86 the change to Definition, proposed in comment is a little bit inconsistent with respect to mixing information about serialized form and the rust syntax forms that serialized representation originated from.

Let's take Enum first as an example.
Current version in master contains this:

    Enum {
        variants: Vec<(VariantName, Declaration)>,
    },

This pr adds some info about serialized form:

    Enum {
        tag_width: u8,
        variants: Vec<(VariantName, Declaration)>,
    },

But, some info was skipped. With the addition of enum discriminants the enum tag in serialized form for VariantName cannot be computed just by its index in containing Vec<...>, so the full definition,
containing both full info about serialized form and VariantName-s, which are relevant only for hinting, what the original enum in rust looked like, would look like:

    Enum {
        tag_width: u8,
        variants: Vec<(u64, VariantName, Declaration)>,
    },

If it were supposed to only convey information about serialized form, than it would look like:

    Enum {
        tag_width: u8,
        variants: BTreeMap<u64, Declaration>,    // the order of variants, as they were declared in rust enum,  
                                                                          // is irrelevant to serialization
    },

Similarly, this change for Structs, proposed in the same comment

    /// A fixed-size tuple with the length known at the compile time and the
    /// elements of different types.
    Tuple {
        /// Types of elements of the tuple.
        elements: Vec<Declaration>,
    },
    /// A structure, structurally similar to a tuple.
    Struct {
        /// Named fields of the structure with their types.
        ///
        /// Tuple-like structures are modelled as a `Tuple`.
        fields: Vec<(FieldName, Declaration)>,
    },

contains some loss of information about originating forms in rust, as some Tuple-s originated from tuples in rust, and some - from tuple structs, that info is lost, while structs with named fields don't lose that kind of information.

If it were supposed to only convey information about serialized form, than it would look like this both for tuples and structs on rust side:

    Tuple { elements: Vec<Declaration> },

as the names of the fields are irrelevant.

A variant with full info (both serialization & origin) is what it looks like now:

    Tuple { elements: Vec<Declaration> },
    Struct { fields: Fields },
    ...
pub enum Fields {
    NamedFields(Vec<(FieldName, Declaration)>),
    UnnamedFields(Vec<Declaration>),
    Empty,
}

thus, i believe, to keep up the same logic about the Definition as a whole, it should be either:

// max info
pub enum Definition {
    Sequence {    // extends prev `Sequence` and replaces `Array`
        length_width: u8,
        min_length: u64,
        max_length: u64,
        elements: Declaration,
    },
    Tuple {    // stays the same
        elements: Vec<Declaration>,
    },
    Enum {  // extended with `tag_width` and `u64` added to variants tuple
        tag_width: u8,
        variants: Vec<(u64, VariantName, Declaration)>,
    },
    Struct { fields: Fields },  // stays the same
}

pub enum Fields {
    NamedFields(Vec<(FieldName, Declaration)>),
    UnnamedFields(Vec<Declaration>),
    Empty,
}

or

// only serialized form
pub enum Definition {
    Sequence {  // changes
        length_width: u8,
        min_length: u64,
        max_length: u64,
        elements: Declaration,
    },
    Tuple {  // stays the same, replaces `Struct`
        elements: Vec<Declaration>,
    },
    Enum {    // changes
        tag_width: u8,
        variants: BTreeMap<u64, Declaration>,    // the order of variants, as they were declared in rust enum,  
    }
}

@frol what do you think?

As there's also process of deserialization, which turns serialized bytes back into rust forms, i vote for variant with max info,
where adding u64 to Vec<(u64, VariantName, Declaration)> would be the most complex change.

On the other hand, if BorshSchema's job is to only concisely describe the format of output data, and the task of implementing actual conversions is delegated to BorshSerialize and BorshDeserialize traits, then the minimal variant would be better.

mina86 · 2023-09-10T13:40:08Z

@mina86 the change to Definition, proposed in comment is a little bit inconsistent with respect to mixing information about serialized form and the rust syntax forms that serialized representation originated from.

Note that one more piece of information one has to go on is declaration. Tuples are called Tuple<...> which can be used to distinguish a tuple from a struct. (Though as a side note I’d argue for replacing the declaration by (...) to not clash with someone creating a generic struct called Tuple). Even if it’s important to keep information about the rust types, I don’t think Fields is necessary for it. (I’d be happy removing all information not related to encoding but I don’t have strong feelings).

Lastly, Enum::tag_width is being added in both situations so can we get this PR merged regardless? ;)

frol · 2023-09-10T16:20:58Z

@dj8yfo I don't have full context about Borsh Schema, and I still believe there are more problems with it than it solves, so anything that kind of works is fine with me. I am fine with iterative approach and this PR only adds tag_width, so we don't need to block it.

dj8yfo · 2023-09-10T18:01:02Z

@mina86 with all of that articulated,

so can we get this PR merged regardless?

yes
i'd be happy to see

    Sequence {    // extends prev `Sequence` and replaces `Array`
        length_width: u8,
        min_length: u64,
        max_length: u64,
        elements: Declaration,
    },

in another pr.

If you also considered adding info about enum discriminants' values (in this or another pr), that would be consistent with current pr's drive for adding details about data layout:

    Enum {  // extended with `tag_width` and `u128` added to variants tuple
        tag_width: u8,
        variants: Vec<(u128, VariantName, Declaration)>,
    },

The change of this pr might be useful not in the context of having enums defined with more than 256 variants, but having enums defined with reasonable number of variants which have discriminants outside of u8 range.

Tuples are called Tuple<...>

Due to the clash of names, i think there's no need to introduce any change to Definition::{Struct, Tuple} , it looks more like a change for the sake of a change, not adding anything new.

mina86 · 2023-09-10T19:59:06Z

If you also considered adding info about enum discriminants' values (in this or another pr), that would be consistent with current pr's drive for adding details about data layout:

I can look into it, but I’m essentially doing all this in my spare time so I give no promises that I’ll add discriminants to the definition.

dj8yfo · 2023-09-11T11:49:35Z

it might actually be

    Enum {  // extended with `tag_width` and `i64` added to variants tuple
        tag_width: u8,
        variants: Vec<(i64, VariantName, Declaration)>,
    },

as rust allows isize , if BorshSchema were to mirror what rust specifies for discriminants

I give no promises that I’ll add discriminants to the definition

no, i just meant it would be a more meaningful investment of time for this repo than churning Struct/Tuple again

…st (i+1)

to make `BorshSchema` more congruent with `BorshSerialize, BorshDeserialize`

mina86 · 2023-09-11T18:09:50Z

no, i just meant it would be a more meaningful investment of time for this repo than churning Struct/Tuple again

Difference is I already have Struct/Tuple mostly written. And adding discriminant to Enum seems like a huge can of worms.

dj8yfo · 2023-09-11T18:55:41Z

Difference is I already have Struct/Tuple mostly written

well, then you should be ready it gets rejected if it doesn't look absolutely stunning )

dj8yfo · 2023-09-11T19:07:52Z

seems like a huge can of worms.

adding #[borsh(tag_type = "i16")] for derives would be,
just adding the discriminants to variants' tuples shouldn't be too hard

mina86 · 2023-09-11T19:09:08Z

I think I don’t understand GitHub… I’m trying to comment on your changes but my comment don’t seem to appear.

Firstly, why is actually dynamically-sized sequence of zero-sized encoding forbidden?

Secondly, rather than calling is_zero_size it’s probably better to check sz == 0 not to duplicate the work.

dj8yfo · 2023-09-11T19:31:05Z

I’m trying to comment on your changes but my comment don’t seem to appear.

there must be this button in upper right corner Finish your review

mina86 requested review from frol and dj8yfo as code owners September 8, 2023 18:40

mina86 force-pushed the e branch from edaf46f to d1abc97 Compare September 9, 2023 15:28

dj8yfo marked this pull request as draft September 9, 2023 17:08

mina86 force-pushed the e branch 3 times, most recently from 9d72f84 to f7d9c47 Compare September 9, 2023 17:53

Add Definition::Enum::tag_width field

0967d54

The field allows specifying how many bytes the discriminant byte takes. This in turn allows for better support of custom encoding formats with more than 256 variants and untagged unions. Issue: near#181

mina86 force-pushed the e branch from f7d9c47 to 0967d54 Compare September 9, 2023 21:41

mina86 marked this pull request as ready for review September 9, 2023 21:41

dj8yf0μl added 2 commits September 11, 2023 17:05

build: revert externing alloc crate when std is available; fix te…

f237c55

…st (i+1)

chore: add MaxSizeError::ZSTSequenceNotArray

2adf4ca

to make `BorshSchema` more congruent with `BorshSerialize, BorshDeserialize`

dj8yfo force-pushed the e branch from 63ea369 to ea05b81 Compare September 11, 2023 18:04

dj8yfo force-pushed the e branch from ea05b81 to dd1e79e Compare September 11, 2023 18:33

dj8yfo force-pushed the e branch 2 times, most recently from 668a885 to 2adf4ca Compare September 11, 2023 19:29

dj8yfo approved these changes Sep 11, 2023

View reviewed changes

dj8yfo merged commit 655119a into near:master Sep 11, 2023
7 checks passed

frol mentioned this pull request Sep 11, 2023

chore: release #217

Merged

mina86 deleted the e branch September 12, 2023 09:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Definition::Enum::tag_width field #215

Add Definition::Enum::tag_width field #215

mina86 commented Sep 8, 2023 •

edited by dj8yfo

Loading

mina86 commented Sep 9, 2023 •

edited

Loading

dj8yfo commented Sep 9, 2023 •

edited

Loading

dj8yfo commented Sep 9, 2023

dj8yfo commented Sep 10, 2023 •

edited

Loading

mina86 commented Sep 10, 2023

frol commented Sep 10, 2023

dj8yfo commented Sep 10, 2023

mina86 commented Sep 10, 2023

dj8yfo commented Sep 11, 2023

mina86 commented Sep 11, 2023

dj8yfo commented Sep 11, 2023

dj8yfo commented Sep 11, 2023 •

edited

Loading

mina86 commented Sep 11, 2023

dj8yfo commented Sep 11, 2023

Add Definition::Enum::tag_width field #215

Add Definition::Enum::tag_width field #215

Conversation

mina86 commented Sep 8, 2023 • edited by dj8yfo Loading

mina86 commented Sep 9, 2023 • edited Loading

dj8yfo commented Sep 9, 2023 • edited Loading

dj8yfo commented Sep 9, 2023

dj8yfo commented Sep 10, 2023 • edited Loading

mina86 commented Sep 10, 2023

frol commented Sep 10, 2023

dj8yfo commented Sep 10, 2023

mina86 commented Sep 10, 2023

dj8yfo commented Sep 11, 2023

mina86 commented Sep 11, 2023

dj8yfo commented Sep 11, 2023

dj8yfo commented Sep 11, 2023 • edited Loading

mina86 commented Sep 11, 2023

dj8yfo commented Sep 11, 2023

mina86 commented Sep 8, 2023 •

edited by dj8yfo

Loading

mina86 commented Sep 9, 2023 •

edited

Loading

dj8yfo commented Sep 9, 2023 •

edited

Loading

dj8yfo commented Sep 10, 2023 •

edited

Loading

dj8yfo commented Sep 11, 2023 •

edited

Loading