Use a reference array, not a reference table#360
Closed
Conversation
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
If we don’t have references, the format has the nice property that you
can actually treat record fields as abstract things: You have their
lengths, so you can, for example, copy them to another record you send
on without looking at the content. Even though the type system doesn’t
(yet?) allow such polymorphic things, the encoding already does (so we
are not painting outself into a corner a lot), and even in the
monomorphic case it might be desirable to copy subexpressions without
deserialization. So this is nice.
The current encoding of references break that: You need to decode
subexpressions to know where table entries are, and possibly update them
as you copy the bytes. This makes the format non-compositional.
But it does not have to be that way: An alternative encoding produces
not an array of bytes and a table of references, but an array of bytes
and an array of references. This almost works: The only real change is
that the field length we introduce for subtyping/field omission now have
to be two numbers: One to indicate the length of the byte array and one
the length of the reference array of the subexpression.
The parser would just go through both arrays in parallel, pretty simple
actually.
If we are worried about the extra byte (but you said you are not ;-)),
maybe consider using a slightly shorter hash (62 bits? 60 bits?) and
maybe use two bits of the record field hash field to indicate a zero
length data resp. reference array.
Also note that a long vector of references would be represented more
compactly, and have a fixed size in
M(just the length).The other slight downside is that you can no longer share references: If
the same reference is used twice, it will appear twice in the reference
array. I am not worried about this: Our format does not support sharing
of values, so why should it support sharing references?
This is actually based on my experience with implementing two different
binary formats for ActorScript already: The first one was using a byte
array and a table, the second one uses two arrays. I think the latter is
neater.
BTW, my diff to IDL is optimized for least changes to existing lines,
not for most elegant presentation. If we agree that the meaning of this
change is good I will try to phrase it more clearly, without repeating
the rules for the recursive types.