Skip to content

Use a reference array, not a reference table#360

Closed
nomeata wants to merge 1 commit intomasterfrom
joachim/idl-refarray
Closed

Use a reference array, not a reference table#360
nomeata wants to merge 1 commit intomasterfrom
joachim/idl-refarray

Conversation

@nomeata
Copy link
Contributor

@nomeata nomeata commented May 3, 2019

If we don’t have references, the format has the nice property that you
can actually treat record fields as abstract things: You have their
lengths, so you can, for example, copy them to another record you send
on without looking at the content. Even though the type system doesn’t
(yet?) allow such polymorphic things, the encoding already does (so we
are not painting outself into a corner a lot), and even in the
monomorphic case it might be desirable to copy subexpressions without
deserialization. So this is nice.

The current encoding of references break that: You need to decode
subexpressions to know where table entries are, and possibly update them
as you copy the bytes. This makes the format non-compositional.

But it does not have to be that way: An alternative encoding produces
not an array of bytes and a table of references, but an array of bytes
and an array of references. This almost works: The only real change is
that the field length we introduce for subtyping/field omission now have
to be two numbers: One to indicate the length of the byte array and one
the length of the reference array of the subexpression.

The parser would just go through both arrays in parallel, pretty simple
actually.

If we are worried about the extra byte (but you said you are not ;-)),
maybe consider using a slightly shorter hash (62 bits? 60 bits?) and
maybe use two bits of the record field hash field to indicate a zero
length data resp. reference array.

Also note that a long vector of references would be represented more
compactly, and have a fixed size in M (just the length).

The other slight downside is that you can no longer share references: If
the same reference is used twice, it will appear twice in the reference
array. I am not worried about this: Our format does not support sharing
of values, so why should it support sharing references?

This is actually based on my experience with implementing two different
binary formats for ActorScript already: The first one was using a byte
array and a table, the second one uses two arrays. I think the latter is
neater.

BTW, my diff to IDL is optimized for least changes to existing lines,
not for most elegant presentation. If we agree that the meaning of this
change is good I will try to phrase it more clearly, without repeating
the rules for the recursive types.

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants