Replies: 1 comment 1 reply
-
one thing I'll note since it comes up whenever "binary" and "bsn" are in the same sentence is that there are two kinds of binary data:
We currently don't have an answer for item 2 (embedded or separate), and this issue is discussing item 1, but I bring it up because we also don't know what a ".bsn" text format file looks like yet either which would still need the |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I wanted to throw out some ideas on possible binary encodings for BSN. I expect to get a ton of objections, but at least it will start a discussion.
Suggested file extension is ".bsb" for "Bevy Scene Binary". I checked several websites that list common file extensions, and "bsb" did not appear in any of them.
Goals
There is a trade-off between compactness and ease of decoding: the smallest formats will be more compressed, and require more complex decoding of things like integer arrays. From a performance standpoint, a smaller format means you save on i/o bandwidth but spend more CPU unpacking.
In this proposal I am going to prioritize compactness, but I expect that some readers will have a different opinion. I'm open to discussion on this.
Schema evolution: this simply means that we tolerate missing fields, or extra names during deserialization. If I delete the
bar
field in my code, and then attempt to read an older file that still hasbar
it will deserialize OK. Similarly, if I add abar
field where none existed before, it will be set to the default. (This is standard behavior for serde.)One other goal I want to mention which is not about the format itself but about the implementation: it should be possible to deserialize into different in-memory representations. If I want to write a Python script that manipulates .bsb files, I should be able to deserialize it into Python objects. I don't think this will be too hard to do. The reason I mention this is because the data representation in the Bevy editor might be different than the data representation in the final game.
Prior Art
None of these formats are ideal, although they may be "good enough". Here's some potential weaknesses:
Foo
type with a fieldbar
, then the stringbar
occurs 10 times within the stream.Name Tables
Instead of repeating the field names inline, I propose that the file have a prologue section containing all the names. There will be separate tables for type names and field names. The reason for making them separate is that we can then do all the reflection lookups in a batch: that is, when we load the type names table we can look up each type name in the reflection registry and assign an index to each reflected type info. These types will then be referred to by index for the rest of the file. This saves us from having to look up the name for every instance of the struct.
The order of the names will be arranged so that the most frequently-used names will have the lowest numbered indices. This is because in our encoding scheme, smaller integers take less space.
This name table scheme I've outlined is the main reason for not using something like MesagePack or CBOR; if we decide that repeating names is fine, then there's little reason not to adopt one of the pre-existing formats. (I've used MessagePack + Serde with Bevy before, it works fine.)
Varints
For integers, I propose encoding similar to either MessagePack or Thrift (using ZigZag encoding):
For struct fields, we can also store some fields in an even more compact form, where both the field type and value can be packed into a single byte.
Floats
Unfortunately, variable-length encoding does not work for floats. Instead we just store them in the canonical IEEE format.
Structs
Structs consist of a type id followed by a variable number of fields. Each field consists of (field index, type, value), however for some things (like boolean true and false, or integers less than 3 bits) the value can be included in the type.
Look at the Thrift encoding spec for ideas here: https://github.com/apache/thrift/blob/master/doc/specs/thrift-compact-protocol.md
Note that deserialization of a struct does not directly produce an instance of that type, instead it produces a template or patch (refer to BSN semantics) for that type.
Arrays
Because we're going to be storing meshes, we'll need lots of arrays.
Entities
Entities are stored as an array of structs, where each struct represents a component.
Relations
Relations can be encoded as components: so for example
children
will be encoded as theChildren
struct. TheChildOf
component will not be serialized.Other BSN-specific features like
#name
and so on.TBD
Embedded expressions / interpolations
This is not a feature of BSN today, but it's something we might want to have eventually. This can be serialized as an expression tree (unless we want to parse expressions at runtime, which might not be the worst thing in the world.)
@cart
@alice-i-cecile
@ChristopherBiscardi
@andriyDev
Beta Was this translation helpful? Give feedback.
All reactions