Thoughts on a binary scene format #21233

viridia · 2025-09-26T20:43:16Z

viridia
Sep 26, 2025
Collaborator

I wanted to throw out some ideas on possible binary encodings for BSN. I expect to get a ton of objections, but at least it will start a discussion.

Suggested file extension is ".bsb" for "Bevy Scene Binary". I checked several websites that list common file extensions, and "bsb" did not appear in any of them.

Goals

A compact binary format
Decoding should be relatively easy and fast
Should be able to represent the full BSN data model, with the exception of embedded Rust code (although we may add interpolated expressions later).
Supports schema evolution: adding or removing a field from a struct shouldn't require re-encoding all your assets.

There is a trade-off between compactness and ease of decoding: the smallest formats will be more compressed, and require more complex decoding of things like integer arrays. From a performance standpoint, a smaller format means you save on i/o bandwidth but spend more CPU unpacking.

In this proposal I am going to prioritize compactness, but I expect that some readers will have a different opinion. I'm open to discussion on this.

Schema evolution: this simply means that we tolerate missing fields, or extra names during deserialization. If I delete the bar field in my code, and then attempt to read an older file that still has bar it will deserialize OK. Similarly, if I add a bar field where none existed before, it will be set to the default. (This is standard behavior for serde.)

One other goal I want to mention which is not about the format itself but about the implementation: it should be possible to deserialize into different in-memory representations. If I want to write a Python script that manipulates .bsb files, I should be able to deserialize it into Python objects. I don't think this will be too hard to do. The reason I mention this is because the data representation in the Bevy editor might be different than the data representation in the final game.

Prior Art

MessagePack
CBOR
Thrift
Google Protocol Buffers
Cap'n Proto
Flatbuffers

None of these formats are ideal, although they may be "good enough". Here's some potential weaknesses:

Protobufs require an IDL (Interface Definition Language) file which assigns indices to individual fields. This is a non-starter for us. Also, the semantics of protobufs doesn't really match ECS all that well.
Other formats like MessagePack store field names as strings, meaning that there's a lot of duplication of the same names within the stream. So for example if I have 10 instances of a Foo type with a field bar, then the string bar occurs 10 times within the stream.

Name Tables

Instead of repeating the field names inline, I propose that the file have a prologue section containing all the names. There will be separate tables for type names and field names. The reason for making them separate is that we can then do all the reflection lookups in a batch: that is, when we load the type names table we can look up each type name in the reflection registry and assign an index to each reflected type info. These types will then be referred to by index for the rest of the file. This saves us from having to look up the name for every instance of the struct.

The order of the names will be arranged so that the most frequently-used names will have the lowest numbered indices. This is because in our encoding scheme, smaller integers take less space.

This name table scheme I've outlined is the main reason for not using something like MesagePack or CBOR; if we decide that repeating names is fine, then there's little reason not to adopt one of the pre-existing formats. (I've used MessagePack + Serde with Bevy before, it works fine.)

Varints

For integers, I propose encoding similar to either MessagePack or Thrift (using ZigZag encoding):

Positive integers in the range 0..127 are stored as one byte
Negative integers in the range -1..32 are also stored as one byte
Integers of larger magnitude are stored in a variable number of bytes

For struct fields, we can also store some fields in an even more compact form, where both the field type and value can be packed into a single byte.

Floats

Unfortunately, variable-length encoding does not work for floats. Instead we just store them in the canonical IEEE format.

Structs

Structs consist of a type id followed by a variable number of fields. Each field consists of (field index, type, value), however for some things (like boolean true and false, or integers less than 3 bits) the value can be included in the type.

Look at the Thrift encoding spec for ideas here: https://github.com/apache/thrift/blob/master/doc/specs/thrift-compact-protocol.md

Note that deserialization of a struct does not directly produce an instance of that type, instead it produces a template or patch (refer to BSN semantics) for that type.

Arrays

Because we're going to be storing meshes, we'll need lots of arrays.

Arrays of floats are a varint followed by N floats (either f32 or f64).
There are several types of integer arrays: varint, i8, u8, i16, u16, etc.
Arrays of structs, etc.

Entities

Entities are stored as an array of structs, where each struct represents a component.

Relations

Relations can be encoded as components: so for example children will be encoded as the Children struct. The ChildOf component will not be serialized.

Other BSN-specific features like `#name` and so on.

TBD

Embedded expressions / interpolations

This is not a feature of BSN today, but it's something we might want to have eventually. This can be serialized as an expression tree (unless we want to parse expressions at runtime, which might not be the worst thing in the world.)

@cart
@alice-i-cecile
@ChristopherBiscardi
@andriyDev

ChristopherBiscardi · 2025-09-26T20:51:12Z

ChristopherBiscardi
Sep 26, 2025
Collaborator

one thing I'll note since it comes up whenever "binary" and "bsn" are in the same sentence is that there are two kinds of binary data:

bsn-as-a-binary-format
Mesh vertex data (or other binary data)

We currently don't have an answer for item 2 (embedded or separate), and this issue is discussing item 1, but I bring it up because we also don't know what a ".bsn" text format file looks like yet either which would still need the #2 answer as well

1 reply

viridia Sep 27, 2025
Collaborator Author

There are a couple of obvious choices:

Textual representations of floating-point values: [1.394834, 2.09284, ... etc.]. Not particularly efficient (although if you are willing to live with some precision loss you can make the number of digits smaller), but easy to read and edit (for some value of "easy").
Some sort of binhex encoding, which is effectively impossible to edit without special tooling.
Dedicated sibling assets that just contain arrays of numbers, accessed using relative paths: "./mesh0.msgpack". Again, hard to edit without special tooling.
A hybrid system in which meshes are stored in some native-ish file format (.gltf, .fbx, etc.) and patched in to the BSN hierarchy.

The first choice seems like the path of least resistance, if people are concerned about the cost of parsing they can switch to the all-binary format (ideally there'll be a command-line tool and libraries for conversion).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Thoughts on a binary scene format #21233

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Thoughts on a binary scene format #21233

Uh oh!

viridia Sep 26, 2025 Collaborator

Goals

Prior Art

Name Tables

Varints

Floats

Structs

Arrays

Entities

Relations

Other BSN-specific features like #name and so on.

Embedded expressions / interpolations

Replies: 1 comment · 1 reply

Uh oh!

ChristopherBiscardi Sep 26, 2025 Collaborator

Uh oh!

viridia Sep 27, 2025 Collaborator Author

viridia
Sep 26, 2025
Collaborator

Other BSN-specific features like `#name` and so on.

Replies: 1 comment 1 reply

ChristopherBiscardi
Sep 26, 2025
Collaborator

viridia Sep 27, 2025
Collaborator Author