-
Notifications
You must be signed in to change notification settings - Fork 373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement support for nullable unions using a "virtual arm" approach #2684
Labels
Comments
teh-cmc
changed the title
Implement support for nullable unions using a "virtual branch" approach
Implement support for nullable unions using a "virtual arm" approach
Jul 17, 2023
3 tasks
The C++ implementation of unions (added in #2707) already has an |
teh-cmc
added a commit
that referenced
this issue
Jul 19, 2023
**Commit by commit!** This PR implements support for top-level nullable Arrow unions using the "virtual arm" strategy. Specifically, all unions derived from Flatbuffers definitions now have an auto-generated arm at type index `0` of type `DataType::Null`. The data values in this arm are irrelevant: the deserialization code will treat the union as a whole as `null` any time the type buffer indicates a value of `0`. I've decided to make the auto-generated arm visible in the auto-generated `DataType` definition because... why wouldn't we? It makes it all much less magical and acts as a reminder that this thing is indeed here. The choice of using type index `0` rather than e.g. `nb_arms` is to make things A) more stable and B) forwards compatible. It cannot surprise end users in any way since user-defined types defined in Flatbuffers would just do the right thing, while user-defined types in raw Arrow don't even go through our (de)serialization logic but rather through theirs. In addition, this PR greatly improves error handling in the auto-generated code, giving the end-user very specific context and a context-correct backtrace when something goes wrong. Fixes #2684 ### What ### Checklist * [x] I have read and agree to [Contributor Guide](https://github.com/rerun-io/rerun/blob/main/CONTRIBUTING.md) and the [Code of Conduct](https://github.com/rerun-io/rerun/blob/main/CODE_OF_CONDUCT.md) * [x] I've included a screenshot or gif (if applicable) * [x] I have tested [demo.rerun.io](https://demo.rerun.io/pr/2708) (if applicable) - [PR Build Summary](https://build.rerun.io/pr/2708) - [Docs preview](https://rerun.io/preview/pr%3Acmc%2Frust_virtual_union_arms/docs) - [Examples preview](https://rerun.io/preview/pr%3Acmc%2Frust_virtual_union_arms/examples)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Recap
For some reason unknown to me, contrary to every other Arrow types, Arrow unions don't have top-level bitmaps.
This means that they cannot express their own nullability, as opposed to the respective nullabilities of their children fields.
I.e. you can express this:
but not this:
To work around the issue (again, I have no clue as to why it has to be an issue to begin with), the spec suggest piggybacking on the nullability on an arbitrary picked children to determine the nullability of the enum as a whole.
See e.g. this example taken from the spec:
Not only is this inconsistent and really annoying to deal with on the (de)serialization paths, this is also ambiguous (how do I differentiate between a null enum vs. a null float in the first array in the example above?).
Worse: how could you piggyback on one of your children's nullability if none of your children are nullable in the first place? A situation that arises all over the place in our component definitions.
arrow2-convert
's answer to the issue is, as far as I can tell, to put nullable data into something that is otherwise advertised as non-nullable and patch things up as they go in and out.E.g. here's how a nullable
Rotation3D
is serialized today:Here we can see a
Quaternion
, which is really a non-nullableFixedSizeList(f32, 4)
, being serialized as a null value anyhow so the outer union can be tracked as null in and of itself.This is very much related to #795.
Solution
Emil proposed another approach: use a virtual branch instead in contexts where the union is supposed to be nullable.
That virtual branch would only be used (and visible) when serializing data in and out, and its values would always represent nulls.
The text was updated successfully, but these errors were encountered: