Skip to content

[Java] Type-ids in UnionVector are erroneously coupled to the Arrow types of the underlying vectors #108

@jarohen

Description

@jarohen

Describe the bug, including details regarding any error messages, version, and platform.

re: https://lists.apache.org/thread/z89xlvw7v1rwq89gknflhsj3c65x20kd

It seems that the UnionVector implementation (particularly initializeChildrenFromFields (#29848), getVector, getVectorByType, setSafe etc) assumes that the type-id is always based on the ArrowType, but the Schema.fbs spec is more lenient - users have the choice to use whatever type-ids they require.

For example, in XTDB, we're trying to represent an algebraic data type (ADT) of 'put', 'delete' and 'erase' events as a sparse union. Delete and erase have the same type, so UnionVector currently expects them to be the same type-id (whereas, in DenseUnionVector, we can use type-ids 0, 1 and 2).

Would there be an appetite for (potentially relatively significant) changes to UnionVector to make it behave this way? We could perhaps consider bringing it more in line with DenseUnionVector which seems closer to the spec. Would be happy to work on it if so.

Cheers,

James/Finn (@FiV0)

Component(s)

Java

Metadata

Metadata

Assignees

Labels

help wantedExtra attention is needed

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions