-
Notifications
You must be signed in to change notification settings - Fork 96
Description
Describe the bug, including details regarding any error messages, version, and platform.
We hit an issue on using VectorLoader to load some Arrow vectors.
java.util.NoSuchElementException
at java.base/java.util.ArrayList$Itr.next(ArrayList.java:970)
at org.apache.arrow.vector.VectorLoader.loadBuffers(VectorLoader.java:104)
at org.apache.arrow.vector.VectorLoader.load(VectorLoader.java:84)
The schema of the VectorSchemaRoot is Schema<_0: Utf8 not null>.
The field vector in the root is Utf8 type, not nullable. As it is Utf8 type, TypeLayout.getTypeBufferCount reports buffer count 3 for it.
The IPC ArrowRecordBatch message to load has one node: ArrowFieldNode [length=1500, nullCount=0], and two buffers:
buffer: ArrowBuf[...], address:....., capacity:..., ArrowBuf
buffer: ArrowBuf[...], address:....., capacity:..., ArrowBuf
So when VectorLoader.loadBuffers is trying to load buffers by iterating the buffer list, it assumes there are 3 buffers but actually there are only 2 buffers (null buffer doesn't exist). That's why it hits NoSuchElementException.
I think that an array that in the spec can contain a null bitmap may choose to not allocate the validity buffer (also see the doc). So the Utf8 array with 2 buffers is correct by the spec. The issue looks like that VectorLoader doesn't consider field nullability when loading buffers.
We uses Arrow Java 15.0.2 version. But as I just looked at the current code in this repo, looks like current TypeLayout has this issue still.