Skip to content

Nullability in schema is not considered in VectorLoader #648

@viirya

Description

@viirya

Describe the bug, including details regarding any error messages, version, and platform.

We hit an issue on using VectorLoader to load some Arrow vectors.

java.util.NoSuchElementException
        at java.base/java.util.ArrayList$Itr.next(ArrayList.java:970)
        at org.apache.arrow.vector.VectorLoader.loadBuffers(VectorLoader.java:104)
        at org.apache.arrow.vector.VectorLoader.load(VectorLoader.java:84)

The schema of the VectorSchemaRoot is Schema<_0: Utf8 not null>.
The field vector in the root is Utf8 type, not nullable. As it is Utf8 type, TypeLayout.getTypeBufferCount reports buffer count 3 for it.

The IPC ArrowRecordBatch message to load has one node: ArrowFieldNode [length=1500, nullCount=0], and two buffers:

buffer: ArrowBuf[...], address:....., capacity:..., ArrowBuf
buffer: ArrowBuf[...], address:....., capacity:..., ArrowBuf

So when VectorLoader.loadBuffers is trying to load buffers by iterating the buffer list, it assumes there are 3 buffers but actually there are only 2 buffers (null buffer doesn't exist). That's why it hits NoSuchElementException.

I think that an array that in the spec can contain a null bitmap may choose to not allocate the validity buffer (also see the doc). So the Utf8 array with 2 buffers is correct by the spec. The issue looks like that VectorLoader doesn't consider field nullability when loading buffers.

We uses Arrow Java 15.0.2 version. But as I just looked at the current code in this repo, looks like current TypeLayout has this issue still.

Metadata

Metadata

Assignees

Labels

Type: bugSomething isn't working

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions