Use arrow::MakeBuilder
to create arrow array builder that exactly match the desired datatype
#4280
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What
Previously, not knowing about
arrow::MakeBuilder
, we created the hierarchy of arrow array builders manually in a utility method dubbednew_arrow_array_builder
. While working on C FFI it turned out that this way we end up with an arrow array whose data type doesn't have the correct nullability settings - in other words, on creating the arrow array builders we would have needed to pass the exact data types. This is what I tried at first and it worked, but then I noticed that there's alreadyarrow::MakeBuilder
which internally uses a bunch of rolled out switch/case statements to create the hierarchy programmatically.It turns out that this in combination with the work done on switching from IPC to C FFI gives another significant speed-up. When tested in isolation however (i.e. still on IPC), the gains seem to be rather modest and within the (large!) measurement noise.
Checklist