Skip to content

improve: reuse Arc<dyn Array> in parquet record batch reader. #4864

@RinChanNOWWW

Description

@RinChanNOWWW

Both in arrow_reader and async_reader, if there are predicates, the reader will first read arrays (wrapped in RecordBatch) to evaluate the predicates and get a row selection. And then the reader will use the row selection to output the final needed arrays.

If some arrays in the final output are contained in the prefetched arrays, we also have to deserialize them again. This is quite wasteful.

We should have a reasonable way to reuse the arrays.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementAny new improvement worthy of a entry in the changelog

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions