Skip to content

Read delete files in parallel. #3118

@Reo-LEI

Description

@Reo-LEI

In upsert/cdc case, we usually will get a lot of pos-delete and eq-delete files. When we read/rewrite data from the v2 table, DeleteFilter will open all referenced pos-delete files and eq-delete files for each data file to construct the posDeleteSet and eqDeleteSet.

Currently, that all work will handled by same thread for each CombinedScanTask and all delete files are read serially, that is mean iceberg read a delete file must wait for the last file to be read and DeleteFilter will take a lot of time to open and read delete files.
I think DeleteFilter should read delete files in parallel when construct the posDeleteSet and eqDeleteSet to speed up reading v2 table.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions