Skip to content

Optimize the performance of MOR on Trino #5245

@shidayang

Description

@shidayang

I had done a chbenchmark of iceberg on Trino. I found that the performance of MOR is very low when have many delete files. The scale of data is 10 warehouse. The average duration is less than 10 second when no have delete files, but when I add some delete file to every tables some queries spent over one hour.

Reasons:

  1. Make predicates of delete only initialize once #5195 The Trino every page will call DeleteFilter#filter, every calling of DeleteFilter#filter will initialize delete files.
  2. Add StructLikeWrapperFactory to generate StructLikeWrapper #5244 Add InternalRecordWrapperFactory to generate InternalRecordWrapper #5242 We found that the cost of creating StructLikeWrapper and InternalRecordWrapper is high.
    this is Flame Graph:

image

The query performance improved when we made these optimizations. such as the query "select count(*) from stock", before optimize spent 8 minutes, after optimize only spent 20 seconds.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions