Improve performance for Equality Delete files in Iceberg connector#18397
Improve performance for Equality Delete files in Iceberg connector#18397jasonf20 wants to merge 3 commits intotrinodb:masterfrom
Conversation
|
Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla |
eb8be8c to
2de67b7
Compare
|
Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla |
2de67b7 to
7d32723
Compare
|
Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla |
7d32723 to
65c40fd
Compare
|
Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla |
|
@findinpath This PR is the fix for the equality deletes issue we discussed. We have another PR coming that should further improve for situations where there are a lot of deletes (>~200M). |
|
@jasonf20 does your contribution overlap with #17115 ? Note that there is in the above mentioned PR scaffolding for testing. Please use it here as well to ensure the validity of your changes.
Is this something we can verify as part of this PR (before & after your change) ? |
65c40fd to
0472932
Compare
|
Hi @findinpath It does seem like this PR overlaps with #17115. This PR should do the same map compaction that is done in that PR. However, it includes more optimizations on top of that including:
I think that it should suffice to merge only one of these PRs. If the other one is merged I will need to rebase and add the above improvements on top of the previous PR. I have added two tests based on the templates/suggestion I saw in the other PR. I was able to use the |
89ee06a to
16d859a
Compare
16d859a to
f230f95
Compare
|
@jasonf20 let's concentrate on landing first the PR #17115 which comes with less changes. |
|
@findinpath Happy to assist with the merging of #17115. Is there anything specifically there? Looking at the PR it seems most action items have been handled. Once that's merged I'll rebase on top of that with the remaining changes. Keep in mind that as it stands the other PR isn't enough for the use case of a large table with constant upserts to actually work since each split loads all the delete files sequentially. |
We stumbled on a situation related to using nested fields for equality deletes. The PR will be adding proper handling for such situations. |
2afae2c to
1d5e8da
Compare
|
@findinpath This PR is ready for review |
1d5e8da to
ff5861e
Compare
45f388f to
a6b9841
Compare
|
@jasonf20 IIUC this PR is caching deletion files. However, it seems that these can be cached per split rather than per query. |
The delete files are already read only once per split. The issue is that caching per task/query is required otherwise the delete files are read in each split leading to unusable performance. This approach was designed with @pettyjamesm as the cleanest way to cache at the right level. |
02122ea to
d3d0597
Compare
|
@jasonf20 Would it make sense to cache deletion files across queries too? |
|
It could be useful for performance when querying the same tables but the same can be said for data files. Perhaps a generic cache of this sort would be implemented in Trino at some point but it's probably a much more involved task, considering memory usage and such. For this PR specifically it would be interesting to consider caching at the query level (this PR does it at the Task Level). But for most queries caching at the task level should produce the same performance. We discussed query level caches and decided it can probably be built on top of this Task Level caching with reference counting or something like that in the future if needed. So the first step is to solve this at this level which should have the largest impact and potentially expand the caching scope later. The main motivation here is to make queries on tables with equality deletes complete in a reasonable amount of time. The current implementation doesn't really work with more than a couple equality delete files |
165bad5 to
5158db7
Compare
|
@dain @pettyjamesm Should the I don't think the use case here really has anything to close, it should all be cleaned up by GC, but if we are creating a stateful instance we might want it as part of the interface. |
5158db7 to
0bef97e
Compare
…ovider This abstraction enables connectors to implement page source provider factories that provide stateful instances of PageSourceProvider at the task level. This allows for things like task level caching across splits for example.
0bef97e to
b28035e
Compare
The goal of this commit is to improve the performance of queries that contain Eqaulity Delete files. Before this commit equality delete files were re-loaded for every split that needed them. Now, each delete file is loaded once per execution node. In addition, the equality deletes are stored in a single map (per delete schema) with the data sequence number in which the row was deleted. This allows us to merge rows that were deleted multiple times (common in upsert use cases) into a single entry instead of holding an entry per delete file. Changes in this commit: * Create a DeleteManager class to manage the above logic * Read every delete file only once * Merge deletes into single map * Opportunistic delete file load parallelization State Management: The `IcebergPageSourceProviderFactory` creates an `IcebergPageSourceProvider` per task and allows for reusing delete file information within the task. The `DeleteManager` is created only once and then loaded equality deletes for the task are cached until the end of the task. Tasks are per partition and typically most equality delete files in a partition apply to most data files so caching all the equality delete files is efficient.
Refactored some existing tests to extract shared code to util classes. This allows adding a new test suite for testing data file access opeartions, in addition to the existing suite testing metadata file access operations.
b28035e to
a8725ec
Compare
|
This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua |
|
@findinpath we'd love to get this PR merged. Are there any outstanding action items here? |
|
Replaced by #21441 |
Description
The goal of this PR is to improve the performance of queries that contain Equality Delete files.
Before this commit equality delete files were re-loaded for every split that needed them. Now, each delete file is loaded once per execution node.
In addition, the equality deletes are stored in a single map (per delete schema) with the data sequence number in which the row was deleted. This allows us to merge rows that were deleted multiple times (common in upsert use cases) into a single entry instead of holding an entry per delete file.
Additional context and related issues
Fixes #18396
Changes in this commit:
Release notes
( ) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
(x) Release notes are required, with the following suggested text: