Skip to content

Compact deserialized page in SingleStreamSpiller#16338

Merged
rschlussel merged 1 commit intoprestodb:masterfrom
viczhang861:spiller_compact
Jun 29, 2021
Merged

Compact deserialized page in SingleStreamSpiller#16338
rschlussel merged 1 commit intoprestodb:masterfrom
viczhang861:spiller_compact

Conversation

@viczhang861
Copy link
Contributor

@viczhang861 viczhang861 commented Jun 25, 2021

Deserialized pages may contain block that is not
compact (e.g.,VariableWidthBlock might contain
a slice that is a view of all columns), which
results in incorrect retained size and memory
accounting issue.

When a block holds a slice of the entire page,
depends on number of affected blocks in a page,
memory could be off by N times (N >= count of
VariableWidthBlock).

== RELEASE NOTES ==

General Changes
* Fix spilled query exceeding local memory limit due to incorrect memory size calculation when spilled data is read by SingleStreamSpiller.

@viczhang861 viczhang861 requested a review from a team June 25, 2021 14:46
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

 Iterator<Page> compactedPages = transform(deserializedPages, page -> {
                page.compact();
                return page;
            });

or simply Iterator<Page> compactedPages = transform(deserializedPages, Page::compact) if you change the Page::compact implementation to return this as in other methods in the Page interface.

Deserialized pages may contain block that is not
compact (e.g.,VariableWidthBlock might contain
a slice that is a view of all columns), which
results in incorrect retained size and memory
accounting issue.

When a block holds a slice of the entire page,
depends on number of affected blocks in a page,
memory could be off by N times (N >= count of
VariableWidthBlock).
@rschlussel
Copy link
Contributor

Can you explain why the retainedSizeInBytes would be incorrect for non-compacted pages?

@viczhang861
Copy link
Contributor Author

@rschlussel See commit message, VariableWidthBlock holds a slice, when it is compact, this slice is the block itself, when it is not compact, this slice is a reference of the whole deserialized page. How the retained size of a page is calculated? it adds up the retained size of each block, thus the slice reference could be added multiple times by multiple VariableWidthBlock.

@rschlussel
Copy link
Contributor

@rschlussel See commit message, VariableWidthBlock holds a slice, when it is compact, this slice is the block itself, when it is not compact, this slice is a reference of the whole deserialized page. How the retained size of a page is calculated? it adds up the retained size of each block, thus the slice reference could be added multiple times by multiple VariableWidthBlock.

Got it, thanks. So without this, we would allocate more memory than needed, and the query would fail with oom when it didn't have to.

@rschlussel rschlussel merged commit 3ef9184 into prestodb:master Jun 29, 2021
@viczhang861 viczhang861 deleted the spiller_compact branch June 29, 2021 15:14
@viczhang861
Copy link
Contributor Author

Got it, thanks. So without this, we would allocate more memory than needed, and the query would fail with oom when it didn't have to.

Yes, that is why I started to investigate why it reserved so much memory. 10 references means 10X than expected size.

@ajaygeorge ajaygeorge mentioned this pull request Jul 7, 2021
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants