Revert "Unwrap lazy blocks before expensive remote and local exchange operations" PR#16773
Conversation
pettyjamesm
left a comment
There was a problem hiding this comment.
The changes in the revert look fine and safe to me, approved.
That said, I’m curious to know any details you might be able to provide about the problems you observed that were resolved by reverting these changes. I haven’t seen similar issues in our test environments and nothing in the PR being reverted should fundamentally require any more or less query memory- so this seems like an edge case that needs to be chased down.
| } | ||
| Page pageSplit; | ||
| if (partitionSize == page.getPositionCount()) { | ||
| pageSplit = page; // entire page will be sent to this partition, no copies necessary |
There was a problem hiding this comment.
Is it possible that the issue is that these pages were retaining more memory as a result of not having all of their positions explicitly copied out? If this partitioning exchanger was used to do a local exchange after a remote exchange (fairly common), then the issue with VariableWidthBlocks retaining their entire SerializedPage slice could indeed yield a much higher retained size than before.
There was a problem hiding this comment.
Sorry about the delayed reply. @aweisberg do you have more context on this since you mentioned this PR in our discussion about the local memory issues?
There was a problem hiding this comment.
I think I talked about this with James in Slack, but we didn't pick the correct way to follow up and get this performance improvement without also sometimes regressing on memory.
There was a problem hiding this comment.
Since I had also made the regressing change to Trino, I followed up there with a fix and a longer description of the problem / alternatives. See: trinodb/trino#9327 (with a follow up PR in trinodb/trino#9379)
As for what to do in PrestoDB, there some unknowns about reintroducing this change with a similar fix on the Trino side, as I do not have a comparable testing environment to ensure that calling Page#compact() explicitly will perform comparably to Page#copyPositions in terms of throughput or reported memory usage for all workloads. Trino has made a variety of other localized changes to related classes (like adding explicit calls to Page#compact() in other places, revised operator and memory tracking implementations, etc). On the Athena side, we chose to take the performance hit of eagerly copying VariableWidthBlock instances out of their input slice during deserialization which means we didn't have the same issue, but that's an expensive hit to take to throughput.
So as I see it, we can either:
- Take a harsh performance hit to
VariableWidthBlockdeserialization throughput, but address this basic memory tracking bug more thoroughly (ie: right now the memory tracking bug isn't typically reported because there is usually a partitioning local exchange after a remote exchange that forces the copy, but not always) - I can put together a PR with this change reintroduced, but with comparable fixes to the ones I made on the Trino side to mitigate the potential issue and let someone with time and realistic production workloads test them out.
- Leave the change reverted (I have no objection with this option)
This reverts #16617
The commits were causing previously successful queries to fail with EXCEEDED_LOCAL_MEMORY_LIMIT errors.