-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HashJoinStream
memory tracking insufficient
#7848
Comments
cc @korowa who I think did the initial work for Join memory tracking in case they have ideas. |
Can't remember of anything "leaky" except for build-side data and visited build-side indices 🤔 -- I'll check it out and try to submit PR |
I was able to reproduce it with cross join (all records in both build and probe sides have the same key, and build side is small enough to fit in memory limit) -- it's true, "working" batch is not tracked at all. I guess the most proper solution would be to produce partially joined batch (in the same fashion like it works in MergeJoin), rather than accumulating output in memory until memory limit violated. Any thoughts on this option? UPD: but still, it's worth having as precise memory management as possible -- I'll take a stab at additional memory tracking for working batch, if there are no objections. |
@korowa I think adding support for the working batch sounds like a good idea, as long as it doesn't get too complex. Thank you for looking into this |
@korowa is there any update on this? If now, I may try and work on it this week |
Unfortunately I've started only a day ago with first option (emitting batches from hash join) -- so, I hope during this week there will be at least draft for this. But, ofc, feel free to pick this issue! |
Thanks @korowa -- I'll let you know |
Related ticket: #8130 which I believe is blocking this code. |
Describe the bug
Using a hash join may lead to OOM kills / very large memory consumption even when a memory limit is set.
To Reproduce
No reproduction steps yet.
We have a flame graph from a prod environment though:
Expected behavior
Memory manager report "out of memory", the query fails.
Additional context
I think the code only tracks
HashJoinStream::visited_left_side
butbuild_equal_condition_join_indices
is completely untracked.The text was updated successfully, but these errors were encountered: