-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
how to avoid nebula OOM killed by OS? #5727
Comments
In nebula v3.6.0, situation is much worsen. I wish someone in product team responds. |
|
Everything else are exactly the same, right? only memory limit will hit more often? |
Yes, every thing is same. My cluster info. Our space has total vertices Count: 2.8 Billion and total edges Count: 1 Billon. We don't have memory_tracker_limit_ratio, how does it help the situation? |
I found the OOM log in
|
By default the memory tracker was enabled and there are default ratio(80%) defined., refer to https://www.nebula-graph.io/posts/memory-tracker-practices for more details! What version are you running before 3.6.0, please? Also, was the limit in node over commit? |
refer to https://www.nebula-graph.io/posts/memory-tracker-practices , the mem tracker will stop things from requesting memory but the ratio was based on total - unmanaged(OS or other process), it's indeed strange that OOM reported it's way more than 0.3 ratio. By
|
Based on this discussion we added memory_tracker_limit_ratio=0.3. After this, we are not seeing OOM but queries are failing with 'GraphMemoryExceeded: (-2600)' error. Below is what we have for graphd
|
It's the config:
And there are memory items get from
|
@coldgust it turns out |
The mem tracker will limit based on the untracked memory, ratio, and the limit in k8s limit, it should hit the ratio to trigger this exceeding error, we could enlarge the ratio to one that's larger than 0.3, maybe start from 0.6? Also, for the limit, was that overcommit value when accumulating all workloads? |
Hi, I have noticed that the issue you created hasn’t been updated for nearly a month, so I have to close it for now. If you have any new updates, you are welcome to reopen this issue anytime. Thanks a lot for your contribution anyway 😊 |
I have tried set
memory_tracker_limit_ratio=0.3
tographd
andstoraged
, andgraphd
andstoraged
are deployed on the same node. When I try to execute a large query, it still get killed due to OOM. Is there have a method to kill the query rather than OOM killed by OS?And when I execute a query, is there a way to set a timeout parameter to kill the slow query?
Thanks!
The text was updated successfully, but these errors were encountered: