-
Notifications
You must be signed in to change notification settings - Fork 2.3k
[Enhancement] Add mem_pool_use_ratio metric to be #67701
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Enhancement] Add mem_pool_use_ratio metric to be #67701
Conversation
1e84cc9 to
3c958c1
Compare
Signed-off-by: arin-mirza <a.mirza@celonis.com>
3c958c1 to
c1de967
Compare
This comment was marked as resolved.
This comment was marked as resolved.
Signed-off-by: arin-mirza <a.mirza@celonis.com>
|
@cursor review |
be/src/exec/workgroup/work_group.cpp
Outdated
| if (mem_pool_use_ratio_registered) | ||
| wg_metrics->mem_pool_use_ratio = std::move(resource_group_mem_pool_use_ratio); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even though this is only one statement, we prefer to have a { } to close for safety.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I applied this to the existing code as well. See commit a895854
Signed-off-by: arin-mirza <a.mirza@celonis.com>
[Java-Extensions Incremental Coverage Report]✅ pass : 0 / 0 (0%) |
[FE Incremental Coverage Report]✅ pass : 0 / 0 (0%) |
[BE Incremental Coverage Report]✅ pass : 31 / 31 (100.00%) file detail
|
|
@cursor review |
| bool mem_pool_use_ratio_registered = StarRocksMetrics::instance()->metrics()->register_metric( | ||
| "resource_group_mem_pool_use_ratio", | ||
| MetricLabels().add("name", wg->name()).add("mem_pool", wg->mem_pool()), | ||
| resource_group_mem_pool_use_ratio.get()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Metric label becomes stale after workgroup ALTER
Low Severity
The new metric resource_group_mem_pool_use_ratio includes a mem_pool label that is set at initial registration time. When a workgroup is altered via alter_workgroup_unlocked, the code at line 320 skips metric re-registration because _wg_metrics.count(wg->name()) != 0. If the workgroup's mem_pool value changes during ALTER, the metric label becomes stale - it continues showing the old mem_pool value while the metric value is calculated from the new pool. Unlike other metrics which only use the immutable name label, this new metric's label can drift from the actual state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to think about this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I considered different workarounds to deal with this staleness problem.
- Once a metric is registered with some labels, these labels are not supposed to change. If the
mem_poolname changes and we want to reflect that, we need to deregister and reregister the metric. This is not the intended use of metrics. - Instead of using a label, we can add
mem_poolas a separate metric instead. This is a dirty workaround which I am not willing to implement.
Neither of these is optimal, and the cleanest approach seems to be implementing separate metric registration inside MemTrackerManager. This way we can report metrics for each memory pool individually.
I will close this pull request soon and open a new one which refactors the MemTrackerManager.
|
Closing this one in favor of the new PR which reports the metrics in |
Why I'm doing:
I recently extended the
femetrics to include mem_pool related information:These are not reported as
bemetrics.Memory pool usage percentage per resource group is a useful information that we can add as a backend metric and display in dashboards.
What I'm doing:
This pull request adds the backend metric
resource_group_mem_pool_usage_ratio.Fixes #issue
What type of PR is this:
Does this PR entail a change in behavior?
If yes, please specify the type of change:
Checklist:
Bugfix cherry-pick branch check:
Note
Introduces a per-workgroup memory pool usage ratio to backend metrics.
DoubleGaugemetricresource_group_mem_pool_use_ratio(labels:name,mem_pool) inwork_group.cppmem_consumption_bytes / parent_memory_limit_bytesinupdate_metrics_unlocked, and resets to0when a workgroup is absent_calculate_ratiomoved into anonymous namespace with safer casts; conditional assignments expanded for clarityWritten by Cursor Bugbot for commit a895854. This will update automatically on new commits. Configure here.