Skip to content

Conversation

@arin-mirza
Copy link
Contributor

@arin-mirza arin-mirza commented Jan 9, 2026

Why I'm doing:

I recently extended the fe metrics to include mem_pool related information:

These are not reported as be metrics.

Memory pool usage percentage per resource group is a useful information that we can add as a backend metric and display in dashboards.

What I'm doing:

This pull request adds the backend metric resource_group_mem_pool_usage_ratio.

Fixes #issue

What type of PR is this:

  • BugFix
  • Feature
  • Enhancement
  • Refactor
  • UT
  • Doc
  • Tool

Does this PR entail a change in behavior?

  • Yes, this PR will result in a change in behavior.
  • No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • Parameter changes: default values, similar parameters but with different default values
  • Policy changes: use new policy to replace old one, functionality automatically enabled
  • Feature removed
  • Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • I have added test cases for my bug fix or my new feature
  • This pr needs user documentation (for new or modified features or behaviors)
    • I have added documentation for my new feature or new function
    • This pr needs auto generate documentation
  • This is a backport pr

Bugfix cherry-pick branch check:

  • I have checked the version labels which the pr will be auto-backported to the target branch
    • 4.1
    • 4.1
    • 4.0
    • 3.5
    • 3.4

Note

Introduces a per-workgroup memory pool usage ratio to backend metrics.

  • Registers new DoubleGauge metric resource_group_mem_pool_use_ratio (labels: name, mem_pool) in work_group.cpp
  • Computes and updates the ratio using mem_consumption_bytes / parent_memory_limit_bytes in update_metrics_unlocked, and resets to 0 when a workgroup is absent
  • Minor cleanups: _calculate_ratio moved into anonymous namespace with safer casts; conditional assignments expanded for clarity

Written by Cursor Bugbot for commit a895854. This will update automatically on new commits. Configure here.

@arin-mirza arin-mirza requested a review from a team as a code owner January 9, 2026 14:30
@github-actions github-actions bot added the 4.1 label Jan 9, 2026
@arin-mirza arin-mirza force-pushed the add-be-metric-mem-pool-usage-ratio branch from 1e84cc9 to 3c958c1 Compare January 9, 2026 14:35
Signed-off-by: arin-mirza <a.mirza@celonis.com>
@arin-mirza arin-mirza force-pushed the add-be-metric-mem-pool-usage-ratio branch from 3c958c1 to c1de967 Compare January 9, 2026 15:50
@arin-mirza

This comment was marked as resolved.

Signed-off-by: arin-mirza <a.mirza@celonis.com>
@alvin-celerdata
Copy link
Contributor

@cursor review

Comment on lines 418 to 419
if (mem_pool_use_ratio_registered)
wg_metrics->mem_pool_use_ratio = std::move(resource_group_mem_pool_use_ratio);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though this is only one statement, we prefer to have a { } to close for safety.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I applied this to the existing code as well. See commit a895854

Signed-off-by: arin-mirza <a.mirza@celonis.com>
@github-actions
Copy link

[Java-Extensions Incremental Coverage Report]

pass : 0 / 0 (0%)

@github-actions
Copy link

[FE Incremental Coverage Report]

pass : 0 / 0 (0%)

@github-actions
Copy link

[BE Incremental Coverage Report]

pass : 31 / 31 (100.00%)

file detail

path covered_line new_line coverage not_covered_line_detail
🔵 be/src/exec/workgroup/work_group.cpp 31 31 100.00% []

@alvin-celerdata
Copy link
Contributor

@cursor review

bool mem_pool_use_ratio_registered = StarRocksMetrics::instance()->metrics()->register_metric(
"resource_group_mem_pool_use_ratio",
MetricLabels().add("name", wg->name()).add("mem_pool", wg->mem_pool()),
resource_group_mem_pool_use_ratio.get());
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Metric label becomes stale after workgroup ALTER

Low Severity

The new metric resource_group_mem_pool_use_ratio includes a mem_pool label that is set at initial registration time. When a workgroup is altered via alter_workgroup_unlocked, the code at line 320 skips metric re-registration because _wg_metrics.count(wg->name()) != 0. If the workgroup's mem_pool value changes during ALTER, the metric label becomes stale - it continues showing the old mem_pool value while the metric value is calculated from the new pool. Unlike other metrics which only use the immutable name label, this new metric's label can drift from the actual state.

Fix in Cursor Fix in Web

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to think about this.

Copy link
Contributor Author

@arin-mirza arin-mirza Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered different workarounds to deal with this staleness problem.

  • Once a metric is registered with some labels, these labels are not supposed to change. If the mem_pool name changes and we want to reflect that, we need to deregister and reregister the metric. This is not the intended use of metrics.
  • Instead of using a label, we can add mem_pool as a separate metric instead. This is a dirty workaround which I am not willing to implement.

Neither of these is optimal, and the cleanest approach seems to be implementing separate metric registration inside MemTrackerManager. This way we can report metrics for each memory pool individually.

I will close this pull request soon and open a new one which refactors the MemTrackerManager.

@arin-mirza
Copy link
Contributor Author

Closing this one in favor of the new PR which reports the metrics in MemTrackerManager:

@arin-mirza arin-mirza closed this Jan 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants