Skip to content

Conversation

@mergify
Copy link
Contributor

@mergify mergify bot commented Jan 12, 2026

Why I'm doing:

The mem_pool property for resource groups was introduced to StarRocks in the following pull requests:

The collective usage statistics of resource groups which belong to the same memory pool are tracked by a shared memory tracker. These shared memory trackers are managed by the mem_tracker_manager instance, which owns an unordered map from mem_pool names to their shared memory trackers.

Currently, memory pools (and their corresponding shared memory trackers) can only be inserted to this map but not removed. A memory pool becomes unused and needs to be deleted when all the resource groups that belong to that pool are deleted.

This pull request adds the necessary functionality to automatically and safely clean up such unused mem_pool entries.

What I'm doing:

  • MemTrackerManager now also keeps track of the number of its children (i.e. the number of workgroups that belong to it) and when this number reaches 0, the corresponding entry in the unordered map is deleted.
  • This counter is set by the usage of register_workgroup and deregister_workgroup methods, which should be called whenever a new workgroup is constructed and destructed, respectively.
    • When the register_workgroup method is called with a workgroup that has a new mem_pool, it will construct a new shared memory tracker instance and insert the memory pool name, memory tracker pointer, and child count into the unordered map. If it is called with a workgroup with an existing mem_pool, it will increase the number of children for that mem_pool by one.
    • When the deregister_workgroup method is called, it will decrease mem_pool's number of children by one, or erase the mem_pool entry from the unordered map if the child count has reached 0.

There is a tricky edge case that is also handled, which is explained below:

  • When the user deletes a workgroup, it is deleted immediately at the front end, but will only be marked for deletion at the backend. The workgroup instance will be destructed after some expiration time (120 seconds by default).
  • This means that a new workgroup can be created with the same name right after deleting it. This is not a problem because workgroups are immutable and uniquely versioned. However, this is not the case for memory pools.
  • A memory pool entry will be cleaned up after all of its children are destructed. This means there is a time window where the mem_pool does not exist at the front-end, but its entry is still alive at the backend, because all its children are marked for deletion but haven't been destructed yet. If the user creates a new workgroup under a mem_pool with the same name in this time window, register_workgroup cannot simply insert a fresh entry and ignore the children waiting for deletion, but must take their number into account.
    • If the child count is reset and expiring children are not orphaned, when these children are finally destructed, they will call deregister_workgroup and cause an underflow in the counter.
    • This is prevented by never resetting the child count and always incrementing it inside register_workgroup. For fresh mem_pools, it will be 0-initialized and increased to 1. For reused mem_pools, it will keep the old count (expiring child count) and increase it by 1.

This pull request also changes the mark_del() function of the Workgroup class and parameterizes it with an expiration time. The value of expiration time is owned by the WorkgroupManager and has the default of 120 seconds. This change was necessary for testing the correct behavior of MemTrackerManager.

Fixes #issue

What type of PR is this:

  • BugFix
  • Feature
  • Enhancement
  • Refactor
  • UT
  • Doc
  • Tool

Does this PR entail a change in behavior?

  • Yes, this PR will result in a change in behavior.
  • No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • Parameter changes: default values, similar parameters but with different default values
  • Policy changes: use new policy to replace old one, functionality automatically enabled
  • Feature removed
  • Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • I have added test cases for my bug fix or my new feature
  • This pr needs user documentation (for new or modified features or behaviors)
    • I have added documentation for my new feature or new function
    • This pr needs auto generate documentation
  • This is a backport pr

Bugfix cherry-pick branch check:

  • I have checked the version labels which the pr will be auto-backported to the target branch
    • 4.0
    • 3.5
    • 3.4
    • 3.3

Note

Implements automatic cleanup of unused mem_pool shared memory trackers and integrates lifecycle management with workgroups.

  • Add MemTrackerManager APIs: register_workgroup/deregister_workgroup with child counting; verify cached tracker limit to handle FE deletion/recreate edge case; expose list_mem_trackers
  • Refactor WorkGroupManager to use register_workgroup, call deregister_workgroup on cleanup, add list_memory_pools, and make mark_del expiration configurable via set_workgroup_expiration_time
  • Update WorkGroup::mark_del to accept expiration; adjust vacuum time precision and defaults
  • Add tests for shared tracker reuse/overwrite and for cleanup of unused memory pools

Written by Cursor Bugbot for commit 0cce91e. This will update automatically on new commits. Configure here.


This is an automatic backport of pull request #67347 done by [Mergify](https://mergify.com).

@mergify
Copy link
Contributor Author

mergify bot commented Jan 12, 2026

🧪 CI Insights

Here's what we observed from your CI run for 8bbc648d.

🟢 All jobs passed!

But CI Insights is watching 👀

@wanpengfei-git wanpengfei-git merged commit 0232f28 into branch-4.0 Jan 12, 2026
33 of 34 checks passed
@wanpengfei-git wanpengfei-git deleted the mergify/bp/branch-4.0/pr-67347 branch January 12, 2026 18:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants