Skip to content

Conversation

@arin-mirza
Copy link
Contributor

@arin-mirza arin-mirza commented Dec 12, 2025

Why I'm doing:

The mem_pool property for resource groups was recently introduced. See related pull requests below:

Currently, there is no way to display the usage metrics of the memory pools.

What I'm doing:

This pull request extends the result of show usage resource groups; command with the following:

Column Description
Id
Backend
BEInUseCpuCores
BEInUseMemBytes
BERunningQueries
+ BEMemLimitBytes Memory limit of the resource group.
+ BEMemPool Name of the memory pool the resource group belongs to.
+ BEMemPoolInUseMemBytes Current total memory usage of the mem_pool.
+ BEMemPoolMemLimitBytes Specified memory limit of the mem_pool.
  • The current implementation of memory pools requires configuring the memory limit of all resource groups (BEMemLimitBytes) under the same memory pool to the same value. This value also corresponds to the memory limit of the memory pool (BEMemPoolMemLimitBytes). The displayed values in these two columns will coincide due to this restriction, but it might change in the future.
  • The value of BEMemPoolInUseMemBytes is the sum of memory usages of each resource group that belongs to that memory pool. See the screenshots below for an example scenario.

Notes/Discussion

  • If two workgroups with the same id but different versions exist at the same time, the metrics to be reported for memory pools must belong to the latest version of that workgroup. This necessitates knowing which group_version a TResourceGroupUsage object belongs to. I tried two different approaches:
    • In resorce_group_usage_recorder.cpp, I added an auxillary VersionedResourceGroupUsage struct which had a group_version and a TResourceGroupUsage object as a pair. This solution worked, but it is just too messy for something as simple as this. See b9c4141
  • (Preferred) I added a group_version field to the ResourceUsage.thrift file and directly use it. This is much cleaner. See bba2165
    • The group_version is only used by resorce_group_usage_recorder.cpp in the backend, and is not read from the thrift file on the front end side.

Demo

demo-0
demo-1
demo-2
demo-3

Fixes #issue

What type of PR is this:

  • BugFix
  • Feature
  • Enhancement
  • Refactor
  • UT
  • Doc
  • Tool

Does this PR entail a change in behavior?

  • Yes, this PR will result in a change in behavior.
  • No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • Parameter changes: default values, similar parameters but with different default values
  • Policy changes: use new policy to replace old one, functionality automatically enabled
  • Feature removed
  • Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • I have added test cases for my bug fix or my new feature
  • This pr needs user documentation (for new or modified features or behaviors)
    • I have added documentation for my new feature or new function
  • This is a backport pr

Bugfix cherry-pick branch check:

  • I have checked the version labels which the pr will be auto-backported to the target branch
    • 4.0
    • 3.5
    • 3.4
    • 3.3

Note

Enhances resource group usage reporting with memory‑pool metrics end-to-end.

  • FE: Extends SHOW USAGE RESOURCE GROUPS with BEMemLimitBytes, BEMemPool, BEMemPoolInUseMemBytes, BEMemPoolMemLimitBytes; updates metadata, rendering, and tests
  • BE: ResourceGroupUsageRecorder now includes group_version, mem_limit_bytes, mem_pool, and mem‑pool limit/usage; merges per‑group stats preferring latest group_version
  • BE: WorkGroup exposes parent memory limit/usage helpers used for mem‑pool values
  • FE runtime: ComputeNode.ResourceGroupUsage extended to carry new fields and map from thrift
  • Thrift: adds fields to TResourceGroupUsage for mem limits/pool and group_version

Written by Cursor Bugbot for commit a07cc5e. This will update automatically on new commits. Configure here.

@mergify
Copy link
Contributor

mergify bot commented Dec 12, 2025

🧪 CI Insights

Here's what we observed from your CI run for b9c4141.

🟢 All jobs passed!

But CI Insights is watching 👀

@github-actions github-actions bot added the 4.0 label Dec 12, 2025
@arin-mirza arin-mirza force-pushed the extend-show-usage-resource-groups-with-mem-pool-metrics branch from bf65e1c to b9c4141 Compare December 12, 2025 16:28
@alvin-celerdata
Copy link
Contributor

@cursor review

@alvin-celerdata
Copy link
Contributor

@cursor review

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Test assertions missing new columns in expected output

The test assertions at lines 1592-1602 and 1620-1624 expect the old output format with only 6 columns, but the code now produces 10 columns per row (including BEMemLimitBytes, BEMemPool, BEMemPoolInUseMemBytes, and BEMemPoolMemLimitBytes). The header assertion at lines 1562-1563 was correctly updated to include the new columns, but the data row assertions still expect the old format. These tests will fail because the actual output contains 4 additional column values that are not included in the expected strings.

fe/fe-core/src/test/java/com/starrocks/qe/scheduler/QueryQueueManagerTest.java#L1591-L1602

String res = starRocksAssert.executeShowResourceUsageSql("SHOW USAGE RESOURCE GROUPS;");
assertThat(res).isEqualTo("Name|Id|Backend|BEInUseCpuCores|BEInUseMemBytes|BERunningQueries\n" +
"default_wg|2|be0-host|3.112|39|38\n" +
"default_mv_wg|3|be1-host|4.11|49|48\n" +
"wg0|10|be0-host|0.112|9|8\n" +
"wg0|10|be1-host|1.11|19|18\n" +
"wg1|11|be0-host|0.1|0|0\n" +
"wg1|11|be1-host|1.1|0|0\n" +
"wg2|12|be0-host|0.12|7|6\n" +
"wg2|12|be1-host|1.12|17|16\n" +
"wg3|13|be0-host|0.03|0|0\n" +
"wg3|13|be1-host|0.13|0|0");

fe/fe-core/src/test/java/com/starrocks/qe/scheduler/QueryQueueManagerTest.java#L1619-L1624

String res = starRocksAssert.executeShowResourceUsageSql("SHOW USAGE RESOURCE GROUPS;");
assertThat(res).isEqualTo("Name|Id|Backend|BEInUseCpuCores|BEInUseMemBytes|BERunningQueries\n" +
"wg0|10|be0-host|0.21|29|28\n" +
"wg1|11|be0-host|0.2|0|0\n" +
"wg2|12|be1-host|1.22|27|26\n" +
"wg3|13|be1-host|0.23|0|0");

Fix in Cursor Fix in Web


@arin-mirza
Copy link
Contributor Author

I adjusted the test case testShowResourceGroupUsage properly in 9fc7afa and fixed the clang formatting issue in 4708184

@alvin-celerdata
Copy link
Contributor

@cursor review

@arin-mirza
Copy link
Contributor Author

arin-mirza commented Dec 19, 2025

The failing CI PIPELINE / FE UT check is due to testShowResourceGroupUsage test failing with:

2025-12-19T18:29:56.5650451Z [�[1;34mINFO�[m] 
2025-12-19T18:29:56.5650846Z [�[1;34mINFO�[m] Results:
2025-12-19T18:29:56.5651234Z [�[1;34mINFO�[m] 
2025-12-19T18:29:56.5651646Z [�[1;31mERROR�[m] �[1;31mFailures: �[m
2025-12-19T18:29:56.5652459Z [�[1;31mERROR�[m] �[1;31m  QueryQueueManagerTest.testShowResourceGroupUsage:1562 
2025-12-19T18:29:56.5653107Z expected: 
2025-12-19T18:29:56.5654336Z   "Name|Id|Backend|BEInUseCpuCores|BEInUseMemBytes|BERunningQueries|BEMemLimitBytes|BEMemPool|BEMemPoolInUseMemBytes|BEMemPoolMemLimitBytes
2025-12-19T18:29:56.5655474Z   "
2025-12-19T18:29:56.5655750Z  but was: 
2025-12-19T18:29:56.5656196Z   "Name|Id|Backend|BEInUseCpuCores|BEInUseMemBytes|BERunningQueries
2025-12-19T18:29:56.5656801Z   "�[m
2025-12-19T18:29:56.5657137Z [�[1;34mINFO�[m] 
2025-12-19T18:29:56.5657866Z [�[1;31mERROR�[m] �[1;31mTests run: 14661, Failures: 1, Errors: 0, Skipped: 81�[m
2025-12-19T18:29:56.5658570Z [�[1;34mINFO�[m] 

This is very interesting, because this test passes for me locally, and I can see the new columns when executing show usage resource groups; on locally built starrocks fe/be instances, which is how I took the screenshots in the PR description.

The columns in the fe response are also clearly being extended in ShowResourceGroupUsageStmt.java.

Any idea why the CI run might not be getting the expected result?

@alvin-celerdata
Copy link
Contributor

@cursor review

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no bugs!


murphyatwork
murphyatwork previously approved these changes Dec 25, 2025
HangyuanLiu
HangyuanLiu previously approved these changes Dec 25, 2025
satanson
satanson previously approved these changes Dec 25, 2025
@alvin-celerdata
Copy link
Contributor

@mergify rebase

@mergify
Copy link
Contributor

mergify bot commented Dec 27, 2025

rebase

✅ Branch has been successfully rebased

@alvin-celerdata alvin-celerdata force-pushed the extend-show-usage-resource-groups-with-mem-pool-metrics branch from 080a7ca to c1421a2 Compare December 27, 2025 18:51
@alvin-celerdata
Copy link
Contributor

@arin-mirza
Could you fix the failed FE UT? After this passes, I will merge it.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no bugs!

@arin-mirza
Copy link
Contributor Author

Regarding the failing BE UT check

FAILED TESTS (2/2145):
   37396 ms: /root/starrocks/be/ut_build_Release/test/starrocks_dw_test LakePrimaryKeyConsistencyTest/LakePrimaryKeyConsistencyTest.test_local_pk_consistency/0
   36347 ms: /root/starrocks/be/ut_build_Release/test/starrocks_dw_test LakePrimaryKeyConsistencyTest/LakePrimaryKeyConsistencyTest.test_random_seed_pk_consistency/0

Seems unrelated. Flaky?

@github-actions
Copy link
Contributor

[FE Incremental Coverage Report]

pass : 24 / 24 (100.00%)

file detail

path covered_line new_line coverage not_covered_line_detail
🔵 com/starrocks/sql/ast/ShowResourceGroupUsageStmt.java 9 9 100.00% []
🔵 com/starrocks/qe/ShowResultMetaFactory.java 4 4 100.00% []
🔵 com/starrocks/system/ComputeNode.java 11 11 100.00% []

@alvin-celerdata
Copy link
Contributor

@mergify rebase

…age metrics

Signed-off-by: arin-mirza <a.mirza@celonis.com>
Signed-off-by: arin-mirza <a.mirza@celonis.com>
Signed-off-by: arin-mirza <a.mirza@celonis.com>
Signed-off-by: arin-mirza <a.mirza@celonis.com>
Signed-off-by: arin-mirza <a.mirza@celonis.com>
Signed-off-by: arin-mirza <a.mirza@celonis.com>
Signed-off-by: arin-mirza <a.mirza@celonis.com>
Signed-off-by: arin-mirza <a.mirza@celonis.com>
Signed-off-by: arin-mirza <a.mirza@celonis.com>
…olumns

Signed-off-by: arin-mirza <a.mirza@celonis.com>
@mergify
Copy link
Contributor

mergify bot commented Dec 31, 2025

rebase

✅ Branch has been successfully rebased

@alvin-celerdata alvin-celerdata force-pushed the extend-show-usage-resource-groups-with-mem-pool-metrics branch from 8e750b3 to a07cc5e Compare December 31, 2025 15:05
@sonarqubecloud
Copy link

@github-actions
Copy link
Contributor

[Java-Extensions Incremental Coverage Report]

pass : 0 / 0 (0%)

@alvin-celerdata alvin-celerdata merged commit f4a8df5 into StarRocks:main Dec 31, 2025
59 of 62 checks passed
@github-actions
Copy link
Contributor

@Mergifyio backport branch-4.0

@github-actions github-actions bot removed the 4.0 label Dec 31, 2025
@mergify
Copy link
Contributor

mergify bot commented Dec 31, 2025

backport branch-4.0

✅ Backports have been created

Details

mergify bot pushed a commit that referenced this pull request Dec 31, 2025
…ics (#66690)

Signed-off-by: arin-mirza <a.mirza@celonis.com>
(cherry picked from commit f4a8df5)

# Conflicts:
#	fe/fe-core/src/main/java/com/starrocks/qe/ShowResultMetaFactory.java
#	fe/fe-core/src/main/java/com/starrocks/sql/ast/ShowResourceGroupUsageStmt.java
@github-actions
Copy link
Contributor

[BE Incremental Coverage Report]

fail : 13 / 18 (72.22%)

file detail

path covered_line new_line coverage not_covered_line_detail
🔵 be/src/agent/resource_group_usage_recorder.cpp 6 11 54.55% [55, 56, 57, 58, 59]
🔵 be/src/exec/workgroup/work_group.h 7 7 100.00% []

farhad-celo pushed a commit to farhad-celo/starrocks that referenced this pull request Jan 20, 2026
…ics (StarRocks#66690)

Signed-off-by: arin-mirza <a.mirza@celonis.com>
Signed-off-by: Farhad Shahmohammadi <f.shahmohammadi@celonis.com>
farhad-celo pushed a commit to farhad-celo/starrocks that referenced this pull request Jan 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants