-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose Pressure Stall Information as metrics #3052
Comments
This is something @bobbypage and I talked about adding recently. Also, adding @kolyshkin here. |
I added support for runc here opencontainers/runc#3358, so once that goes in we can update libcontainer in cadvisor and expose the metrics here. |
That would be awesome to have support in libcontainer and use it in cAdvisor. Thanks @dqminh ! |
xinau
added a commit
to xinau/cadvisor
that referenced
this issue
Jan 26, 2025
issues: google#3052, google#3083, kubernetes/enhancements#4205 This change adds metrics for pressure stall information, that indicate why some or all tasks of a cgroupv2 have waited due to resource congestion (cpu, memory, io). The change exposes this information by including the _PSIStats_ of each controller in it's stats, i.e. _CPUStats.PSI_, _MemoryStats.PSI_ and _DiskStats.PSI_. The information is additionally exposed as Prometheus metrics. The metrics follow the naming outlined by the prometheus/node-exporter, where stalled eq full and waiting eq some. ``` container_pressure_cpu_stalled_seconds_total container_pressure_cpu_waiting_seconds_total container_pressure_memory_stalled_seconds_total container_pressure_memory_waiting_seconds_total container_pressure_io_stalled_seconds_total container_pressure_io_waiting_seconds_total ``` Signed-off-by: Felix Ehrenpfort <[email protected]>
xinau
added a commit
to xinau/cadvisor
that referenced
this issue
Jan 26, 2025
issues: google#3052, google#3083, kubernetes/enhancements#4205 This change adds metrics for pressure stall information, that indicate why some or all tasks of a cgroupv2 have waited due to resource congestion (cpu, memory, io). The change exposes this information by including the _PSIStats_ of each controller in it's stats, i.e. _CPUStats.PSI_, _MemoryStats.PSI_ and _DiskStats.PSI_. The information is additionally exposed as Prometheus metrics. The metrics follow the naming outlined by the prometheus/node-exporter, where stalled eq full and waiting eq some. ``` container_pressure_cpu_stalled_seconds_total container_pressure_cpu_waiting_seconds_total container_pressure_memory_stalled_seconds_total container_pressure_memory_waiting_seconds_total container_pressure_io_stalled_seconds_total container_pressure_io_waiting_seconds_total ``` Signed-off-by: Felix Ehrenpfort <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Pressure Stall Information is exposed per cgroup in cgroupv2. It's a good way to understand contention due to lack of resources ( cpu, memory, io ). For example
It would be great to expose this data source in cadvisor as metrics.
The text was updated successfully, but these errors were encountered: