Expose Pressure Stall Information as metrics #3052

dqminh · 2022-01-27T17:06:02Z

Pressure Stall Information is exposed per cgroup in cgroupv2. It's a good way to understand contention due to lack of resources ( cpu, memory, io ). For example

# /sys/fs/cgroup/system.slice/cpu.pressure
some avg10=0.00 avg60=0.00 avg300=0.00 total=306212315
full avg10=0.00 avg60=0.00 avg300=0.00 total=246733962

It would be great to expose this data source in cadvisor as metrics.

The text was updated successfully, but these errors were encountered:

mrunalp · 2022-01-28T04:59:40Z

This is something @bobbypage and I talked about adding recently. Also, adding @kolyshkin here.

dqminh · 2022-01-28T17:26:58Z

I added support for runc here opencontainers/runc#3358, so once that goes in we can update libcontainer in cadvisor and expose the metrics here.

bobbypage · 2022-01-31T21:51:27Z

That would be awesome to have support in libcontainer and use it in cAdvisor. Thanks @dqminh !

issues: google#3052, google#3083, kubernetes/enhancements#4205 This change adds metrics for pressure stall information, that indicate why some or all tasks of a cgroupv2 have waited due to resource congestion (cpu, memory, io). The change exposes this information by including the _PSIStats_ of each controller in it's stats, i.e. _CPUStats.PSI_, _MemoryStats.PSI_ and _DiskStats.PSI_. The information is additionally exposed as Prometheus metrics. The metrics follow the naming outlined by the prometheus/node-exporter, where stalled eq full and waiting eq some. ``` container_pressure_cpu_stalled_seconds_total container_pressure_cpu_waiting_seconds_total container_pressure_memory_stalled_seconds_total container_pressure_memory_waiting_seconds_total container_pressure_io_stalled_seconds_total container_pressure_io_waiting_seconds_total ``` Signed-off-by: Felix Ehrenpfort <[email protected]>

dqminh changed the title ~~Exopse Pressure Stall Information as metrics~~ Expose Pressure Stall Information as metrics Jan 27, 2022

dqminh linked a pull request Mar 23, 2022 that will close this issue

Support for exposing PSI metrics #3083

Open

ShadowJonathan mentioned this issue Nov 24, 2023

GetCpuLoad Failed on cgroup v2 environment #3137

Open

xinau mentioned this issue Jan 26, 2025

Add Pressure Stall Information Metrics #3649

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expose Pressure Stall Information as metrics #3052

Expose Pressure Stall Information as metrics #3052

dqminh commented Jan 27, 2022

mrunalp commented Jan 28, 2022

dqminh commented Jan 28, 2022

bobbypage commented Jan 31, 2022

Expose Pressure Stall Information as metrics #3052

Expose Pressure Stall Information as metrics #3052

Comments

dqminh commented Jan 27, 2022

mrunalp commented Jan 28, 2022

dqminh commented Jan 28, 2022

bobbypage commented Jan 31, 2022