Skip to content

[cgv2] Add memory PSI metrics#274

Merged
orestisfl merged 4 commits intoelastic:mainfrom
orestisfl:cgroup-memory-psi
Dec 15, 2025
Merged

[cgv2] Add memory PSI metrics#274
orestisfl merged 4 commits intoelastic:mainfrom
orestisfl:cgroup-memory-psi

Conversation

@orestisfl
Copy link
Contributor

What does this PR do?

This adds memory pressure metrics to the cgroup v2 memory subsystem, complementing the existing CPU and IO pressure metrics.

New metrics exposed:

  • system.process.cgroup.memory.pressure.some.{10,60,300}.pct
  • system.process.cgroup.memory.pressure.some.total
  • system.process.cgroup.memory.pressure.full.{10,60,300}.pct
  • system.process.cgroup.memory.pressure.full.total

The implementation:

  • Adds Pressure field to MemorySubsystem struct
  • Reads from memory.pressure file using existing GetPressure helper
  • Gracefully handles systems without PSI support

PSI can be disabled for linux in a number of ways:

  1. Compile-time: Controlled by CONFIG_PSI in init/Kconfig:
  2. Boot-time disabled by default: With CONFIG_PSI_DEFAULT_DISABLED=y, PSI is off unless psi=1 is passed.
  3. Boot parameter: Can be disabled with cgroup_disable=pressure:

Why is it important?

Important statistic to track linux performance

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.md

Related issues

This adds memory pressure metrics to the cgroup v2 memory subsystem,
complementing the existing CPU and IO pressure metrics.

New metrics exposed:
- system.process.cgroup.memory.pressure.some.{10,60,300}.pct
- system.process.cgroup.memory.pressure.some.total
- system.process.cgroup.memory.pressure.full.{10,60,300}.pct
- system.process.cgroup.memory.pressure.full.total

The implementation:
- Adds Pressure field to MemorySubsystem struct
- Reads from memory.pressure file using existing GetPressure helper
- Gracefully handles systems without PSI support

PSI can be disabled for linux in a number of ways:
1. **Compile-time**: Controlled by `CONFIG_PSI` in `init/Kconfig`:
2. **Boot-time disabled by default**: With `CONFIG_PSI_DEFAULT_DISABLED=y`, PSI is off unless `psi=1` is passed.
3. **Boot parameter**: Can be disabled with `cgroup_disable=pressure`:
@orestisfl orestisfl self-assigned this Dec 11, 2025
@orestisfl orestisfl requested a review from a team as a code owner December 11, 2025 10:39
@orestisfl orestisfl requested review from AndersonQ and belimawr and removed request for a team December 11, 2025 10:39
@orestisfl orestisfl added enhancement New feature or request Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team labels Dec 11, 2025
Copy link
Member

@mauri870 mauri870 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Codewise LGTM.

@orestisfl orestisfl merged commit d9d092b into elastic:main Dec 15, 2025
5 checks passed
@orestisfl orestisfl deleted the cgroup-memory-psi branch December 15, 2025 07:10
orestisfl added a commit to elastic/beats that referenced this pull request Dec 16, 2025
## Proposed commit message
Add memory pressure PSI metrics to the system.process.cgroup.memory
metricset, complementing the existing CPU and IO pressure metrics.

New fields added under system.process.cgroup.memory.pressure:
- pressure.some.{10,60,300}.pct - Share of time with some tasks stalled
- pressure.some.total - Total some pressure time
- pressure.full.{10,60,300}.pct - Share of time with all tasks stalled
- pressure.full.total - Total full pressure time

Closes #47604

## How to test this PR locally

### 1. Build and Run Metricbeat

```bash
cd metricbeat
go build .
```

### 2. Create Test Configuration

```yaml
metricbeat.modules:
- module: system
  period: 5s
  metricsets:
    - process
  processes: ['.*']
  process.cgroups.enabled: true

output.console:
  pretty: true
```

### 3. Run Metricbeat

```bash
./metricbeat -e -c /tmp/metricbeat-psi-test.yml
```

### 4. Verify Memory Pressure Fields

Look for `system.process.cgroup.memory.pressure` in the output:
```json
"memory": {
  "pressure": {
    "some": {
      "10": { "pct": 0 },
      "60": { "pct": 0 },
      "300": { "pct": 0 },
      "total": 0
    },
    "full": {
      "10": { "pct": 0 },
      "60": { "pct": 0 },
      "300": { "pct": 0 },
      "total": 0
    }
  }
}
```

### 5. Compare Before/After (Optional)
[compare-psi-metrics.sh](https://github.com/user-attachments/files/24191696/compare-psi-metrics.sh)
Use the comparison script to compare output from main vs this PR:

```
compare-psi-metrics.sh
Usage: ./compare-psi-metrics.sh <main-output.ndjson> <pr-output.ndjson>
```

## Related issues

- Requires elastic/elastic-agent-system-metrics#274
- Closes #47604
orestisfl added a commit that referenced this pull request Jan 16, 2026
## What does this PR do?

Fixes a bug in the cgroup v2 CPU subsystem where the `Get` method was
returning early when the `cpu.pressure` file doesn't exist.

The change replaces `return nil` with `err = nil`, allowing the function
to continue execution and fetch the remaining CPU stats (usage, system
time, etc.) even when pressure stats are unavailable.

Also adds a new test `TestGetCPUEmpty` that verifies the CPU subsystem
correctly handles empty directories without errors.

## Why is it important?

On systems where `cpu.pressure` is not available, the previous code
would silently return without populating any CPU statistics. This meant
that valid CPU metrics like usage and system time were being lost even
though they were available in other files in the cgroup path.

PSI can be disabled for linux in a number of ways:
1. **Compile-time**: Controlled by `CONFIG_PSI` in `init/Kconfig`:
2. **Boot-time disabled by default**: With
`CONFIG_PSI_DEFAULT_DISABLED=y`, PSI is off unless `psi=1` is passed.
3. **Boot parameter**: Can be disabled with `cgroup_disable=pressure`:

## Related issues

- Relates #274
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Metricbeat][System module] Report memory pressure stall information in metrics (Linux)

2 participants