Skip to content

[Metricbeat] Add memory PSI metrics for cgroupv2#48054

Merged
orestisfl merged 13 commits intoelastic:mainfrom
orestisfl:metricbeat-cgroup-memory-pressure
Dec 16, 2025
Merged

[Metricbeat] Add memory PSI metrics for cgroupv2#48054
orestisfl merged 13 commits intoelastic:mainfrom
orestisfl:metricbeat-cgroup-memory-pressure

Conversation

@orestisfl
Copy link
Contributor

@orestisfl orestisfl commented Dec 11, 2025

Proposed commit message

Add memory pressure PSI metrics to the system.process.cgroup.memory
metricset, complementing the existing CPU and IO pressure metrics.

New fields added under system.process.cgroup.memory.pressure:

  • pressure.some.{10,60,300}.pct - Share of time with some tasks stalled
  • pressure.some.total - Total some pressure time
  • pressure.full.{10,60,300}.pct - Share of time with all tasks stalled
  • pressure.full.total - Total full pressure time

Closes #47604

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works. Where relevant, I have used the stresstest.sh script to run them under stress conditions and race detector to verify their stability.
  • I have added an entry in ./changelog/fragments using the changelog tool.

How to test this PR locally

1. Build and Run Metricbeat

cd metricbeat
go build .

2. Create Test Configuration

metricbeat.modules:
- module: system
  period: 5s
  metricsets:
    - process
  processes: ['.*']
  process.cgroups.enabled: true

output.console:
  pretty: true

3. Run Metricbeat

./metricbeat -e -c /tmp/metricbeat-psi-test.yml

4. Verify Memory Pressure Fields

Look for system.process.cgroup.memory.pressure in the output:

"memory": {
  "pressure": {
    "some": {
      "10": { "pct": 0 },
      "60": { "pct": 0 },
      "300": { "pct": 0 },
      "total": 0
    },
    "full": {
      "10": { "pct": 0 },
      "60": { "pct": 0 },
      "300": { "pct": 0 },
      "total": 0
    }
  }
}

5. Compare Before/After (Optional)

compare-psi-metrics.sh
Use the comparison script to compare output from main vs this PR:

compare-psi-metrics.sh
Usage: ./compare-psi-metrics.sh <main-output.ndjson> <pr-output.ndjson>

Related issues

Add memory pressure PSI metrics to the system.process.cgroup.memory
metricset, complementing the existing CPU and IO pressure metrics.

New fields added under system.process.cgroup.memory.pressure:
- pressure.some.{10,60,300}.pct - Share of time with some tasks stalled
- pressure.some.total - Total some pressure time
- pressure.full.{10,60,300}.pct - Share of time with all tasks stalled
- pressure.full.total - Total full pressure time

Closes elastic#47604
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Dec 11, 2025
@github-actions
Copy link
Contributor

🤖 GitHub comments

Just comment with:

  • run docs-build : Re-trigger the docs validation. (use unformatted text in the comment!)

@orestisfl orestisfl changed the title [Metricbeat][System module] Add memory pressure stall information (PS… [Metricbeat] Add PSI metrics for cgroupv2 Dec 11, 2025
@orestisfl orestisfl self-assigned this Dec 11, 2025
@mergify
Copy link
Contributor

mergify bot commented Dec 11, 2025

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @orestisfl? 🙏.
For such, you'll need to label your PR with:

  • The upcoming major version of the Elastic Stack
  • The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-8./d is the label to automatically backport to the 8./d branch. /d is the digit
  • backport-active-all is the label that automatically backports to all active branches.
  • backport-active-8 is the label that automatically backports to all active minor branches for the 8 major.
  • backport-active-9 is the label that automatically backports to all active minor branches for the 9 major.

@orestisfl orestisfl added enhancement Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team and removed needs_team Indicates that the issue/PR needs a Team:* label labels Dec 11, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Dec 11, 2025

@orestisfl orestisfl changed the title [Metricbeat] Add PSI metrics for cgroupv2 [Metricbeat] Add memory PSI metrics for cgroupv2 Dec 11, 2025
@orestisfl orestisfl added the backport-skip Skip notification from the automated backport with mergify label Dec 16, 2025
@orestisfl orestisfl marked this pull request as ready for review December 16, 2025 12:50
@orestisfl orestisfl requested review from a team as code owners December 16, 2025 12:50
@orestisfl orestisfl requested review from AndersonQ and faec December 16, 2025 12:50
@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

@pierrehilbert pierrehilbert removed the request for review from faec December 16, 2025 12:59
@pierrehilbert pierrehilbert added the Team:Docs Label for the Observability docs team label Dec 16, 2025
Copy link
Contributor

@colleenmcginnis colleenmcginnis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Starting with v9.0, there is no longer a new documentation set published with every minor release: the same page stays valid over time and shows version-related evolutions (ref). As a result, we add version information in the fields.yml and it adds version badges to the generated Markdown file. Read more in Contributing to the docs.

Based on the backport labels, I assumed these changes are targeting 9.3.0, but feel free to adjust as needed. After applying the suggestions you'll have to regenerate the docs.

Co-authored-by: Colleen McGinnis <colleen.j.mcginnis@gmail.com>
@colleenmcginnis
Copy link
Contributor

You'll have to regenerate the docs (make update) to get the check-docs check to pass.

Copy link
Contributor

@colleenmcginnis colleenmcginnis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@orestisfl orestisfl enabled auto-merge (squash) December 16, 2025 17:27
@orestisfl orestisfl merged commit c3f35a9 into elastic:main Dec 16, 2025
213 checks passed
@orestisfl orestisfl deleted the metricbeat-cgroup-memory-pressure branch December 16, 2025 19:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-skip Skip notification from the automated backport with mergify enhancement Team:Docs Label for the Observability docs team Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Metricbeat][System module] Report memory pressure stall information in metrics (Linux)

6 participants