🧠 llmisvc: set kv-cache metric to vllm:kv_cache_usage_perc by adam-d-young · Pull Request #1020 · opendatahub-io/kserve

adam-d-young · 2025-12-15T17:10:43Z

What this PR does / why we need it:
Explicitly sets --kv-cache-usage-percentage-metric to vllm:kv_cache_usage_perc in the default LLM scheduler config so EPP does not rely on the legacy default vllm:gpu_cache_usage_perc from released GIE versions.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
RHOAIENG-41868

Type of changes
Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)

Feature/Issue validation/testing:

Validated by running the scheduler with the flag override and confirming Flags processed shows kv-cache-usage-percentage-metric=vllm:kv_cache_usage_perc.

Logs attached to RHOAIENG-41868

Special notes for your reviewer:
This PR does not change any image versions. It only sets an explicit metric flag in the default config (manifests + Helm) as a workaround until the scheduler/GIE released default is updated.
The legacy default comes from the EPP/GIE scheduler flag --kv-cache-usage-percentage-metric (GIE v1.2.1 defaults to vllm:gpu_cache_usage_perc, while vLLM exposes vllm:kv_cache_usage_perc). Setting the flag explicitly in the shipped LLMInferenceServiceConfig is the safest short-term fix until RHOAI picks up a scheduler image/GIE release where the default is corrected.
Upstream reference: gateway-api-inference-extension/pkg/epp/server/runserver.go in tag v1.2.1 sets the legacy default; gateway-api-inference-extension/pkg/epp/server/options.go on main uses vllm:kv_cache_usage_perc.

Release note:
NONE

Summary by CodeRabbit

New Features
- Scheduler now emits KV cache usage percentage for improved resource monitoring.
- Added a configurable option to report that metric (labelled vllm:kv_cache_usage_perc) so operators can collect and visualize KV cache utilization alongside existing metrics.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Refs: RHOAIENG-41868 Explicitly set --kv-cache-usage-percentage-metric so EPP does not rely on the legacy default (vllm:gpu_cache_usage_perc) from released GIE versions. Signed-off-by: Adam Young <adam.young@redhat.com>

openshift-ci · 2025-12-15T17:10:51Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: adam-d-young
Once this PR has been reviewed and has the lgtm label, please assign spolti for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci · 2025-12-15T17:10:56Z

Hi @adam-d-young. Thanks for your PR.

I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

coderabbitai · 2025-12-15T17:11:13Z

Walkthrough

Two LLM scheduler configuration files were updated to add two new command-line arguments: --kvCacheUsagePercentageMetric and the metric label vllm:kv_cache_usage_perc. These arguments are appended to the scheduler container args in both the Helm template and the config file; no control flow or other settings were changed.

Changes

Cohort / File(s)	Summary
LLM Scheduler KV Cache Metrics `charts/llmisvc-resources/templates/config-llm-scheduler.yaml`, `config/llmisvcconfig/config-llm-scheduler.yaml`	Appended two command-line arguments to the scheduler container args: `--kvCacheUsagePercentageMetric` and the metric label `vllm:kv_cache_usage_perc`, enabling KV cache usage percentage metric exposure.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Small number of files but configuration affects runtime metrics surface; verify arg naming/casing and consistency between Helm template and config file.
Confirm metric label format aligns with metrics consumer and no duplication with existing metrics.
Ensure YAML quoting/escaping and line ordering preserve container args ordering in templating context.

Poem

🐰
A tiny hop, a metric made bright,
KV usage beams into the night,
vllm hums a soft data song,
Counters dancing all day long,
Hooray — the scheduler's metrics take flight!

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and specifically describes the main change: setting the kv-cache metric to a specific vLLM metric name in the LLM scheduler configuration.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

📜 Recent review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ae49c3c and 19fdd93.

📒 Files selected for processing (2)

charts/llmisvc-resources/templates/config-llm-scheduler.yaml (1 hunks)
config/llmisvcconfig/config-llm-scheduler.yaml (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (2)

charts/llmisvc-resources/templates/config-llm-scheduler.yaml
config/llmisvcconfig/config-llm-scheduler.yaml

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

adam-d-young · 2025-12-15T17:20:44Z

Please consider backporting to the ODH/RHOAI 3.0 and 3.2 streams, since those ship the EPP scheduler defaults that still reference vllm:gpu_cache_usage_perc. This does not apply to RHOAI 2.25 because the llmisvc/EPP scheduler path is not shipped/used there.

bartoszmajsak · 2025-12-16T16:43:28Z

/ok-to-test

spolti · 2025-12-17T14:29:38Z

charts/llmisvc-resources/templates/config-llm-scheduler.yaml

              - --modelServerMetricsHttpsInsecureSkipVerify
              - --certPath
              - "/etc/ssl/certs"
+              - --kv-cache-usage-percentage-metric


Hi, can this change break backwards compatibility? If not, I think we can go ahead with it.

A good question. I've found two potential issues with backward compatibility:

Flag style: kabob-case flags look like they were introduced in GIE v1.0.0, but this repo appears to vendor GIE v0.5.0, which uses camelCase. I've pushed a new commit to switch to camelCase.

vLLM Compatibility: From what I can tell, vllm:kv_cache_usage_perc was introduced in vLLM 0.9.x (vLLM PR #18354) which predates LLMInferenceService. If that's correct, there shouldn't be any break to backward compatibility.

I'd appreciate confirmation on both points.

Note: I'll be opening a corresponding PR against red-hat-data-services/kserve with kebab-case (--kv-cache-usage-percentage-metric) for RHOAI 3.x, which uses GIE v1.0.0. It doesn't look like RHOAI, which is what I was intending to fix, is actually build from this repo, but please correct me if I'm wrong.

To fix RHOAI we need to cherry-pick it to the release-v0.15 branch, which will land in the next RHOAI version, 3.3 at this point.

I am not sure about it, @bartoszmajsak can you please confirm it? I think that KServe upstream is updating the GIE to 1.0, but not sure how we will do it for ODH and RHOAI.

ok, it should be fine, the vllm version we are using is v0.11.x.

@spolti Ad 1. It's coming with kserve#4886, it's similar to what @KillianGolds did with #996 (some of the feedback in upstream we shared is based on this amazing work). I think we will need to somehow consolidate them and push some further improvements upstream if we find gaps (one I can think of is zero downtime update)

The upstream scheduler (llm-d-inference-scheduler) vendors GIE v0.5.0, which uses camelCase flags (--kvCacheUsagePercentageMetric), not kebab-case (--kv-cache-usage-percentage-metric). GIE changed to kebab-case in v1.0.0, but opendatahub-io/kserve targets the upstream scheduler which is still on v0.5.0.

openshift-merge-robot · 2026-03-17T19:49:50Z

PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

openshift-ci · 2026-03-23T19:14:57Z

@adam-d-young: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/e2e-llm-inference-service	`98624a7`	link	true	`/test e2e-llm-inference-service`
ci/prow/e2e-raw	`98624a7`	link	true	`/test e2e-raw`
ci/prow/e2e-predictor	`98624a7`	link	true	`/test e2e-predictor`
ci/prow/e2e-graph	`98624a7`	link	true	`/test e2e-graph`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

🧠 llmisvc: set kv-cache metric to vllm:kv_cache_usage_perc

ae49c3c

Refs: RHOAIENG-41868 Explicitly set --kv-cache-usage-percentage-metric so EPP does not rely on the legacy default (vllm:gpu_cache_usage_perc) from released GIE versions. Signed-off-by: Adam Young <adam.young@redhat.com>

github-project-automation bot added this to ODH Model Serving Planning Dec 15, 2025

github-project-automation bot moved this to New/Backlog in ODH Model Serving Planning Dec 15, 2025

openshift-ci bot added the needs-ok-to-test label Dec 15, 2025

openshift-ci bot added ok-to-test and removed needs-ok-to-test labels Dec 16, 2025

spolti reviewed Dec 17, 2025

View reviewed changes

Adam Young and others added 2 commits December 19, 2025 16:11

Merge branch 'master' into fix/kv-cache-metric-flag

98624a7

openshift-merge-robot added the needs-rebase label Mar 17, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🧠 llmisvc: set kv-cache metric to vllm:kv_cache_usage_perc#1020

🧠 llmisvc: set kv-cache metric to vllm:kv_cache_usage_perc#1020
adam-d-young wants to merge 3 commits intoopendatahub-io:masterfrom
adam-d-young:fix/kv-cache-metric-flag

adam-d-young commented Dec 15, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

openshift-ci bot commented Dec 15, 2025

Uh oh!

openshift-ci bot commented Dec 15, 2025

Uh oh!

coderabbitai bot commented Dec 15, 2025 •

edited

Loading

Uh oh!

adam-d-young commented Dec 15, 2025

Uh oh!

bartoszmajsak commented Dec 16, 2025

Uh oh!

spolti Dec 17, 2025

Uh oh!

adam-d-young Dec 19, 2025

Uh oh!

spolti Dec 30, 2025

Uh oh!

bartoszmajsak Dec 30, 2025

Uh oh!

openshift-merge-robot commented Mar 17, 2026

Uh oh!

openshift-ci bot commented Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

adam-d-young commented Dec 15, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

openshift-ci bot commented Dec 15, 2025

Uh oh!

openshift-ci bot commented Dec 15, 2025

Uh oh!

coderabbitai bot commented Dec 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

adam-d-young commented Dec 15, 2025

Uh oh!

bartoszmajsak commented Dec 16, 2025

Uh oh!

spolti Dec 17, 2025

Choose a reason for hiding this comment

Uh oh!

adam-d-young Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

spolti Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

bartoszmajsak Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

openshift-merge-robot commented Mar 17, 2026

Uh oh!

openshift-ci bot commented Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

adam-d-young commented Dec 15, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 15, 2025 •

edited

Loading