Skip to content

refactor: kv cache manager repo#570

Merged
vMaroon merged 27 commits into
llm-d:mainfrom
sagearc:rename-llmd-kv-cache
Jan 21, 2026
Merged

refactor: kv cache manager repo#570
vMaroon merged 27 commits into
llm-d:mainfrom
sagearc:rename-llmd-kv-cache

Conversation

@sagearc
Copy link
Copy Markdown
Collaborator

@sagearc sagearc commented Jan 19, 2026

@github-actions
Copy link
Copy Markdown
Contributor

🚨 Unsigned commits detected! Please sign your commits.

For instructions on how to set up GPG/SSH signing and verify your commits, please see GitHub Documentation.

@sagearc sagearc marked this pull request as draft January 19, 2026 11:58
@github-actions github-actions Bot requested review from elevran and nilig January 19, 2026 11:58
@sagearc
Copy link
Copy Markdown
Collaborator Author

sagearc commented Jan 19, 2026

@vMaroon

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
@sagearc sagearc force-pushed the rename-llmd-kv-cache branch from 44e6130 to 1c15fd2 Compare January 19, 2026 11:59
@github-actions github-actions Bot requested a review from kfswain January 19, 2026 11:59
@vMaroon vMaroon marked this pull request as ready for review January 19, 2026 13:22
@vMaroon
Copy link
Copy Markdown
Member

vMaroon commented Jan 19, 2026

Can you rebase and confirm that makefile operations work as expected?

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
@github-actions github-actions Bot requested a review from shmuelk January 19, 2026 16:33
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
@github-actions github-actions Bot requested a review from nirrozenbaum January 19, 2026 16:34
@sagearc
Copy link
Copy Markdown
Collaborator Author

sagearc commented Jan 19, 2026

closes #523
ref: #552

Comment thread go.mod Outdated
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
…go mod replace

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
@github-actions github-actions Bot requested a review from nirrozenbaum January 19, 2026 17:17
@sagearc sagearc marked this pull request as draft January 19, 2026 17:22
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
@vMaroon vMaroon marked this pull request as ready for review January 20, 2026 10:12
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
@sagearc sagearc force-pushed the rename-llmd-kv-cache branch from b0d37fb to a26d4b1 Compare January 20, 2026 10:19
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
…lmd-kv-cache

Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>
Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>
@hyeongyun0916
Copy link
Copy Markdown
Contributor

I think it need test. but skipped because this pr also changed docs
https://github.com/llm-d/llm-d-inference-scheduler/actions/runs/21184070368/workflow?pr=570#L28

@vMaroon
Copy link
Copy Markdown
Member

vMaroon commented Jan 20, 2026

I think it need test. but skipped because this pr also changed docs https://github.com/llm-d/llm-d-inference-scheduler/actions/runs/21184070368/workflow?pr=570#L28

Was just looking at that, I think it's a serious CI bug. To unblock this PR immediately, can you disable the line that depends on check-changes?

Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>
@hyeongyun0916 hyeongyun0916 force-pushed the rename-llmd-kv-cache branch 2 times, most recently from 7bbbbce to 66e1907 Compare January 21, 2026 05:16
Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>
@sagearc sagearc force-pushed the rename-llmd-kv-cache branch from 8437a4e to 2e59a41 Compare January 21, 2026 10:31
@elevran
Copy link
Copy Markdown
Collaborator

elevran commented Jan 21, 2026

Please split this into diffrerent PRs to allow early merge (e.g., LoRA and vLLM are independent)

sagearc and others added 2 commits January 21, 2026 13:32
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>
@vMaroon vMaroon force-pushed the rename-llmd-kv-cache branch from e3f799d to 7861c04 Compare January 21, 2026 12:20
…lmd-kv-cache

Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>
Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>
Comment thread .dockerignore
__pycache__

# Docker files
Dockerfile No newline at end of file
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: newline missing

if: ${{ needs.check-changes.outputs.src == 'true' }}
runs-on: ubuntu-latest
steps:
- name: Free Disk Space (Ubuntu)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: can you briefly explain why this is needed and was not needed before?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because now the disk use for dependencies is larger - and it now runs out of space. This was discussed with Greg at some point, we can increase capacity by upgrading tier, but this approach was preferred.

It is temporary in all cases.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this is the first step, this clears up disk space of packages that are installed (and hopefully not used) before anything from inference-scheduler is actually run?
It is run in the hope that afterwards there will be sufficient disk space on the worker to run the build.

- name: Install dependencies
run: |
go mod tidy
sudo -E env "PATH=$PATH" make install-dependencies
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: any limitation in running sudo -E env "PATH=$PATH" make install-dependencies install-python-deps in one statement?

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
@elevran
Copy link
Copy Markdown
Collaborator

elevran commented Jan 21, 2026

/lgtm
/approve

under the assumption that #552 is top priority and shall be completed ASAP after v0.5 is cut.
The two remaining items in this PR can be done in a quick follow up to allow cutting 0.5 RC now

@github-actions github-actions Bot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 21, 2026
@vMaroon vMaroon merged commit 981f17a into llm-d:main Jan 21, 2026
9 checks passed
@github-project-automation github-project-automation Bot moved this from In review to Done in llm-d-inference-scheduler Jan 21, 2026
github-actions Bot pushed a commit to revit13/llm-d-inference-scheduler that referenced this pull request Feb 16, 2026
* chore: bump gie to v1.2.1 (llm-d#504)

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

* deps(go): bump sigs.k8s.io/gateway-api in the kubernetes group (llm-d#508)

Bumps the kubernetes group with 1 update: [sigs.k8s.io/gateway-api](https://github.com/kubernetes-sigs/gateway-api).


Updates `sigs.k8s.io/gateway-api` from 1.4.0 to 1.4.1
- [Release notes](https://github.com/kubernetes-sigs/gateway-api/releases)
- [Changelog](https://github.com/kubernetes-sigs/gateway-api/blob/main/RELEASE.md)
- [Commits](kubernetes-sigs/gateway-api@v1.4.0...v1.4.1)

---
updated-dependencies:
- dependency-name: sigs.k8s.io/gateway-api
  dependency-version: 1.4.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: kubernetes
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* deps(go): bump the go-dependencies group with 3 updates (llm-d#507)

Bumps the go-dependencies group with 3 updates: [github.com/onsi/ginkgo/v2](https://github.com/onsi/ginkgo), [github.com/onsi/gomega](https://github.com/onsi/gomega) and [golang.org/x/sync](https://github.com/golang/sync).


Updates `github.com/onsi/ginkgo/v2` from 2.27.2 to 2.27.3
- [Release notes](https://github.com/onsi/ginkgo/releases)
- [Changelog](https://github.com/onsi/ginkgo/blob/master/CHANGELOG.md)
- [Commits](onsi/ginkgo@v2.27.2...v2.27.3)

Updates `github.com/onsi/gomega` from 1.38.2 to 1.38.3
- [Release notes](https://github.com/onsi/gomega/releases)
- [Changelog](https://github.com/onsi/gomega/blob/master/CHANGELOG.md)
- [Commits](onsi/gomega@v1.38.2...v1.38.3)

Updates `golang.org/x/sync` from 0.18.0 to 0.19.0
- [Commits](golang/sync@v0.18.0...v0.19.0)

---
updated-dependencies:
- dependency-name: github.com/onsi/ginkgo/v2
  dependency-version: 2.27.3
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: go-dependencies
- dependency-name: github.com/onsi/gomega
  dependency-version: 1.38.3
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: go-dependencies
- dependency-name: golang.org/x/sync
  dependency-version: 0.19.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: go-dependencies
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Miscellaneous dependency updates (llm-d#510)

* Miscelaneous dependency updates

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Use latest GIE CRDs

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Fixed references to kv-cache-manager

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

---------

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* deps(go): bump the kubernetes group with 5 updates (llm-d#513)

Bumps the kubernetes group with 5 updates:

| Package | From | To |
| --- | --- | --- |
| [k8s.io/api](https://github.com/kubernetes/api) | `0.34.2` | `0.34.3` |
| [k8s.io/apiextensions-apiserver](https://github.com/kubernetes/apiextensions-apiserver) | `0.34.2` | `0.34.3` |
| [k8s.io/apimachinery](https://github.com/kubernetes/apimachinery) | `0.34.2` | `0.34.3` |
| [k8s.io/client-go](https://github.com/kubernetes/client-go) | `0.34.2` | `0.34.3` |
| [k8s.io/component-base](https://github.com/kubernetes/component-base) | `0.34.2` | `0.34.3` |


Updates `k8s.io/api` from 0.34.2 to 0.34.3
- [Commits](kubernetes/api@v0.34.2...v0.34.3)

Updates `k8s.io/apiextensions-apiserver` from 0.34.2 to 0.34.3
- [Release notes](https://github.com/kubernetes/apiextensions-apiserver/releases)
- [Commits](kubernetes/apiextensions-apiserver@v0.34.2...v0.34.3)

Updates `k8s.io/apimachinery` from 0.34.2 to 0.34.3
- [Commits](kubernetes/apimachinery@v0.34.2...v0.34.3)

Updates `k8s.io/client-go` from 0.34.2 to 0.34.3
- [Changelog](https://github.com/kubernetes/client-go/blob/master/CHANGELOG.md)
- [Commits](kubernetes/client-go@v0.34.2...v0.34.3)

Updates `k8s.io/component-base` from 0.34.2 to 0.34.3
- [Commits](kubernetes/component-base@v0.34.2...v0.34.3)

---
updated-dependencies:
- dependency-name: k8s.io/api
  dependency-version: 0.34.3
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: kubernetes
- dependency-name: k8s.io/apiextensions-apiserver
  dependency-version: 0.34.3
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: kubernetes
- dependency-name: k8s.io/apimachinery
  dependency-version: 0.34.3
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: kubernetes
- dependency-name: k8s.io/client-go
  dependency-version: 0.34.3
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: kubernetes
- dependency-name: k8s.io/component-base
  dependency-version: 0.34.3
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: kubernetes
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Fix kind-dev-env.sh (llm-d#512)

Running `make env-dev-kind` will fail if the vllm simulator image hasn't
been already pulled.

This fixes it by skipping the manual load & save of the image unless we're
dealing with a custom locally built image (using the dev tag).

The kubelet will anyway pull the right image when deploying the pod.

Signed-off-by: Antonio Cardace <acardace@redhat.com>

* test: add precise_prefix_cache_test (llm-d#505)

* test: add precise_prefix_cache_test

Signed-off-by: Edoardo Vacchi <evacchi@users.noreply.github.com>

* test: add precise_prefix_cache_test

Signed-off-by: Edoardo Vacchi <evacchi@users.noreply.github.com>

---------

Signed-off-by: Edoardo Vacchi <evacchi@users.noreply.github.com>

* test: reuse upstream data store and enable logr in unit tests (llm-d#518)

* enable logr in ut

Signed-off-by: MregXN <mregxn@gmail.com>

* fix package impoert order

Signed-off-by: MregXN <mregxn@gmail.com>

* apply comments

Signed-off-by: MregXN <mregxn@gmail.com>

---------

Signed-off-by: MregXN <mregxn@gmail.com>

* feat: allow pd_profile_handler to handle diverse plugin types (llm-d#516)

* Store the precise prefix cache score in cycleState.

Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>

* edit test code

Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>

---------

Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>

* deps(actions): bump crate-ci/typos from 1.40.0 to 1.40.1 (llm-d#526)

Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.40.0 to 1.40.1.
- [Release notes](https://github.com/crate-ci/typos/releases)
- [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md)
- [Commits](crate-ci/typos@v1.40.0...v1.40.1)

---
updated-dependencies:
- dependency-name: crate-ci/typos
  dependency-version: 1.40.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* deps(go): bump google.golang.org/grpc in the go-dependencies group (llm-d#527)

Bumps the go-dependencies group with 1 update: [google.golang.org/grpc](https://github.com/grpc/grpc-go).


Updates `google.golang.org/grpc` from 1.77.0 to 1.78.0
- [Release notes](https://github.com/grpc/grpc-go/releases)
- [Commits](grpc/grpc-go@v1.77.0...v1.78.0)

---
updated-dependencies:
- dependency-name: google.golang.org/grpc
  dependency-version: 1.78.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: go-dependencies
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* feat(metrics): add model_name label to PD decision metric (llm-d#528)

Signed-off-by: CYJiang <googs1025@gmail.com>

* deps(actions): bump crate-ci/typos from 1.40.1 to 1.41.0 (llm-d#532)

Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.40.1 to 1.41.0.
- [Release notes](https://github.com/crate-ci/typos/releases)
- [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md)
- [Commits](crate-ci/typos@v1.40.1...v1.41.0)

---
updated-dependencies:
- dependency-name: crate-ci/typos
  dependency-version: 1.41.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Configure dependabot ignores Go version updates (llm-d#533)

* dependabot ignores Go version updates

Signed-off-by: Etai Lev Ran <elevran@gmail.com>

* allow semver patch level updates to Go

Signed-off-by: Etai Lev Ran <elevran@gmail.com>

---------

Signed-off-by: Etai Lev Ran <elevran@gmail.com>

* Updates the architecture description with reference to BBR and support for multiple GenAI models and LoRAs to remove confusion about llm-d only supporing one model per cluster (llm-d#525)

* finer control over package updates (llm-d#542)

Signed-off-by: Etai Lev Ran <elevran@gmail.com>

* port auto-assign action from llm-d-kv-cache (llm-d#551)

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>

* refactor: set python version and pin docker image with tag (llm-d#543)

- default set to 3.12 for python
- set 9.7(the current latest) for ubi image

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* chore(test): update API version for nixl test (llm-d#555)

- extentionRef was in old v1alpha2, in v1 it should be updated to
  endpointPickerRef
- remove InferenceModel
- update docs for test/sidecar

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* deps(go): bump the go-dependencies group with 2 updates (llm-d#558)

Bumps the go-dependencies group with 2 updates: [github.com/onsi/ginkgo/v2](https://github.com/onsi/ginkgo) and [github.com/onsi/gomega](https://github.com/onsi/gomega).


Updates `github.com/onsi/ginkgo/v2` from 2.27.3 to 2.27.4
- [Release notes](https://github.com/onsi/ginkgo/releases)
- [Changelog](https://github.com/onsi/ginkgo/blob/master/CHANGELOG.md)
- [Commits](onsi/ginkgo@v2.27.3...v2.27.4)

Updates `github.com/onsi/gomega` from 1.38.3 to 1.39.0
- [Release notes](https://github.com/onsi/gomega/releases)
- [Changelog](https://github.com/onsi/gomega/blob/master/CHANGELOG.md)
- [Commits](onsi/gomega@v1.38.3...v1.39.0)

---
updated-dependencies:
- dependency-name: github.com/onsi/ginkgo/v2
  dependency-version: 2.27.4
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: go-dependencies
- dependency-name: github.com/onsi/gomega
  dependency-version: 1.39.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: go-dependencies
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* deps(actions): bump crate-ci/typos from 1.41.0 to 1.42.0 (llm-d#557)

Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.41.0 to 1.42.0.
- [Release notes](https://github.com/crate-ci/typos/releases)
- [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md)
- [Commits](crate-ci/typos@v1.41.0...v1.42.0)

---
updated-dependencies:
- dependency-name: crate-ci/typos
  dependency-version: 1.42.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* deps(actions): bump actions/checkout from 4 to 6 (llm-d#556)

Bumps [actions/checkout](https://github.com/actions/checkout) from 4 to 6.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](actions/checkout@v4...v6)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-version: '6'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* update auto-assign logic (llm-d#560)

Signed-off-by: Etai Lev Ran <elevran@gmail.com>

* remove newline in unsigned commit message (llm-d#561)

Signed-off-by: Etai Lev Ran <elevran@gmail.com>

* bump gie to v1.3.0 rc2 (llm-d#562)

* update OWNERS (llm-d#559)

Signed-off-by: Etai Lev Ran <elevran@gmail.com>

* refactor: Makefile, update docs (llm-d#463)

* refactor: Makefile, update docs

- split Makefile
  1. tools: include install tools, check tools, download dependency(gcc
     etc) and tokenizer. these will be download into "bin" folder than
     global path
  2. cluster: include k8s and ocp
  3. kind
- rename "openshift-base" to "kubernetes-base" to be clear for purpose
- uplift Go lint version to 2.1.6 to align with the same one set in
  Github Action
- rename make targets for better visibility, deprcating old ones
- add more print in "make env"

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* update: code review

- move image tags from Makefile.tools.mk back to Makefile
- update docuement to reflact how image and tag are created
- do not export image tag env variables IMG_TAG
- fix patch-deployments.yaml after EPP_TAG is not used but should only
  use EPP_IMAGE
- fix kubernetes-dev-env.sh for EPP_IMAGE
- remove flag on golangci_lint fmt

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* code review:

- revert back to 1.3.0
- remove comments
- set default as default namespace

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* Update Makefile

Co-authored-by: Shmuel Kallner <kallner@il.ibm.com>
Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* docs: fix broken link in the docs

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

---------

Signed-off-by: Wen Zhou <wenzhou@redhat.com>
Co-authored-by: Shmuel Kallner <kallner@il.ibm.com>

* feat: add metrics validation in e2e test (llm-d#529)

Signed-off-by: CYJiang <googs1025@gmail.com>

* feat: make no-hit-lru P/D-aware (llm-d#522)

* feat: make no-hit-lru P/D-aware

Signed-off-by: Edoardo Vacchi <evacchi@users.noreply.github.com>

* hardcode prefill profile

Signed-off-by: Edoardo Vacchi <evacchi@users.noreply.github.com>

* remove spammy log

Signed-off-by: Edoardo Vacchi <evacchi@users.noreply.github.com>

* apply suggestions

Signed-off-by: Edoardo Vacchi <evacchi@users.noreply.github.com>

---------

Signed-off-by: Edoardo Vacchi <evacchi@users.noreply.github.com>

* Update disaggregated Prefill/Decode inference serving documentation (llm-d#571)

* update pd docs

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* typos

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* typo

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

---------

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* deps(actions): bump crate-ci/typos from 1.42.0 to 1.42.1 (llm-d#572)

Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.42.0 to 1.42.1.
- [Release notes](https://github.com/crate-ci/typos/releases)
- [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md)
- [Commits](crate-ci/typos@v1.42.0...v1.42.1)

---
updated-dependencies:
- dependency-name: crate-ci/typos
  dependency-version: 1.42.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* deps(go): bump github.com/onsi/ginkgo/v2 in the go-dependencies group (llm-d#573)

Bumps the go-dependencies group with 1 update: [github.com/onsi/ginkgo/v2](https://github.com/onsi/ginkgo).


Updates `github.com/onsi/ginkgo/v2` from 2.27.4 to 2.27.5
- [Release notes](https://github.com/onsi/ginkgo/releases)
- [Changelog](https://github.com/onsi/ginkgo/blob/master/CHANGELOG.md)
- [Commits](onsi/ginkgo@v2.27.4...v2.27.5)

---
updated-dependencies:
- dependency-name: github.com/onsi/ginkgo/v2
  dependency-version: 2.27.5
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: go-dependencies
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* fix reviewers auto assign minor bug (llm-d#575)

* fix(scorer): make active request pd aware (llm-d#569)

* fix: decrement all pods on request complete instead of only final pod

Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

* fix: append all pod endpoints from profile results

Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

---------

Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

* test(e2e): cleanup kind cluster (llm-d#563)

- if e2e-tests cluster exist, it fails to run "make test-e2e"
- main cleanup should be done in AfterSuite() call
- in certain case(kill/terminate) cluster might remain locally
  this PR is to add trap to preperly clean i up

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* refactor: add early validation in DP profile handler (llm-d#554)

- validate number of schedulingProfiles in EPP to be 1 otherwise return
  empty map to reduce computation on filter and scores.
- add unit test

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* deps(go): bump the kubernetes group with 2 updates (llm-d#574)

Bumps the kubernetes group with 2 updates: [sigs.k8s.io/controller-runtime](https://github.com/kubernetes-sigs/controller-runtime) and [sigs.k8s.io/gateway-api-inference-extension](https://github.com/kubernetes-sigs/gateway-api-inference-extension).


Updates `sigs.k8s.io/controller-runtime` from 0.22.4 to 0.22.5
- [Release notes](https://github.com/kubernetes-sigs/controller-runtime/releases)
- [Changelog](https://github.com/kubernetes-sigs/controller-runtime/blob/main/RELEASE.md)
- [Commits](kubernetes-sigs/controller-runtime@v0.22.4...v0.22.5)

Updates `sigs.k8s.io/gateway-api-inference-extension` from 1.3.0-rc.2 to 1.3.0-rc.3
- [Release notes](https://github.com/kubernetes-sigs/gateway-api-inference-extension/releases)
- [Changelog](https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/RELEASE.md)
- [Commits](kubernetes-sigs/gateway-api-inference-extension@v1.3.0-rc.2...v1.3.0-rc.3)

---
updated-dependencies:
- dependency-name: sigs.k8s.io/controller-runtime
  dependency-version: 0.22.5
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: kubernetes
- dependency-name: sigs.k8s.io/gateway-api-inference-extension
  dependency-version: 1.3.0-rc.3
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: kubernetes
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* refactor: kv cache manager repo (llm-d#570)

* refactor: kv cache manager repo name

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* go mod tidy

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* fetch kv cache upstream instead of my fork

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* revert dockerfile to fetch kv cache manager from upstream instead of go mod replace

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* update chat preprocessing structs

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* update kv cache manager version

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* refactor kvblock.Key to kvblock.BlockHash

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* add context

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* add parent block key

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* refactor encode

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* validate model name

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* run setup.sh

Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>

* clone vllm into build

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* edit

Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>

* edit lint

Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>

* delete fetch-python-wrapper.sh

Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>

* edit git workflow

Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>

* edit

Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>

* refactor TokenProcessorConfig in config

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* fix kv cache repo name in docker file

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* fix e2e tests

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>

* add ignore

Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>

* update architecture docs

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

---------

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>
Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>
Co-authored-by: HyunKyun Moon <mhg5303@gmail.com>
Co-authored-by: Maroon Ayoub <maroon.ayoub@ibm.com>

* bumping IGW version to the full released version (llm-d#583)

Signed-off-by: Kellen Swain <kfswain@google.com>

* Enable prefix-cache awareness in active-active multi-replica scheduler deployments (llm-d#578)

* - active-active-ha support

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>

* Update docs/architecture.md

Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Maroon Ayoub <Maroonay@gmail.com>

* lint

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>

---------

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>
Signed-off-by: Maroon Ayoub <Maroonay@gmail.com>
Co-authored-by: Etai Lev Ran <elevran@gmail.com>

* Switch to pre-built vLLM wheels for CPU builds (llm-d#582)

* try use official vllm wheels in dockerfile.epp

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* wip

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* use wheels in makefile

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* wip

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* write permissions to setup.sh

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* update kv cache manager commit

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* try instal py deps wo sudo

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* CR changes

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

---------

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* update llm-d-kv-cache import to v0.5.0-RC1 (llm-d#584)

* update kvc version import

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>

* add go.mod to testable changes

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>

---------

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>

* Use 1.3.0 CRDs (llm-d#586)

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* free disk space on ci-release (llm-d#587)

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>

* feat: use Tinyllama as the "model" for kind test and switch to use precise-prefix-cache-score in config (llm-d#581)

* feat: use Tinyllama as the "model" for kind test

- in order to test precies-prefix-cache-score we cannot use
  fool-reviewer since it need call kv-cache-manager to get tokenizer by
  getting a real model from HF
- the change is to switch the "default model" to TinyLlama
- also to make tokenizer folder writable need change permission to the
  USER in Dockerfile
- rename dp-epp-config.yaml sim-dp-epp-config.yaml as it is used for
  local test

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* update: revert back some config to keep using prefix-cache-scorer

- revert file renaming

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

---------

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* Update linter configuration (llm-d#588)

Signed-off-by: Etai Lev Ran <elevran@gmail.com>

* fix: config should use new precise-prefix-cache-scorer (llm-d#576)

- we have rename prefix-cache-scorer to precise-prefix-cache-scorer in 0.3.0, configs
  need migrate from the old one to the new one with spec.
  - rename plugin name
  - remove parameters.autoTune and parameters.mode: cache_tracking and
    lruCapacityPerServer
  - move hashBlockSize, maxPrefixBlocksToMatch under indexrConfig
- for config using food-review keep old prefix-cache-scorer
- keep pd-epp-config and sim-pd-epp-config with prefix-cache-scorer as
  KV and PD need both be enabled which is not done yet

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* deps(actions): bump crate-ci/typos from 1.42.1 to 1.42.2 (llm-d#589)

Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.42.1 to 1.42.2.
- [Release notes](https://github.com/crate-ci/typos/releases)
- [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md)
- [Commits](crate-ci/typos@v1.42.1...v1.42.2)

---
updated-dependencies:
- dependency-name: crate-ci/typos
  dependency-version: 1.42.2
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Updated to more recent GIE (llm-d#592)

* Updated to more recent GIE

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Updated to latest GIE and chnages due to review comments

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Added a true mock SchedulerProfile

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Exploited mock SchedulerProfile

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

---------

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* pull kvc v0.5.0 libs (llm-d#595)

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>

* deps(actions): bump crate-ci/typos from 1.42.2 to 1.43.0 (llm-d#596)

Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.42.2 to 1.43.0.
- [Release notes](https://github.com/crate-ci/typos/releases)
- [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md)
- [Commits](crate-ci/typos@v1.42.2...v1.43.0)

---
updated-dependencies:
- dependency-name: crate-ci/typos
  dependency-version: 1.43.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* address nil,nil return linter error in test mock (llm-d#598)

Signed-off-by: Etai Lev Ran <elevran@gmail.com>

* deps(go): bump the go-dependencies group with 2 updates (llm-d#597)

Bumps the go-dependencies group with 2 updates: [github.com/onsi/ginkgo/v2](https://github.com/onsi/ginkgo) and [github.com/onsi/gomega](https://github.com/onsi/gomega).


Updates `github.com/onsi/ginkgo/v2` from 2.27.5 to 2.28.1
- [Release notes](https://github.com/onsi/ginkgo/releases)
- [Changelog](https://github.com/onsi/ginkgo/blob/master/CHANGELOG.md)
- [Commits](onsi/ginkgo@v2.27.5...v2.28.1)

Updates `github.com/onsi/gomega` from 1.39.0 to 1.39.1
- [Release notes](https://github.com/onsi/gomega/releases)
- [Changelog](https://github.com/onsi/gomega/blob/master/CHANGELOG.md)
- [Commits](onsi/gomega@v1.39.0...v1.39.1)

---
updated-dependencies:
- dependency-name: github.com/onsi/ginkgo/v2
  dependency-version: 2.28.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: go-dependencies
- dependency-name: github.com/onsi/gomega
  dependency-version: 1.39.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: go-dependencies
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Models extractor (llm-d#553)

* Models extractor

Signed-off-by: irar2 <irar@il.ibm.com>

* Update register.go

Signed-off-by: Ira Rosen <irar@il.ibm.com>

* Updated for the newer GIE

Signed-off-by: irar2 <irar@il.ibm.com>

* Review comments

Signed-off-by: irar2 <irar@il.ibm.com>

* Check the scheme

Signed-off-by: irar2 <irar@il.ibm.com>

---------

Signed-off-by: irar2 <irar@il.ibm.com>
Signed-off-by: Ira Rosen <irar@il.ibm.com>

* feat(lmcache): implement decode first flow on lmcache connector when cache_hit_threshold field is present (llm-d#509)

* feat: implement decode first flow on lmcache connector

- if cache_hit_threshold field is present in completion request, then we perform a decode first flow

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: error handling

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: add back todo comment

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: reduce code complexity and duplication

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: improve header copying

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: add comment explaning the cache_hit_threshold field and the new decode first flow

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: enhance logging for cache hit threshold in decode flow

- decrease verbosity for common log
- add cache_hit_threshold attribute

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: improve error handling and observability when failing to unmarshal decode response

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: add deleted informational comments

Signed-off-by: kyano <kyanokashi2@gmail.com>

* typo

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: make error logs more descriptive of the failure reason

Signed-off-by: kyano <kyanokashi2@gmail.com>

* feat: add cache hit threshold to prefill request so prefill executes regardless of cache condition

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: typo

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: assign 0 cache_hit_threshold before final decode attempt

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: update comment according to feedback

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: remove istio workaround

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: set cache hit threshold to 0 in prefill request for consistent execution

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: update the log

Signed-off-by: kyano <kyanokashi2@gmail.com>

* feat: support online decoding

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: preserve request body in lmcache connector

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: support sse format for streamed decode

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: add and improve log  descriptions

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: typo

Signed-off-by: kyano <kyanokashi2@gmail.com>

* nit: undo capitalization

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: typos

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: improve error log observability

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: encapsulate http error checking in function and reuse

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: encapsulate and reuse code better

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: lint error

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: improve code encapsulation and reduce duplication

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: rename and simplify SSE event signaling logic

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: rename lmcache to shared storage protocol

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: remove unused function

Signed-off-by: kyano <kyanokashi2@gmail.com>

* test: e2e tests

Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

* chore: claude gitignore

Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

* fix: sim deployment

Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

* feat: make linter running on new code configurable

Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

* fix: lint errors

Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

---------

Signed-off-by: kyano <kyanokashi2@gmail.com>
Signed-off-by: kyanokashi <71283892+kyanokashi@users.noreply.github.com>
Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

* Extend support for different ways to decide if disaggregated PD is required (llm-d#531)

* Initial step of a configurable pd decider which is responsible for decision whether disaggregation is required, use data added in prefix scorer plugin in PrepareRequestData

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* update version of GIE + fix lint

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* update yaml and the test according prefix plugin configuration change (blockSize replaced by blockSizeTokens)

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* Update docs/architecture.md

Co-authored-by: Shmuel Kallner <kallner@il.ibm.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* code review

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* code review

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* update version of GIE, update prefix_disagr_decider accordingly

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* fix typo

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* fix PD for short inputs

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* Update docs/architecture.md

Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* Update pkg/plugins/profile/always_disaggr_decider.go

Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* Update pkg/plugins/profile/always_disaggr_decider.go

Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* Update pkg/plugins/profile/prefix_disagg_decider.go

Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* updates according the PR comments

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* fix test

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* create pd decider plugin type with 2 implementations (for prefix based and test always), update deploy configuration according the new structure

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* fix e2e tests

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* changes according the pr comments

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* fix e2e test

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* add explanation about pd deciders to disagg_pd doc

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* rename always_disaggr_decider to always_disagg_decider

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

---------

Signed-off-by: Maya Barnea <mayab@il.ibm.com>
Co-authored-by: Shmuel Kallner <kallner@il.ibm.com>
Co-authored-by: Etai Lev Ran <elevran@gmail.com>

* chore: fix wrong port for NIXL (llm-d#593)

- start with vLLM 0.11.1, default port for NIXL has been updated to 5600
- leave ZMQ to use 5557

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* fix: resolve JSON serialization error in active-request-scorer debug logs (llm-d#602)

* fix: resolve JSON serialization error in active-request-scorer debug logs

Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>

* feat: Add raw scores to debug

Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>

---------

Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>

* Implement "LGTM" ChatOps Workflow.

Signed-off-by: Revital Sur <eres@il.ibm.com>

* test

Signed-off-by: Revital Sur <eres@il.ibm.com>

* Lgtm2 (#17)

* Implement "LGTM" ChatOps Workflow.

Signed-off-by: Revital Sur <eres@il.ibm.com>

* test

Signed-off-by: Revital Sur <eres@il.ibm.com>

---------

Signed-off-by: Revital Sur <eres@il.ibm.com>

* test

* test: automated LGTM workflow test (#19)

This PR tests the /lgtm command workflow automation.

Test suite: all

Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

* test: automated LGTM workflow test (#20)

This PR tests the /lgtm command workflow automation.

Test suite: all

Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

* test: automated LGTM workflow test (#21)

This PR tests the /lgtm command workflow automation.

Test suite: all

Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

* test: automated LGTM workflow test (#22)

This PR tests the /lgtm command workflow automation.

Test suite: reset

Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

* test

Signed-off-by: Revital Sur <eres@il.ibm.com>

* test: automated LGTM workflow test (#24)

This PR tests the /lgtm command workflow automation.

Test suite: reset

Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

* test

Signed-off-by: Revital Sur <eres@il.ibm.com>

* test: automated LGTM workflow test (#26)

This PR tests the /lgtm command workflow automation.

Test suite: reset

Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

* test

Signed-off-by: Revital Sur <eres@il.ibm.com>

* Address review comments.

Signed-off-by: Revital Sur <eres@il.ibm.com>

* test: automated LGTM workflow test

This PR tests the /lgtm command workflow automation.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: Revital Sur <eres@il.ibm.com>

---------

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>
Signed-off-by: Antonio Cardace <acardace@redhat.com>
Signed-off-by: Edoardo Vacchi <evacchi@users.noreply.github.com>
Signed-off-by: MregXN <mregxn@gmail.com>
Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>
Signed-off-by: CYJiang <googs1025@gmail.com>
Signed-off-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>
Signed-off-by: Wen Zhou <wenzhou@redhat.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>
Signed-off-by: kyanokashi <kyanokashi2@gmail.com>
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
Signed-off-by: Kellen Swain <kfswain@google.com>
Signed-off-by: Maroon Ayoub <Maroonay@gmail.com>
Signed-off-by: irar2 <irar@il.ibm.com>
Signed-off-by: Ira Rosen <irar@il.ibm.com>
Signed-off-by: kyano <kyanokashi2@gmail.com>
Signed-off-by: kyanokashi <71283892+kyanokashi@users.noreply.github.com>
Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>
Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Nir Rozenbaum <nirro@il.ibm.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Shmuel Kallner <kallner@il.ibm.com>
Co-authored-by: Antonio Cardace <anto.cardace@gmail.com>
Co-authored-by: Edoardo Vacchi <evacchi@users.noreply.github.com>
Co-authored-by: MregXN <46479059+MregXN@users.noreply.github.com>
Co-authored-by: Hyunkyun Moon <mhg5303@gmail.com>
Co-authored-by: CYJiang <86391540+googs1025@users.noreply.github.com>
Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Co-authored-by: David Breitgand <davidbreitgand@users.noreply.github.com>
Co-authored-by: Maroon Ayoub <maroon.ayoub@ibm.com>
Co-authored-by: Wen Zhou <wenzhou@redhat.com>
Co-authored-by: Maya Barnea <mayab@il.ibm.com>
Co-authored-by: kyanokashi <71283892+kyanokashi@users.noreply.github.com>
Co-authored-by: Sage <80211083+sagearc@users.noreply.github.com>
Co-authored-by: Kellen Swain <kfswain@google.com>
Co-authored-by: Ira Rosen <irar@il.ibm.com>
Co-authored-by: alberto <aperdomo@redhat.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
github-actions Bot pushed a commit to revit13/llm-d-inference-scheduler that referenced this pull request Mar 1, 2026
* feat: make no-hit-lru P/D-aware (llm-d#522)

* feat: make no-hit-lru P/D-aware

Signed-off-by: Edoardo Vacchi <evacchi@users.noreply.github.com>

* hardcode prefill profile

Signed-off-by: Edoardo Vacchi <evacchi@users.noreply.github.com>

* remove spammy log

Signed-off-by: Edoardo Vacchi <evacchi@users.noreply.github.com>

* apply suggestions

Signed-off-by: Edoardo Vacchi <evacchi@users.noreply.github.com>

---------

Signed-off-by: Edoardo Vacchi <evacchi@users.noreply.github.com>

* Update disaggregated Prefill/Decode inference serving documentation (llm-d#571)

* update pd docs

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* typos

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* typo

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

---------

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* deps(actions): bump crate-ci/typos from 1.42.0 to 1.42.1 (llm-d#572)

Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.42.0 to 1.42.1.
- [Release notes](https://github.com/crate-ci/typos/releases)
- [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md)
- [Commits](crate-ci/typos@v1.42.0...v1.42.1)

---
updated-dependencies:
- dependency-name: crate-ci/typos
  dependency-version: 1.42.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* deps(go): bump github.com/onsi/ginkgo/v2 in the go-dependencies group (llm-d#573)

Bumps the go-dependencies group with 1 update: [github.com/onsi/ginkgo/v2](https://github.com/onsi/ginkgo).


Updates `github.com/onsi/ginkgo/v2` from 2.27.4 to 2.27.5
- [Release notes](https://github.com/onsi/ginkgo/releases)
- [Changelog](https://github.com/onsi/ginkgo/blob/master/CHANGELOG.md)
- [Commits](onsi/ginkgo@v2.27.4...v2.27.5)

---
updated-dependencies:
- dependency-name: github.com/onsi/ginkgo/v2
  dependency-version: 2.27.5
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: go-dependencies
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* fix reviewers auto assign minor bug (llm-d#575)

* fix(scorer): make active request pd aware (llm-d#569)

* fix: decrement all pods on request complete instead of only final pod

Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

* fix: append all pod endpoints from profile results

Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

---------

Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

* test(e2e): cleanup kind cluster (llm-d#563)

- if e2e-tests cluster exist, it fails to run "make test-e2e"
- main cleanup should be done in AfterSuite() call
- in certain case(kill/terminate) cluster might remain locally
  this PR is to add trap to preperly clean i up

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* refactor: add early validation in DP profile handler (llm-d#554)

- validate number of schedulingProfiles in EPP to be 1 otherwise return
  empty map to reduce computation on filter and scores.
- add unit test

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* deps(go): bump the kubernetes group with 2 updates (llm-d#574)

Bumps the kubernetes group with 2 updates: [sigs.k8s.io/controller-runtime](https://github.com/kubernetes-sigs/controller-runtime) and [sigs.k8s.io/gateway-api-inference-extension](https://github.com/kubernetes-sigs/gateway-api-inference-extension).


Updates `sigs.k8s.io/controller-runtime` from 0.22.4 to 0.22.5
- [Release notes](https://github.com/kubernetes-sigs/controller-runtime/releases)
- [Changelog](https://github.com/kubernetes-sigs/controller-runtime/blob/main/RELEASE.md)
- [Commits](kubernetes-sigs/controller-runtime@v0.22.4...v0.22.5)

Updates `sigs.k8s.io/gateway-api-inference-extension` from 1.3.0-rc.2 to 1.3.0-rc.3
- [Release notes](https://github.com/kubernetes-sigs/gateway-api-inference-extension/releases)
- [Changelog](https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/RELEASE.md)
- [Commits](kubernetes-sigs/gateway-api-inference-extension@v1.3.0-rc.2...v1.3.0-rc.3)

---
updated-dependencies:
- dependency-name: sigs.k8s.io/controller-runtime
  dependency-version: 0.22.5
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: kubernetes
- dependency-name: sigs.k8s.io/gateway-api-inference-extension
  dependency-version: 1.3.0-rc.3
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: kubernetes
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* refactor: kv cache manager repo (llm-d#570)

* refactor: kv cache manager repo name

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* go mod tidy

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* fetch kv cache upstream instead of my fork

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* revert dockerfile to fetch kv cache manager from upstream instead of go mod replace

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* update chat preprocessing structs

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* update kv cache manager version

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* refactor kvblock.Key to kvblock.BlockHash

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* add context

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* add parent block key

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* refactor encode

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* validate model name

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* run setup.sh

Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>

* clone vllm into build

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* edit

Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>

* edit lint

Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>

* delete fetch-python-wrapper.sh

Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>

* edit git workflow

Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>

* edit

Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>

* refactor TokenProcessorConfig in config

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* fix kv cache repo name in docker file

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* fix e2e tests

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>

* add ignore

Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>

* update architecture docs

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

---------

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>
Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>
Co-authored-by: HyunKyun Moon <mhg5303@gmail.com>
Co-authored-by: Maroon Ayoub <maroon.ayoub@ibm.com>

* bumping IGW version to the full released version (llm-d#583)

Signed-off-by: Kellen Swain <kfswain@google.com>

* Enable prefix-cache awareness in active-active multi-replica scheduler deployments (llm-d#578)

* - active-active-ha support

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>

* Update docs/architecture.md

Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Maroon Ayoub <Maroonay@gmail.com>

* lint

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>

---------

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>
Signed-off-by: Maroon Ayoub <Maroonay@gmail.com>
Co-authored-by: Etai Lev Ran <elevran@gmail.com>

* Switch to pre-built vLLM wheels for CPU builds (llm-d#582)

* try use official vllm wheels in dockerfile.epp

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* wip

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* use wheels in makefile

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* wip

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* write permissions to setup.sh

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* update kv cache manager commit

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* try instal py deps wo sudo

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* CR changes

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

---------

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* update llm-d-kv-cache import to v0.5.0-RC1 (llm-d#584)

* update kvc version import

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>

* add go.mod to testable changes

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>

---------

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>

* Use 1.3.0 CRDs (llm-d#586)

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* free disk space on ci-release (llm-d#587)

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>

* feat: use Tinyllama as the "model" for kind test and switch to use precise-prefix-cache-score in config (llm-d#581)

* feat: use Tinyllama as the "model" for kind test

- in order to test precies-prefix-cache-score we cannot use
  fool-reviewer since it need call kv-cache-manager to get tokenizer by
  getting a real model from HF
- the change is to switch the "default model" to TinyLlama
- also to make tokenizer folder writable need change permission to the
  USER in Dockerfile
- rename dp-epp-config.yaml sim-dp-epp-config.yaml as it is used for
  local test

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* update: revert back some config to keep using prefix-cache-scorer

- revert file renaming

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

---------

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* Update linter configuration (llm-d#588)

Signed-off-by: Etai Lev Ran <elevran@gmail.com>

* fix: config should use new precise-prefix-cache-scorer (llm-d#576)

- we have rename prefix-cache-scorer to precise-prefix-cache-scorer in 0.3.0, configs
  need migrate from the old one to the new one with spec.
  - rename plugin name
  - remove parameters.autoTune and parameters.mode: cache_tracking and
    lruCapacityPerServer
  - move hashBlockSize, maxPrefixBlocksToMatch under indexrConfig
- for config using food-review keep old prefix-cache-scorer
- keep pd-epp-config and sim-pd-epp-config with prefix-cache-scorer as
  KV and PD need both be enabled which is not done yet

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* deps(actions): bump crate-ci/typos from 1.42.1 to 1.42.2 (llm-d#589)

Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.42.1 to 1.42.2.
- [Release notes](https://github.com/crate-ci/typos/releases)
- [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md)
- [Commits](crate-ci/typos@v1.42.1...v1.42.2)

---
updated-dependencies:
- dependency-name: crate-ci/typos
  dependency-version: 1.42.2
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Updated to more recent GIE (llm-d#592)

* Updated to more recent GIE

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Updated to latest GIE and chnages due to review comments

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Added a true mock SchedulerProfile

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Exploited mock SchedulerProfile

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

---------

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* pull kvc v0.5.0 libs (llm-d#595)

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>

* deps(actions): bump crate-ci/typos from 1.42.2 to 1.43.0 (llm-d#596)

Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.42.2 to 1.43.0.
- [Release notes](https://github.com/crate-ci/typos/releases)
- [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md)
- [Commits](crate-ci/typos@v1.42.2...v1.43.0)

---
updated-dependencies:
- dependency-name: crate-ci/typos
  dependency-version: 1.43.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* address nil,nil return linter error in test mock (llm-d#598)

Signed-off-by: Etai Lev Ran <elevran@gmail.com>

* deps(go): bump the go-dependencies group with 2 updates (llm-d#597)

Bumps the go-dependencies group with 2 updates: [github.com/onsi/ginkgo/v2](https://github.com/onsi/ginkgo) and [github.com/onsi/gomega](https://github.com/onsi/gomega).


Updates `github.com/onsi/ginkgo/v2` from 2.27.5 to 2.28.1
- [Release notes](https://github.com/onsi/ginkgo/releases)
- [Changelog](https://github.com/onsi/ginkgo/blob/master/CHANGELOG.md)
- [Commits](onsi/ginkgo@v2.27.5...v2.28.1)

Updates `github.com/onsi/gomega` from 1.39.0 to 1.39.1
- [Release notes](https://github.com/onsi/gomega/releases)
- [Changelog](https://github.com/onsi/gomega/blob/master/CHANGELOG.md)
- [Commits](onsi/gomega@v1.39.0...v1.39.1)

---
updated-dependencies:
- dependency-name: github.com/onsi/ginkgo/v2
  dependency-version: 2.28.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: go-dependencies
- dependency-name: github.com/onsi/gomega
  dependency-version: 1.39.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: go-dependencies
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Models extractor (llm-d#553)

* Models extractor

Signed-off-by: irar2 <irar@il.ibm.com>

* Update register.go

Signed-off-by: Ira Rosen <irar@il.ibm.com>

* Updated for the newer GIE

Signed-off-by: irar2 <irar@il.ibm.com>

* Review comments

Signed-off-by: irar2 <irar@il.ibm.com>

* Check the scheme

Signed-off-by: irar2 <irar@il.ibm.com>

---------

Signed-off-by: irar2 <irar@il.ibm.com>
Signed-off-by: Ira Rosen <irar@il.ibm.com>

* feat(lmcache): implement decode first flow on lmcache connector when cache_hit_threshold field is present (llm-d#509)

* feat: implement decode first flow on lmcache connector

- if cache_hit_threshold field is present in completion request, then we perform a decode first flow

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: error handling

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: add back todo comment

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: reduce code complexity and duplication

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: improve header copying

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: add comment explaning the cache_hit_threshold field and the new decode first flow

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: enhance logging for cache hit threshold in decode flow

- decrease verbosity for common log
- add cache_hit_threshold attribute

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: improve error handling and observability when failing to unmarshal decode response

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: add deleted informational comments

Signed-off-by: kyano <kyanokashi2@gmail.com>

* typo

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: make error logs more descriptive of the failure reason

Signed-off-by: kyano <kyanokashi2@gmail.com>

* feat: add cache hit threshold to prefill request so prefill executes regardless of cache condition

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: typo

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: assign 0 cache_hit_threshold before final decode attempt

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: update comment according to feedback

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: remove istio workaround

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: set cache hit threshold to 0 in prefill request for consistent execution

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: update the log

Signed-off-by: kyano <kyanokashi2@gmail.com>

* feat: support online decoding

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: preserve request body in lmcache connector

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: support sse format for streamed decode

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: add and improve log  descriptions

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: typo

Signed-off-by: kyano <kyanokashi2@gmail.com>

* nit: undo capitalization

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: typos

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: improve error log observability

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: encapsulate http error checking in function and reuse

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: encapsulate and reuse code better

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: lint error

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: improve code encapsulation and reduce duplication

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: rename and simplify SSE event signaling logic

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: rename lmcache to shared storage protocol

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: remove unused function

Signed-off-by: kyano <kyanokashi2@gmail.com>

* test: e2e tests

Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

* chore: claude gitignore

Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

* fix: sim deployment

Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

* feat: make linter running on new code configurable

Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

* fix: lint errors

Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

---------

Signed-off-by: kyano <kyanokashi2@gmail.com>
Signed-off-by: kyanokashi <71283892+kyanokashi@users.noreply.github.com>
Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

* Extend support for different ways to decide if disaggregated PD is required (llm-d#531)

* Initial step of a configurable pd decider which is responsible for decision whether disaggregation is required, use data added in prefix scorer plugin in PrepareRequestData

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* update version of GIE + fix lint

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* update yaml and the test according prefix plugin configuration change (blockSize replaced by blockSizeTokens)

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* Update docs/architecture.md

Co-authored-by: Shmuel Kallner <kallner@il.ibm.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* code review

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* code review

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* update version of GIE, update prefix_disagr_decider accordingly

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* fix typo

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* fix PD for short inputs

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* Update docs/architecture.md

Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* Update pkg/plugins/profile/always_disaggr_decider.go

Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* Update pkg/plugins/profile/always_disaggr_decider.go

Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* Update pkg/plugins/profile/prefix_disagg_decider.go

Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* updates according the PR comments

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* fix test

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* create pd decider plugin type with 2 implementations (for prefix based and test always), update deploy configuration according the new structure

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* fix e2e tests

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* changes according the pr comments

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* fix e2e test

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* add explanation about pd deciders to disagg_pd doc

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* rename always_disaggr_decider to always_disagg_decider

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

---------

Signed-off-by: Maya Barnea <mayab@il.ibm.com>
Co-authored-by: Shmuel Kallner <kallner@il.ibm.com>
Co-authored-by: Etai Lev Ran <elevran@gmail.com>

* chore: fix wrong port for NIXL (llm-d#593)

- start with vLLM 0.11.1, default port for NIXL has been updated to 5600
- leave ZMQ to use 5557

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* fix: resolve JSON serialization error in active-request-scorer debug logs (llm-d#602)

* fix: resolve JSON serialization error in active-request-scorer debug logs

Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>

* feat: Add raw scores to debug

Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>

---------

Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>

* Implement "LGTM" ChatOps Workflow.

Signed-off-by: Revital Sur <eres@il.ibm.com>

* test

Signed-off-by: Revital Sur <eres@il.ibm.com>

* Lgtm2 (#17)

* Implement "LGTM" ChatOps Workflow.

Signed-off-by: Revital Sur <eres@il.ibm.com>

* test

Signed-off-by: Revital Sur <eres@il.ibm.com>

---------

Signed-off-by: Revital Sur <eres@il.ibm.com>

* Implement "LGTM" ChatOps Workflow.

Signed-off-by: Revital Sur <eres@il.ibm.com>

* test

* test: automated LGTM workflow test (#19)

This PR tests the /lgtm command workflow automation.

Test suite: all

Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

* test: automated LGTM workflow test (#20)

This PR tests the /lgtm command workflow automation.

Test suite: all

Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

* test: automated LGTM workflow test (#21)

This PR tests the /lgtm command workflow automation.

Test suite: all

Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

* test: automated LGTM workflow test (#22)

This PR tests the /lgtm command workflow automation.

Test suite: reset

Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

* test

Signed-off-by: Revital Sur <eres@il.ibm.com>

* test: automated LGTM workflow test (#24)

This PR tests the /lgtm command workflow automation.

Test suite: reset

Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

* test

Signed-off-by: Revital Sur <eres@il.ibm.com>

* test: automated LGTM workflow test (#26)

This PR tests the /lgtm command workflow automation.

Test suite: reset

Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

* Use a PAT (BOT_TOKEN) instead of GITHUB_TOKEN to trigger the 'lgtm-gatekeeper' workflow.

Signed-off-by: Revital Sur <eres@il.ibm.com>

* test

Signed-off-by: Revital Sur <eres@il.ibm.com>

* Use secrets.BOT_TOKEN in lgtm-reset.yml.

Signed-off-by: Revital Sur <eres@il.ibm.com>

* Address review comments.

Signed-off-by: Revital Sur <eres@il.ibm.com>

* test: automated LGTM workflow test (#33)

This PR tests the /lgtm command workflow automation.

Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

* test: automated LGTM workflow test (#34)

This PR tests the /lgtm command workflow automation.

Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

* test: automated LGTM workflow test (#35)

This PR tests the /lgtm command workflow automation.

Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

* test: automated LGTM workflow test (#36)

This PR tests the /lgtm command workflow automation.

Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

* test

Signed-off-by: Revital Sur <eres@il.ibm.com>

* test

Signed-off-by: Revital Sur <eres@il.ibm.com>

* test: automated LGTM workflow test (#39)

This PR tests the /lgtm command workflow automation.

Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

* test

Signed-off-by: Revital Sur <eres@il.ibm.com>

* test: automated LGTM workflow test (#42)

This PR tests the /lgtm command workflow automation.

Test suite: all

Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

* test: automated LGTM workflow test (#43)

This PR tests the /lgtm command workflow automation.

Test suite: all

Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

* test: automated LGTM workflow test (#44)

This PR tests the /lgtm command workflow automation.

Test suite: all

Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

* test: success-path (#45)

Tests the happy path: authorized user LGTM on a clean PR.

Test timestamp: 1771126029

Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

* test: success-path (#47)

Tests the happy path: authorized user LGTM on a clean PR.

Test timestamp: 1771126298

Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

* test

Signed-off-by: Revital Sur <eres@il.ibm.com>

* test

Signed-off-by: Revital Sur <eres@il.ibm.com>

* test

Signed-off-by: Revital Sur <eres@il.ibm.com>

* test: automated LGTM workflow test (#50)

This PR tests the /lgtm command workflow automation.

Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

* test

Signed-off-by: Revital Sur <eres@il.ibm.com>

* test: automated LGTM workflow test (#54)

This PR tests the /lgtm command workflow automation.

Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

* test

Signed-off-by: Revital Sur <eres@il.ibm.com>

* test: blocking-labels (#58)

Tests that /lgtm is blocked when hold label is present.

Test timestamp: 1771128456

Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

* test: blocking-labels (#60)

Tests that /lgtm is blocked when hold label is present.

Test timestamp: 1771128602

Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

* test: open-pr (#62)

Tests that opening a PR triggers gatekeeper which blocks without lgtm label.

Test timestamp: 1771133038

Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

* test: open-pr (#63)

Tests that opening a PR triggers gatekeeper which blocks without lgtm label.

Test timestamp: 1771133183

Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

* test

Signed-off-by: Revital Sur <eres@il.ibm.com>

* test: open-pr (#64)

Tests that opening a PR triggers gatekeeper which blocks without lgtm label.

Test timestamp: 1771135205

Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

* test

Signed-off-by: Revital Sur <eres@il.ibm.com>

* Address review comments.

Signed-off-by: Revital Sur <eres@il.ibm.com>

* test: open-pr (#68)

Tests that opening a PR triggers gatekeeper which blocks without lgtm label.

Test timestamp: 1771138616

Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

* test

Signed-off-by: Revital Sur <eres@il.ibm.com>

* test

Signed-off-by: Revital Sur <eres@il.ibm.com>

* test: open-pr

Tests that opening a PR triggers gatekeeper which blocks without lgtm label.

Test timestamp: 1771143330

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

---------

Signed-off-by: Edoardo Vacchi <evacchi@users.noreply.github.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: kyanokashi <kyanokashi2@gmail.com>
Signed-off-by: Wen Zhou <wenzhou@redhat.com>
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>
Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>
Signed-off-by: Kellen Swain <kfswain@google.com>
Signed-off-by: Maroon Ayoub <Maroonay@gmail.com>
Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>
Signed-off-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: irar2 <irar@il.ibm.com>
Signed-off-by: Ira Rosen <irar@il.ibm.com>
Signed-off-by: kyano <kyanokashi2@gmail.com>
Signed-off-by: kyanokashi <71283892+kyanokashi@users.noreply.github.com>
Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>
Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Edoardo Vacchi <evacchi@users.noreply.github.com>
Co-authored-by: Maya Barnea <mayab@il.ibm.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Nir Rozenbaum <nirro@il.ibm.com>
Co-authored-by: kyanokashi <71283892+kyanokashi@users.noreply.github.com>
Co-authored-by: Wen Zhou <wenzhou@redhat.com>
Co-authored-by: Sage <80211083+sagearc@users.noreply.github.com>
Co-authored-by: HyunKyun Moon <mhg5303@gmail.com>
Co-authored-by: Maroon Ayoub <maroon.ayoub@ibm.com>
Co-authored-by: Kellen Swain <kfswain@google.com>
Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Co-authored-by: Shmuel Kallner <kallner@il.ibm.com>
Co-authored-by: Ira Rosen <irar@il.ibm.com>
Co-authored-by: alberto <aperdomo@redhat.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

lgtm "Looks good to me", indicates that a PR is ready to be merged.

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

5 participants