Deprecate InferenceModel by learner0810 · Pull Request #129 · llm-d-incubation/llm-d-modelservice

learner0810 · 2025-10-13T06:51:05Z

FIX: #121

yankay · 2025-10-14T09:42:44Z

The chart version needs to be updated like: #125

learner0810 · 2025-10-14T09:47:07Z

The chart version needs to be updated like: #125

Thanks for the review, the chart version has been updated

jgchn

LGTM, please run make verify. Thanks for the PR!

learner0810 · 2025-10-15T01:55:03Z

LGTM, please run make verify. Thanks for the PR!

Thanks for the reminder, the changes have been committed

yankay · 2025-10-15T02:20:16Z

LGTM, please run make verify. Thanks for the PR!

Thanks for the reminder, the changes have been committed

Thanks @learner0810

It is recommended to manually run make pre-commit-run for a check before merging #131.

learner0810 · 2025-10-15T02:53:22Z

LGTM, please run make verify. Thanks for the PR!

Thanks for the reminder, the changes have been committed

Thanks @learner0810

It is recommended to manually run make pre-commit-run for a check before merging #131.

Sorry to bother you. Running make pre-commit-run causes changes in PR #123 to be deleted.

My changes passed the pre-commit checks. Should we leave the #123 issue to be resolved by #131? What do you think?

host@hostdeMacBook-Pro llm-d-modelservice % make pre-commit-run                  
hack/install-tools.sh
ct is already installed. Skipping installation.
Set FORCE_INSTALL=true to reinstall.
'ct' has been installed successfully. Location: /Users/host/go/src/github/llm-d-modelservice/bin
helm is already installed. Skipping installation.
Set FORCE_INSTALL=true to reinstall.
'helm' has been installed successfully. Location: /Users/host/go/src/github/llm-d-modelservice/bin
pre-commit is already installed. Skipping installation.
Set FORCE_INSTALL=true to reinstall.
'precommit' has been installed successfully. Location: /Users/host/go/src/github/llm-d-modelservice/bin
pre-commit run --all-files
Generate jsonschema......................................................Failed
- hook id: helm-schema
- files were modified by this hook
check for added large files..............................................Passed
check for merge conflicts................................................Passed
check json...............................................................Passed
detect private key.......................................................Passed
fix end of files.........................................................Passed
mixed line ending........................................................Passed
trim trailing whitespace.................................................Passed
jsonschema-dereference...................................................Failed
- hook id: jsonschema-dereference
- files were modified by this hook
make: *** [pre-commit-run] Error 1
host@hostdeMacBook-Pro llm-d-modelservice % git diff
diff --git a/charts/llm-d-modelservice/values.schema.json b/charts/llm-d-modelservice/values.schema.json
index 8e7db4c..92c9ba2 100644
--- a/charts/llm-d-modelservice/values.schema.json
+++ b/charts/llm-d-modelservice/values.schema.json
@@ -2,10 +2,6 @@
     "$schema": "http://json-schema.org/draft-07/schema#",
     "additionalProperties": false,
     "properties": {
-        "enabled": {
-            "description": "Usually used when using llm-d-modelservice as a subchart.",
-            "type": "boolean"
-        },
         "accelerator": {
             "additionalProperties": false,
             "description": " Supported types: nvidia, intel-i915, intel-xe, amd, google",
diff --git a/charts/llm-d-modelservice/values.schema.tmpl.json b/charts/llm-d-modelservice/values.schema.tmpl.json
index 6fd64c1..86f4e33 100644
--- a/charts/llm-d-modelservice/values.schema.tmpl.json
+++ b/charts/llm-d-modelservice/values.schema.tmpl.json
@@ -2,10 +2,6 @@
   "$schema": "http://json-schema.org/draft-07/schema#",
   "additionalProperties": false,
   "properties": {
-    "enabled": {
-        "description": "Usually used when using llm-d-modelservice as a subchart.",
-        "type": "boolean"
-    },
     "accelerator": {
       "additionalProperties": false,
       "description": " Supported types: nvidia, intel-i915, intel-xe, amd, google",
host@hostdeMacBook-Pro llm-d-modelservice %

yankay · 2025-10-15T03:05:46Z

LGTM, please run make verify. Thanks for the PR!

Thanks for the reminder, the changes have been committed

Thanks @learner0810
It is recommended to manually run make pre-commit-run for a check before merging #131.

Sorry to bother you. Running make pre-commit-run causes changes in PR #123 to be deleted.

My changes passed the pre-commit checks. Should we leave the #123 issue to be resolved by #131? What do you think?

host@hostdeMacBook-Pro llm-d-modelservice % make pre-commit-run                  
hack/install-tools.sh
ct is already installed. Skipping installation.
Set FORCE_INSTALL=true to reinstall.
'ct' has been installed successfully. Location: /Users/host/go/src/github/llm-d-modelservice/bin
helm is already installed. Skipping installation.
Set FORCE_INSTALL=true to reinstall.
'helm' has been installed successfully. Location: /Users/host/go/src/github/llm-d-modelservice/bin
pre-commit is already installed. Skipping installation.
Set FORCE_INSTALL=true to reinstall.
'precommit' has been installed successfully. Location: /Users/host/go/src/github/llm-d-modelservice/bin
pre-commit run --all-files
Generate jsonschema......................................................Failed
- hook id: helm-schema
- files were modified by this hook
check for added large files..............................................Passed
check for merge conflicts................................................Passed
check json...............................................................Passed
detect private key.......................................................Passed
fix end of files.........................................................Passed
mixed line ending........................................................Passed
trim trailing whitespace.................................................Passed
jsonschema-dereference...................................................Failed
- hook id: jsonschema-dereference
- files were modified by this hook
make: *** [pre-commit-run] Error 1
host@hostdeMacBook-Pro llm-d-modelservice % git diff
diff --git a/charts/llm-d-modelservice/values.schema.json b/charts/llm-d-modelservice/values.schema.json
index 8e7db4c..92c9ba2 100644
--- a/charts/llm-d-modelservice/values.schema.json
+++ b/charts/llm-d-modelservice/values.schema.json
@@ -2,10 +2,6 @@
     "$schema": "http://json-schema.org/draft-07/schema#",
     "additionalProperties": false,
     "properties": {
-        "enabled": {
-            "description": "Usually used when using llm-d-modelservice as a subchart.",
-            "type": "boolean"
-        },
         "accelerator": {
             "additionalProperties": false,
             "description": " Supported types: nvidia, intel-i915, intel-xe, amd, google",
diff --git a/charts/llm-d-modelservice/values.schema.tmpl.json b/charts/llm-d-modelservice/values.schema.tmpl.json
index 6fd64c1..86f4e33 100644
--- a/charts/llm-d-modelservice/values.schema.tmpl.json
+++ b/charts/llm-d-modelservice/values.schema.tmpl.json
@@ -2,10 +2,6 @@
   "$schema": "http://json-schema.org/draft-07/schema#",
   "additionalProperties": false,
   "properties": {
-    "enabled": {
-        "description": "Usually used when using llm-d-modelservice as a subchart.",
-        "type": "boolean"
-    },
     "accelerator": {
       "additionalProperties": false,
       "description": " Supported types: nvidia, intel-i915, intel-xe, amd, google",
host@hostdeMacBook-Pro llm-d-modelservice %

it's a good idea :-)
/lgtm

poussa · 2025-10-17T07:40:04Z

Can we get this merged?

yankay · 2025-10-17T07:47:22Z

The chart version needs to be updated :-)

poussa · 2025-10-17T07:54:41Z

The chart version needs to be updated :-)

Not necessarily. We can collect multiple changes and then update/release a new version.

EDIT: Well, it was already done...that is good.

poussa · 2025-10-17T07:59:30Z

Hmm, I am getting on error when testing this:

{"level":"error","ts":"2025-10-17T07:47:42Z","logger":"controller-runtime.source.Kind","caller":"source/kind.go:80","msg":"failed to get informer from cache","error":"Timeout: failed waiting for *v1.InferencePool Informer to sync","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/source.(*Kind[...]).Start.func1.1\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.22.0/pkg/internal/source/kind.go:80\nk8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext.func1\n\t/go/pkg/mod/k8s.io/apimachinery@v0.34.0/pkg/util/wait/loop.go:53\nk8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext\n\t/go/pkg/mod/k8s.io/apimachinery@v0.34.0/pkg/util/wait/loop.go:54\nk8s.io/apimachinery/pkg/util/wait.PollUntilContextCancel\n\t/go/pkg/mod/k8s.io/apimachinery@v0.34.0/pkg/util/wait/poll.go:33\nsigs.k8s.io/controller-runtime/pkg/internal/source.(*Kind[...]).Start.func1\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.22.0/pkg/internal/source/kind.go:68"}

I have the CRDs:

inferenceobjectives.inference.networking.x-k8s.io                 2025-10-17T07:30:05Z
inferencepools.inference.networking.k8s.io                        2025-10-17T07:30:06Z
inferencepools.inference.networking.x-k8s.io                      2025-10-17T07:30:07Z

Is this related to the PR?

learner0810 · 2025-10-17T08:05:19Z

Hmm, I am getting on error when testing this:

{"level":"error","ts":"2025-10-17T07:47:42Z","logger":"controller-runtime.source.Kind","caller":"source/kind.go:80","msg":"failed to get informer from cache","error":"Timeout: failed waiting for *v1.InferencePool Informer to sync","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/source.(*Kind[...]).Start.func1.1\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.22.0/pkg/internal/source/kind.go:80\nk8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext.func1\n\t/go/pkg/mod/k8s.io/apimachinery@v0.34.0/pkg/util/wait/loop.go:53\nk8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext\n\t/go/pkg/mod/k8s.io/apimachinery@v0.34.0/pkg/util/wait/loop.go:54\nk8s.io/apimachinery/pkg/util/wait.PollUntilContextCancel\n\t/go/pkg/mod/k8s.io/apimachinery@v0.34.0/pkg/util/wait/poll.go:33\nsigs.k8s.io/controller-runtime/pkg/internal/source.(*Kind[...]).Start.func1\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.22.0/pkg/internal/source/kind.go:68"}

I have the CRDs:

inferenceobjectives.inference.networking.x-k8s.io                 2025-10-17T07:30:05Z
inferencepools.inference.networking.k8s.io                        2025-10-17T07:30:06Z
inferencepools.inference.networking.x-k8s.io                      2025-10-17T07:30:07Z

Is this related to the PR?

Yes, I've upgraded the llm-d-inference-scheduler version. You can resolve this by running the following command:

kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/releases/download/v1.0.1/v1-manifests.yaml

poussa · 2025-10-17T08:10:05Z

Yes, I've upgraded the llm-d-inference-scheduler version. You can resolve this by running the following command:
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/releases/download/v1.0.1/v1-manifests.yaml

I have already done:

k apply -k "https://github.com/kubernetes-sigs/gateway-api-inference-extension/config/crd/?ref=v1.0.1"

Isn't that the same thing?

learner0810 · 2025-10-17T08:12:40Z

Yes, I've upgraded the llm-d-inference-scheduler version. You can resolve this by running the following command:
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/releases/download/v1.0.1/v1-manifests.yaml
I have already done:

k apply -k "https://github.com/kubernetes-sigs/gateway-api-inference-extension/config/crd/?ref=v1.0.1"

Isn't that the same thing?

Yeah, the same thing.

poussa · 2025-10-17T08:28:15Z

Actually, there is an earlier error, which is related to RBAC, I guess.

{"level":"error","ts":"2025-10-17T08:13:08Z","logger":"controller-runtime.cache.UnhandledError","caller":"runtime/runtime.go:221","msg":"Failed to watch","reflector":"pkg/mod/k8s.io/client-go@v0.34.0/tools/cache/reflector.go:290","type":"*v1alpha2.InferenceObjective","error":"failed to list *v1alpha2.InferenceObjective: inferenceobjectives.inference.networking.x-k8s.io is forbidden: User \"system:serviceaccount:llm-d:gaudi-llm-d-modelservice-epp\" cannot list resource \"inferenceobjectives\" in API group \"inference.networking.x-k8s.io\" in the namespace \"llm-d\"","stacktrace":"k8s.io/apimachinery/pkg/util/runtime.logError\n\t/go/pkg/mod/k8s.io/apimachinery@v0.34.0/pkg/util/runtime/runtime.go:221\nk8s.io/apimachinery/pkg/util/runtime.handleError\n\t/go/pkg/mod/k8s.io/apimachinery@v0.34.0/pkg/util/runtime/runtime.go:212\nk8s.io/apimachinery/pkg/util/runtime.HandleErrorWithContext\n\t/go/pkg/mod/k8s.io/apimachinery@v0.34.0/pkg/util/runtime/runtime.go:198\nk8s.io/client-go/tools/cache.DefaultWatchErrorHandler\n\t/go/pkg/mod/k8s.io/client-go@v0.34.0/tools/cache/reflector.go:205\nk8s.io/client-go/tools/cache.(*Reflector).RunWithContext.func1\n\t/go/pkg/mod/k8s.io/client-go@v0.34.0/tools/cache/reflector.go:361\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\t/go/pkg/mod/k8s.io/apimachinery@v0.34.0/pkg/util/wait/backoff.go:233\nk8s.io/apimachinery/pkg/util/wait.BackoffUntilWithContext.func1\n\t/go/pkg/mod/k8s.io/apimachinery@v0.34.0/pkg/util/wait/backoff.go:255\nk8s.io/apimachinery/pkg/util/wait.BackoffUntilWithContext\n\t/go/pkg/mod/k8s.io/apimachinery@v0.34.0/pkg/util/wait/backoff.go:256\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\t/go/pkg/mod/k8s.io/apimachinery@v0.34.0/pkg/util/wait/backoff.go:233\nk8s.io/client-go/tools/cache.(*Reflector).RunWithContext\n\t/go/pkg/mod/k8s.io/client-go@v0.34.0/tools/cache/reflector.go:359\nk8s.io/client-go/tools/cache.(*controller).RunWithContext.(*Group).StartWithContext.func3\n\t/go/pkg/mod/k8s.io/apimachinery@v0.34.0/pkg/util/wait/wait.go:63\nk8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1\n\t/go/pkg/mod/k8s.io/apimachinery@v0.34.0/pkg/util/wait/wait.go:72"}

Any clue where this is coming from?

learner0810 · 2025-10-17T08:37:21Z

Any clue where this is coming from?

I am very sorry that I missed the rbac of the newly added cr resource, and it has been fixed. Please reinstall it

poussa · 2025-10-17T08:49:14Z

Getting closer but you still need to add this. Then it works (at least for me).

diff --git a/charts/llm-d-modelservice/templates/epp-role.yaml b/charts/llm-d-modelservice/templates/epp-role.yaml
index 4315ba3..8b912bd 100644
--- a/charts/llm-d-modelservice/templates/epp-role.yaml
+++ b/charts/llm-d-modelservice/templates/epp-role.yaml
@@ -5,9 +5,10 @@ metadata:
   name: {{ include "llm-d-modelservice.eppRoleName" . }}
 rules:
 - apiGroups:
-  - inference.networking.x-k8s.io
+  - inference.networking.k8s.io
   resources:
   - inferencepools
+  - inferenceobjectives
   verbs:
   - get
   - watch

poussa · 2025-10-17T08:52:28Z

Well, not quite. I get no error but the pod still terminates. Any clue why?

{"level":"info","ts":"2025-10-17T08:48:34Z","logger":"setup","caller":"runner/runner.go:144","msg":"GIE build","commit-sha":"unknown","build-ref":""}
{"level":"info","ts":"2025-10-17T08:48:34Z","logger":"setup","caller":"runner/runner.go:157","msg":"Flags processed","flags":{"cert-path":"","config-file":"config/default-config.yaml","config-text":"","enable-pprof":true,"grpc-health-port":9003,"grpc-port":9002,"ha-enable-leader-election":false,"health-checking":false,"kubeconfig":"","kv-cache-usage-percentage-metric":"vllm:gpu_cache_usage_perc","lora-info-metric":"vllm:lora_requests_info","metrics-port":9090,"metrics-staleness-threshold":2000000000,"model-server-metrics-https-insecure-skip-verify":true,"model-server-metrics-path":"/metrics","model-server-metrics-port":0,"model-server-metrics-scheme":"http","pool-group":"inference.networking.k8s.io","pool-name":"gaudi-llm-d-modelservice","pool-namespace":"llm-d","refresh-metrics-interval":50000000,"refresh-prometheus-metrics-interval":5000000000,"secure-serving":true,"total-queued-requests-metric":"vllm:num_requests_waiting","v":4,"zap-devel":true,"zap-encoder":{},"zap-log-level":{},"zap-stacktrace-level":{},"zap-time-encoding":{}}}
{"level":"info","ts":"2025-10-17T08:48:34Z","logger":"saturation-detector-config","caller":"env/env.go:34","msg":"Environment variable not set, using default value","key":"SD_QUEUE_DEPTH_THRESHOLD","defaultValue":5}
{"level":"info","ts":"2025-10-17T08:48:34Z","logger":"saturation-detector-config","caller":"env/env.go:34","msg":"Environment variable not set, using default value","key":"SD_KV_CACHE_UTIL_THRESHOLD","defaultValue":0.8}
{"level":"info","ts":"2025-10-17T08:48:34Z","logger":"saturation-detector-config","caller":"env/env.go:34","msg":"Environment variable not set, using default value","key":"SD_METRICS_STALENESS_THRESHOLD","defaultValue":0.2}
{"level":"info","ts":"2025-10-17T08:48:34Z","logger":"saturation-detector-config","caller":"saturationdetector/config.go:70","msg":"SaturationDetector configuration loaded from env","config":"&{QueueDepthThreshold:5 KVCacheUtilThreshold:0.8 MetricsStalenessThreshold:200ms}"}
{"level":"info","ts":"2025-10-17T08:48:34Z","logger":"setup","caller":"env/env.go:34","msg":"Environment variable not set, using default value","key":"ENABLE_EXPERIMENTAL_DATALAYER_V2","defaultValue":false}
{"level":"info","ts":"2025-10-17T08:48:34Z","logger":"setup","caller":"runner/runner.go:226","msg":"Enabling pprof handlers"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:34Z","caller":"manager/internal.go:196","msg":"Registering metrics http server extra handler","path":"/debug/pprof/heap"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:34Z","caller":"manager/internal.go:196","msg":"Registering metrics http server extra handler","path":"/debug/pprof/goroutine"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:34Z","caller":"manager/internal.go:196","msg":"Registering metrics http server extra handler","path":"/debug/pprof/allocs"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:34Z","caller":"manager/internal.go:196","msg":"Registering metrics http server extra handler","path":"/debug/pprof/threadcreate"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:34Z","caller":"manager/internal.go:196","msg":"Registering metrics http server extra handler","path":"/debug/pprof/block"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:34Z","caller":"manager/internal.go:196","msg":"Registering metrics http server extra handler","path":"/debug/pprof/mutex"}
{"level":"info","ts":"2025-10-17T08:48:34Z","caller":"loader/configloader.go:49","msg":"Loaded configuration","config":"{Plugins: [{/prefix-cache-scorer, Parameters: {\"hashBlockSize\":5,\"lruCapacityPerServer\":31250,\"maxPrefixBlocksToMatch\":256}} {/decode-filter} {/max-score-picker} {/single-profile-handler}], SchedulingProfiles: [{Name: default, Plugins: [{PluginRef: decode-filter} {PluginRef: max-score-picker} {PluginRef: prefix-cache-scorer, Weight: 50}]}]}"}
{"level":"info","ts":"2025-10-17T08:48:34Z","caller":"loader/configloader.go:60","msg":"Configuration with defaults set","config":"{Plugins: [{prefix-cache-scorer/prefix-cache-scorer, Parameters: {\"hashBlockSize\":5,\"lruCapacityPerServer\":31250,\"maxPrefixBlocksToMatch\":256}} {decode-filter/decode-filter} {max-score-picker/max-score-picker} {single-profile-handler/single-profile-handler}], SchedulingProfiles: [{Name: default, Plugins: [{PluginRef: decode-filter} {PluginRef: max-score-picker} {PluginRef: prefix-cache-scorer, Weight: 50}]}]}"}
{"level":"info","ts":"2025-10-17T08:48:34Z","caller":"runner/runner.go:343","msg":"loaded configuration from file/text successfully"}
{"level":"info","ts":"2025-10-17T08:48:34Z","logger":"setup","caller":"runner/runner.go:249","msg":"parsed config","scheduler-config":"{ProfileHandler: single-profile-handler/single-profile-handler, Profiles: map[default:{Filters: [decode-filter/by-label], Scorers: [prefix-cache-scorer/prefix-cache-scorer: 50], Picker: max-score-picker/max-score-picker}]}"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:34Z","logger":"setup.SaturationDetector","caller":"saturationdetector/saturationdetector.go:89","msg":"Creating new SaturationDetector","queueDepthThreshold":5,"kvCacheUtilThreshold":0.8,"metricsStalenessThreshold":"200ms"}
{"level":"info","ts":"2025-10-17T08:48:34Z","logger":"setup","caller":"runner/runner.go:446","msg":"ExtProc server runner added to manager."}
{"level":"info","ts":"2025-10-17T08:48:34Z","logger":"setup","caller":"runner/runner.go:290","msg":"Controller manager starting"}
{"level":"info","ts":"2025-10-17T08:48:34Z","logger":"controller-runtime.metrics","caller":"server/server.go:208","msg":"Starting metrics server"}
{"level":"info","ts":"2025-10-17T08:48:34Z","logger":"controller-runtime.metrics","caller":"server/server.go:247","msg":"Serving metrics server","bindAddress":":9090","secure":false}
{"level":"info","ts":"2025-10-17T08:48:34Z","caller":"runnable/grpc.go:35","msg":"gRPC server starting","name":"health"}
{"level":"info","ts":"2025-10-17T08:48:34Z","caller":"runnable/grpc.go:44","msg":"gRPC server listening","name":"health","port":9003}
{"level":"info","ts":"2025-10-17T08:48:34Z","caller":"controller/controller.go:353","msg":"Starting EventSource","controller":"inferencepool","controllerGroup":"inference.networking.k8s.io","controllerKind":"InferencePool","source":"kind source: *v1.InferencePool"}
{"level":"info","ts":"2025-10-17T08:48:34Z","caller":"controller/controller.go:353","msg":"Starting EventSource","controller":"pod","controllerGroup":"","controllerKind":"Pod","source":"kind source: *v1.Pod"}
{"level":"info","ts":"2025-10-17T08:48:34Z","caller":"controller/controller.go:353","msg":"Starting EventSource","controller":"inferenceobjective","controllerGroup":"inference.networking.x-k8s.io","controllerKind":"InferenceObjective","source":"kind source: *v1alpha2.InferenceObjective"}
{"level":"Level(-3)","ts":"2025-10-17T08:48:34Z","logger":"controller-runtime.cache","caller":"cache/reflector.go:358","msg":"Starting reflector","type":"*v1.Pod","resyncPeriod":34471.287448561,"reflector":"pkg/mod/k8s.io/client-go@v0.34.0/tools/cache/reflector.go:290"}
{"level":"Level(-3)","ts":"2025-10-17T08:48:34Z","logger":"controller-runtime.cache","caller":"cache/reflector.go:404","msg":"Listing and watching","type":"*v1.Pod","reflector":"pkg/mod/k8s.io/client-go@v0.34.0/tools/cache/reflector.go:290"}
{"level":"Level(-3)","ts":"2025-10-17T08:48:34Z","logger":"controller-runtime.cache","caller":"cache/reflector.go:358","msg":"Starting reflector","type":"*v1.InferencePool","resyncPeriod":34881.462490506,"reflector":"pkg/mod/k8s.io/client-go@v0.34.0/tools/cache/reflector.go:290"}
{"level":"Level(-3)","ts":"2025-10-17T08:48:34Z","logger":"controller-runtime.cache","caller":"cache/reflector.go:404","msg":"Listing and watching","type":"*v1.InferencePool","reflector":"pkg/mod/k8s.io/client-go@v0.34.0/tools/cache/reflector.go:290"}
{"level":"Level(-3)","ts":"2025-10-17T08:48:34Z","logger":"controller-runtime.cache","caller":"cache/reflector.go:358","msg":"Starting reflector","type":"*v1alpha2.InferenceObjective","resyncPeriod":37158.80205366,"reflector":"pkg/mod/k8s.io/client-go@v0.34.0/tools/cache/reflector.go:290"}
{"level":"Level(-3)","ts":"2025-10-17T08:48:34Z","logger":"controller-runtime.cache","caller":"cache/reflector.go:404","msg":"Listing and watching","type":"*v1alpha2.InferenceObjective","reflector":"pkg/mod/k8s.io/client-go@v0.34.0/tools/cache/reflector.go:290"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:34Z","logger":"controller-runtime.cache","caller":"cache/reflector.go:436","msg":"Caches populated","type":"*v1.InferencePool","reflector":"pkg/mod/k8s.io/client-go@v0.34.0/tools/cache/reflector.go:290"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:34Z","logger":"controller-runtime.cache","caller":"cache/reflector.go:436","msg":"Caches populated","type":"*v1alpha2.InferenceObjective","reflector":"pkg/mod/k8s.io/client-go@v0.34.0/tools/cache/reflector.go:290"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:34Z","logger":"controller-runtime.cache","caller":"cache/reflector.go:436","msg":"Caches populated","type":"*v1.Pod","reflector":"pkg/mod/k8s.io/client-go@v0.34.0/tools/cache/reflector.go:290"}
{"level":"info","ts":"2025-10-17T08:48:35Z","caller":"controller/controller.go:286","msg":"Starting Controller","controller":"inferencepool","controllerGroup":"inference.networking.k8s.io","controllerKind":"InferencePool"}
{"level":"info","ts":"2025-10-17T08:48:35Z","caller":"controller/controller.go:289","msg":"Starting workers","controller":"inferencepool","controllerGroup":"inference.networking.k8s.io","controllerKind":"InferencePool","worker count":1}
{"level":"info","ts":"2025-10-17T08:48:35Z","caller":"controller/controller.go:286","msg":"Starting Controller","controller":"pod","controllerGroup":"","controllerKind":"Pod"}
{"level":"info","ts":"2025-10-17T08:48:35Z","caller":"controller/controller.go:289","msg":"Starting workers","controller":"pod","controllerGroup":"","controllerKind":"Pod","worker count":1}
{"level":"info","ts":"2025-10-17T08:48:35Z","caller":"controller/controller.go:286","msg":"Starting Controller","controller":"inferenceobjective","controllerGroup":"inference.networking.x-k8s.io","controllerKind":"InferenceObjective"}
{"level":"info","ts":"2025-10-17T08:48:35Z","caller":"controller/controller.go:289","msg":"Starting workers","controller":"inferenceobjective","controllerGroup":"inference.networking.x-k8s.io","controllerKind":"InferenceObjective","worker count":1}
{"level":"info","ts":"2025-10-17T08:48:35Z","caller":"runnable/grpc.go:35","msg":"gRPC server starting","name":"ext-proc"}
{"level":"info","ts":"2025-10-17T08:48:35Z","caller":"runnable/grpc.go:44","msg":"gRPC server listening","name":"ext-proc","port":9002}
{"level":"Level(-2)","ts":"2025-10-17T08:48:39Z","caller":"metrics/logger.go:81","msg":"Pool is not initialized, skipping refreshing metrics"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:44Z","logger":"health","caller":"runner/health.go:52","msg":"gRPC health check not serving (leader election disabled)","service":"envoy.service.ext_proc.v3.ExternalProcessor"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:44Z","caller":"metrics/logger.go:81","msg":"Pool is not initialized, skipping refreshing metrics"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:46Z","logger":"health","caller":"runner/health.go:52","msg":"gRPC health check not serving (leader election disabled)","service":"envoy.service.ext_proc.v3.ExternalProcessor"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:49Z","caller":"metrics/logger.go:81","msg":"Pool is not initialized, skipping refreshing metrics"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:54Z","logger":"health","caller":"runner/health.go:52","msg":"gRPC health check not serving (leader election disabled)","service":"envoy.service.ext_proc.v3.ExternalProcessor"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:54Z","caller":"metrics/logger.go:81","msg":"Pool is not initialized, skipping refreshing metrics"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:56Z","logger":"health","caller":"runner/health.go:52","msg":"gRPC health check not serving (leader election disabled)","service":"envoy.service.ext_proc.v3.ExternalProcessor"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:59Z","caller":"metrics/logger.go:81","msg":"Pool is not initialized, skipping refreshing metrics"}
{"level":"Level(-2)","ts":"2025-10-17T08:49:04Z","logger":"health","caller":"runner/health.go:52","msg":"gRPC health check not serving (leader election disabled)","service":"envoy.service.ext_proc.v3.ExternalProcessor"}
{"level":"Level(-2)","ts":"2025-10-17T08:49:04Z","logger":"health","caller":"runner/health.go:52","msg":"gRPC health check not serving (leader election disabled)","service":"envoy.service.ext_proc.v3.ExternalProcessor"}
{"level":"Level(-2)","ts":"2025-10-17T08:49:04Z","caller":"plugins/plugin_state.go:109","msg":"Shutting down plugin state cleanup"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"manager/internal.go:555","msg":"Stopping and waiting for non leader election runnables"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"manager/internal.go:550","msg":"Stopping and waiting for warmup runnables"}
{"level":"Level(-2)","ts":"2025-10-17T08:49:04Z","caller":"metrics/logger.go:46","msg":"Shutting down prometheus metrics thread"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"runnable/grpc.go:54","msg":"gRPC server shutting down","name":"health"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"runnable/grpc.go:54","msg":"gRPC server shutting down","name":"ext-proc"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"runnable/grpc.go:65","msg":"gRPC server terminated","name":"health"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"runnable/grpc.go:65","msg":"gRPC server terminated","name":"ext-proc"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"manager/internal.go:559","msg":"Stopping and waiting for leader election runnables"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"controller/controller.go:309","msg":"Shutdown signal received, waiting for all workers to finish","controller":"inferenceobjective","controllerGroup":"inference.networking.x-k8s.io","controllerKind":"InferenceObjective"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"controller/controller.go:309","msg":"Shutdown signal received, waiting for all workers to finish","controller":"inferencepool","controllerGroup":"inference.networking.k8s.io","controllerKind":"InferencePool"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"controller/controller.go:311","msg":"All workers finished","controller":"inferenceobjective","controllerGroup":"inference.networking.x-k8s.io","controllerKind":"InferenceObjective"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"controller/controller.go:311","msg":"All workers finished","controller":"inferencepool","controllerGroup":"inference.networking.k8s.io","controllerKind":"InferencePool"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"controller/controller.go:309","msg":"Shutdown signal received, waiting for all workers to finish","controller":"pod","controllerGroup":"","controllerKind":"Pod"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"controller/controller.go:311","msg":"All workers finished","controller":"pod","controllerGroup":"","controllerKind":"Pod"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"manager/internal.go:567","msg":"Stopping and waiting for caches"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"manager/internal.go:571","msg":"Stopping and waiting for webhooks"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"manager/internal.go:574","msg":"Stopping and waiting for HTTP servers"}
{"level":"Level(-3)","ts":"2025-10-17T08:49:04Z","logger":"controller-runtime.cache","caller":"cache/reflector.go:364","msg":"Stopping reflector","type":"*v1.Pod","resyncPeriod":34471.287448561,"reflector":"pkg/mod/k8s.io/client-go@v0.34.0/tools/cache/reflector.go:290"}
{"level":"Level(-3)","ts":"2025-10-17T08:49:04Z","logger":"controller-runtime.cache","caller":"cache/reflector.go:364","msg":"Stopping reflector","type":"*v1alpha2.InferenceObjective","resyncPeriod":37158.80205366,"reflector":"pkg/mod/k8s.io/client-go@v0.34.0/tools/cache/reflector.go:290"}
{"level":"Level(-3)","ts":"2025-10-17T08:49:04Z","logger":"controller-runtime.cache","caller":"cache/reflector.go:364","msg":"Stopping reflector","type":"*v1.InferencePool","resyncPeriod":34881.462490506,"reflector":"pkg/mod/k8s.io/client-go@v0.34.0/tools/cache/reflector.go:290"}
{"level":"info","ts":"2025-10-17T08:49:04Z","logger":"controller-runtime.metrics","caller":"server/server.go:254","msg":"Shutting down metrics server with timeout of 1 minute"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"manager/internal.go:578","msg":"Wait completed, proceeding to shutdown the manager"}
{"level":"info","ts":"2025-10-17T08:49:04Z","logger":"setup","caller":"runner/runner.go:295","msg":"Controller manager terminated"}

learner0810 · 2025-10-17T10:06:44Z

Well, not quite. I get no error but the pod still terminates. Any clue why?

{"level":"info","ts":"2025-10-17T08:48:34Z","logger":"setup","caller":"runner/runner.go:144","msg":"GIE build","commit-sha":"unknown","build-ref":""}
{"level":"info","ts":"2025-10-17T08:48:34Z","logger":"setup","caller":"runner/runner.go:157","msg":"Flags processed","flags":{"cert-path":"","config-file":"config/default-config.yaml","config-text":"","enable-pprof":true,"grpc-health-port":9003,"grpc-port":9002,"ha-enable-leader-election":false,"health-checking":false,"kubeconfig":"","kv-cache-usage-percentage-metric":"vllm:gpu_cache_usage_perc","lora-info-metric":"vllm:lora_requests_info","metrics-port":9090,"metrics-staleness-threshold":2000000000,"model-server-metrics-https-insecure-skip-verify":true,"model-server-metrics-path":"/metrics","model-server-metrics-port":0,"model-server-metrics-scheme":"http","pool-group":"inference.networking.k8s.io","pool-name":"gaudi-llm-d-modelservice","pool-namespace":"llm-d","refresh-metrics-interval":50000000,"refresh-prometheus-metrics-interval":5000000000,"secure-serving":true,"total-queued-requests-metric":"vllm:num_requests_waiting","v":4,"zap-devel":true,"zap-encoder":{},"zap-log-level":{},"zap-stacktrace-level":{},"zap-time-encoding":{}}}
{"level":"info","ts":"2025-10-17T08:48:34Z","logger":"saturation-detector-config","caller":"env/env.go:34","msg":"Environment variable not set, using default value","key":"SD_QUEUE_DEPTH_THRESHOLD","defaultValue":5}
{"level":"info","ts":"2025-10-17T08:48:34Z","logger":"saturation-detector-config","caller":"env/env.go:34","msg":"Environment variable not set, using default value","key":"SD_KV_CACHE_UTIL_THRESHOLD","defaultValue":0.8}
{"level":"info","ts":"2025-10-17T08:48:34Z","logger":"saturation-detector-config","caller":"env/env.go:34","msg":"Environment variable not set, using default value","key":"SD_METRICS_STALENESS_THRESHOLD","defaultValue":0.2}
{"level":"info","ts":"2025-10-17T08:48:34Z","logger":"saturation-detector-config","caller":"saturationdetector/config.go:70","msg":"SaturationDetector configuration loaded from env","config":"&{QueueDepthThreshold:5 KVCacheUtilThreshold:0.8 MetricsStalenessThreshold:200ms}"}
{"level":"info","ts":"2025-10-17T08:48:34Z","logger":"setup","caller":"env/env.go:34","msg":"Environment variable not set, using default value","key":"ENABLE_EXPERIMENTAL_DATALAYER_V2","defaultValue":false}
{"level":"info","ts":"2025-10-17T08:48:34Z","logger":"setup","caller":"runner/runner.go:226","msg":"Enabling pprof handlers"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:34Z","caller":"manager/internal.go:196","msg":"Registering metrics http server extra handler","path":"/debug/pprof/heap"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:34Z","caller":"manager/internal.go:196","msg":"Registering metrics http server extra handler","path":"/debug/pprof/goroutine"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:34Z","caller":"manager/internal.go:196","msg":"Registering metrics http server extra handler","path":"/debug/pprof/allocs"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:34Z","caller":"manager/internal.go:196","msg":"Registering metrics http server extra handler","path":"/debug/pprof/threadcreate"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:34Z","caller":"manager/internal.go:196","msg":"Registering metrics http server extra handler","path":"/debug/pprof/block"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:34Z","caller":"manager/internal.go:196","msg":"Registering metrics http server extra handler","path":"/debug/pprof/mutex"}
{"level":"info","ts":"2025-10-17T08:48:34Z","caller":"loader/configloader.go:49","msg":"Loaded configuration","config":"{Plugins: [{/prefix-cache-scorer, Parameters: {\"hashBlockSize\":5,\"lruCapacityPerServer\":31250,\"maxPrefixBlocksToMatch\":256}} {/decode-filter} {/max-score-picker} {/single-profile-handler}], SchedulingProfiles: [{Name: default, Plugins: [{PluginRef: decode-filter} {PluginRef: max-score-picker} {PluginRef: prefix-cache-scorer, Weight: 50}]}]}"}
{"level":"info","ts":"2025-10-17T08:48:34Z","caller":"loader/configloader.go:60","msg":"Configuration with defaults set","config":"{Plugins: [{prefix-cache-scorer/prefix-cache-scorer, Parameters: {\"hashBlockSize\":5,\"lruCapacityPerServer\":31250,\"maxPrefixBlocksToMatch\":256}} {decode-filter/decode-filter} {max-score-picker/max-score-picker} {single-profile-handler/single-profile-handler}], SchedulingProfiles: [{Name: default, Plugins: [{PluginRef: decode-filter} {PluginRef: max-score-picker} {PluginRef: prefix-cache-scorer, Weight: 50}]}]}"}
{"level":"info","ts":"2025-10-17T08:48:34Z","caller":"runner/runner.go:343","msg":"loaded configuration from file/text successfully"}
{"level":"info","ts":"2025-10-17T08:48:34Z","logger":"setup","caller":"runner/runner.go:249","msg":"parsed config","scheduler-config":"{ProfileHandler: single-profile-handler/single-profile-handler, Profiles: map[default:{Filters: [decode-filter/by-label], Scorers: [prefix-cache-scorer/prefix-cache-scorer: 50], Picker: max-score-picker/max-score-picker}]}"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:34Z","logger":"setup.SaturationDetector","caller":"saturationdetector/saturationdetector.go:89","msg":"Creating new SaturationDetector","queueDepthThreshold":5,"kvCacheUtilThreshold":0.8,"metricsStalenessThreshold":"200ms"}
{"level":"info","ts":"2025-10-17T08:48:34Z","logger":"setup","caller":"runner/runner.go:446","msg":"ExtProc server runner added to manager."}
{"level":"info","ts":"2025-10-17T08:48:34Z","logger":"setup","caller":"runner/runner.go:290","msg":"Controller manager starting"}
{"level":"info","ts":"2025-10-17T08:48:34Z","logger":"controller-runtime.metrics","caller":"server/server.go:208","msg":"Starting metrics server"}
{"level":"info","ts":"2025-10-17T08:48:34Z","logger":"controller-runtime.metrics","caller":"server/server.go:247","msg":"Serving metrics server","bindAddress":":9090","secure":false}
{"level":"info","ts":"2025-10-17T08:48:34Z","caller":"runnable/grpc.go:35","msg":"gRPC server starting","name":"health"}
{"level":"info","ts":"2025-10-17T08:48:34Z","caller":"runnable/grpc.go:44","msg":"gRPC server listening","name":"health","port":9003}
{"level":"info","ts":"2025-10-17T08:48:34Z","caller":"controller/controller.go:353","msg":"Starting EventSource","controller":"inferencepool","controllerGroup":"inference.networking.k8s.io","controllerKind":"InferencePool","source":"kind source: *v1.InferencePool"}
{"level":"info","ts":"2025-10-17T08:48:34Z","caller":"controller/controller.go:353","msg":"Starting EventSource","controller":"pod","controllerGroup":"","controllerKind":"Pod","source":"kind source: *v1.Pod"}
{"level":"info","ts":"2025-10-17T08:48:34Z","caller":"controller/controller.go:353","msg":"Starting EventSource","controller":"inferenceobjective","controllerGroup":"inference.networking.x-k8s.io","controllerKind":"InferenceObjective","source":"kind source: *v1alpha2.InferenceObjective"}
{"level":"Level(-3)","ts":"2025-10-17T08:48:34Z","logger":"controller-runtime.cache","caller":"cache/reflector.go:358","msg":"Starting reflector","type":"*v1.Pod","resyncPeriod":34471.287448561,"reflector":"pkg/mod/k8s.io/client-go@v0.34.0/tools/cache/reflector.go:290"}
{"level":"Level(-3)","ts":"2025-10-17T08:48:34Z","logger":"controller-runtime.cache","caller":"cache/reflector.go:404","msg":"Listing and watching","type":"*v1.Pod","reflector":"pkg/mod/k8s.io/client-go@v0.34.0/tools/cache/reflector.go:290"}
{"level":"Level(-3)","ts":"2025-10-17T08:48:34Z","logger":"controller-runtime.cache","caller":"cache/reflector.go:358","msg":"Starting reflector","type":"*v1.InferencePool","resyncPeriod":34881.462490506,"reflector":"pkg/mod/k8s.io/client-go@v0.34.0/tools/cache/reflector.go:290"}
{"level":"Level(-3)","ts":"2025-10-17T08:48:34Z","logger":"controller-runtime.cache","caller":"cache/reflector.go:404","msg":"Listing and watching","type":"*v1.InferencePool","reflector":"pkg/mod/k8s.io/client-go@v0.34.0/tools/cache/reflector.go:290"}
{"level":"Level(-3)","ts":"2025-10-17T08:48:34Z","logger":"controller-runtime.cache","caller":"cache/reflector.go:358","msg":"Starting reflector","type":"*v1alpha2.InferenceObjective","resyncPeriod":37158.80205366,"reflector":"pkg/mod/k8s.io/client-go@v0.34.0/tools/cache/reflector.go:290"}
{"level":"Level(-3)","ts":"2025-10-17T08:48:34Z","logger":"controller-runtime.cache","caller":"cache/reflector.go:404","msg":"Listing and watching","type":"*v1alpha2.InferenceObjective","reflector":"pkg/mod/k8s.io/client-go@v0.34.0/tools/cache/reflector.go:290"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:34Z","logger":"controller-runtime.cache","caller":"cache/reflector.go:436","msg":"Caches populated","type":"*v1.InferencePool","reflector":"pkg/mod/k8s.io/client-go@v0.34.0/tools/cache/reflector.go:290"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:34Z","logger":"controller-runtime.cache","caller":"cache/reflector.go:436","msg":"Caches populated","type":"*v1alpha2.InferenceObjective","reflector":"pkg/mod/k8s.io/client-go@v0.34.0/tools/cache/reflector.go:290"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:34Z","logger":"controller-runtime.cache","caller":"cache/reflector.go:436","msg":"Caches populated","type":"*v1.Pod","reflector":"pkg/mod/k8s.io/client-go@v0.34.0/tools/cache/reflector.go:290"}
{"level":"info","ts":"2025-10-17T08:48:35Z","caller":"controller/controller.go:286","msg":"Starting Controller","controller":"inferencepool","controllerGroup":"inference.networking.k8s.io","controllerKind":"InferencePool"}
{"level":"info","ts":"2025-10-17T08:48:35Z","caller":"controller/controller.go:289","msg":"Starting workers","controller":"inferencepool","controllerGroup":"inference.networking.k8s.io","controllerKind":"InferencePool","worker count":1}
{"level":"info","ts":"2025-10-17T08:48:35Z","caller":"controller/controller.go:286","msg":"Starting Controller","controller":"pod","controllerGroup":"","controllerKind":"Pod"}
{"level":"info","ts":"2025-10-17T08:48:35Z","caller":"controller/controller.go:289","msg":"Starting workers","controller":"pod","controllerGroup":"","controllerKind":"Pod","worker count":1}
{"level":"info","ts":"2025-10-17T08:48:35Z","caller":"controller/controller.go:286","msg":"Starting Controller","controller":"inferenceobjective","controllerGroup":"inference.networking.x-k8s.io","controllerKind":"InferenceObjective"}
{"level":"info","ts":"2025-10-17T08:48:35Z","caller":"controller/controller.go:289","msg":"Starting workers","controller":"inferenceobjective","controllerGroup":"inference.networking.x-k8s.io","controllerKind":"InferenceObjective","worker count":1}
{"level":"info","ts":"2025-10-17T08:48:35Z","caller":"runnable/grpc.go:35","msg":"gRPC server starting","name":"ext-proc"}
{"level":"info","ts":"2025-10-17T08:48:35Z","caller":"runnable/grpc.go:44","msg":"gRPC server listening","name":"ext-proc","port":9002}
{"level":"Level(-2)","ts":"2025-10-17T08:48:39Z","caller":"metrics/logger.go:81","msg":"Pool is not initialized, skipping refreshing metrics"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:44Z","logger":"health","caller":"runner/health.go:52","msg":"gRPC health check not serving (leader election disabled)","service":"envoy.service.ext_proc.v3.ExternalProcessor"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:44Z","caller":"metrics/logger.go:81","msg":"Pool is not initialized, skipping refreshing metrics"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:46Z","logger":"health","caller":"runner/health.go:52","msg":"gRPC health check not serving (leader election disabled)","service":"envoy.service.ext_proc.v3.ExternalProcessor"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:49Z","caller":"metrics/logger.go:81","msg":"Pool is not initialized, skipping refreshing metrics"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:54Z","logger":"health","caller":"runner/health.go:52","msg":"gRPC health check not serving (leader election disabled)","service":"envoy.service.ext_proc.v3.ExternalProcessor"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:54Z","caller":"metrics/logger.go:81","msg":"Pool is not initialized, skipping refreshing metrics"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:56Z","logger":"health","caller":"runner/health.go:52","msg":"gRPC health check not serving (leader election disabled)","service":"envoy.service.ext_proc.v3.ExternalProcessor"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:59Z","caller":"metrics/logger.go:81","msg":"Pool is not initialized, skipping refreshing metrics"}
{"level":"Level(-2)","ts":"2025-10-17T08:49:04Z","logger":"health","caller":"runner/health.go:52","msg":"gRPC health check not serving (leader election disabled)","service":"envoy.service.ext_proc.v3.ExternalProcessor"}
{"level":"Level(-2)","ts":"2025-10-17T08:49:04Z","logger":"health","caller":"runner/health.go:52","msg":"gRPC health check not serving (leader election disabled)","service":"envoy.service.ext_proc.v3.ExternalProcessor"}
{"level":"Level(-2)","ts":"2025-10-17T08:49:04Z","caller":"plugins/plugin_state.go:109","msg":"Shutting down plugin state cleanup"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"manager/internal.go:555","msg":"Stopping and waiting for non leader election runnables"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"manager/internal.go:550","msg":"Stopping and waiting for warmup runnables"}
{"level":"Level(-2)","ts":"2025-10-17T08:49:04Z","caller":"metrics/logger.go:46","msg":"Shutting down prometheus metrics thread"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"runnable/grpc.go:54","msg":"gRPC server shutting down","name":"health"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"runnable/grpc.go:54","msg":"gRPC server shutting down","name":"ext-proc"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"runnable/grpc.go:65","msg":"gRPC server terminated","name":"health"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"runnable/grpc.go:65","msg":"gRPC server terminated","name":"ext-proc"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"manager/internal.go:559","msg":"Stopping and waiting for leader election runnables"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"controller/controller.go:309","msg":"Shutdown signal received, waiting for all workers to finish","controller":"inferenceobjective","controllerGroup":"inference.networking.x-k8s.io","controllerKind":"InferenceObjective"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"controller/controller.go:309","msg":"Shutdown signal received, waiting for all workers to finish","controller":"inferencepool","controllerGroup":"inference.networking.k8s.io","controllerKind":"InferencePool"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"controller/controller.go:311","msg":"All workers finished","controller":"inferenceobjective","controllerGroup":"inference.networking.x-k8s.io","controllerKind":"InferenceObjective"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"controller/controller.go:311","msg":"All workers finished","controller":"inferencepool","controllerGroup":"inference.networking.k8s.io","controllerKind":"InferencePool"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"controller/controller.go:309","msg":"Shutdown signal received, waiting for all workers to finish","controller":"pod","controllerGroup":"","controllerKind":"Pod"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"controller/controller.go:311","msg":"All workers finished","controller":"pod","controllerGroup":"","controllerKind":"Pod"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"manager/internal.go:567","msg":"Stopping and waiting for caches"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"manager/internal.go:571","msg":"Stopping and waiting for webhooks"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"manager/internal.go:574","msg":"Stopping and waiting for HTTP servers"}
{"level":"Level(-3)","ts":"2025-10-17T08:49:04Z","logger":"controller-runtime.cache","caller":"cache/reflector.go:364","msg":"Stopping reflector","type":"*v1.Pod","resyncPeriod":34471.287448561,"reflector":"pkg/mod/k8s.io/client-go@v0.34.0/tools/cache/reflector.go:290"}
{"level":"Level(-3)","ts":"2025-10-17T08:49:04Z","logger":"controller-runtime.cache","caller":"cache/reflector.go:364","msg":"Stopping reflector","type":"*v1alpha2.InferenceObjective","resyncPeriod":37158.80205366,"reflector":"pkg/mod/k8s.io/client-go@v0.34.0/tools/cache/reflector.go:290"}
{"level":"Level(-3)","ts":"2025-10-17T08:49:04Z","logger":"controller-runtime.cache","caller":"cache/reflector.go:364","msg":"Stopping reflector","type":"*v1.InferencePool","resyncPeriod":34881.462490506,"reflector":"pkg/mod/k8s.io/client-go@v0.34.0/tools/cache/reflector.go:290"}
{"level":"info","ts":"2025-10-17T08:49:04Z","logger":"controller-runtime.metrics","caller":"server/server.go:254","msg":"Shutting down metrics server with timeout of 1 minute"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"manager/internal.go:578","msg":"Wait completed, proceeding to shutdown the manager"}
{"level":"info","ts":"2025-10-17T08:49:04Z","logger":"setup","caller":"runner/runner.go:295","msg":"Controller manager terminated"}

According to the IGW code, this is because the pool has not completed syncing.
https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/cmd/epp/runner/health.go#L47

learner0810 · 2025-10-17T10:12:58Z

Getting closer but you still need to add this. Then it works (at least for me).

diff --git a/charts/llm-d-modelservice/templates/epp-role.yaml b/charts/llm-d-modelservice/templates/epp-role.yaml
index 4315ba3..8b912bd 100644
--- a/charts/llm-d-modelservice/templates/epp-role.yaml
+++ b/charts/llm-d-modelservice/templates/epp-role.yaml
@@ -5,9 +5,10 @@ metadata:
   name: {{ include "llm-d-modelservice.eppRoleName" . }}
 rules:
 - apiGroups:
-  - inference.networking.x-k8s.io
+  - inference.networking.k8s.io
   resources:
   - inferencepools
+  - inferenceobjectives
   verbs:
   - get
   - watch

Thanks for reminding, it has been added. The previous operation was wrong, resulting in a duplicate crd group

poussa · 2025-10-17T10:17:40Z

According to the IGW code, this is because the pool has not completed syncing. https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/cmd/epp/runner/health.go#L47

Ah, yeah. I have no gateway etc. running. Let me fix that.

learner0810 · 2025-10-17T10:18:43Z

Well, not quite. I get no error but the pod still terminates. Any clue why?

{"level":"info","ts":"2025-10-17T08:48:34Z","logger":"setup","caller":"runner/runner.go:144","msg":"GIE build","commit-sha":"unknown","build-ref":""}
{"level":"info","ts":"2025-10-17T08:48:34Z","logger":"setup","caller":"runner/runner.go:157","msg":"Flags processed","flags":{"cert-path":"","config-file":"config/default-config.yaml","config-text":"","enable-pprof":true,"grpc-health-port":9003,"grpc-port":9002,"ha-enable-leader-election":false,"health-checking":false,"kubeconfig":"","kv-cache-usage-percentage-metric":"vllm:gpu_cache_usage_perc","lora-info-metric":"vllm:lora_requests_info","metrics-port":9090,"metrics-staleness-threshold":2000000000,"model-server-metrics-https-insecure-skip-verify":true,"model-server-metrics-path":"/metrics","model-server-metrics-port":0,"model-server-metrics-scheme":"http","pool-group":"inference.networking.k8s.io","pool-name":"gaudi-llm-d-modelservice","pool-namespace":"llm-d","refresh-metrics-interval":50000000,"refresh-prometheus-metrics-interval":5000000000,"secure-serving":true,"total-queued-requests-metric":"vllm:num_requests_waiting","v":4,"zap-devel":true,"zap-encoder":{},"zap-log-level":{},"zap-stacktrace-level":{},"zap-time-encoding":{}}}
{"level":"info","ts":"2025-10-17T08:48:34Z","logger":"saturation-detector-config","caller":"env/env.go:34","msg":"Environment variable not set, using default value","key":"SD_QUEUE_DEPTH_THRESHOLD","defaultValue":5}
{"level":"info","ts":"2025-10-17T08:48:34Z","logger":"saturation-detector-config","caller":"env/env.go:34","msg":"Environment variable not set, using default value","key":"SD_KV_CACHE_UTIL_THRESHOLD","defaultValue":0.8}
{"level":"info","ts":"2025-10-17T08:48:34Z","logger":"saturation-detector-config","caller":"env/env.go:34","msg":"Environment variable not set, using default value","key":"SD_METRICS_STALENESS_THRESHOLD","defaultValue":0.2}
{"level":"info","ts":"2025-10-17T08:48:34Z","logger":"saturation-detector-config","caller":"saturationdetector/config.go:70","msg":"SaturationDetector configuration loaded from env","config":"&{QueueDepthThreshold:5 KVCacheUtilThreshold:0.8 MetricsStalenessThreshold:200ms}"}
{"level":"info","ts":"2025-10-17T08:48:34Z","logger":"setup","caller":"env/env.go:34","msg":"Environment variable not set, using default value","key":"ENABLE_EXPERIMENTAL_DATALAYER_V2","defaultValue":false}
{"level":"info","ts":"2025-10-17T08:48:34Z","logger":"setup","caller":"runner/runner.go:226","msg":"Enabling pprof handlers"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:34Z","caller":"manager/internal.go:196","msg":"Registering metrics http server extra handler","path":"/debug/pprof/heap"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:34Z","caller":"manager/internal.go:196","msg":"Registering metrics http server extra handler","path":"/debug/pprof/goroutine"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:34Z","caller":"manager/internal.go:196","msg":"Registering metrics http server extra handler","path":"/debug/pprof/allocs"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:34Z","caller":"manager/internal.go:196","msg":"Registering metrics http server extra handler","path":"/debug/pprof/threadcreate"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:34Z","caller":"manager/internal.go:196","msg":"Registering metrics http server extra handler","path":"/debug/pprof/block"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:34Z","caller":"manager/internal.go:196","msg":"Registering metrics http server extra handler","path":"/debug/pprof/mutex"}
{"level":"info","ts":"2025-10-17T08:48:34Z","caller":"loader/configloader.go:49","msg":"Loaded configuration","config":"{Plugins: [{/prefix-cache-scorer, Parameters: {\"hashBlockSize\":5,\"lruCapacityPerServer\":31250,\"maxPrefixBlocksToMatch\":256}} {/decode-filter} {/max-score-picker} {/single-profile-handler}], SchedulingProfiles: [{Name: default, Plugins: [{PluginRef: decode-filter} {PluginRef: max-score-picker} {PluginRef: prefix-cache-scorer, Weight: 50}]}]}"}
{"level":"info","ts":"2025-10-17T08:48:34Z","caller":"loader/configloader.go:60","msg":"Configuration with defaults set","config":"{Plugins: [{prefix-cache-scorer/prefix-cache-scorer, Parameters: {\"hashBlockSize\":5,\"lruCapacityPerServer\":31250,\"maxPrefixBlocksToMatch\":256}} {decode-filter/decode-filter} {max-score-picker/max-score-picker} {single-profile-handler/single-profile-handler}], SchedulingProfiles: [{Name: default, Plugins: [{PluginRef: decode-filter} {PluginRef: max-score-picker} {PluginRef: prefix-cache-scorer, Weight: 50}]}]}"}
{"level":"info","ts":"2025-10-17T08:48:34Z","caller":"runner/runner.go:343","msg":"loaded configuration from file/text successfully"}
{"level":"info","ts":"2025-10-17T08:48:34Z","logger":"setup","caller":"runner/runner.go:249","msg":"parsed config","scheduler-config":"{ProfileHandler: single-profile-handler/single-profile-handler, Profiles: map[default:{Filters: [decode-filter/by-label], Scorers: [prefix-cache-scorer/prefix-cache-scorer: 50], Picker: max-score-picker/max-score-picker}]}"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:34Z","logger":"setup.SaturationDetector","caller":"saturationdetector/saturationdetector.go:89","msg":"Creating new SaturationDetector","queueDepthThreshold":5,"kvCacheUtilThreshold":0.8,"metricsStalenessThreshold":"200ms"}
{"level":"info","ts":"2025-10-17T08:48:34Z","logger":"setup","caller":"runner/runner.go:446","msg":"ExtProc server runner added to manager."}
{"level":"info","ts":"2025-10-17T08:48:34Z","logger":"setup","caller":"runner/runner.go:290","msg":"Controller manager starting"}
{"level":"info","ts":"2025-10-17T08:48:34Z","logger":"controller-runtime.metrics","caller":"server/server.go:208","msg":"Starting metrics server"}
{"level":"info","ts":"2025-10-17T08:48:34Z","logger":"controller-runtime.metrics","caller":"server/server.go:247","msg":"Serving metrics server","bindAddress":":9090","secure":false}
{"level":"info","ts":"2025-10-17T08:48:34Z","caller":"runnable/grpc.go:35","msg":"gRPC server starting","name":"health"}
{"level":"info","ts":"2025-10-17T08:48:34Z","caller":"runnable/grpc.go:44","msg":"gRPC server listening","name":"health","port":9003}
{"level":"info","ts":"2025-10-17T08:48:34Z","caller":"controller/controller.go:353","msg":"Starting EventSource","controller":"inferencepool","controllerGroup":"inference.networking.k8s.io","controllerKind":"InferencePool","source":"kind source: *v1.InferencePool"}
{"level":"info","ts":"2025-10-17T08:48:34Z","caller":"controller/controller.go:353","msg":"Starting EventSource","controller":"pod","controllerGroup":"","controllerKind":"Pod","source":"kind source: *v1.Pod"}
{"level":"info","ts":"2025-10-17T08:48:34Z","caller":"controller/controller.go:353","msg":"Starting EventSource","controller":"inferenceobjective","controllerGroup":"inference.networking.x-k8s.io","controllerKind":"InferenceObjective","source":"kind source: *v1alpha2.InferenceObjective"}
{"level":"Level(-3)","ts":"2025-10-17T08:48:34Z","logger":"controller-runtime.cache","caller":"cache/reflector.go:358","msg":"Starting reflector","type":"*v1.Pod","resyncPeriod":34471.287448561,"reflector":"pkg/mod/k8s.io/client-go@v0.34.0/tools/cache/reflector.go:290"}
{"level":"Level(-3)","ts":"2025-10-17T08:48:34Z","logger":"controller-runtime.cache","caller":"cache/reflector.go:404","msg":"Listing and watching","type":"*v1.Pod","reflector":"pkg/mod/k8s.io/client-go@v0.34.0/tools/cache/reflector.go:290"}
{"level":"Level(-3)","ts":"2025-10-17T08:48:34Z","logger":"controller-runtime.cache","caller":"cache/reflector.go:358","msg":"Starting reflector","type":"*v1.InferencePool","resyncPeriod":34881.462490506,"reflector":"pkg/mod/k8s.io/client-go@v0.34.0/tools/cache/reflector.go:290"}
{"level":"Level(-3)","ts":"2025-10-17T08:48:34Z","logger":"controller-runtime.cache","caller":"cache/reflector.go:404","msg":"Listing and watching","type":"*v1.InferencePool","reflector":"pkg/mod/k8s.io/client-go@v0.34.0/tools/cache/reflector.go:290"}
{"level":"Level(-3)","ts":"2025-10-17T08:48:34Z","logger":"controller-runtime.cache","caller":"cache/reflector.go:358","msg":"Starting reflector","type":"*v1alpha2.InferenceObjective","resyncPeriod":37158.80205366,"reflector":"pkg/mod/k8s.io/client-go@v0.34.0/tools/cache/reflector.go:290"}
{"level":"Level(-3)","ts":"2025-10-17T08:48:34Z","logger":"controller-runtime.cache","caller":"cache/reflector.go:404","msg":"Listing and watching","type":"*v1alpha2.InferenceObjective","reflector":"pkg/mod/k8s.io/client-go@v0.34.0/tools/cache/reflector.go:290"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:34Z","logger":"controller-runtime.cache","caller":"cache/reflector.go:436","msg":"Caches populated","type":"*v1.InferencePool","reflector":"pkg/mod/k8s.io/client-go@v0.34.0/tools/cache/reflector.go:290"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:34Z","logger":"controller-runtime.cache","caller":"cache/reflector.go:436","msg":"Caches populated","type":"*v1alpha2.InferenceObjective","reflector":"pkg/mod/k8s.io/client-go@v0.34.0/tools/cache/reflector.go:290"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:34Z","logger":"controller-runtime.cache","caller":"cache/reflector.go:436","msg":"Caches populated","type":"*v1.Pod","reflector":"pkg/mod/k8s.io/client-go@v0.34.0/tools/cache/reflector.go:290"}
{"level":"info","ts":"2025-10-17T08:48:35Z","caller":"controller/controller.go:286","msg":"Starting Controller","controller":"inferencepool","controllerGroup":"inference.networking.k8s.io","controllerKind":"InferencePool"}
{"level":"info","ts":"2025-10-17T08:48:35Z","caller":"controller/controller.go:289","msg":"Starting workers","controller":"inferencepool","controllerGroup":"inference.networking.k8s.io","controllerKind":"InferencePool","worker count":1}
{"level":"info","ts":"2025-10-17T08:48:35Z","caller":"controller/controller.go:286","msg":"Starting Controller","controller":"pod","controllerGroup":"","controllerKind":"Pod"}
{"level":"info","ts":"2025-10-17T08:48:35Z","caller":"controller/controller.go:289","msg":"Starting workers","controller":"pod","controllerGroup":"","controllerKind":"Pod","worker count":1}
{"level":"info","ts":"2025-10-17T08:48:35Z","caller":"controller/controller.go:286","msg":"Starting Controller","controller":"inferenceobjective","controllerGroup":"inference.networking.x-k8s.io","controllerKind":"InferenceObjective"}
{"level":"info","ts":"2025-10-17T08:48:35Z","caller":"controller/controller.go:289","msg":"Starting workers","controller":"inferenceobjective","controllerGroup":"inference.networking.x-k8s.io","controllerKind":"InferenceObjective","worker count":1}
{"level":"info","ts":"2025-10-17T08:48:35Z","caller":"runnable/grpc.go:35","msg":"gRPC server starting","name":"ext-proc"}
{"level":"info","ts":"2025-10-17T08:48:35Z","caller":"runnable/grpc.go:44","msg":"gRPC server listening","name":"ext-proc","port":9002}
{"level":"Level(-2)","ts":"2025-10-17T08:48:39Z","caller":"metrics/logger.go:81","msg":"Pool is not initialized, skipping refreshing metrics"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:44Z","logger":"health","caller":"runner/health.go:52","msg":"gRPC health check not serving (leader election disabled)","service":"envoy.service.ext_proc.v3.ExternalProcessor"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:44Z","caller":"metrics/logger.go:81","msg":"Pool is not initialized, skipping refreshing metrics"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:46Z","logger":"health","caller":"runner/health.go:52","msg":"gRPC health check not serving (leader election disabled)","service":"envoy.service.ext_proc.v3.ExternalProcessor"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:49Z","caller":"metrics/logger.go:81","msg":"Pool is not initialized, skipping refreshing metrics"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:54Z","logger":"health","caller":"runner/health.go:52","msg":"gRPC health check not serving (leader election disabled)","service":"envoy.service.ext_proc.v3.ExternalProcessor"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:54Z","caller":"metrics/logger.go:81","msg":"Pool is not initialized, skipping refreshing metrics"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:56Z","logger":"health","caller":"runner/health.go:52","msg":"gRPC health check not serving (leader election disabled)","service":"envoy.service.ext_proc.v3.ExternalProcessor"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:59Z","caller":"metrics/logger.go:81","msg":"Pool is not initialized, skipping refreshing metrics"}
{"level":"Level(-2)","ts":"2025-10-17T08:49:04Z","logger":"health","caller":"runner/health.go:52","msg":"gRPC health check not serving (leader election disabled)","service":"envoy.service.ext_proc.v3.ExternalProcessor"}
{"level":"Level(-2)","ts":"2025-10-17T08:49:04Z","logger":"health","caller":"runner/health.go:52","msg":"gRPC health check not serving (leader election disabled)","service":"envoy.service.ext_proc.v3.ExternalProcessor"}
{"level":"Level(-2)","ts":"2025-10-17T08:49:04Z","caller":"plugins/plugin_state.go:109","msg":"Shutting down plugin state cleanup"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"manager/internal.go:555","msg":"Stopping and waiting for non leader election runnables"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"manager/internal.go:550","msg":"Stopping and waiting for warmup runnables"}
{"level":"Level(-2)","ts":"2025-10-17T08:49:04Z","caller":"metrics/logger.go:46","msg":"Shutting down prometheus metrics thread"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"runnable/grpc.go:54","msg":"gRPC server shutting down","name":"health"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"runnable/grpc.go:54","msg":"gRPC server shutting down","name":"ext-proc"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"runnable/grpc.go:65","msg":"gRPC server terminated","name":"health"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"runnable/grpc.go:65","msg":"gRPC server terminated","name":"ext-proc"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"manager/internal.go:559","msg":"Stopping and waiting for leader election runnables"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"controller/controller.go:309","msg":"Shutdown signal received, waiting for all workers to finish","controller":"inferenceobjective","controllerGroup":"inference.networking.x-k8s.io","controllerKind":"InferenceObjective"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"controller/controller.go:309","msg":"Shutdown signal received, waiting for all workers to finish","controller":"inferencepool","controllerGroup":"inference.networking.k8s.io","controllerKind":"InferencePool"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"controller/controller.go:311","msg":"All workers finished","controller":"inferenceobjective","controllerGroup":"inference.networking.x-k8s.io","controllerKind":"InferenceObjective"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"controller/controller.go:311","msg":"All workers finished","controller":"inferencepool","controllerGroup":"inference.networking.k8s.io","controllerKind":"InferencePool"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"controller/controller.go:309","msg":"Shutdown signal received, waiting for all workers to finish","controller":"pod","controllerGroup":"","controllerKind":"Pod"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"controller/controller.go:311","msg":"All workers finished","controller":"pod","controllerGroup":"","controllerKind":"Pod"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"manager/internal.go:567","msg":"Stopping and waiting for caches"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"manager/internal.go:571","msg":"Stopping and waiting for webhooks"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"manager/internal.go:574","msg":"Stopping and waiting for HTTP servers"}
{"level":"Level(-3)","ts":"2025-10-17T08:49:04Z","logger":"controller-runtime.cache","caller":"cache/reflector.go:364","msg":"Stopping reflector","type":"*v1.Pod","resyncPeriod":34471.287448561,"reflector":"pkg/mod/k8s.io/client-go@v0.34.0/tools/cache/reflector.go:290"}
{"level":"Level(-3)","ts":"2025-10-17T08:49:04Z","logger":"controller-runtime.cache","caller":"cache/reflector.go:364","msg":"Stopping reflector","type":"*v1alpha2.InferenceObjective","resyncPeriod":37158.80205366,"reflector":"pkg/mod/k8s.io/client-go@v0.34.0/tools/cache/reflector.go:290"}
{"level":"Level(-3)","ts":"2025-10-17T08:49:04Z","logger":"controller-runtime.cache","caller":"cache/reflector.go:364","msg":"Stopping reflector","type":"*v1.InferencePool","resyncPeriod":34881.462490506,"reflector":"pkg/mod/k8s.io/client-go@v0.34.0/tools/cache/reflector.go:290"}
{"level":"info","ts":"2025-10-17T08:49:04Z","logger":"controller-runtime.metrics","caller":"server/server.go:254","msg":"Shutting down metrics server with timeout of 1 minute"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"manager/internal.go:578","msg":"Wait completed, proceeding to shutdown the manager"}
{"level":"info","ts":"2025-10-17T08:49:04Z","logger":"setup","caller":"runner/runner.go:295","msg":"Controller manager terminated"}

According to the IGW code, this is because the pool has not completed syncing. https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/cmd/epp/runner/health.go#L47

Excuse me, is the EPP Pod working properly now?

poussa · 2025-10-17T11:21:54Z

This works now (at least for me).

Details:

when I use only the model-service helm chart for deployment, I get the epp pod up&running wo/ any errors. But then it terminates as there is no gateway (or something) per @learner0810 above.
When I use this PR from the llm-d/guides/inference-scheduling all the pods working properly.

IMO, this is now ready for merge.

poussa · 2025-10-17T11:34:24Z

/cc @jgchn @kalantar

learner0810 · 2025-10-17T12:27:37Z

This works now (at least for me).

Details:

when I use only the model-service helm chart for deployment, I get the epp pod up&running wo/ any errors. But then it terminates as there is no gateway (or something) per @learner0810 above.

When I use this PR from the llm-d/guides/inference-scheduling all the pods working properly.

IMO, this is now ready for merge.

GAIE charts only initiate health checks when the number of copies exceeds 1. https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/config%2Fcharts%2Finferencepool%2Ftemplates%2Fepp-deployment.yaml#L85-L85

However, the lm-d model-service defaults to initiating health checks.

learner0810 · 2025-10-18T07:57:31Z

This works now (at least for me).
Details:

when I use only the model-service helm chart for deployment, I get the epp pod up&running wo/ any errors. But then it terminates as there is no gateway (or something) per @learner0810 above.

When I use this PR from the llm-d/guides/inference-scheduling all the pods working properly.

IMO, this is now ready for merge.

GAIE charts only initiate health checks when the number of copies exceeds 1. https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/config%2Fcharts%2Finferencepool%2Ftemplates%2Fepp-deployment.yaml#L85-L85

However, the lm-d model-service defaults to initiating health checks.

Due to GAIE modifying the health check and readiness check logic, if the gateway is not installed or malfunctions, the EPP pod fails the checks and remains in a perpetual failure state. Therefore, I have modified the default settings for the llm-d-modelservice to disable both the health check and readiness check by default.
cc @yankay @jgchn @kalantar

jgchn

I think this looks fine to me. @kalantar WDYT?

Signed-off-by: learner0810 <zhongjun.li@daocloud.io>

kalantar · 2025-10-21T12:48:29Z

The endpoint picker/inferencepool pieces should all be removed from the modelservice chart. There is an upstream chart defined here: https://github.com/kubernetes-sigs/gateway-api-inference-extension/tree/main/config/charts/inferencepool (released versions at oci://registry.k8s.io/gateway-api-inference-extension/charts/inferencepool) that has all these updates already. Can you try this and see where you are stuck?

See #135.

learner0810 · 2025-10-21T13:35:49Z

The endpoint picker/inferencepool pieces should all be removed from the modelservice chart. There is an upstream chart defined here: https://github.com/kubernetes-sigs/gateway-api-inference-extension/tree/main/config/charts/inferencepool (released versions at oci://registry.k8s.io/gateway-api-inference-extension/charts/inferencepool) that has all these updates already. Can you try this and see where you are stuck?

See #135.

Could you please clarify whether you mean all EPP-related YAML files need to be deleted?

learner0810 · 2025-10-22T02:01:31Z

resolved #145

yankay reviewed Oct 14, 2025

View reviewed changes

Comment thread README.md Outdated

learner0810 force-pushed the Deprecate-InferenceModel branch 2 times, most recently from 63f43f0 to 9747017 Compare October 14, 2025 09:44

jgchn reviewed Oct 14, 2025

View reviewed changes

learner0810 force-pushed the Deprecate-InferenceModel branch from 9747017 to f4fad91 Compare October 15, 2025 01:53

JaredTan95 approved these changes Oct 15, 2025

View reviewed changes

learner0810 force-pushed the Deprecate-InferenceModel branch 5 times, most recently from a32542c to c5d00f2 Compare October 17, 2025 02:34

learner0810 force-pushed the Deprecate-InferenceModel branch from c5d00f2 to 44a1c12 Compare October 17, 2025 07:53

learner0810 force-pushed the Deprecate-InferenceModel branch from 44a1c12 to 61601e1 Compare October 17, 2025 07:55

learner0810 force-pushed the Deprecate-InferenceModel branch from 61601e1 to e65a82d Compare October 17, 2025 08:35

yankay reviewed Oct 17, 2025

View reviewed changes

Comment thread charts/llm-d-modelservice/templates/epp-role.yaml Outdated

learner0810 force-pushed the Deprecate-InferenceModel branch from e65a82d to f476f43 Compare October 17, 2025 10:11

github-actions Bot requested review from jgchn and kalantar October 17, 2025 11:34

yankay reviewed Oct 17, 2025

View reviewed changes

Comment thread charts/llm-d-modelservice/templates/epp-role.yaml

poussa mentioned this pull request Oct 17, 2025

Inference scheduling support for Intel Gaudi accelerator llm-d/llm-d#374

Merged

learner0810 force-pushed the Deprecate-InferenceModel branch 2 times, most recently from a5ad946 to 38b2e04 Compare October 18, 2025 08:04

jgchn reviewed Oct 18, 2025

View reviewed changes

Comment thread .github/workflows/lint-charts.yaml Outdated

Deprecate InferenceModel

b86b3e2

Signed-off-by: learner0810 <zhongjun.li@daocloud.io>

learner0810 force-pushed the Deprecate-InferenceModel branch from 38b2e04 to b86b3e2 Compare October 18, 2025 23:10

poussa approved these changes Oct 20, 2025

View reviewed changes

learner0810 closed this Oct 22, 2025

Conversation

learner0810 commented Oct 13, 2025

Uh oh!

Uh oh!

yankay commented Oct 14, 2025

Uh oh!

learner0810 commented Oct 14, 2025

Uh oh!

jgchn left a comment

Choose a reason for hiding this comment

Uh oh!

learner0810 commented Oct 15, 2025

Uh oh!

yankay commented Oct 15, 2025

Uh oh!

learner0810 commented Oct 15, 2025

Uh oh!

yankay commented Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

poussa commented Oct 17, 2025

Uh oh!

yankay commented Oct 17, 2025

Uh oh!

poussa commented Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

poussa commented Oct 17, 2025

Uh oh!

learner0810 commented Oct 17, 2025

Uh oh!

poussa commented Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

learner0810 commented Oct 17, 2025

Uh oh!

poussa commented Oct 17, 2025

Uh oh!

learner0810 commented Oct 17, 2025

Uh oh!

poussa commented Oct 17, 2025

Uh oh!

poussa commented Oct 17, 2025

Uh oh!

Uh oh!

learner0810 commented Oct 17, 2025

Uh oh!

learner0810 commented Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

poussa commented Oct 17, 2025

Uh oh!

learner0810 commented Oct 17, 2025

Uh oh!

poussa commented Oct 17, 2025

Uh oh!

poussa commented Oct 17, 2025

Uh oh!

Uh oh!

learner0810 commented Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

learner0810 commented Oct 18, 2025

Uh oh!

jgchn left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kalantar commented Oct 21, 2025

Uh oh!

learner0810 commented Oct 21, 2025

Uh oh!

learner0810 commented Oct 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

yankay commented Oct 15, 2025 •

edited

Loading

poussa commented Oct 17, 2025 •

edited

Loading

poussa commented Oct 17, 2025 •

edited

Loading

learner0810 commented Oct 17, 2025 •

edited

Loading

learner0810 commented Oct 17, 2025 •

edited

Loading