Skip to content

Deprecate InferenceModel#129

Closed
learner0810 wants to merge 1 commit into
llm-d-incubation:mainfrom
learner0810:Deprecate-InferenceModel
Closed

Deprecate InferenceModel#129
learner0810 wants to merge 1 commit into
llm-d-incubation:mainfrom
learner0810:Deprecate-InferenceModel

Conversation

@learner0810

Copy link
Copy Markdown
Contributor

FIX: #121

Comment thread README.md Outdated
@yankay

yankay commented Oct 14, 2025

Copy link
Copy Markdown
Collaborator

The chart version needs to be updated like: #125

@learner0810 learner0810 force-pushed the Deprecate-InferenceModel branch 2 times, most recently from 63f43f0 to 9747017 Compare October 14, 2025 09:44
@learner0810

Copy link
Copy Markdown
Contributor Author

The chart version needs to be updated like: #125

Thanks for the review, the chart version has been updated

@jgchn jgchn left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, please run make verify. Thanks for the PR!

@learner0810 learner0810 force-pushed the Deprecate-InferenceModel branch from 9747017 to f4fad91 Compare October 15, 2025 01:53
@learner0810

Copy link
Copy Markdown
Contributor Author

LGTM, please run make verify. Thanks for the PR!

Thanks for the reminder, the changes have been committed

@yankay

yankay commented Oct 15, 2025

Copy link
Copy Markdown
Collaborator

LGTM, please run make verify. Thanks for the PR!

Thanks for the reminder, the changes have been committed

Thanks @learner0810

It is recommended to manually run make pre-commit-run for a check before merging #131.

@learner0810

Copy link
Copy Markdown
Contributor Author

LGTM, please run make verify. Thanks for the PR!

Thanks for the reminder, the changes have been committed

Thanks @learner0810

It is recommended to manually run make pre-commit-run for a check before merging #131.

Sorry to bother you. Running make pre-commit-run causes changes in PR #123 to be deleted.

My changes passed the pre-commit checks. Should we leave the #123 issue to be resolved by #131? What do you think?

host@hostdeMacBook-Pro llm-d-modelservice % make pre-commit-run                  
hack/install-tools.sh
ct is already installed. Skipping installation.
Set FORCE_INSTALL=true to reinstall.
'ct' has been installed successfully. Location: /Users/host/go/src/github/llm-d-modelservice/bin
helm is already installed. Skipping installation.
Set FORCE_INSTALL=true to reinstall.
'helm' has been installed successfully. Location: /Users/host/go/src/github/llm-d-modelservice/bin
pre-commit is already installed. Skipping installation.
Set FORCE_INSTALL=true to reinstall.
'precommit' has been installed successfully. Location: /Users/host/go/src/github/llm-d-modelservice/bin
pre-commit run --all-files
Generate jsonschema......................................................Failed
- hook id: helm-schema
- files were modified by this hook
check for added large files..............................................Passed
check for merge conflicts................................................Passed
check json...............................................................Passed
detect private key.......................................................Passed
fix end of files.........................................................Passed
mixed line ending........................................................Passed
trim trailing whitespace.................................................Passed
jsonschema-dereference...................................................Failed
- hook id: jsonschema-dereference
- files were modified by this hook
make: *** [pre-commit-run] Error 1
host@hostdeMacBook-Pro llm-d-modelservice % git diff
diff --git a/charts/llm-d-modelservice/values.schema.json b/charts/llm-d-modelservice/values.schema.json
index 8e7db4c..92c9ba2 100644
--- a/charts/llm-d-modelservice/values.schema.json
+++ b/charts/llm-d-modelservice/values.schema.json
@@ -2,10 +2,6 @@
     "$schema": "http://json-schema.org/draft-07/schema#",
     "additionalProperties": false,
     "properties": {
-        "enabled": {
-            "description": "Usually used when using llm-d-modelservice as a subchart.",
-            "type": "boolean"
-        },
         "accelerator": {
             "additionalProperties": false,
             "description": " Supported types: nvidia, intel-i915, intel-xe, amd, google",
diff --git a/charts/llm-d-modelservice/values.schema.tmpl.json b/charts/llm-d-modelservice/values.schema.tmpl.json
index 6fd64c1..86f4e33 100644
--- a/charts/llm-d-modelservice/values.schema.tmpl.json
+++ b/charts/llm-d-modelservice/values.schema.tmpl.json
@@ -2,10 +2,6 @@
   "$schema": "http://json-schema.org/draft-07/schema#",
   "additionalProperties": false,
   "properties": {
-    "enabled": {
-        "description": "Usually used when using llm-d-modelservice as a subchart.",
-        "type": "boolean"
-    },
     "accelerator": {
       "additionalProperties": false,
       "description": " Supported types: nvidia, intel-i915, intel-xe, amd, google",
host@hostdeMacBook-Pro llm-d-modelservice % 

@yankay

yankay commented Oct 15, 2025

Copy link
Copy Markdown
Collaborator

LGTM, please run make verify. Thanks for the PR!

Thanks for the reminder, the changes have been committed

Thanks @learner0810
It is recommended to manually run make pre-commit-run for a check before merging #131.

Sorry to bother you. Running make pre-commit-run causes changes in PR #123 to be deleted.

My changes passed the pre-commit checks. Should we leave the #123 issue to be resolved by #131? What do you think?

host@hostdeMacBook-Pro llm-d-modelservice % make pre-commit-run                  
hack/install-tools.sh
ct is already installed. Skipping installation.
Set FORCE_INSTALL=true to reinstall.
'ct' has been installed successfully. Location: /Users/host/go/src/github/llm-d-modelservice/bin
helm is already installed. Skipping installation.
Set FORCE_INSTALL=true to reinstall.
'helm' has been installed successfully. Location: /Users/host/go/src/github/llm-d-modelservice/bin
pre-commit is already installed. Skipping installation.
Set FORCE_INSTALL=true to reinstall.
'precommit' has been installed successfully. Location: /Users/host/go/src/github/llm-d-modelservice/bin
pre-commit run --all-files
Generate jsonschema......................................................Failed
- hook id: helm-schema
- files were modified by this hook
check for added large files..............................................Passed
check for merge conflicts................................................Passed
check json...............................................................Passed
detect private key.......................................................Passed
fix end of files.........................................................Passed
mixed line ending........................................................Passed
trim trailing whitespace.................................................Passed
jsonschema-dereference...................................................Failed
- hook id: jsonschema-dereference
- files were modified by this hook
make: *** [pre-commit-run] Error 1
host@hostdeMacBook-Pro llm-d-modelservice % git diff
diff --git a/charts/llm-d-modelservice/values.schema.json b/charts/llm-d-modelservice/values.schema.json
index 8e7db4c..92c9ba2 100644
--- a/charts/llm-d-modelservice/values.schema.json
+++ b/charts/llm-d-modelservice/values.schema.json
@@ -2,10 +2,6 @@
     "$schema": "http://json-schema.org/draft-07/schema#",
     "additionalProperties": false,
     "properties": {
-        "enabled": {
-            "description": "Usually used when using llm-d-modelservice as a subchart.",
-            "type": "boolean"
-        },
         "accelerator": {
             "additionalProperties": false,
             "description": " Supported types: nvidia, intel-i915, intel-xe, amd, google",
diff --git a/charts/llm-d-modelservice/values.schema.tmpl.json b/charts/llm-d-modelservice/values.schema.tmpl.json
index 6fd64c1..86f4e33 100644
--- a/charts/llm-d-modelservice/values.schema.tmpl.json
+++ b/charts/llm-d-modelservice/values.schema.tmpl.json
@@ -2,10 +2,6 @@
   "$schema": "http://json-schema.org/draft-07/schema#",
   "additionalProperties": false,
   "properties": {
-    "enabled": {
-        "description": "Usually used when using llm-d-modelservice as a subchart.",
-        "type": "boolean"
-    },
     "accelerator": {
       "additionalProperties": false,
       "description": " Supported types: nvidia, intel-i915, intel-xe, amd, google",
host@hostdeMacBook-Pro llm-d-modelservice % 

it's a good idea :-)
/lgtm

@learner0810 learner0810 force-pushed the Deprecate-InferenceModel branch 5 times, most recently from a32542c to c5d00f2 Compare October 17, 2025 02:34
@poussa

poussa commented Oct 17, 2025

Copy link
Copy Markdown
Contributor

Can we get this merged?

@yankay

yankay commented Oct 17, 2025

Copy link
Copy Markdown
Collaborator

The chart version needs to be updated :-)

@learner0810 learner0810 force-pushed the Deprecate-InferenceModel branch from c5d00f2 to 44a1c12 Compare October 17, 2025 07:53
@poussa

poussa commented Oct 17, 2025

Copy link
Copy Markdown
Contributor

The chart version needs to be updated :-)

Not necessarily. We can collect multiple changes and then update/release a new version.

EDIT: Well, it was already done...that is good.

@learner0810 learner0810 force-pushed the Deprecate-InferenceModel branch from 44a1c12 to 61601e1 Compare October 17, 2025 07:55
@poussa

poussa commented Oct 17, 2025

Copy link
Copy Markdown
Contributor

Hmm, I am getting on error when testing this:

{"level":"error","ts":"2025-10-17T07:47:42Z","logger":"controller-runtime.source.Kind","caller":"source/kind.go:80","msg":"failed to get informer from cache","error":"Timeout: failed waiting for *v1.InferencePool Informer to sync","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/source.(*Kind[...]).Start.func1.1\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.22.0/pkg/internal/source/kind.go:80\nk8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext.func1\n\t/go/pkg/mod/k8s.io/apimachinery@v0.34.0/pkg/util/wait/loop.go:53\nk8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext\n\t/go/pkg/mod/k8s.io/apimachinery@v0.34.0/pkg/util/wait/loop.go:54\nk8s.io/apimachinery/pkg/util/wait.PollUntilContextCancel\n\t/go/pkg/mod/k8s.io/apimachinery@v0.34.0/pkg/util/wait/poll.go:33\nsigs.k8s.io/controller-runtime/pkg/internal/source.(*Kind[...]).Start.func1\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.22.0/pkg/internal/source/kind.go:68"}

I have the CRDs:

inferenceobjectives.inference.networking.x-k8s.io                 2025-10-17T07:30:05Z
inferencepools.inference.networking.k8s.io                        2025-10-17T07:30:06Z
inferencepools.inference.networking.x-k8s.io                      2025-10-17T07:30:07Z

Is this related to the PR?

@learner0810

Copy link
Copy Markdown
Contributor Author

Hmm, I am getting on error when testing this:

{"level":"error","ts":"2025-10-17T07:47:42Z","logger":"controller-runtime.source.Kind","caller":"source/kind.go:80","msg":"failed to get informer from cache","error":"Timeout: failed waiting for *v1.InferencePool Informer to sync","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/source.(*Kind[...]).Start.func1.1\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.22.0/pkg/internal/source/kind.go:80\nk8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext.func1\n\t/go/pkg/mod/k8s.io/apimachinery@v0.34.0/pkg/util/wait/loop.go:53\nk8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext\n\t/go/pkg/mod/k8s.io/apimachinery@v0.34.0/pkg/util/wait/loop.go:54\nk8s.io/apimachinery/pkg/util/wait.PollUntilContextCancel\n\t/go/pkg/mod/k8s.io/apimachinery@v0.34.0/pkg/util/wait/poll.go:33\nsigs.k8s.io/controller-runtime/pkg/internal/source.(*Kind[...]).Start.func1\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.22.0/pkg/internal/source/kind.go:68"}

I have the CRDs:

inferenceobjectives.inference.networking.x-k8s.io                 2025-10-17T07:30:05Z
inferencepools.inference.networking.k8s.io                        2025-10-17T07:30:06Z
inferencepools.inference.networking.x-k8s.io                      2025-10-17T07:30:07Z

Is this related to the PR?

Yes, I've upgraded the llm-d-inference-scheduler version. You can resolve this by running the following command:

kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/releases/download/v1.0.1/v1-manifests.yaml

@poussa

poussa commented Oct 17, 2025

Copy link
Copy Markdown
Contributor

Yes, I've upgraded the llm-d-inference-scheduler version. You can resolve this by running the following command:

kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/releases/download/v1.0.1/v1-manifests.yaml

I have already done:

k apply -k "https://github.com/kubernetes-sigs/gateway-api-inference-extension/config/crd/?ref=v1.0.1"

Isn't that the same thing?

@learner0810

Copy link
Copy Markdown
Contributor Author

Yes, I've upgraded the llm-d-inference-scheduler version. You can resolve this by running the following command:

kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/releases/download/v1.0.1/v1-manifests.yaml

I have already done:

k apply -k "https://github.com/kubernetes-sigs/gateway-api-inference-extension/config/crd/?ref=v1.0.1"

Isn't that the same thing?

Yeah, the same thing.

@poussa

poussa commented Oct 17, 2025

Copy link
Copy Markdown
Contributor

Actually, there is an earlier error, which is related to RBAC, I guess.

{"level":"error","ts":"2025-10-17T08:13:08Z","logger":"controller-runtime.cache.UnhandledError","caller":"runtime/runtime.go:221","msg":"Failed to watch","reflector":"pkg/mod/k8s.io/client-go@v0.34.0/tools/cache/reflector.go:290","type":"*v1alpha2.InferenceObjective","error":"failed to list *v1alpha2.InferenceObjective: inferenceobjectives.inference.networking.x-k8s.io is forbidden: User \"system:serviceaccount:llm-d:gaudi-llm-d-modelservice-epp\" cannot list resource \"inferenceobjectives\" in API group \"inference.networking.x-k8s.io\" in the namespace \"llm-d\"","stacktrace":"k8s.io/apimachinery/pkg/util/runtime.logError\n\t/go/pkg/mod/k8s.io/apimachinery@v0.34.0/pkg/util/runtime/runtime.go:221\nk8s.io/apimachinery/pkg/util/runtime.handleError\n\t/go/pkg/mod/k8s.io/apimachinery@v0.34.0/pkg/util/runtime/runtime.go:212\nk8s.io/apimachinery/pkg/util/runtime.HandleErrorWithContext\n\t/go/pkg/mod/k8s.io/apimachinery@v0.34.0/pkg/util/runtime/runtime.go:198\nk8s.io/client-go/tools/cache.DefaultWatchErrorHandler\n\t/go/pkg/mod/k8s.io/client-go@v0.34.0/tools/cache/reflector.go:205\nk8s.io/client-go/tools/cache.(*Reflector).RunWithContext.func1\n\t/go/pkg/mod/k8s.io/client-go@v0.34.0/tools/cache/reflector.go:361\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\t/go/pkg/mod/k8s.io/apimachinery@v0.34.0/pkg/util/wait/backoff.go:233\nk8s.io/apimachinery/pkg/util/wait.BackoffUntilWithContext.func1\n\t/go/pkg/mod/k8s.io/apimachinery@v0.34.0/pkg/util/wait/backoff.go:255\nk8s.io/apimachinery/pkg/util/wait.BackoffUntilWithContext\n\t/go/pkg/mod/k8s.io/apimachinery@v0.34.0/pkg/util/wait/backoff.go:256\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\t/go/pkg/mod/k8s.io/apimachinery@v0.34.0/pkg/util/wait/backoff.go:233\nk8s.io/client-go/tools/cache.(*Reflector).RunWithContext\n\t/go/pkg/mod/k8s.io/client-go@v0.34.0/tools/cache/reflector.go:359\nk8s.io/client-go/tools/cache.(*controller).RunWithContext.(*Group).StartWithContext.func3\n\t/go/pkg/mod/k8s.io/apimachinery@v0.34.0/pkg/util/wait/wait.go:63\nk8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1\n\t/go/pkg/mod/k8s.io/apimachinery@v0.34.0/pkg/util/wait/wait.go:72"}

Any clue where this is coming from?

@learner0810 learner0810 force-pushed the Deprecate-InferenceModel branch from 61601e1 to e65a82d Compare October 17, 2025 08:35
@learner0810

Copy link
Copy Markdown
Contributor Author

Any clue where this is coming from?

I am very sorry that I missed the rbac of the newly added cr resource, and it has been fixed. Please reinstall it

@poussa

poussa commented Oct 17, 2025

Copy link
Copy Markdown
Contributor

Getting closer but you still need to add this. Then it works (at least for me).

diff --git a/charts/llm-d-modelservice/templates/epp-role.yaml b/charts/llm-d-modelservice/templates/epp-role.yaml
index 4315ba3..8b912bd 100644
--- a/charts/llm-d-modelservice/templates/epp-role.yaml
+++ b/charts/llm-d-modelservice/templates/epp-role.yaml
@@ -5,9 +5,10 @@ metadata:
   name: {{ include "llm-d-modelservice.eppRoleName" . }}
 rules:
 - apiGroups:
-  - inference.networking.x-k8s.io
+  - inference.networking.k8s.io
   resources:
   - inferencepools
+  - inferenceobjectives
   verbs:
   - get
   - watch

@poussa

poussa commented Oct 17, 2025

Copy link
Copy Markdown
Contributor

Well, not quite. I get no error but the pod still terminates. Any clue why?

{"level":"info","ts":"2025-10-17T08:48:34Z","logger":"setup","caller":"runner/runner.go:144","msg":"GIE build","commit-sha":"unknown","build-ref":""}
{"level":"info","ts":"2025-10-17T08:48:34Z","logger":"setup","caller":"runner/runner.go:157","msg":"Flags processed","flags":{"cert-path":"","config-file":"config/default-config.yaml","config-text":"","enable-pprof":true,"grpc-health-port":9003,"grpc-port":9002,"ha-enable-leader-election":false,"health-checking":false,"kubeconfig":"","kv-cache-usage-percentage-metric":"vllm:gpu_cache_usage_perc","lora-info-metric":"vllm:lora_requests_info","metrics-port":9090,"metrics-staleness-threshold":2000000000,"model-server-metrics-https-insecure-skip-verify":true,"model-server-metrics-path":"/metrics","model-server-metrics-port":0,"model-server-metrics-scheme":"http","pool-group":"inference.networking.k8s.io","pool-name":"gaudi-llm-d-modelservice","pool-namespace":"llm-d","refresh-metrics-interval":50000000,"refresh-prometheus-metrics-interval":5000000000,"secure-serving":true,"total-queued-requests-metric":"vllm:num_requests_waiting","v":4,"zap-devel":true,"zap-encoder":{},"zap-log-level":{},"zap-stacktrace-level":{},"zap-time-encoding":{}}}
{"level":"info","ts":"2025-10-17T08:48:34Z","logger":"saturation-detector-config","caller":"env/env.go:34","msg":"Environment variable not set, using default value","key":"SD_QUEUE_DEPTH_THRESHOLD","defaultValue":5}
{"level":"info","ts":"2025-10-17T08:48:34Z","logger":"saturation-detector-config","caller":"env/env.go:34","msg":"Environment variable not set, using default value","key":"SD_KV_CACHE_UTIL_THRESHOLD","defaultValue":0.8}
{"level":"info","ts":"2025-10-17T08:48:34Z","logger":"saturation-detector-config","caller":"env/env.go:34","msg":"Environment variable not set, using default value","key":"SD_METRICS_STALENESS_THRESHOLD","defaultValue":0.2}
{"level":"info","ts":"2025-10-17T08:48:34Z","logger":"saturation-detector-config","caller":"saturationdetector/config.go:70","msg":"SaturationDetector configuration loaded from env","config":"&{QueueDepthThreshold:5 KVCacheUtilThreshold:0.8 MetricsStalenessThreshold:200ms}"}
{"level":"info","ts":"2025-10-17T08:48:34Z","logger":"setup","caller":"env/env.go:34","msg":"Environment variable not set, using default value","key":"ENABLE_EXPERIMENTAL_DATALAYER_V2","defaultValue":false}
{"level":"info","ts":"2025-10-17T08:48:34Z","logger":"setup","caller":"runner/runner.go:226","msg":"Enabling pprof handlers"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:34Z","caller":"manager/internal.go:196","msg":"Registering metrics http server extra handler","path":"/debug/pprof/heap"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:34Z","caller":"manager/internal.go:196","msg":"Registering metrics http server extra handler","path":"/debug/pprof/goroutine"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:34Z","caller":"manager/internal.go:196","msg":"Registering metrics http server extra handler","path":"/debug/pprof/allocs"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:34Z","caller":"manager/internal.go:196","msg":"Registering metrics http server extra handler","path":"/debug/pprof/threadcreate"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:34Z","caller":"manager/internal.go:196","msg":"Registering metrics http server extra handler","path":"/debug/pprof/block"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:34Z","caller":"manager/internal.go:196","msg":"Registering metrics http server extra handler","path":"/debug/pprof/mutex"}
{"level":"info","ts":"2025-10-17T08:48:34Z","caller":"loader/configloader.go:49","msg":"Loaded configuration","config":"{Plugins: [{/prefix-cache-scorer, Parameters: {\"hashBlockSize\":5,\"lruCapacityPerServer\":31250,\"maxPrefixBlocksToMatch\":256}} {/decode-filter} {/max-score-picker} {/single-profile-handler}], SchedulingProfiles: [{Name: default, Plugins: [{PluginRef: decode-filter} {PluginRef: max-score-picker} {PluginRef: prefix-cache-scorer, Weight: 50}]}]}"}
{"level":"info","ts":"2025-10-17T08:48:34Z","caller":"loader/configloader.go:60","msg":"Configuration with defaults set","config":"{Plugins: [{prefix-cache-scorer/prefix-cache-scorer, Parameters: {\"hashBlockSize\":5,\"lruCapacityPerServer\":31250,\"maxPrefixBlocksToMatch\":256}} {decode-filter/decode-filter} {max-score-picker/max-score-picker} {single-profile-handler/single-profile-handler}], SchedulingProfiles: [{Name: default, Plugins: [{PluginRef: decode-filter} {PluginRef: max-score-picker} {PluginRef: prefix-cache-scorer, Weight: 50}]}]}"}
{"level":"info","ts":"2025-10-17T08:48:34Z","caller":"runner/runner.go:343","msg":"loaded configuration from file/text successfully"}
{"level":"info","ts":"2025-10-17T08:48:34Z","logger":"setup","caller":"runner/runner.go:249","msg":"parsed config","scheduler-config":"{ProfileHandler: single-profile-handler/single-profile-handler, Profiles: map[default:{Filters: [decode-filter/by-label], Scorers: [prefix-cache-scorer/prefix-cache-scorer: 50], Picker: max-score-picker/max-score-picker}]}"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:34Z","logger":"setup.SaturationDetector","caller":"saturationdetector/saturationdetector.go:89","msg":"Creating new SaturationDetector","queueDepthThreshold":5,"kvCacheUtilThreshold":0.8,"metricsStalenessThreshold":"200ms"}
{"level":"info","ts":"2025-10-17T08:48:34Z","logger":"setup","caller":"runner/runner.go:446","msg":"ExtProc server runner added to manager."}
{"level":"info","ts":"2025-10-17T08:48:34Z","logger":"setup","caller":"runner/runner.go:290","msg":"Controller manager starting"}
{"level":"info","ts":"2025-10-17T08:48:34Z","logger":"controller-runtime.metrics","caller":"server/server.go:208","msg":"Starting metrics server"}
{"level":"info","ts":"2025-10-17T08:48:34Z","logger":"controller-runtime.metrics","caller":"server/server.go:247","msg":"Serving metrics server","bindAddress":":9090","secure":false}
{"level":"info","ts":"2025-10-17T08:48:34Z","caller":"runnable/grpc.go:35","msg":"gRPC server starting","name":"health"}
{"level":"info","ts":"2025-10-17T08:48:34Z","caller":"runnable/grpc.go:44","msg":"gRPC server listening","name":"health","port":9003}
{"level":"info","ts":"2025-10-17T08:48:34Z","caller":"controller/controller.go:353","msg":"Starting EventSource","controller":"inferencepool","controllerGroup":"inference.networking.k8s.io","controllerKind":"InferencePool","source":"kind source: *v1.InferencePool"}
{"level":"info","ts":"2025-10-17T08:48:34Z","caller":"controller/controller.go:353","msg":"Starting EventSource","controller":"pod","controllerGroup":"","controllerKind":"Pod","source":"kind source: *v1.Pod"}
{"level":"info","ts":"2025-10-17T08:48:34Z","caller":"controller/controller.go:353","msg":"Starting EventSource","controller":"inferenceobjective","controllerGroup":"inference.networking.x-k8s.io","controllerKind":"InferenceObjective","source":"kind source: *v1alpha2.InferenceObjective"}
{"level":"Level(-3)","ts":"2025-10-17T08:48:34Z","logger":"controller-runtime.cache","caller":"cache/reflector.go:358","msg":"Starting reflector","type":"*v1.Pod","resyncPeriod":34471.287448561,"reflector":"pkg/mod/k8s.io/client-go@v0.34.0/tools/cache/reflector.go:290"}
{"level":"Level(-3)","ts":"2025-10-17T08:48:34Z","logger":"controller-runtime.cache","caller":"cache/reflector.go:404","msg":"Listing and watching","type":"*v1.Pod","reflector":"pkg/mod/k8s.io/client-go@v0.34.0/tools/cache/reflector.go:290"}
{"level":"Level(-3)","ts":"2025-10-17T08:48:34Z","logger":"controller-runtime.cache","caller":"cache/reflector.go:358","msg":"Starting reflector","type":"*v1.InferencePool","resyncPeriod":34881.462490506,"reflector":"pkg/mod/k8s.io/client-go@v0.34.0/tools/cache/reflector.go:290"}
{"level":"Level(-3)","ts":"2025-10-17T08:48:34Z","logger":"controller-runtime.cache","caller":"cache/reflector.go:404","msg":"Listing and watching","type":"*v1.InferencePool","reflector":"pkg/mod/k8s.io/client-go@v0.34.0/tools/cache/reflector.go:290"}
{"level":"Level(-3)","ts":"2025-10-17T08:48:34Z","logger":"controller-runtime.cache","caller":"cache/reflector.go:358","msg":"Starting reflector","type":"*v1alpha2.InferenceObjective","resyncPeriod":37158.80205366,"reflector":"pkg/mod/k8s.io/client-go@v0.34.0/tools/cache/reflector.go:290"}
{"level":"Level(-3)","ts":"2025-10-17T08:48:34Z","logger":"controller-runtime.cache","caller":"cache/reflector.go:404","msg":"Listing and watching","type":"*v1alpha2.InferenceObjective","reflector":"pkg/mod/k8s.io/client-go@v0.34.0/tools/cache/reflector.go:290"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:34Z","logger":"controller-runtime.cache","caller":"cache/reflector.go:436","msg":"Caches populated","type":"*v1.InferencePool","reflector":"pkg/mod/k8s.io/client-go@v0.34.0/tools/cache/reflector.go:290"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:34Z","logger":"controller-runtime.cache","caller":"cache/reflector.go:436","msg":"Caches populated","type":"*v1alpha2.InferenceObjective","reflector":"pkg/mod/k8s.io/client-go@v0.34.0/tools/cache/reflector.go:290"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:34Z","logger":"controller-runtime.cache","caller":"cache/reflector.go:436","msg":"Caches populated","type":"*v1.Pod","reflector":"pkg/mod/k8s.io/client-go@v0.34.0/tools/cache/reflector.go:290"}
{"level":"info","ts":"2025-10-17T08:48:35Z","caller":"controller/controller.go:286","msg":"Starting Controller","controller":"inferencepool","controllerGroup":"inference.networking.k8s.io","controllerKind":"InferencePool"}
{"level":"info","ts":"2025-10-17T08:48:35Z","caller":"controller/controller.go:289","msg":"Starting workers","controller":"inferencepool","controllerGroup":"inference.networking.k8s.io","controllerKind":"InferencePool","worker count":1}
{"level":"info","ts":"2025-10-17T08:48:35Z","caller":"controller/controller.go:286","msg":"Starting Controller","controller":"pod","controllerGroup":"","controllerKind":"Pod"}
{"level":"info","ts":"2025-10-17T08:48:35Z","caller":"controller/controller.go:289","msg":"Starting workers","controller":"pod","controllerGroup":"","controllerKind":"Pod","worker count":1}
{"level":"info","ts":"2025-10-17T08:48:35Z","caller":"controller/controller.go:286","msg":"Starting Controller","controller":"inferenceobjective","controllerGroup":"inference.networking.x-k8s.io","controllerKind":"InferenceObjective"}
{"level":"info","ts":"2025-10-17T08:48:35Z","caller":"controller/controller.go:289","msg":"Starting workers","controller":"inferenceobjective","controllerGroup":"inference.networking.x-k8s.io","controllerKind":"InferenceObjective","worker count":1}
{"level":"info","ts":"2025-10-17T08:48:35Z","caller":"runnable/grpc.go:35","msg":"gRPC server starting","name":"ext-proc"}
{"level":"info","ts":"2025-10-17T08:48:35Z","caller":"runnable/grpc.go:44","msg":"gRPC server listening","name":"ext-proc","port":9002}
{"level":"Level(-2)","ts":"2025-10-17T08:48:39Z","caller":"metrics/logger.go:81","msg":"Pool is not initialized, skipping refreshing metrics"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:44Z","logger":"health","caller":"runner/health.go:52","msg":"gRPC health check not serving (leader election disabled)","service":"envoy.service.ext_proc.v3.ExternalProcessor"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:44Z","caller":"metrics/logger.go:81","msg":"Pool is not initialized, skipping refreshing metrics"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:46Z","logger":"health","caller":"runner/health.go:52","msg":"gRPC health check not serving (leader election disabled)","service":"envoy.service.ext_proc.v3.ExternalProcessor"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:49Z","caller":"metrics/logger.go:81","msg":"Pool is not initialized, skipping refreshing metrics"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:54Z","logger":"health","caller":"runner/health.go:52","msg":"gRPC health check not serving (leader election disabled)","service":"envoy.service.ext_proc.v3.ExternalProcessor"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:54Z","caller":"metrics/logger.go:81","msg":"Pool is not initialized, skipping refreshing metrics"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:56Z","logger":"health","caller":"runner/health.go:52","msg":"gRPC health check not serving (leader election disabled)","service":"envoy.service.ext_proc.v3.ExternalProcessor"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:59Z","caller":"metrics/logger.go:81","msg":"Pool is not initialized, skipping refreshing metrics"}
{"level":"Level(-2)","ts":"2025-10-17T08:49:04Z","logger":"health","caller":"runner/health.go:52","msg":"gRPC health check not serving (leader election disabled)","service":"envoy.service.ext_proc.v3.ExternalProcessor"}
{"level":"Level(-2)","ts":"2025-10-17T08:49:04Z","logger":"health","caller":"runner/health.go:52","msg":"gRPC health check not serving (leader election disabled)","service":"envoy.service.ext_proc.v3.ExternalProcessor"}
{"level":"Level(-2)","ts":"2025-10-17T08:49:04Z","caller":"plugins/plugin_state.go:109","msg":"Shutting down plugin state cleanup"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"manager/internal.go:555","msg":"Stopping and waiting for non leader election runnables"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"manager/internal.go:550","msg":"Stopping and waiting for warmup runnables"}
{"level":"Level(-2)","ts":"2025-10-17T08:49:04Z","caller":"metrics/logger.go:46","msg":"Shutting down prometheus metrics thread"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"runnable/grpc.go:54","msg":"gRPC server shutting down","name":"health"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"runnable/grpc.go:54","msg":"gRPC server shutting down","name":"ext-proc"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"runnable/grpc.go:65","msg":"gRPC server terminated","name":"health"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"runnable/grpc.go:65","msg":"gRPC server terminated","name":"ext-proc"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"manager/internal.go:559","msg":"Stopping and waiting for leader election runnables"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"controller/controller.go:309","msg":"Shutdown signal received, waiting for all workers to finish","controller":"inferenceobjective","controllerGroup":"inference.networking.x-k8s.io","controllerKind":"InferenceObjective"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"controller/controller.go:309","msg":"Shutdown signal received, waiting for all workers to finish","controller":"inferencepool","controllerGroup":"inference.networking.k8s.io","controllerKind":"InferencePool"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"controller/controller.go:311","msg":"All workers finished","controller":"inferenceobjective","controllerGroup":"inference.networking.x-k8s.io","controllerKind":"InferenceObjective"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"controller/controller.go:311","msg":"All workers finished","controller":"inferencepool","controllerGroup":"inference.networking.k8s.io","controllerKind":"InferencePool"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"controller/controller.go:309","msg":"Shutdown signal received, waiting for all workers to finish","controller":"pod","controllerGroup":"","controllerKind":"Pod"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"controller/controller.go:311","msg":"All workers finished","controller":"pod","controllerGroup":"","controllerKind":"Pod"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"manager/internal.go:567","msg":"Stopping and waiting for caches"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"manager/internal.go:571","msg":"Stopping and waiting for webhooks"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"manager/internal.go:574","msg":"Stopping and waiting for HTTP servers"}
{"level":"Level(-3)","ts":"2025-10-17T08:49:04Z","logger":"controller-runtime.cache","caller":"cache/reflector.go:364","msg":"Stopping reflector","type":"*v1.Pod","resyncPeriod":34471.287448561,"reflector":"pkg/mod/k8s.io/client-go@v0.34.0/tools/cache/reflector.go:290"}
{"level":"Level(-3)","ts":"2025-10-17T08:49:04Z","logger":"controller-runtime.cache","caller":"cache/reflector.go:364","msg":"Stopping reflector","type":"*v1alpha2.InferenceObjective","resyncPeriod":37158.80205366,"reflector":"pkg/mod/k8s.io/client-go@v0.34.0/tools/cache/reflector.go:290"}
{"level":"Level(-3)","ts":"2025-10-17T08:49:04Z","logger":"controller-runtime.cache","caller":"cache/reflector.go:364","msg":"Stopping reflector","type":"*v1.InferencePool","resyncPeriod":34881.462490506,"reflector":"pkg/mod/k8s.io/client-go@v0.34.0/tools/cache/reflector.go:290"}
{"level":"info","ts":"2025-10-17T08:49:04Z","logger":"controller-runtime.metrics","caller":"server/server.go:254","msg":"Shutting down metrics server with timeout of 1 minute"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"manager/internal.go:578","msg":"Wait completed, proceeding to shutdown the manager"}
{"level":"info","ts":"2025-10-17T08:49:04Z","logger":"setup","caller":"runner/runner.go:295","msg":"Controller manager terminated"}

Comment thread charts/llm-d-modelservice/templates/epp-role.yaml Outdated
@learner0810

Copy link
Copy Markdown
Contributor Author

Well, not quite. I get no error but the pod still terminates. Any clue why?

{"level":"info","ts":"2025-10-17T08:48:34Z","logger":"setup","caller":"runner/runner.go:144","msg":"GIE build","commit-sha":"unknown","build-ref":""}
{"level":"info","ts":"2025-10-17T08:48:34Z","logger":"setup","caller":"runner/runner.go:157","msg":"Flags processed","flags":{"cert-path":"","config-file":"config/default-config.yaml","config-text":"","enable-pprof":true,"grpc-health-port":9003,"grpc-port":9002,"ha-enable-leader-election":false,"health-checking":false,"kubeconfig":"","kv-cache-usage-percentage-metric":"vllm:gpu_cache_usage_perc","lora-info-metric":"vllm:lora_requests_info","metrics-port":9090,"metrics-staleness-threshold":2000000000,"model-server-metrics-https-insecure-skip-verify":true,"model-server-metrics-path":"/metrics","model-server-metrics-port":0,"model-server-metrics-scheme":"http","pool-group":"inference.networking.k8s.io","pool-name":"gaudi-llm-d-modelservice","pool-namespace":"llm-d","refresh-metrics-interval":50000000,"refresh-prometheus-metrics-interval":5000000000,"secure-serving":true,"total-queued-requests-metric":"vllm:num_requests_waiting","v":4,"zap-devel":true,"zap-encoder":{},"zap-log-level":{},"zap-stacktrace-level":{},"zap-time-encoding":{}}}
{"level":"info","ts":"2025-10-17T08:48:34Z","logger":"saturation-detector-config","caller":"env/env.go:34","msg":"Environment variable not set, using default value","key":"SD_QUEUE_DEPTH_THRESHOLD","defaultValue":5}
{"level":"info","ts":"2025-10-17T08:48:34Z","logger":"saturation-detector-config","caller":"env/env.go:34","msg":"Environment variable not set, using default value","key":"SD_KV_CACHE_UTIL_THRESHOLD","defaultValue":0.8}
{"level":"info","ts":"2025-10-17T08:48:34Z","logger":"saturation-detector-config","caller":"env/env.go:34","msg":"Environment variable not set, using default value","key":"SD_METRICS_STALENESS_THRESHOLD","defaultValue":0.2}
{"level":"info","ts":"2025-10-17T08:48:34Z","logger":"saturation-detector-config","caller":"saturationdetector/config.go:70","msg":"SaturationDetector configuration loaded from env","config":"&{QueueDepthThreshold:5 KVCacheUtilThreshold:0.8 MetricsStalenessThreshold:200ms}"}
{"level":"info","ts":"2025-10-17T08:48:34Z","logger":"setup","caller":"env/env.go:34","msg":"Environment variable not set, using default value","key":"ENABLE_EXPERIMENTAL_DATALAYER_V2","defaultValue":false}
{"level":"info","ts":"2025-10-17T08:48:34Z","logger":"setup","caller":"runner/runner.go:226","msg":"Enabling pprof handlers"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:34Z","caller":"manager/internal.go:196","msg":"Registering metrics http server extra handler","path":"/debug/pprof/heap"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:34Z","caller":"manager/internal.go:196","msg":"Registering metrics http server extra handler","path":"/debug/pprof/goroutine"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:34Z","caller":"manager/internal.go:196","msg":"Registering metrics http server extra handler","path":"/debug/pprof/allocs"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:34Z","caller":"manager/internal.go:196","msg":"Registering metrics http server extra handler","path":"/debug/pprof/threadcreate"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:34Z","caller":"manager/internal.go:196","msg":"Registering metrics http server extra handler","path":"/debug/pprof/block"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:34Z","caller":"manager/internal.go:196","msg":"Registering metrics http server extra handler","path":"/debug/pprof/mutex"}
{"level":"info","ts":"2025-10-17T08:48:34Z","caller":"loader/configloader.go:49","msg":"Loaded configuration","config":"{Plugins: [{/prefix-cache-scorer, Parameters: {\"hashBlockSize\":5,\"lruCapacityPerServer\":31250,\"maxPrefixBlocksToMatch\":256}} {/decode-filter} {/max-score-picker} {/single-profile-handler}], SchedulingProfiles: [{Name: default, Plugins: [{PluginRef: decode-filter} {PluginRef: max-score-picker} {PluginRef: prefix-cache-scorer, Weight: 50}]}]}"}
{"level":"info","ts":"2025-10-17T08:48:34Z","caller":"loader/configloader.go:60","msg":"Configuration with defaults set","config":"{Plugins: [{prefix-cache-scorer/prefix-cache-scorer, Parameters: {\"hashBlockSize\":5,\"lruCapacityPerServer\":31250,\"maxPrefixBlocksToMatch\":256}} {decode-filter/decode-filter} {max-score-picker/max-score-picker} {single-profile-handler/single-profile-handler}], SchedulingProfiles: [{Name: default, Plugins: [{PluginRef: decode-filter} {PluginRef: max-score-picker} {PluginRef: prefix-cache-scorer, Weight: 50}]}]}"}
{"level":"info","ts":"2025-10-17T08:48:34Z","caller":"runner/runner.go:343","msg":"loaded configuration from file/text successfully"}
{"level":"info","ts":"2025-10-17T08:48:34Z","logger":"setup","caller":"runner/runner.go:249","msg":"parsed config","scheduler-config":"{ProfileHandler: single-profile-handler/single-profile-handler, Profiles: map[default:{Filters: [decode-filter/by-label], Scorers: [prefix-cache-scorer/prefix-cache-scorer: 50], Picker: max-score-picker/max-score-picker}]}"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:34Z","logger":"setup.SaturationDetector","caller":"saturationdetector/saturationdetector.go:89","msg":"Creating new SaturationDetector","queueDepthThreshold":5,"kvCacheUtilThreshold":0.8,"metricsStalenessThreshold":"200ms"}
{"level":"info","ts":"2025-10-17T08:48:34Z","logger":"setup","caller":"runner/runner.go:446","msg":"ExtProc server runner added to manager."}
{"level":"info","ts":"2025-10-17T08:48:34Z","logger":"setup","caller":"runner/runner.go:290","msg":"Controller manager starting"}
{"level":"info","ts":"2025-10-17T08:48:34Z","logger":"controller-runtime.metrics","caller":"server/server.go:208","msg":"Starting metrics server"}
{"level":"info","ts":"2025-10-17T08:48:34Z","logger":"controller-runtime.metrics","caller":"server/server.go:247","msg":"Serving metrics server","bindAddress":":9090","secure":false}
{"level":"info","ts":"2025-10-17T08:48:34Z","caller":"runnable/grpc.go:35","msg":"gRPC server starting","name":"health"}
{"level":"info","ts":"2025-10-17T08:48:34Z","caller":"runnable/grpc.go:44","msg":"gRPC server listening","name":"health","port":9003}
{"level":"info","ts":"2025-10-17T08:48:34Z","caller":"controller/controller.go:353","msg":"Starting EventSource","controller":"inferencepool","controllerGroup":"inference.networking.k8s.io","controllerKind":"InferencePool","source":"kind source: *v1.InferencePool"}
{"level":"info","ts":"2025-10-17T08:48:34Z","caller":"controller/controller.go:353","msg":"Starting EventSource","controller":"pod","controllerGroup":"","controllerKind":"Pod","source":"kind source: *v1.Pod"}
{"level":"info","ts":"2025-10-17T08:48:34Z","caller":"controller/controller.go:353","msg":"Starting EventSource","controller":"inferenceobjective","controllerGroup":"inference.networking.x-k8s.io","controllerKind":"InferenceObjective","source":"kind source: *v1alpha2.InferenceObjective"}
{"level":"Level(-3)","ts":"2025-10-17T08:48:34Z","logger":"controller-runtime.cache","caller":"cache/reflector.go:358","msg":"Starting reflector","type":"*v1.Pod","resyncPeriod":34471.287448561,"reflector":"pkg/mod/k8s.io/client-go@v0.34.0/tools/cache/reflector.go:290"}
{"level":"Level(-3)","ts":"2025-10-17T08:48:34Z","logger":"controller-runtime.cache","caller":"cache/reflector.go:404","msg":"Listing and watching","type":"*v1.Pod","reflector":"pkg/mod/k8s.io/client-go@v0.34.0/tools/cache/reflector.go:290"}
{"level":"Level(-3)","ts":"2025-10-17T08:48:34Z","logger":"controller-runtime.cache","caller":"cache/reflector.go:358","msg":"Starting reflector","type":"*v1.InferencePool","resyncPeriod":34881.462490506,"reflector":"pkg/mod/k8s.io/client-go@v0.34.0/tools/cache/reflector.go:290"}
{"level":"Level(-3)","ts":"2025-10-17T08:48:34Z","logger":"controller-runtime.cache","caller":"cache/reflector.go:404","msg":"Listing and watching","type":"*v1.InferencePool","reflector":"pkg/mod/k8s.io/client-go@v0.34.0/tools/cache/reflector.go:290"}
{"level":"Level(-3)","ts":"2025-10-17T08:48:34Z","logger":"controller-runtime.cache","caller":"cache/reflector.go:358","msg":"Starting reflector","type":"*v1alpha2.InferenceObjective","resyncPeriod":37158.80205366,"reflector":"pkg/mod/k8s.io/client-go@v0.34.0/tools/cache/reflector.go:290"}
{"level":"Level(-3)","ts":"2025-10-17T08:48:34Z","logger":"controller-runtime.cache","caller":"cache/reflector.go:404","msg":"Listing and watching","type":"*v1alpha2.InferenceObjective","reflector":"pkg/mod/k8s.io/client-go@v0.34.0/tools/cache/reflector.go:290"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:34Z","logger":"controller-runtime.cache","caller":"cache/reflector.go:436","msg":"Caches populated","type":"*v1.InferencePool","reflector":"pkg/mod/k8s.io/client-go@v0.34.0/tools/cache/reflector.go:290"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:34Z","logger":"controller-runtime.cache","caller":"cache/reflector.go:436","msg":"Caches populated","type":"*v1alpha2.InferenceObjective","reflector":"pkg/mod/k8s.io/client-go@v0.34.0/tools/cache/reflector.go:290"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:34Z","logger":"controller-runtime.cache","caller":"cache/reflector.go:436","msg":"Caches populated","type":"*v1.Pod","reflector":"pkg/mod/k8s.io/client-go@v0.34.0/tools/cache/reflector.go:290"}
{"level":"info","ts":"2025-10-17T08:48:35Z","caller":"controller/controller.go:286","msg":"Starting Controller","controller":"inferencepool","controllerGroup":"inference.networking.k8s.io","controllerKind":"InferencePool"}
{"level":"info","ts":"2025-10-17T08:48:35Z","caller":"controller/controller.go:289","msg":"Starting workers","controller":"inferencepool","controllerGroup":"inference.networking.k8s.io","controllerKind":"InferencePool","worker count":1}
{"level":"info","ts":"2025-10-17T08:48:35Z","caller":"controller/controller.go:286","msg":"Starting Controller","controller":"pod","controllerGroup":"","controllerKind":"Pod"}
{"level":"info","ts":"2025-10-17T08:48:35Z","caller":"controller/controller.go:289","msg":"Starting workers","controller":"pod","controllerGroup":"","controllerKind":"Pod","worker count":1}
{"level":"info","ts":"2025-10-17T08:48:35Z","caller":"controller/controller.go:286","msg":"Starting Controller","controller":"inferenceobjective","controllerGroup":"inference.networking.x-k8s.io","controllerKind":"InferenceObjective"}
{"level":"info","ts":"2025-10-17T08:48:35Z","caller":"controller/controller.go:289","msg":"Starting workers","controller":"inferenceobjective","controllerGroup":"inference.networking.x-k8s.io","controllerKind":"InferenceObjective","worker count":1}
{"level":"info","ts":"2025-10-17T08:48:35Z","caller":"runnable/grpc.go:35","msg":"gRPC server starting","name":"ext-proc"}
{"level":"info","ts":"2025-10-17T08:48:35Z","caller":"runnable/grpc.go:44","msg":"gRPC server listening","name":"ext-proc","port":9002}
{"level":"Level(-2)","ts":"2025-10-17T08:48:39Z","caller":"metrics/logger.go:81","msg":"Pool is not initialized, skipping refreshing metrics"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:44Z","logger":"health","caller":"runner/health.go:52","msg":"gRPC health check not serving (leader election disabled)","service":"envoy.service.ext_proc.v3.ExternalProcessor"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:44Z","caller":"metrics/logger.go:81","msg":"Pool is not initialized, skipping refreshing metrics"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:46Z","logger":"health","caller":"runner/health.go:52","msg":"gRPC health check not serving (leader election disabled)","service":"envoy.service.ext_proc.v3.ExternalProcessor"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:49Z","caller":"metrics/logger.go:81","msg":"Pool is not initialized, skipping refreshing metrics"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:54Z","logger":"health","caller":"runner/health.go:52","msg":"gRPC health check not serving (leader election disabled)","service":"envoy.service.ext_proc.v3.ExternalProcessor"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:54Z","caller":"metrics/logger.go:81","msg":"Pool is not initialized, skipping refreshing metrics"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:56Z","logger":"health","caller":"runner/health.go:52","msg":"gRPC health check not serving (leader election disabled)","service":"envoy.service.ext_proc.v3.ExternalProcessor"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:59Z","caller":"metrics/logger.go:81","msg":"Pool is not initialized, skipping refreshing metrics"}
{"level":"Level(-2)","ts":"2025-10-17T08:49:04Z","logger":"health","caller":"runner/health.go:52","msg":"gRPC health check not serving (leader election disabled)","service":"envoy.service.ext_proc.v3.ExternalProcessor"}
{"level":"Level(-2)","ts":"2025-10-17T08:49:04Z","logger":"health","caller":"runner/health.go:52","msg":"gRPC health check not serving (leader election disabled)","service":"envoy.service.ext_proc.v3.ExternalProcessor"}
{"level":"Level(-2)","ts":"2025-10-17T08:49:04Z","caller":"plugins/plugin_state.go:109","msg":"Shutting down plugin state cleanup"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"manager/internal.go:555","msg":"Stopping and waiting for non leader election runnables"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"manager/internal.go:550","msg":"Stopping and waiting for warmup runnables"}
{"level":"Level(-2)","ts":"2025-10-17T08:49:04Z","caller":"metrics/logger.go:46","msg":"Shutting down prometheus metrics thread"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"runnable/grpc.go:54","msg":"gRPC server shutting down","name":"health"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"runnable/grpc.go:54","msg":"gRPC server shutting down","name":"ext-proc"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"runnable/grpc.go:65","msg":"gRPC server terminated","name":"health"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"runnable/grpc.go:65","msg":"gRPC server terminated","name":"ext-proc"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"manager/internal.go:559","msg":"Stopping and waiting for leader election runnables"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"controller/controller.go:309","msg":"Shutdown signal received, waiting for all workers to finish","controller":"inferenceobjective","controllerGroup":"inference.networking.x-k8s.io","controllerKind":"InferenceObjective"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"controller/controller.go:309","msg":"Shutdown signal received, waiting for all workers to finish","controller":"inferencepool","controllerGroup":"inference.networking.k8s.io","controllerKind":"InferencePool"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"controller/controller.go:311","msg":"All workers finished","controller":"inferenceobjective","controllerGroup":"inference.networking.x-k8s.io","controllerKind":"InferenceObjective"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"controller/controller.go:311","msg":"All workers finished","controller":"inferencepool","controllerGroup":"inference.networking.k8s.io","controllerKind":"InferencePool"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"controller/controller.go:309","msg":"Shutdown signal received, waiting for all workers to finish","controller":"pod","controllerGroup":"","controllerKind":"Pod"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"controller/controller.go:311","msg":"All workers finished","controller":"pod","controllerGroup":"","controllerKind":"Pod"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"manager/internal.go:567","msg":"Stopping and waiting for caches"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"manager/internal.go:571","msg":"Stopping and waiting for webhooks"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"manager/internal.go:574","msg":"Stopping and waiting for HTTP servers"}
{"level":"Level(-3)","ts":"2025-10-17T08:49:04Z","logger":"controller-runtime.cache","caller":"cache/reflector.go:364","msg":"Stopping reflector","type":"*v1.Pod","resyncPeriod":34471.287448561,"reflector":"pkg/mod/k8s.io/client-go@v0.34.0/tools/cache/reflector.go:290"}
{"level":"Level(-3)","ts":"2025-10-17T08:49:04Z","logger":"controller-runtime.cache","caller":"cache/reflector.go:364","msg":"Stopping reflector","type":"*v1alpha2.InferenceObjective","resyncPeriod":37158.80205366,"reflector":"pkg/mod/k8s.io/client-go@v0.34.0/tools/cache/reflector.go:290"}
{"level":"Level(-3)","ts":"2025-10-17T08:49:04Z","logger":"controller-runtime.cache","caller":"cache/reflector.go:364","msg":"Stopping reflector","type":"*v1.InferencePool","resyncPeriod":34881.462490506,"reflector":"pkg/mod/k8s.io/client-go@v0.34.0/tools/cache/reflector.go:290"}
{"level":"info","ts":"2025-10-17T08:49:04Z","logger":"controller-runtime.metrics","caller":"server/server.go:254","msg":"Shutting down metrics server with timeout of 1 minute"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"manager/internal.go:578","msg":"Wait completed, proceeding to shutdown the manager"}
{"level":"info","ts":"2025-10-17T08:49:04Z","logger":"setup","caller":"runner/runner.go:295","msg":"Controller manager terminated"}

According to the IGW code, this is because the pool has not completed syncing.
https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/cmd/epp/runner/health.go#L47

@learner0810 learner0810 force-pushed the Deprecate-InferenceModel branch from e65a82d to f476f43 Compare October 17, 2025 10:11
@learner0810

learner0810 commented Oct 17, 2025

Copy link
Copy Markdown
Contributor Author

Getting closer but you still need to add this. Then it works (at least for me).

diff --git a/charts/llm-d-modelservice/templates/epp-role.yaml b/charts/llm-d-modelservice/templates/epp-role.yaml
index 4315ba3..8b912bd 100644
--- a/charts/llm-d-modelservice/templates/epp-role.yaml
+++ b/charts/llm-d-modelservice/templates/epp-role.yaml
@@ -5,9 +5,10 @@ metadata:
   name: {{ include "llm-d-modelservice.eppRoleName" . }}
 rules:
 - apiGroups:
-  - inference.networking.x-k8s.io
+  - inference.networking.k8s.io
   resources:
   - inferencepools
+  - inferenceobjectives
   verbs:
   - get
   - watch

Thanks for reminding, it has been added. The previous operation was wrong, resulting in a duplicate crd group

@poussa

poussa commented Oct 17, 2025

Copy link
Copy Markdown
Contributor

According to the IGW code, this is because the pool has not completed syncing. https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/cmd/epp/runner/health.go#L47

Ah, yeah. I have no gateway etc. running. Let me fix that.

@learner0810

Copy link
Copy Markdown
Contributor Author

Well, not quite. I get no error but the pod still terminates. Any clue why?

{"level":"info","ts":"2025-10-17T08:48:34Z","logger":"setup","caller":"runner/runner.go:144","msg":"GIE build","commit-sha":"unknown","build-ref":""}
{"level":"info","ts":"2025-10-17T08:48:34Z","logger":"setup","caller":"runner/runner.go:157","msg":"Flags processed","flags":{"cert-path":"","config-file":"config/default-config.yaml","config-text":"","enable-pprof":true,"grpc-health-port":9003,"grpc-port":9002,"ha-enable-leader-election":false,"health-checking":false,"kubeconfig":"","kv-cache-usage-percentage-metric":"vllm:gpu_cache_usage_perc","lora-info-metric":"vllm:lora_requests_info","metrics-port":9090,"metrics-staleness-threshold":2000000000,"model-server-metrics-https-insecure-skip-verify":true,"model-server-metrics-path":"/metrics","model-server-metrics-port":0,"model-server-metrics-scheme":"http","pool-group":"inference.networking.k8s.io","pool-name":"gaudi-llm-d-modelservice","pool-namespace":"llm-d","refresh-metrics-interval":50000000,"refresh-prometheus-metrics-interval":5000000000,"secure-serving":true,"total-queued-requests-metric":"vllm:num_requests_waiting","v":4,"zap-devel":true,"zap-encoder":{},"zap-log-level":{},"zap-stacktrace-level":{},"zap-time-encoding":{}}}
{"level":"info","ts":"2025-10-17T08:48:34Z","logger":"saturation-detector-config","caller":"env/env.go:34","msg":"Environment variable not set, using default value","key":"SD_QUEUE_DEPTH_THRESHOLD","defaultValue":5}
{"level":"info","ts":"2025-10-17T08:48:34Z","logger":"saturation-detector-config","caller":"env/env.go:34","msg":"Environment variable not set, using default value","key":"SD_KV_CACHE_UTIL_THRESHOLD","defaultValue":0.8}
{"level":"info","ts":"2025-10-17T08:48:34Z","logger":"saturation-detector-config","caller":"env/env.go:34","msg":"Environment variable not set, using default value","key":"SD_METRICS_STALENESS_THRESHOLD","defaultValue":0.2}
{"level":"info","ts":"2025-10-17T08:48:34Z","logger":"saturation-detector-config","caller":"saturationdetector/config.go:70","msg":"SaturationDetector configuration loaded from env","config":"&{QueueDepthThreshold:5 KVCacheUtilThreshold:0.8 MetricsStalenessThreshold:200ms}"}
{"level":"info","ts":"2025-10-17T08:48:34Z","logger":"setup","caller":"env/env.go:34","msg":"Environment variable not set, using default value","key":"ENABLE_EXPERIMENTAL_DATALAYER_V2","defaultValue":false}
{"level":"info","ts":"2025-10-17T08:48:34Z","logger":"setup","caller":"runner/runner.go:226","msg":"Enabling pprof handlers"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:34Z","caller":"manager/internal.go:196","msg":"Registering metrics http server extra handler","path":"/debug/pprof/heap"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:34Z","caller":"manager/internal.go:196","msg":"Registering metrics http server extra handler","path":"/debug/pprof/goroutine"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:34Z","caller":"manager/internal.go:196","msg":"Registering metrics http server extra handler","path":"/debug/pprof/allocs"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:34Z","caller":"manager/internal.go:196","msg":"Registering metrics http server extra handler","path":"/debug/pprof/threadcreate"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:34Z","caller":"manager/internal.go:196","msg":"Registering metrics http server extra handler","path":"/debug/pprof/block"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:34Z","caller":"manager/internal.go:196","msg":"Registering metrics http server extra handler","path":"/debug/pprof/mutex"}
{"level":"info","ts":"2025-10-17T08:48:34Z","caller":"loader/configloader.go:49","msg":"Loaded configuration","config":"{Plugins: [{/prefix-cache-scorer, Parameters: {\"hashBlockSize\":5,\"lruCapacityPerServer\":31250,\"maxPrefixBlocksToMatch\":256}} {/decode-filter} {/max-score-picker} {/single-profile-handler}], SchedulingProfiles: [{Name: default, Plugins: [{PluginRef: decode-filter} {PluginRef: max-score-picker} {PluginRef: prefix-cache-scorer, Weight: 50}]}]}"}
{"level":"info","ts":"2025-10-17T08:48:34Z","caller":"loader/configloader.go:60","msg":"Configuration with defaults set","config":"{Plugins: [{prefix-cache-scorer/prefix-cache-scorer, Parameters: {\"hashBlockSize\":5,\"lruCapacityPerServer\":31250,\"maxPrefixBlocksToMatch\":256}} {decode-filter/decode-filter} {max-score-picker/max-score-picker} {single-profile-handler/single-profile-handler}], SchedulingProfiles: [{Name: default, Plugins: [{PluginRef: decode-filter} {PluginRef: max-score-picker} {PluginRef: prefix-cache-scorer, Weight: 50}]}]}"}
{"level":"info","ts":"2025-10-17T08:48:34Z","caller":"runner/runner.go:343","msg":"loaded configuration from file/text successfully"}
{"level":"info","ts":"2025-10-17T08:48:34Z","logger":"setup","caller":"runner/runner.go:249","msg":"parsed config","scheduler-config":"{ProfileHandler: single-profile-handler/single-profile-handler, Profiles: map[default:{Filters: [decode-filter/by-label], Scorers: [prefix-cache-scorer/prefix-cache-scorer: 50], Picker: max-score-picker/max-score-picker}]}"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:34Z","logger":"setup.SaturationDetector","caller":"saturationdetector/saturationdetector.go:89","msg":"Creating new SaturationDetector","queueDepthThreshold":5,"kvCacheUtilThreshold":0.8,"metricsStalenessThreshold":"200ms"}
{"level":"info","ts":"2025-10-17T08:48:34Z","logger":"setup","caller":"runner/runner.go:446","msg":"ExtProc server runner added to manager."}
{"level":"info","ts":"2025-10-17T08:48:34Z","logger":"setup","caller":"runner/runner.go:290","msg":"Controller manager starting"}
{"level":"info","ts":"2025-10-17T08:48:34Z","logger":"controller-runtime.metrics","caller":"server/server.go:208","msg":"Starting metrics server"}
{"level":"info","ts":"2025-10-17T08:48:34Z","logger":"controller-runtime.metrics","caller":"server/server.go:247","msg":"Serving metrics server","bindAddress":":9090","secure":false}
{"level":"info","ts":"2025-10-17T08:48:34Z","caller":"runnable/grpc.go:35","msg":"gRPC server starting","name":"health"}
{"level":"info","ts":"2025-10-17T08:48:34Z","caller":"runnable/grpc.go:44","msg":"gRPC server listening","name":"health","port":9003}
{"level":"info","ts":"2025-10-17T08:48:34Z","caller":"controller/controller.go:353","msg":"Starting EventSource","controller":"inferencepool","controllerGroup":"inference.networking.k8s.io","controllerKind":"InferencePool","source":"kind source: *v1.InferencePool"}
{"level":"info","ts":"2025-10-17T08:48:34Z","caller":"controller/controller.go:353","msg":"Starting EventSource","controller":"pod","controllerGroup":"","controllerKind":"Pod","source":"kind source: *v1.Pod"}
{"level":"info","ts":"2025-10-17T08:48:34Z","caller":"controller/controller.go:353","msg":"Starting EventSource","controller":"inferenceobjective","controllerGroup":"inference.networking.x-k8s.io","controllerKind":"InferenceObjective","source":"kind source: *v1alpha2.InferenceObjective"}
{"level":"Level(-3)","ts":"2025-10-17T08:48:34Z","logger":"controller-runtime.cache","caller":"cache/reflector.go:358","msg":"Starting reflector","type":"*v1.Pod","resyncPeriod":34471.287448561,"reflector":"pkg/mod/k8s.io/client-go@v0.34.0/tools/cache/reflector.go:290"}
{"level":"Level(-3)","ts":"2025-10-17T08:48:34Z","logger":"controller-runtime.cache","caller":"cache/reflector.go:404","msg":"Listing and watching","type":"*v1.Pod","reflector":"pkg/mod/k8s.io/client-go@v0.34.0/tools/cache/reflector.go:290"}
{"level":"Level(-3)","ts":"2025-10-17T08:48:34Z","logger":"controller-runtime.cache","caller":"cache/reflector.go:358","msg":"Starting reflector","type":"*v1.InferencePool","resyncPeriod":34881.462490506,"reflector":"pkg/mod/k8s.io/client-go@v0.34.0/tools/cache/reflector.go:290"}
{"level":"Level(-3)","ts":"2025-10-17T08:48:34Z","logger":"controller-runtime.cache","caller":"cache/reflector.go:404","msg":"Listing and watching","type":"*v1.InferencePool","reflector":"pkg/mod/k8s.io/client-go@v0.34.0/tools/cache/reflector.go:290"}
{"level":"Level(-3)","ts":"2025-10-17T08:48:34Z","logger":"controller-runtime.cache","caller":"cache/reflector.go:358","msg":"Starting reflector","type":"*v1alpha2.InferenceObjective","resyncPeriod":37158.80205366,"reflector":"pkg/mod/k8s.io/client-go@v0.34.0/tools/cache/reflector.go:290"}
{"level":"Level(-3)","ts":"2025-10-17T08:48:34Z","logger":"controller-runtime.cache","caller":"cache/reflector.go:404","msg":"Listing and watching","type":"*v1alpha2.InferenceObjective","reflector":"pkg/mod/k8s.io/client-go@v0.34.0/tools/cache/reflector.go:290"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:34Z","logger":"controller-runtime.cache","caller":"cache/reflector.go:436","msg":"Caches populated","type":"*v1.InferencePool","reflector":"pkg/mod/k8s.io/client-go@v0.34.0/tools/cache/reflector.go:290"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:34Z","logger":"controller-runtime.cache","caller":"cache/reflector.go:436","msg":"Caches populated","type":"*v1alpha2.InferenceObjective","reflector":"pkg/mod/k8s.io/client-go@v0.34.0/tools/cache/reflector.go:290"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:34Z","logger":"controller-runtime.cache","caller":"cache/reflector.go:436","msg":"Caches populated","type":"*v1.Pod","reflector":"pkg/mod/k8s.io/client-go@v0.34.0/tools/cache/reflector.go:290"}
{"level":"info","ts":"2025-10-17T08:48:35Z","caller":"controller/controller.go:286","msg":"Starting Controller","controller":"inferencepool","controllerGroup":"inference.networking.k8s.io","controllerKind":"InferencePool"}
{"level":"info","ts":"2025-10-17T08:48:35Z","caller":"controller/controller.go:289","msg":"Starting workers","controller":"inferencepool","controllerGroup":"inference.networking.k8s.io","controllerKind":"InferencePool","worker count":1}
{"level":"info","ts":"2025-10-17T08:48:35Z","caller":"controller/controller.go:286","msg":"Starting Controller","controller":"pod","controllerGroup":"","controllerKind":"Pod"}
{"level":"info","ts":"2025-10-17T08:48:35Z","caller":"controller/controller.go:289","msg":"Starting workers","controller":"pod","controllerGroup":"","controllerKind":"Pod","worker count":1}
{"level":"info","ts":"2025-10-17T08:48:35Z","caller":"controller/controller.go:286","msg":"Starting Controller","controller":"inferenceobjective","controllerGroup":"inference.networking.x-k8s.io","controllerKind":"InferenceObjective"}
{"level":"info","ts":"2025-10-17T08:48:35Z","caller":"controller/controller.go:289","msg":"Starting workers","controller":"inferenceobjective","controllerGroup":"inference.networking.x-k8s.io","controllerKind":"InferenceObjective","worker count":1}
{"level":"info","ts":"2025-10-17T08:48:35Z","caller":"runnable/grpc.go:35","msg":"gRPC server starting","name":"ext-proc"}
{"level":"info","ts":"2025-10-17T08:48:35Z","caller":"runnable/grpc.go:44","msg":"gRPC server listening","name":"ext-proc","port":9002}
{"level":"Level(-2)","ts":"2025-10-17T08:48:39Z","caller":"metrics/logger.go:81","msg":"Pool is not initialized, skipping refreshing metrics"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:44Z","logger":"health","caller":"runner/health.go:52","msg":"gRPC health check not serving (leader election disabled)","service":"envoy.service.ext_proc.v3.ExternalProcessor"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:44Z","caller":"metrics/logger.go:81","msg":"Pool is not initialized, skipping refreshing metrics"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:46Z","logger":"health","caller":"runner/health.go:52","msg":"gRPC health check not serving (leader election disabled)","service":"envoy.service.ext_proc.v3.ExternalProcessor"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:49Z","caller":"metrics/logger.go:81","msg":"Pool is not initialized, skipping refreshing metrics"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:54Z","logger":"health","caller":"runner/health.go:52","msg":"gRPC health check not serving (leader election disabled)","service":"envoy.service.ext_proc.v3.ExternalProcessor"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:54Z","caller":"metrics/logger.go:81","msg":"Pool is not initialized, skipping refreshing metrics"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:56Z","logger":"health","caller":"runner/health.go:52","msg":"gRPC health check not serving (leader election disabled)","service":"envoy.service.ext_proc.v3.ExternalProcessor"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:59Z","caller":"metrics/logger.go:81","msg":"Pool is not initialized, skipping refreshing metrics"}
{"level":"Level(-2)","ts":"2025-10-17T08:49:04Z","logger":"health","caller":"runner/health.go:52","msg":"gRPC health check not serving (leader election disabled)","service":"envoy.service.ext_proc.v3.ExternalProcessor"}
{"level":"Level(-2)","ts":"2025-10-17T08:49:04Z","logger":"health","caller":"runner/health.go:52","msg":"gRPC health check not serving (leader election disabled)","service":"envoy.service.ext_proc.v3.ExternalProcessor"}
{"level":"Level(-2)","ts":"2025-10-17T08:49:04Z","caller":"plugins/plugin_state.go:109","msg":"Shutting down plugin state cleanup"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"manager/internal.go:555","msg":"Stopping and waiting for non leader election runnables"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"manager/internal.go:550","msg":"Stopping and waiting for warmup runnables"}
{"level":"Level(-2)","ts":"2025-10-17T08:49:04Z","caller":"metrics/logger.go:46","msg":"Shutting down prometheus metrics thread"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"runnable/grpc.go:54","msg":"gRPC server shutting down","name":"health"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"runnable/grpc.go:54","msg":"gRPC server shutting down","name":"ext-proc"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"runnable/grpc.go:65","msg":"gRPC server terminated","name":"health"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"runnable/grpc.go:65","msg":"gRPC server terminated","name":"ext-proc"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"manager/internal.go:559","msg":"Stopping and waiting for leader election runnables"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"controller/controller.go:309","msg":"Shutdown signal received, waiting for all workers to finish","controller":"inferenceobjective","controllerGroup":"inference.networking.x-k8s.io","controllerKind":"InferenceObjective"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"controller/controller.go:309","msg":"Shutdown signal received, waiting for all workers to finish","controller":"inferencepool","controllerGroup":"inference.networking.k8s.io","controllerKind":"InferencePool"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"controller/controller.go:311","msg":"All workers finished","controller":"inferenceobjective","controllerGroup":"inference.networking.x-k8s.io","controllerKind":"InferenceObjective"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"controller/controller.go:311","msg":"All workers finished","controller":"inferencepool","controllerGroup":"inference.networking.k8s.io","controllerKind":"InferencePool"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"controller/controller.go:309","msg":"Shutdown signal received, waiting for all workers to finish","controller":"pod","controllerGroup":"","controllerKind":"Pod"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"controller/controller.go:311","msg":"All workers finished","controller":"pod","controllerGroup":"","controllerKind":"Pod"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"manager/internal.go:567","msg":"Stopping and waiting for caches"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"manager/internal.go:571","msg":"Stopping and waiting for webhooks"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"manager/internal.go:574","msg":"Stopping and waiting for HTTP servers"}
{"level":"Level(-3)","ts":"2025-10-17T08:49:04Z","logger":"controller-runtime.cache","caller":"cache/reflector.go:364","msg":"Stopping reflector","type":"*v1.Pod","resyncPeriod":34471.287448561,"reflector":"pkg/mod/k8s.io/client-go@v0.34.0/tools/cache/reflector.go:290"}
{"level":"Level(-3)","ts":"2025-10-17T08:49:04Z","logger":"controller-runtime.cache","caller":"cache/reflector.go:364","msg":"Stopping reflector","type":"*v1alpha2.InferenceObjective","resyncPeriod":37158.80205366,"reflector":"pkg/mod/k8s.io/client-go@v0.34.0/tools/cache/reflector.go:290"}
{"level":"Level(-3)","ts":"2025-10-17T08:49:04Z","logger":"controller-runtime.cache","caller":"cache/reflector.go:364","msg":"Stopping reflector","type":"*v1.InferencePool","resyncPeriod":34881.462490506,"reflector":"pkg/mod/k8s.io/client-go@v0.34.0/tools/cache/reflector.go:290"}
{"level":"info","ts":"2025-10-17T08:49:04Z","logger":"controller-runtime.metrics","caller":"server/server.go:254","msg":"Shutting down metrics server with timeout of 1 minute"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"manager/internal.go:578","msg":"Wait completed, proceeding to shutdown the manager"}
{"level":"info","ts":"2025-10-17T08:49:04Z","logger":"setup","caller":"runner/runner.go:295","msg":"Controller manager terminated"}

According to the IGW code, this is because the pool has not completed syncing. https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/cmd/epp/runner/health.go#L47

Excuse me, is the EPP Pod working properly now?

@poussa

poussa commented Oct 17, 2025

Copy link
Copy Markdown
Contributor

This works now (at least for me).

Details:

  • when I use only the model-service helm chart for deployment, I get the epp pod up&running wo/ any errors. But then it terminates as there is no gateway (or something) per @learner0810 above.
  • When I use this PR from the llm-d/guides/inference-scheduling all the pods working properly.

IMO, this is now ready for merge.

@poussa

poussa commented Oct 17, 2025

Copy link
Copy Markdown
Contributor

/cc @jgchn @kalantar

@github-actions github-actions Bot requested review from jgchn and kalantar October 17, 2025 11:34
Comment thread charts/llm-d-modelservice/templates/epp-role.yaml
@learner0810

learner0810 commented Oct 17, 2025

Copy link
Copy Markdown
Contributor Author

This works now (at least for me).

Details:

  • when I use only the model-service helm chart for deployment, I get the epp pod up&running wo/ any errors. But then it terminates as there is no gateway (or something) per @learner0810 above.
  • When I use this PR from the llm-d/guides/inference-scheduling all the pods working properly.

IMO, this is now ready for merge.

GAIE charts only initiate health checks when the number of copies exceeds 1. https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/config%2Fcharts%2Finferencepool%2Ftemplates%2Fepp-deployment.yaml#L85-L85

However, the lm-d model-service defaults to initiating health checks.

@learner0810

Copy link
Copy Markdown
Contributor Author

This works now (at least for me).
Details:

  • when I use only the model-service helm chart for deployment, I get the epp pod up&running wo/ any errors. But then it terminates as there is no gateway (or something) per @learner0810 above.
  • When I use this PR from the llm-d/guides/inference-scheduling all the pods working properly.

IMO, this is now ready for merge.

GAIE charts only initiate health checks when the number of copies exceeds 1. https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/config%2Fcharts%2Finferencepool%2Ftemplates%2Fepp-deployment.yaml#L85-L85

However, the lm-d model-service defaults to initiating health checks.

Due to GAIE modifying the health check and readiness check logic, if the gateway is not installed or malfunctions, the EPP pod fails the checks and remains in a perpetual failure state. Therefore, I have modified the default settings for the llm-d-modelservice to disable both the health check and readiness check by default.
cc @yankay @jgchn @kalantar

image

@learner0810 learner0810 force-pushed the Deprecate-InferenceModel branch 2 times, most recently from a5ad946 to 38b2e04 Compare October 18, 2025 08:04

@jgchn jgchn left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks fine to me. @kalantar WDYT?

Comment thread .github/workflows/lint-charts.yaml Outdated
Signed-off-by: learner0810 <zhongjun.li@daocloud.io>
@learner0810 learner0810 force-pushed the Deprecate-InferenceModel branch from 38b2e04 to b86b3e2 Compare October 18, 2025 23:10
@kalantar

Copy link
Copy Markdown
Collaborator

The endpoint picker/inferencepool pieces should all be removed from the modelservice chart. There is an upstream chart defined here: https://github.com/kubernetes-sigs/gateway-api-inference-extension/tree/main/config/charts/inferencepool (released versions at oci://registry.k8s.io/gateway-api-inference-extension/charts/inferencepool) that has all these updates already. Can you try this and see where you are stuck?

See #135.

@learner0810

Copy link
Copy Markdown
Contributor Author

The endpoint picker/inferencepool pieces should all be removed from the modelservice chart. There is an upstream chart defined here: https://github.com/kubernetes-sigs/gateway-api-inference-extension/tree/main/config/charts/inferencepool (released versions at oci://registry.k8s.io/gateway-api-inference-extension/charts/inferencepool) that has all these updates already. Can you try this and see where you are stuck?

See #135.

Could you please clarify whether you mean all EPP-related YAML files need to be deleted?

@learner0810

Copy link
Copy Markdown
Contributor Author

resolved #145

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Deprecate InferenceModel

6 participants