Deprecate InferenceModel#129
Conversation
|
The chart version needs to be updated like: #125 |
63f43f0 to
9747017
Compare
Thanks for the review, the chart version has been updated |
jgchn
left a comment
There was a problem hiding this comment.
LGTM, please run make verify. Thanks for the PR!
9747017 to
f4fad91
Compare
Thanks for the reminder, the changes have been committed |
Thanks @learner0810 It is recommended to manually run |
Sorry to bother you. Running My changes passed the pre-commit checks. Should we leave the #123 issue to be resolved by #131? What do you think? host@hostdeMacBook-Pro llm-d-modelservice % make pre-commit-run
hack/install-tools.sh
ct is already installed. Skipping installation.
Set FORCE_INSTALL=true to reinstall.
'ct' has been installed successfully. Location: /Users/host/go/src/github/llm-d-modelservice/bin
helm is already installed. Skipping installation.
Set FORCE_INSTALL=true to reinstall.
'helm' has been installed successfully. Location: /Users/host/go/src/github/llm-d-modelservice/bin
pre-commit is already installed. Skipping installation.
Set FORCE_INSTALL=true to reinstall.
'precommit' has been installed successfully. Location: /Users/host/go/src/github/llm-d-modelservice/bin
pre-commit run --all-files
Generate jsonschema......................................................Failed
- hook id: helm-schema
- files were modified by this hook
check for added large files..............................................Passed
check for merge conflicts................................................Passed
check json...............................................................Passed
detect private key.......................................................Passed
fix end of files.........................................................Passed
mixed line ending........................................................Passed
trim trailing whitespace.................................................Passed
jsonschema-dereference...................................................Failed
- hook id: jsonschema-dereference
- files were modified by this hook
make: *** [pre-commit-run] Error 1
host@hostdeMacBook-Pro llm-d-modelservice % git diff
diff --git a/charts/llm-d-modelservice/values.schema.json b/charts/llm-d-modelservice/values.schema.json
index 8e7db4c..92c9ba2 100644
--- a/charts/llm-d-modelservice/values.schema.json
+++ b/charts/llm-d-modelservice/values.schema.json
@@ -2,10 +2,6 @@
"$schema": "http://json-schema.org/draft-07/schema#",
"additionalProperties": false,
"properties": {
- "enabled": {
- "description": "Usually used when using llm-d-modelservice as a subchart.",
- "type": "boolean"
- },
"accelerator": {
"additionalProperties": false,
"description": " Supported types: nvidia, intel-i915, intel-xe, amd, google",
diff --git a/charts/llm-d-modelservice/values.schema.tmpl.json b/charts/llm-d-modelservice/values.schema.tmpl.json
index 6fd64c1..86f4e33 100644
--- a/charts/llm-d-modelservice/values.schema.tmpl.json
+++ b/charts/llm-d-modelservice/values.schema.tmpl.json
@@ -2,10 +2,6 @@
"$schema": "http://json-schema.org/draft-07/schema#",
"additionalProperties": false,
"properties": {
- "enabled": {
- "description": "Usually used when using llm-d-modelservice as a subchart.",
- "type": "boolean"
- },
"accelerator": {
"additionalProperties": false,
"description": " Supported types: nvidia, intel-i915, intel-xe, amd, google",
host@hostdeMacBook-Pro llm-d-modelservice % |
it's a good idea :-) |
a32542c to
c5d00f2
Compare
|
Can we get this merged? |
|
The chart version needs to be updated :-) |
c5d00f2 to
44a1c12
Compare
Not necessarily. We can collect multiple changes and then update/release a new version. EDIT: Well, it was already done...that is good. |
44a1c12 to
61601e1
Compare
|
Hmm, I am getting on error when testing this: {"level":"error","ts":"2025-10-17T07:47:42Z","logger":"controller-runtime.source.Kind","caller":"source/kind.go:80","msg":"failed to get informer from cache","error":"Timeout: failed waiting for *v1.InferencePool Informer to sync","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/source.(*Kind[...]).Start.func1.1\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.22.0/pkg/internal/source/kind.go:80\nk8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext.func1\n\t/go/pkg/mod/k8s.io/apimachinery@v0.34.0/pkg/util/wait/loop.go:53\nk8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext\n\t/go/pkg/mod/k8s.io/apimachinery@v0.34.0/pkg/util/wait/loop.go:54\nk8s.io/apimachinery/pkg/util/wait.PollUntilContextCancel\n\t/go/pkg/mod/k8s.io/apimachinery@v0.34.0/pkg/util/wait/poll.go:33\nsigs.k8s.io/controller-runtime/pkg/internal/source.(*Kind[...]).Start.func1\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.22.0/pkg/internal/source/kind.go:68"}I have the CRDs: inferenceobjectives.inference.networking.x-k8s.io 2025-10-17T07:30:05Z
inferencepools.inference.networking.k8s.io 2025-10-17T07:30:06Z
inferencepools.inference.networking.x-k8s.io 2025-10-17T07:30:07ZIs this related to the PR? |
Yes, I've upgraded the llm-d-inference-scheduler version. You can resolve this by running the following command: kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/releases/download/v1.0.1/v1-manifests.yaml |
I have already done:
Isn't that the same thing? |
Yeah, the same thing. |
|
Actually, there is an earlier error, which is related to RBAC, I guess. {"level":"error","ts":"2025-10-17T08:13:08Z","logger":"controller-runtime.cache.UnhandledError","caller":"runtime/runtime.go:221","msg":"Failed to watch","reflector":"pkg/mod/k8s.io/client-go@v0.34.0/tools/cache/reflector.go:290","type":"*v1alpha2.InferenceObjective","error":"failed to list *v1alpha2.InferenceObjective: inferenceobjectives.inference.networking.x-k8s.io is forbidden: User \"system:serviceaccount:llm-d:gaudi-llm-d-modelservice-epp\" cannot list resource \"inferenceobjectives\" in API group \"inference.networking.x-k8s.io\" in the namespace \"llm-d\"","stacktrace":"k8s.io/apimachinery/pkg/util/runtime.logError\n\t/go/pkg/mod/k8s.io/apimachinery@v0.34.0/pkg/util/runtime/runtime.go:221\nk8s.io/apimachinery/pkg/util/runtime.handleError\n\t/go/pkg/mod/k8s.io/apimachinery@v0.34.0/pkg/util/runtime/runtime.go:212\nk8s.io/apimachinery/pkg/util/runtime.HandleErrorWithContext\n\t/go/pkg/mod/k8s.io/apimachinery@v0.34.0/pkg/util/runtime/runtime.go:198\nk8s.io/client-go/tools/cache.DefaultWatchErrorHandler\n\t/go/pkg/mod/k8s.io/client-go@v0.34.0/tools/cache/reflector.go:205\nk8s.io/client-go/tools/cache.(*Reflector).RunWithContext.func1\n\t/go/pkg/mod/k8s.io/client-go@v0.34.0/tools/cache/reflector.go:361\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\t/go/pkg/mod/k8s.io/apimachinery@v0.34.0/pkg/util/wait/backoff.go:233\nk8s.io/apimachinery/pkg/util/wait.BackoffUntilWithContext.func1\n\t/go/pkg/mod/k8s.io/apimachinery@v0.34.0/pkg/util/wait/backoff.go:255\nk8s.io/apimachinery/pkg/util/wait.BackoffUntilWithContext\n\t/go/pkg/mod/k8s.io/apimachinery@v0.34.0/pkg/util/wait/backoff.go:256\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\t/go/pkg/mod/k8s.io/apimachinery@v0.34.0/pkg/util/wait/backoff.go:233\nk8s.io/client-go/tools/cache.(*Reflector).RunWithContext\n\t/go/pkg/mod/k8s.io/client-go@v0.34.0/tools/cache/reflector.go:359\nk8s.io/client-go/tools/cache.(*controller).RunWithContext.(*Group).StartWithContext.func3\n\t/go/pkg/mod/k8s.io/apimachinery@v0.34.0/pkg/util/wait/wait.go:63\nk8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1\n\t/go/pkg/mod/k8s.io/apimachinery@v0.34.0/pkg/util/wait/wait.go:72"}Any clue where this is coming from? |
61601e1 to
e65a82d
Compare
I am very sorry that I missed the rbac of the newly added cr resource, and it has been fixed. Please reinstall it |
|
Getting closer but you still need to add this. Then it works (at least for me). diff --git a/charts/llm-d-modelservice/templates/epp-role.yaml b/charts/llm-d-modelservice/templates/epp-role.yaml
index 4315ba3..8b912bd 100644
--- a/charts/llm-d-modelservice/templates/epp-role.yaml
+++ b/charts/llm-d-modelservice/templates/epp-role.yaml
@@ -5,9 +5,10 @@ metadata:
name: {{ include "llm-d-modelservice.eppRoleName" . }}
rules:
- apiGroups:
- - inference.networking.x-k8s.io
+ - inference.networking.k8s.io
resources:
- inferencepools
+ - inferenceobjectives
verbs:
- get
- watch |
|
Well, not quite. I get no error but the pod still terminates. Any clue why? {"level":"info","ts":"2025-10-17T08:48:34Z","logger":"setup","caller":"runner/runner.go:144","msg":"GIE build","commit-sha":"unknown","build-ref":""}
{"level":"info","ts":"2025-10-17T08:48:34Z","logger":"setup","caller":"runner/runner.go:157","msg":"Flags processed","flags":{"cert-path":"","config-file":"config/default-config.yaml","config-text":"","enable-pprof":true,"grpc-health-port":9003,"grpc-port":9002,"ha-enable-leader-election":false,"health-checking":false,"kubeconfig":"","kv-cache-usage-percentage-metric":"vllm:gpu_cache_usage_perc","lora-info-metric":"vllm:lora_requests_info","metrics-port":9090,"metrics-staleness-threshold":2000000000,"model-server-metrics-https-insecure-skip-verify":true,"model-server-metrics-path":"/metrics","model-server-metrics-port":0,"model-server-metrics-scheme":"http","pool-group":"inference.networking.k8s.io","pool-name":"gaudi-llm-d-modelservice","pool-namespace":"llm-d","refresh-metrics-interval":50000000,"refresh-prometheus-metrics-interval":5000000000,"secure-serving":true,"total-queued-requests-metric":"vllm:num_requests_waiting","v":4,"zap-devel":true,"zap-encoder":{},"zap-log-level":{},"zap-stacktrace-level":{},"zap-time-encoding":{}}}
{"level":"info","ts":"2025-10-17T08:48:34Z","logger":"saturation-detector-config","caller":"env/env.go:34","msg":"Environment variable not set, using default value","key":"SD_QUEUE_DEPTH_THRESHOLD","defaultValue":5}
{"level":"info","ts":"2025-10-17T08:48:34Z","logger":"saturation-detector-config","caller":"env/env.go:34","msg":"Environment variable not set, using default value","key":"SD_KV_CACHE_UTIL_THRESHOLD","defaultValue":0.8}
{"level":"info","ts":"2025-10-17T08:48:34Z","logger":"saturation-detector-config","caller":"env/env.go:34","msg":"Environment variable not set, using default value","key":"SD_METRICS_STALENESS_THRESHOLD","defaultValue":0.2}
{"level":"info","ts":"2025-10-17T08:48:34Z","logger":"saturation-detector-config","caller":"saturationdetector/config.go:70","msg":"SaturationDetector configuration loaded from env","config":"&{QueueDepthThreshold:5 KVCacheUtilThreshold:0.8 MetricsStalenessThreshold:200ms}"}
{"level":"info","ts":"2025-10-17T08:48:34Z","logger":"setup","caller":"env/env.go:34","msg":"Environment variable not set, using default value","key":"ENABLE_EXPERIMENTAL_DATALAYER_V2","defaultValue":false}
{"level":"info","ts":"2025-10-17T08:48:34Z","logger":"setup","caller":"runner/runner.go:226","msg":"Enabling pprof handlers"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:34Z","caller":"manager/internal.go:196","msg":"Registering metrics http server extra handler","path":"/debug/pprof/heap"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:34Z","caller":"manager/internal.go:196","msg":"Registering metrics http server extra handler","path":"/debug/pprof/goroutine"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:34Z","caller":"manager/internal.go:196","msg":"Registering metrics http server extra handler","path":"/debug/pprof/allocs"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:34Z","caller":"manager/internal.go:196","msg":"Registering metrics http server extra handler","path":"/debug/pprof/threadcreate"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:34Z","caller":"manager/internal.go:196","msg":"Registering metrics http server extra handler","path":"/debug/pprof/block"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:34Z","caller":"manager/internal.go:196","msg":"Registering metrics http server extra handler","path":"/debug/pprof/mutex"}
{"level":"info","ts":"2025-10-17T08:48:34Z","caller":"loader/configloader.go:49","msg":"Loaded configuration","config":"{Plugins: [{/prefix-cache-scorer, Parameters: {\"hashBlockSize\":5,\"lruCapacityPerServer\":31250,\"maxPrefixBlocksToMatch\":256}} {/decode-filter} {/max-score-picker} {/single-profile-handler}], SchedulingProfiles: [{Name: default, Plugins: [{PluginRef: decode-filter} {PluginRef: max-score-picker} {PluginRef: prefix-cache-scorer, Weight: 50}]}]}"}
{"level":"info","ts":"2025-10-17T08:48:34Z","caller":"loader/configloader.go:60","msg":"Configuration with defaults set","config":"{Plugins: [{prefix-cache-scorer/prefix-cache-scorer, Parameters: {\"hashBlockSize\":5,\"lruCapacityPerServer\":31250,\"maxPrefixBlocksToMatch\":256}} {decode-filter/decode-filter} {max-score-picker/max-score-picker} {single-profile-handler/single-profile-handler}], SchedulingProfiles: [{Name: default, Plugins: [{PluginRef: decode-filter} {PluginRef: max-score-picker} {PluginRef: prefix-cache-scorer, Weight: 50}]}]}"}
{"level":"info","ts":"2025-10-17T08:48:34Z","caller":"runner/runner.go:343","msg":"loaded configuration from file/text successfully"}
{"level":"info","ts":"2025-10-17T08:48:34Z","logger":"setup","caller":"runner/runner.go:249","msg":"parsed config","scheduler-config":"{ProfileHandler: single-profile-handler/single-profile-handler, Profiles: map[default:{Filters: [decode-filter/by-label], Scorers: [prefix-cache-scorer/prefix-cache-scorer: 50], Picker: max-score-picker/max-score-picker}]}"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:34Z","logger":"setup.SaturationDetector","caller":"saturationdetector/saturationdetector.go:89","msg":"Creating new SaturationDetector","queueDepthThreshold":5,"kvCacheUtilThreshold":0.8,"metricsStalenessThreshold":"200ms"}
{"level":"info","ts":"2025-10-17T08:48:34Z","logger":"setup","caller":"runner/runner.go:446","msg":"ExtProc server runner added to manager."}
{"level":"info","ts":"2025-10-17T08:48:34Z","logger":"setup","caller":"runner/runner.go:290","msg":"Controller manager starting"}
{"level":"info","ts":"2025-10-17T08:48:34Z","logger":"controller-runtime.metrics","caller":"server/server.go:208","msg":"Starting metrics server"}
{"level":"info","ts":"2025-10-17T08:48:34Z","logger":"controller-runtime.metrics","caller":"server/server.go:247","msg":"Serving metrics server","bindAddress":":9090","secure":false}
{"level":"info","ts":"2025-10-17T08:48:34Z","caller":"runnable/grpc.go:35","msg":"gRPC server starting","name":"health"}
{"level":"info","ts":"2025-10-17T08:48:34Z","caller":"runnable/grpc.go:44","msg":"gRPC server listening","name":"health","port":9003}
{"level":"info","ts":"2025-10-17T08:48:34Z","caller":"controller/controller.go:353","msg":"Starting EventSource","controller":"inferencepool","controllerGroup":"inference.networking.k8s.io","controllerKind":"InferencePool","source":"kind source: *v1.InferencePool"}
{"level":"info","ts":"2025-10-17T08:48:34Z","caller":"controller/controller.go:353","msg":"Starting EventSource","controller":"pod","controllerGroup":"","controllerKind":"Pod","source":"kind source: *v1.Pod"}
{"level":"info","ts":"2025-10-17T08:48:34Z","caller":"controller/controller.go:353","msg":"Starting EventSource","controller":"inferenceobjective","controllerGroup":"inference.networking.x-k8s.io","controllerKind":"InferenceObjective","source":"kind source: *v1alpha2.InferenceObjective"}
{"level":"Level(-3)","ts":"2025-10-17T08:48:34Z","logger":"controller-runtime.cache","caller":"cache/reflector.go:358","msg":"Starting reflector","type":"*v1.Pod","resyncPeriod":34471.287448561,"reflector":"pkg/mod/k8s.io/client-go@v0.34.0/tools/cache/reflector.go:290"}
{"level":"Level(-3)","ts":"2025-10-17T08:48:34Z","logger":"controller-runtime.cache","caller":"cache/reflector.go:404","msg":"Listing and watching","type":"*v1.Pod","reflector":"pkg/mod/k8s.io/client-go@v0.34.0/tools/cache/reflector.go:290"}
{"level":"Level(-3)","ts":"2025-10-17T08:48:34Z","logger":"controller-runtime.cache","caller":"cache/reflector.go:358","msg":"Starting reflector","type":"*v1.InferencePool","resyncPeriod":34881.462490506,"reflector":"pkg/mod/k8s.io/client-go@v0.34.0/tools/cache/reflector.go:290"}
{"level":"Level(-3)","ts":"2025-10-17T08:48:34Z","logger":"controller-runtime.cache","caller":"cache/reflector.go:404","msg":"Listing and watching","type":"*v1.InferencePool","reflector":"pkg/mod/k8s.io/client-go@v0.34.0/tools/cache/reflector.go:290"}
{"level":"Level(-3)","ts":"2025-10-17T08:48:34Z","logger":"controller-runtime.cache","caller":"cache/reflector.go:358","msg":"Starting reflector","type":"*v1alpha2.InferenceObjective","resyncPeriod":37158.80205366,"reflector":"pkg/mod/k8s.io/client-go@v0.34.0/tools/cache/reflector.go:290"}
{"level":"Level(-3)","ts":"2025-10-17T08:48:34Z","logger":"controller-runtime.cache","caller":"cache/reflector.go:404","msg":"Listing and watching","type":"*v1alpha2.InferenceObjective","reflector":"pkg/mod/k8s.io/client-go@v0.34.0/tools/cache/reflector.go:290"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:34Z","logger":"controller-runtime.cache","caller":"cache/reflector.go:436","msg":"Caches populated","type":"*v1.InferencePool","reflector":"pkg/mod/k8s.io/client-go@v0.34.0/tools/cache/reflector.go:290"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:34Z","logger":"controller-runtime.cache","caller":"cache/reflector.go:436","msg":"Caches populated","type":"*v1alpha2.InferenceObjective","reflector":"pkg/mod/k8s.io/client-go@v0.34.0/tools/cache/reflector.go:290"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:34Z","logger":"controller-runtime.cache","caller":"cache/reflector.go:436","msg":"Caches populated","type":"*v1.Pod","reflector":"pkg/mod/k8s.io/client-go@v0.34.0/tools/cache/reflector.go:290"}
{"level":"info","ts":"2025-10-17T08:48:35Z","caller":"controller/controller.go:286","msg":"Starting Controller","controller":"inferencepool","controllerGroup":"inference.networking.k8s.io","controllerKind":"InferencePool"}
{"level":"info","ts":"2025-10-17T08:48:35Z","caller":"controller/controller.go:289","msg":"Starting workers","controller":"inferencepool","controllerGroup":"inference.networking.k8s.io","controllerKind":"InferencePool","worker count":1}
{"level":"info","ts":"2025-10-17T08:48:35Z","caller":"controller/controller.go:286","msg":"Starting Controller","controller":"pod","controllerGroup":"","controllerKind":"Pod"}
{"level":"info","ts":"2025-10-17T08:48:35Z","caller":"controller/controller.go:289","msg":"Starting workers","controller":"pod","controllerGroup":"","controllerKind":"Pod","worker count":1}
{"level":"info","ts":"2025-10-17T08:48:35Z","caller":"controller/controller.go:286","msg":"Starting Controller","controller":"inferenceobjective","controllerGroup":"inference.networking.x-k8s.io","controllerKind":"InferenceObjective"}
{"level":"info","ts":"2025-10-17T08:48:35Z","caller":"controller/controller.go:289","msg":"Starting workers","controller":"inferenceobjective","controllerGroup":"inference.networking.x-k8s.io","controllerKind":"InferenceObjective","worker count":1}
{"level":"info","ts":"2025-10-17T08:48:35Z","caller":"runnable/grpc.go:35","msg":"gRPC server starting","name":"ext-proc"}
{"level":"info","ts":"2025-10-17T08:48:35Z","caller":"runnable/grpc.go:44","msg":"gRPC server listening","name":"ext-proc","port":9002}
{"level":"Level(-2)","ts":"2025-10-17T08:48:39Z","caller":"metrics/logger.go:81","msg":"Pool is not initialized, skipping refreshing metrics"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:44Z","logger":"health","caller":"runner/health.go:52","msg":"gRPC health check not serving (leader election disabled)","service":"envoy.service.ext_proc.v3.ExternalProcessor"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:44Z","caller":"metrics/logger.go:81","msg":"Pool is not initialized, skipping refreshing metrics"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:46Z","logger":"health","caller":"runner/health.go:52","msg":"gRPC health check not serving (leader election disabled)","service":"envoy.service.ext_proc.v3.ExternalProcessor"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:49Z","caller":"metrics/logger.go:81","msg":"Pool is not initialized, skipping refreshing metrics"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:54Z","logger":"health","caller":"runner/health.go:52","msg":"gRPC health check not serving (leader election disabled)","service":"envoy.service.ext_proc.v3.ExternalProcessor"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:54Z","caller":"metrics/logger.go:81","msg":"Pool is not initialized, skipping refreshing metrics"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:56Z","logger":"health","caller":"runner/health.go:52","msg":"gRPC health check not serving (leader election disabled)","service":"envoy.service.ext_proc.v3.ExternalProcessor"}
{"level":"Level(-2)","ts":"2025-10-17T08:48:59Z","caller":"metrics/logger.go:81","msg":"Pool is not initialized, skipping refreshing metrics"}
{"level":"Level(-2)","ts":"2025-10-17T08:49:04Z","logger":"health","caller":"runner/health.go:52","msg":"gRPC health check not serving (leader election disabled)","service":"envoy.service.ext_proc.v3.ExternalProcessor"}
{"level":"Level(-2)","ts":"2025-10-17T08:49:04Z","logger":"health","caller":"runner/health.go:52","msg":"gRPC health check not serving (leader election disabled)","service":"envoy.service.ext_proc.v3.ExternalProcessor"}
{"level":"Level(-2)","ts":"2025-10-17T08:49:04Z","caller":"plugins/plugin_state.go:109","msg":"Shutting down plugin state cleanup"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"manager/internal.go:555","msg":"Stopping and waiting for non leader election runnables"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"manager/internal.go:550","msg":"Stopping and waiting for warmup runnables"}
{"level":"Level(-2)","ts":"2025-10-17T08:49:04Z","caller":"metrics/logger.go:46","msg":"Shutting down prometheus metrics thread"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"runnable/grpc.go:54","msg":"gRPC server shutting down","name":"health"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"runnable/grpc.go:54","msg":"gRPC server shutting down","name":"ext-proc"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"runnable/grpc.go:65","msg":"gRPC server terminated","name":"health"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"runnable/grpc.go:65","msg":"gRPC server terminated","name":"ext-proc"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"manager/internal.go:559","msg":"Stopping and waiting for leader election runnables"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"controller/controller.go:309","msg":"Shutdown signal received, waiting for all workers to finish","controller":"inferenceobjective","controllerGroup":"inference.networking.x-k8s.io","controllerKind":"InferenceObjective"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"controller/controller.go:309","msg":"Shutdown signal received, waiting for all workers to finish","controller":"inferencepool","controllerGroup":"inference.networking.k8s.io","controllerKind":"InferencePool"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"controller/controller.go:311","msg":"All workers finished","controller":"inferenceobjective","controllerGroup":"inference.networking.x-k8s.io","controllerKind":"InferenceObjective"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"controller/controller.go:311","msg":"All workers finished","controller":"inferencepool","controllerGroup":"inference.networking.k8s.io","controllerKind":"InferencePool"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"controller/controller.go:309","msg":"Shutdown signal received, waiting for all workers to finish","controller":"pod","controllerGroup":"","controllerKind":"Pod"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"controller/controller.go:311","msg":"All workers finished","controller":"pod","controllerGroup":"","controllerKind":"Pod"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"manager/internal.go:567","msg":"Stopping and waiting for caches"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"manager/internal.go:571","msg":"Stopping and waiting for webhooks"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"manager/internal.go:574","msg":"Stopping and waiting for HTTP servers"}
{"level":"Level(-3)","ts":"2025-10-17T08:49:04Z","logger":"controller-runtime.cache","caller":"cache/reflector.go:364","msg":"Stopping reflector","type":"*v1.Pod","resyncPeriod":34471.287448561,"reflector":"pkg/mod/k8s.io/client-go@v0.34.0/tools/cache/reflector.go:290"}
{"level":"Level(-3)","ts":"2025-10-17T08:49:04Z","logger":"controller-runtime.cache","caller":"cache/reflector.go:364","msg":"Stopping reflector","type":"*v1alpha2.InferenceObjective","resyncPeriod":37158.80205366,"reflector":"pkg/mod/k8s.io/client-go@v0.34.0/tools/cache/reflector.go:290"}
{"level":"Level(-3)","ts":"2025-10-17T08:49:04Z","logger":"controller-runtime.cache","caller":"cache/reflector.go:364","msg":"Stopping reflector","type":"*v1.InferencePool","resyncPeriod":34881.462490506,"reflector":"pkg/mod/k8s.io/client-go@v0.34.0/tools/cache/reflector.go:290"}
{"level":"info","ts":"2025-10-17T08:49:04Z","logger":"controller-runtime.metrics","caller":"server/server.go:254","msg":"Shutting down metrics server with timeout of 1 minute"}
{"level":"info","ts":"2025-10-17T08:49:04Z","caller":"manager/internal.go:578","msg":"Wait completed, proceeding to shutdown the manager"}
{"level":"info","ts":"2025-10-17T08:49:04Z","logger":"setup","caller":"runner/runner.go:295","msg":"Controller manager terminated"} |
According to the IGW code, this is because the pool has not completed syncing. |
e65a82d to
f476f43
Compare
Thanks for reminding, it has been added. The previous operation was wrong, resulting in a duplicate crd group |
Ah, yeah. I have no gateway etc. running. Let me fix that. |
Excuse me, is the EPP Pod working properly now? |
|
This works now (at least for me). Details:
IMO, this is now ready for merge. |
GAIE charts only initiate health checks when the number of copies exceeds 1. https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/config%2Fcharts%2Finferencepool%2Ftemplates%2Fepp-deployment.yaml#L85-L85 However, the lm-d model-service defaults to initiating health checks. |
Due to GAIE modifying the health check and readiness check logic, if the gateway is not installed or malfunctions, the EPP pod fails the checks and remains in a perpetual failure state. Therefore, I have modified the default settings for the llm-d-modelservice to disable both the health check and readiness check by default.
|
a5ad946 to
38b2e04
Compare
Signed-off-by: learner0810 <zhongjun.li@daocloud.io>
38b2e04 to
b86b3e2
Compare
|
The endpoint picker/inferencepool pieces should all be removed from the modelservice chart. There is an upstream chart defined here: https://github.com/kubernetes-sigs/gateway-api-inference-extension/tree/main/config/charts/inferencepool (released versions at oci://registry.k8s.io/gateway-api-inference-extension/charts/inferencepool) that has all these updates already. Can you try this and see where you are stuck? See #135. |
Could you please clarify whether you mean all EPP-related YAML files need to be deleted? |
|
resolved #145 |

FIX: #121