RHOAIENG-34472: [WIP] GIE v1 Migration by KillianGolds · Pull Request #948 · opendatahub-io/kserve

KillianGolds · 2025-10-20T14:21:19Z

Overview

This PR migrates KServe's LLMInferenceService to Gateway API Inference Extension (GIE) v1.0.0, implementing a dual-pool architecture for zero-downtime backward compatibility. 415 files changed with significant code generation, but core logic changes are focused in specific controller and API files.

Upstream Challenges & Contributions:
During this migration, critical bugs were discovered in upstream dependencies that required fixes:

Gateway API bug: kubernetes-sigs/gateway-api#4178 - Fixed issues blocking GIE v1 migration
GIE validation bug: kubernetes-sigs/gateway-api-inference-extension#1679 - Fixed InferencePool validation issues

Major Dependency Upgrades (merged from upstream during development):

Kubernetes: v0.33.1 → v0.34.1
Gateway API: v1.2.1 → v1.4.0
Gateway API Inference Extension: v0.3.0 → v1.0.0
KEDA: v2.16.1 → v2.18.0
controller-runtime: v0.19.7 → v0.22.3

What This PR Does

Updates KServe's LLMInferenceService from GIE v1alpha2 to v1.0.0 with:

✅ Dual InferencePool creation (GIE v1 + v1alpha2 simultaneously)
✅ v1alpha2 API versioning with conversion webhooks (v1alpha2 storage ↔ v1alpha1 served)
✅ One-way traffic migration from v1alpha2 → v1 InferencePools
✅ Zero-downtime migration with backward compatibility

🔍 Key Files for Review (Core Logic)

1. API Versioning & Conversion Webhooks (NEW)

v1alpha2 API (Storage Version - Hub)

pkg/apis/serving/v1alpha2/llm_inference_service_types.go - v1alpha2 LLMInferenceService types (storage version in etcd)
pkg/apis/serving/v1alpha2/llm_inference_service_conversion.go - Hub conversion methods
pkg/apis/serving/v1alpha2/llm_inference_service_defaults.go - v1alpha2 defaulting logic
pkg/apis/serving/v1alpha2/validation/llminferenceservice_validator.go - v1alpha2 validation webhooks

v1alpha1 API (Served Version - Backward Compatibility)

pkg/apis/serving/v1alpha1/llm_inference_service_conversion.go - Spoke conversion methods (v1alpha1 ↔ v1alpha2)
pkg/apis/serving/v1alpha1/llm_inference_service_types.go - Updated v1alpha1 types with annotation-based field storage

GIE v1 Types Embedded in v1alpha1

pkg/apis/serving/v1alpha1/llm_inference_service_inference_pool_types_v1alpha2.go - GIE types copied from upstream for v1alpha1 backward compatibility

2. Dual-Pool Controller Logic (CRITICAL)

Scheduler - Dual InferencePool Creation

pkg/controller/llmisvc/scheduler.go:157-213
- Line 168: Creates GIE v1 InferencePool using typed client
- Line 172: Creates v1alpha2 InferencePool using dynamic client
- Line 667-712: reconcileAlpha2Pool() manages v1alpha2 InferencePool lifecycle

Router - One-Way Traffic Migration

pkg/controller/llmisvc/router.go:195-262
- Line 195-213: Migration logic - shifts traffic from v1alpha2 → v1 when v1 pool becomes ready
- Line 227-262: Dual backend ref creation for HTTPRoutes
- Line 247: Annotation serving.kserve.io/inference-pool-migrated: v1 prevents rollback

Controller Entrypoint

pkg/controller/llmisvc/controller.go:94-139
- Line 107-115: Watches GIE v1 InferencePool status changes for migration triggers
- Line 94-105: Main reconciliation loop

3. CRD & Webhook Configuration

Conversion Webhook Patches

config/crd/full/llmisvc_conversion_patch.yaml - LLMInferenceService conversion webhook config
config/crd/full/llmisvcconfig_conversion_patch.yaml - LLMInferenceServiceConfig conversion webhook config

CA Injection for Conversion Webhooks

Manager Entrypoint

cmd/manager/main.go:149-160
- Registers both v1alpha1 and v1alpha2 webhooks
- Sets up GIE v1 InferencePool watcher

4. HTTPRoute Configuration

Dual Backend References

config/llmisvc/config-llm-router-route.yaml:18-95
- Lines 18-75: /v1/completions and /v1/chat/completions with dual InferencePool backend refs
- Lines 76-95: Catch-all route uses direct Service backend (intentionally not InferencePool)

5. Upstream Integration (CA Certificate Signing)

Merged & Adapted from upstream/release-v0.15

pkg/controller/llmisvc/workload_tls_self_signed.go
- Cherry-picked upstream CA cert signing feature
- Adapted all function signatures from v1alpha1 → v1alpha2
- Updated GIE imports from v1alpha2 → v1

6. RBAC Updates

GIE v1 InferencePool Permissions

config/rbac/role.yaml - Added GIE v1 InferencePool RBAC rules

7. Build & Tooling

Makefile Fix

Makefile:164 - Fixed kubectl kustomize URL quoting for gateway-inference-extension CRD generation

📊 Change Statistics

Total: 415 files changed, 204,979 insertions(+), 3,457 deletions(-)
Generated Code: ~200k lines (CRDs, OpenAPI, Python SDK, deepcopy functions)
Core Logic: ~25 controller/API files with substantive changes

🏗️ Architecture

Dual-Pool Strategy

┌─────────────────────────────────────────────────────────────┐
│  HTTPRoute (/v1/completions, /v1/chat/completions)          │
│  ┌──────────────────────────────────────────────────────┐   │
│  │ Backend Refs (Initially)                             │   │
│  │  • v1alpha2 InferencePool: weight=100 (100% traffic) │   │
│  │  • v1 InferencePool: weight=0 (0% traffic)           │   │
│  └──────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────┘
                          ↓
              ┌──────────────────────┐
              │ Migration Trigger    │
              │ (v1 pool ready)      │
              └──────────────────────┘
                          ↓
┌─────────────────────────────────────────────────────────────┐
│  HTTPRoute (After Migration)                                │
│  ┌──────────────────────────────────────────────────────┐   │
│  │ Backend Refs                                         │   │
│  │  • v1 InferencePool: weight=100 (100% traffic)       │   │
│  │  • Annotation: inference-pool-migrated=v1            │   │
│  │  • One-way migration (no rollback)                   │   │
│  └──────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────┘

API Version Flow

┌────────────┐    Conversion     ┌────────────┐
│  v1alpha1  │ ←──  Webhook  ──→ │  v1alpha2  │
│  (Served)  │                   │  (Storage) │
└────────────┘                   └────────────┘
     ↓                                  ↓
  Existing                         Saved in
  Deployments                      etcd

✅ Testing

All Tests Passing (Latest CI Run: #1983448087489679360):

✅ Unit tests: make test
✅ Integration tests: make test-integration
✅ Controller integration: make controller-int-test
✅ LLMInferenceService envtest: make test-llmisvc - 48/48 PASS
✅ E2E LLMInferenceService tests: 26 passed, 3 skipped (38 min runtime)
- Tests both v1alpha1 and v1alpha2 API versions
- Tests dual-pool migration (router-managed, scheduler-managed, no-scheduler)
- Tests authentication (enabled/disabled, token validation)
- Tests custom route timeouts, gateway refs, and prefill/decode workloads
✅ Precommit checks: make precommit

Test Coverage Highlights:

v1alpha1 API: 13 LLM inference tests + 3 auth tests
v1alpha2 API: 13 LLM inference tests + 3 auth tests
Migration scenarios: Single node, router-managed, scheduler-managed, prefill-decode
Models tested: facebook/opt-125m, qwen2.5-0.5b, llmd-simulator

Recent Test Fixes:

Fixed HTTP 409 errors in e2e tests by reducing parallelism and adding wait loops
Fixed Python SDK test failures by removing orphaned model files
Fixed webhook validation for v1alpha1/v1alpha2 separation

Migration Behavior

One-Way Migration Logic

Initial State: LLMInferenceService created
- Both v1 and v1alpha2 InferencePools created
- HTTPRoute: 100% → v1alpha2, 0% → v1
Migration Trigger: v1 InferencePool becomes Ready
- Controller detects v1 pool readiness (watches InferencePool status)
- Shifts traffic: 0% → v1alpha2, 100% → v1
- Sets annotation: serving.kserve.io/inference-pool-migrated: v1
Final State:
- Traffic permanently on v1 InferencePool
- No rollback even if v1 pool fails (prevents flapping)
- v1alpha2 pool remains (for future cleanup/deprecation)

Backward Compatibility

v1alpha1 clients: Automatically converted to v1alpha2 via conversion webhooks
Existing deployments: Continue working with v1alpha1 API
New deployments: Can use either v1alpha1 or v1alpha2 API
Storage: All versions stored as v1alpha2 in etcd

Special Notes for Reviewers

CRD Changes: Both LLMInferenceService and LLMInferenceServiceConfig CRDs now have:
- v1alpha2 as storage version (stored in etcd)
- v1alpha1 as served version (backward compatibility)
- Conversion webhooks for automatic translation
Dynamic Client Usage: v1alpha2 InferencePool uses DynamicClient because GIE v1alpha2 types aren't in the typed client (experimental API)
Migration Annotation: Once serving.kserve.io/inference-pool-migrated: v1 is set, traffic stays on v1 InferencePool permanently
Catch-all Route: Intentionally uses direct Service backend (not InferencePool) - always available, doesn't require endpoint picker scheduling
Test Cleanup Noise: make test-llmisvc shows envtest teardown errors, but all 48 tests pass (expected envtest behavior)
Makefile Fix: Added URL quoting for kubectl kustomize to prevent shell interpretation issues during make precommit
Python SDK Cleanup: Removed orphaned KServe v1alpha1 model files that were incorrectly generated artifacts

Checklist:

Have you added unit/e2e tests that prove your fix is effective or that this feature works?
Has code been commented, particularly in hard-to-understand areas?
Have you made corresponding changes to the documentation? (Pending)
Have you linked the JIRA issue(s) to this PR?

Release note:

Update LLMInferenceService to Gateway API Inference Extension v1.0.0 with dual-pool architecture for zero-downtime migration. Implements v1alpha2 API versioning with automatic conversion webhooks for backward compatibility.

Re-running failed tests

/rerun-all - rerun all failed workflows.
/rerun-workflow <workflow name> - rerun a specific failed workflow. Only one workflow name can be specified. Multiple /rerun-workflow commands are allowed per comment.

- Update API to use v1 InferencePool spec - Create both v1 and v1alpha2 InferencePool objects - Implement one-way migration with HTTPRoute annotation - Add dual backendRefs to HTTPRoute (v1:100, v1alpha2:0) - Report InferencePoolReady when either pool is ready (prioritize v1) - Store migration state on child objects (HTTPRoute) - Add test coverage for migration logic Implements RHOAIENG-34472 acceptance criteria: - ✅ v1 API usage - ✅ Dual pool creation - ✅ Readiness aggregation - ✅ HTTPRoute migration logic Signed-off-by: Killian Golds <kgolds@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED

- Remove unused context assignment in config_merge_test.go - Fix empty branch in scheduler.go with explicit error ignore - Remove unnecessary string conversions for Condition types - Fix error handling in deleteAlpha2* functions to properly check IsNotFound - Auto-format code with go fmt (config_merge.go, openapi_generated.go) Signed-off-by: Killian Golds <kgolds@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED

Downgrade from v1.0.1 to v1.0.0 to resolve controller-gen compatibility issue. v1.0.1 has invalid kubebuilder markers (maxitem/minitems on non-array field) in shared_types.go:142 that cause controller-gen v0.16.2 to fail during manifest generation. Signed-off-by: Killian Golds <kgolds@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED

Upgrade controller-gen from v0.16.2 to v0.17.2 to resolve compatibility issue with gateway-api-inference-extension v1.0.0. Signed-off-by: Killian Golds <kgolds@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED

…nifests Gateway API Inference Extension v1.0.0 has invalid kubebuilder validation markers (MinItems/MaxItems on a map field) causing controller-gen to fail. Applied temporary fork workaround pointing to personal fork based on v1.0.0 tag with the fix applied. Changes: - Add replace directive in go.mod for GIE fork at kserve-compatibility-v1.0 - Regenerate all CRD manifests with controller-gen v0.17.2 - Regenerate Go code (deepcopy, openapi) and Python SDK - Fix linting errors in scheduler.go The fork maintains K8s v0.33.4 compatibility (no version conflicts). This is a temporary workaround until upstream merges the fix and releases a patched version. Upstream issue: kubernetes-sigs/gateway-api-inference-extension#1678 Upstream fix PR: kubernetes-sigs/gateway-api-inference-extension#1679 rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED

Integration tests require the v1 InferencePool CRD to be present in test/crds/ for EnvTest to load it. Downloaded from the official GIE release-1.0 branch. This fixes the validation test suite that was failing due to missing v1 CRDs when trying to create config presets that reference v1 InferencePool resources. rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED

Regenerate openapi_generated.go to restore blank lines that were incorrectly removed. These blank lines are required by the code generation tooling. rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED

The controller creates both v1 and v1alpha2 InferencePools for zero-downtime migration. Tests were failing because the v1alpha2 CRD was missing from the test/crds/ directory. Downloaded from: https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/config/crd/bases/inference.networking.x-k8s.io_inferencepools.yaml rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED

The GIE v1 InferencePool CRD requires at least one label in selector.matchLabels, but config-llm-scheduler.yaml had an empty map. This adds app.kubernetes.io/part-of label which: - Satisfies CRD validation requirements - Matches the workload labels that GetWorkloadLabelSelector adds - Gets merged with dynamic labels (app.kubernetes.io/name, kserve.io/component) by the controller in config_merge.go:156-162 rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED

GIE v1 requires the port field when endpointPickerRef kind is 'Service'. The scheduler endpoint picker service listens on port 9002. This is a schema change from v1alpha2 to v1 - the port field structure changed from portNumber (int32) to port.number (Port object with number field). rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED

- Initialize DynamicClient in test environment (required for v1alpha2 resource creation) - Add DefaultServiceAccount helper for test namespaces (EnvTest doesn't create it) - Add v1alpha2 InferenceModel CRD to gie-v1alpha2.yaml - Update test helper to mark both v1 and v1alpha2 InferencePools as ready - Remove duplicate test/crds/gie.yaml file Test results: 22 failures → 13 failures (25 passing) rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED

The code generator adds trailing blank lines that were previously removed. rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED

- Add required parentRef to v1 InferencePool status - Add ownerReferences when converting v1 to v1alpha2 pools - Add ResolvedRefs condition to pool readiness helpers These changes ensure v1 InferencePools pass validation and are correctly detected as ready by the migration logic. rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED

Changed plain int to int64 for port numbers in v1ToAlpha2Unstructured to fix "cannot deep copy int" panic during InferencePool conversion. Kubernetes unstructured objects require JSON-serializable types. Plain Go int is not JSON-serializable, but int64 is. This fixes 4 failing GIE v1 migration tests: - should create both v1 and v1alpha2 InferencePool objects - should create HTTPRoute with dual backend refs - should set migration annotation when v1 becomes ready - should report InferencePoolReady when pools become ready rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED

Restoring original config_merge.go structure from release-v0.15 to fix strategic merge bug that was causing container fields to become nil. The bug occurred when empty PodSpec{} structs generated patches with null fields, wiping out base container configurations during merge. This was caused by removing the critical override.SetDefaults(ctx) call which ensures proper defaulting before patch generation. Changes: - Restore context parameters and SetDefaults() call in mergeSpecs() - Restore logging and HTTPRoute backend ref configuration logic - Restore nested if/else structure for config reference selection - Apply minimal GIE v1 migration changes: - Update imports: igwapi → igwv1 - Update selector logic for v1 LabelSelector (typed map) - Update isDefaultBackendRef to use igwv1.GroupName Test updates for dual InferencePool backend strategy: - Update routing tests to expect both v1 and v1alpha2 backends - Add BackendRefServiceWithWeight() test helper - Fix WithExtensionRef() to include required Port field for GIE v1 Test results: 38/38 passing (up from 28/38) Signed-off-by: Killian Golds <kgolds@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED

Restores igwv1.Install() and InferencePool ownership watch that were inadvertently removed during refactoring. Note: InferenceModel is not included, as it was removed from the GIE v1 API and only existed in v1alpha2. Signed-off-by: Killian Golds <kgolds@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED

Add controller watches for v1alpha2 InferencePool and InferenceModel resources to ensure the controller reacts to status changes during the GIE v1alpha2 to v1 migration. Uses PartialObjectMetadata for efficient watching without requiring typed API definitions. Signed-off-by: Killian Golds <kgolds@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED

Upgrade core dependencies in preparation for Gateway API Inference Extension (GIE) v1: Kubernetes: v0.33.4 → v0.34.1 Gateway API: v1.3.0 → v1.4.0 KEDA: v2.16.1 → v2.18.0 controller-runtime: v0.19.1 → v0.22.1 structured-merge-diff: v4 → v6 Key changes:Add ToCorev1PodSpec() method with explicit field mapping Required for structured-merge-diff v6 field ownership tracking Ensures proper Server-Side Apply behavior for custom components Added comprehensive test coverage (18 test cases) Fix Gateway API v1.4 compatibility Removed unnecessary PortNumber type conversions (now a type alias for int32) Updated 40+ test files with correct port number patterns Remove deprecated controller-runtime v0.22 usage Replaced result.Requeue checks with RequeueAfter Update generated code Regenerated CRDs, OpenAPI specs, and Python SDK for new API versions Signed-off-by: Killian Golds <kgolds@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED

Update all Go builder images from go-toolset:1.24 to go-toolset:1.24.7 to match go.mod requirements from dependency upgrades. Signed-off-by: Killian Golds <kgolds@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED

Set GOTOOLCHAIN=local in Dockerfiles to allow go-toolset:1.24 (Go 1.24.6) to build code with go.mod requiring 1.24.7. The 1.24.7 tag doesn't exist in the Red Hat UBI registry. Signed-off-by: Killian Golds <kgolds@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED

Set GOTOOLCHAIN=auto in Dockerfiles to allow Go 1.24.6 (from go-toolset:1.24) to automatically download Go 1.24.7 when required by go.mod dependencies. Signed-off-by: Killian Golds <kgolds@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED

Add GOTOOLCHAIN=auto to all Go builder images to allow Go 1.24.6 (from go-toolset:1.24) to automatically download Go 1.24.7 when required by go.mod dependencies. Signed-off-by: Killian Golds <kgolds@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED

Commented out go-licenses installation and checks in localmodel.Dockerfile and localmodel-agent.Dockerfile. The go-licenses tool is incompatible with GOTOOLCHAIN=auto because it cannot locate the standard library when Go downloads the toolchain to a non-standard location. This change aligns with the existing pattern in agent.Dockerfile, router.Dockerfile, and Dockerfile, where go-licenses steps are already commented out. Signed-off-by: Killian Golds <kgolds@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED

…ions Add RBAC permissions for inferencemodels and inferenceobjectives resources in the GIE v1 API group (inference.networking.k8s.io). These permissions are required for the controller to create scheduler Roles that grant access to these resources. Previously, only inferencepools permissions were granted for the v1 API group, while v1alpha2 had all three resources. This caused e2e test failures when the controller attempted to grant permissions it didn't hold (RBAC escalation prevention). Signed-off-by: Killian Golds <kgolds@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED

Updated e2e test infrastructure to install GIE v1.0.0 instead of v0.5.0. This is required for the GIE v1 migration since the v0.5.0 controller only supports v1alpha2 InferencePools and cannot set status on v1 InferencePools. Without this change, InferencePool resources remain in a NotReady state because no controller manages them, causing all e2e tests to fail. Signed-off-by: Killian Golds <kgolds@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED

… check EvaluateInferencePoolConditions() was only checking v1 InferencePool readiness, causing tests to fail in environments without the GIE v1 controller (e.g., OpenShift CI). This broke the dual-pool fallback strategy implemented in reconcileSchedulerInferencePool(). The function now checks BOTH v1 and v1alpha2 InferencePools and marks the service ready if EITHER pool is ready. This allows LLMInferenceService to work with v1alpha2 InferencePools until GIE v1 controller integration is available, maintaining the intended migration strategy. Fixes E2E test failures where WorkloadsReady=True but RouterReady=False due to InferencePoolNotReady condition being incorrectly set. Signed-off-by: Killian Golds <kgolds@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED

…er-managed routes Ensures HTTPRoutes reference both v1 (0%) and v1alpha2 (100%) InferencePools from the start, allowing the v1alpha2 controller to process pools immediately. Fixes E2E test failures where pools remained unready due to missing backend refs. Signed-off-by: Killian Golds <kgolds@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED

Signed-off-by: Killian Golds <kgolds@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED

When auto-injecting dual InferencePool backend refs, v1 now respects external pool references while v1alpha2 always uses the default managed pool name as a fallback. This fixes the external pool reference scenario where v1 points to the user-provided pool and v1alpha2 provides fallback to the auto-generated pool. Signed-off-by: Killian Golds <kgolds@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED

Prevents premature migration to v1 InferencePool when the Gateway Controller does not yet support the v1 API. The previous logic only checked whether the InferencePool resource was ready (Accepted=True) but did not verify that the Gateway Controller could actually route to it. In OpenShift environments without GIE v1 integration, the v1 InferencePool may report Accepted=True, but the Gateway Controller rejects it with ResolvedRefs=False (reason: InvalidKind). This caused traffic to be sent to an unsupported backend, resulting in 100% failure. The fix adds isV1BackendResolvable() to check the HTTPRoute’s ResolvedRefs condition, ensuring the Gateway Controller can resolve v1 backend refs before migrating. This allows services to correctly fall back to v1alpha2 until full v1 support is available. Signed-off-by: Killian Golds <kgolds@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED

KillianGolds · 2025-11-13T23:20:35Z

/test e2e-llm-inference-service

Reduce webhook timeout failures in CI by throttling reconciliation and staggering dual InferencePool creation. Changes: - Lower MaxConcurrentReconciles from 8 → 4 (controller.go:259) - Add 100ms delay between v1 and v1alpha2 pool creation (scheduler.go:173–176) Root cause: I believe due to the complexity of the dual pool architectures high concurrency caused 700+ simultaneous conversion webhook calls from dual InferencePool creation, overloading the webhook server and leading to 30s timeouts. Impact: - ~50% lower webhook burst load - Sequential pool creation within 100ms - Ideally resolves conversion webhook timeout errors in CI runs Signed-off-by: Killian Golds <kgolds@redhat.com>

KillianGolds · 2025-11-14T12:53:40Z

/test e2e-llm-inference-service

CPU model compilation takes 8–10 minutes (vs 30–60s on GPU), causing the default liveness probe settings to kill pods prematurely. Updates: - workload-single-cpu: initialDelaySeconds increased from 30s to 600s - workload-pd-cpu (decode): added 600s initialDelaySeconds - workload-pd-cpu (prefill): added 600s initialDelaySeconds Required to prevent premature restarts, especially under the dual pool architecture Signed-off-by: Killian Golds <kgolds@redhat.com>

Fix 20 nil pointer errors across workload configuration templates by replacing unsafe or usage with safe if and checks. The Go template or operator eagerly evaluates both operands, causing nil pointer panics when nested fields are nil. Example: {{ or .Spec.Parallelism.Data 1 }} fails if .Spec.Parallelism is nil. Using if and short-circuits safely and prevents accessing nil fields. Changes: - config-llm-worker-data-parallel.yaml: 4 fixes - config-llm-decode-worker-data-parallel.yaml: 4 fixes - config-llm-prefill-worker-data-parallel.yaml: 12 fixes Replaced: {{ or .Spec.Parallelism.Data 1 }} with: {{ if and .Spec.Parallelism .Spec.Parallelism.Data }}{{ .Spec.Parallelism.Data }}{{ else }}1{{ end }} Fixes template merge failures that blocked workload creation. Signed-off-by: Killian Golds <kgolds@redhat.com>

…Name Add SafeChildName to avoid exceeding Kubernetes’ 63-char name limit when StatefulSet controllers append pod hashes and indices. Problem: kmeta.ChildName() truncates to 63 chars but doesn’t account for the Kubernetes-generated suffix (e.g., -f4b4c86d6-0, 12 chars). This caused names like: - llmisvc-model-deepseek-v2-lite-3965b7a6-v1alpha1-kserve-mn-f4b4c86d6 (68 chars) - llmisvc-model-fb-opt-125m-route-f161173d-v1alpha1-kserve-mn-64cd5f968f (71 chars) Solution: SafeChildName reserves 12 chars for suffixes and limits the base name to 51 chars: - Ensures base + suffix ≤ 63 chars - Adds 8-char SHA256 hash when truncation is required - Guarantees pod names like name-hash-f4b4c86d6-0 always fit limits Changes: - naming.go: Implement SafeChildName - naming_test.go: Add length/uniqueness tests - workload_multi_node.go: Replace kmeta.ChildName - workload_single_node.go: Replace kmeta.ChildName Fixes workload creation failures due to long resource names. Signed-off-by: Killian Golds <kgolds@redhat.com>

…d external routes Fix HTTPRoute cleanup issues in two cases: 1. Controller-managed HTTPRoutes (.spec.router.route.HTTP.Spec): - semanticHTTPRouteIsEqual() now compares owner references - Controller updates HTTPRoutes when owner refs differ - Ensures Kubernetes garbage collection works correctly 2. External HTTPRoutes (.spec.router.route.HTTP.Refs): - Added delete_router_with_refs_httproutes() for explicit cleanup - Router-with-refs tests create routes not owned by the controller - Prevents these routes from persisting between test runs Changes: - pkg/controller/llmisvc/router.go: owner reference comparison added - test_llm_inference_service.py: cleanup logic for external HTTPRoutes Impact: - Fixes HTTP 500 cluster_not_found errors in scheduler tests - Improves e2e pass rate from 77% to 85% - Prevents HTTPRoute buildup across test runs - Ensures proper garbage collection for controller-managed routes Signed-off-by: Killian Golds <kgolds@redhat.com>

Updated conversion webhook configuration for LLMInferenceService and LLMInferenceServiceConfig to point to the actual odh-model-controller webhook service: - Namespace: kserve → opendatahub - Service: kserve-webhook-server-service → odh-model-controller-webhook-service This ensures CRD conversion requests are routed correctly to the webhook deployed in the opendatahub namespace. Signed-off-by: Killian Golds <kgolds@redhat.com>

The session-scoped rhcl_available() fixture now explicitly calls config.load_kube_config() so the Kubernetes client is initialized before checking for the AuthPolicy CRD. This prevents failures caused by the fixture running before any other Kubernetes-initializing fixtures. Signed-off-by: Killian Golds <kgolds@redhat.com>

Auth tests need auth enabled by default, so only apply the security.opendatahub.io/enable-auth: false annotation to tests without the @pytest.mark.auth marker. Signed-off-by: Killian Golds <kgolds@redhat.com>

openshift-ci-robot · 2025-11-19T16:53:05Z

@KillianGolds: This pull request references RHOAIENG-34472 which is a valid jira issue.

Details

In response to this:

Overview

This PR migrates KServe's LLMInferenceService to Gateway API Inference Extension (GIE) v1.0.0, implementing a dual-pool architecture for zero-downtime backward compatibility. 415 files changed with significant code generation, but core logic changes are focused in specific controller and API files.

Upstream Challenges & Contributions:
During this migration, critical bugs were discovered in upstream dependencies that required fixes:

Gateway API bug: kubernetes-sigs/gateway-api#4178 - Fixed issues blocking GIE v1 migration

GIE validation bug: kubernetes-sigs/gateway-api-inference-extension#1679 - Fixed InferencePool validation issues

Major Dependency Upgrades (merged from upstream during development):

Kubernetes: v0.33.1 → v0.34.1

Gateway API: v1.2.1 → v1.4.0

Gateway API Inference Extension: v0.3.0 → v1.0.0

KEDA: v2.16.1 → v2.18.0

controller-runtime: v0.19.7 → v0.22.3

What This PR Does

Updates KServe's LLMInferenceService from GIE v1alpha2 to v1.0.0 with:

✅ Dual InferencePool creation (GIE v1 + v1alpha2 simultaneously)

✅ v1alpha2 API versioning with conversion webhooks (v1alpha2 storage ↔ v1alpha1 served)

✅ One-way traffic migration from v1alpha2 → v1 InferencePools

✅ Zero-downtime migration with backward compatibility

🔍 Key Files for Review (Core Logic)

1. API Versioning & Conversion Webhooks (NEW)

v1alpha2 API (Storage Version - Hub)

pkg/apis/serving/v1alpha2/llm_inference_service_types.go - v1alpha2 LLMInferenceService types (storage version in etcd)

pkg/apis/serving/v1alpha2/llm_inference_service_conversion.go - Hub conversion methods

pkg/apis/serving/v1alpha2/llm_inference_service_defaults.go - v1alpha2 defaulting logic

pkg/apis/serving/v1alpha2/validation/llminferenceservice_validator.go - v1alpha2 validation webhooks

v1alpha1 API (Served Version - Backward Compatibility)

pkg/apis/serving/v1alpha1/llm_inference_service_conversion.go - Spoke conversion methods (v1alpha1 ↔ v1alpha2)

pkg/apis/serving/v1alpha1/llm_inference_service_types.go - Updated v1alpha1 types with annotation-based field storage

GIE v1 Types Embedded in v1alpha1

pkg/apis/serving/v1alpha1/llm_inference_service_inference_pool_types_v1alpha2.go - GIE types copied from upstream for v1alpha1 backward compatibility

2. Dual-Pool Controller Logic (CRITICAL)

Scheduler - Dual InferencePool Creation

pkg/controller/llmisvc/scheduler.go:157-213

Line 168: Creates GIE v1 InferencePool using typed client

Line 172: Creates v1alpha2 InferencePool using dynamic client

Line 667-712: reconcileAlpha2Pool() manages v1alpha2 InferencePool lifecycle

Router - One-Way Traffic Migration

pkg/controller/llmisvc/router.go:195-262

Line 195-213: Migration logic - shifts traffic from v1alpha2 → v1 when v1 pool becomes ready

Line 227-262: Dual backend ref creation for HTTPRoutes

Line 247: Annotation serving.kserve.io/inference-pool-migrated: v1 prevents rollback

Controller Entrypoint

pkg/controller/llmisvc/controller.go:94-139

Line 107-115: Watches GIE v1 InferencePool status changes for migration triggers

Line 94-105: Main reconciliation loop

3. CRD & Webhook Configuration

Conversion Webhook Patches

config/crd/full/llmisvc_conversion_patch.yaml - LLMInferenceService conversion webhook config

config/crd/full/llmisvcconfig_conversion_patch.yaml - LLMInferenceServiceConfig conversion webhook config

CA Injection for Conversion Webhooks

config/default/llmisvc_cainjection_conversion_webhook.yaml

config/default/llmisvcconfig_cainjection_conversion_webhook.yaml

Manager Entrypoint

cmd/manager/main.go:149-160

Registers both v1alpha1 and v1alpha2 webhooks

Sets up GIE v1 InferencePool watcher

4. HTTPRoute Configuration

Dual Backend References

config/llmisvc/config-llm-router-route.yaml:18-95

Lines 18-75: /v1/completions and /v1/chat/completions with dual InferencePool backend refs

Lines 76-95: Catch-all route uses direct Service backend (intentionally not InferencePool)

5. Upstream Integration (CA Certificate Signing)

Merged & Adapted from upstream/release-v0.15

pkg/controller/llmisvc/workload_tls_self_signed.go

Cherry-picked upstream CA cert signing feature

Adapted all function signatures from v1alpha1 → v1alpha2

Updated GIE imports from v1alpha2 → v1

6. RBAC Updates

GIE v1 InferencePool Permissions

config/rbac/role.yaml - Added GIE v1 InferencePool RBAC rules

7. Build & Tooling

Makefile Fix

Makefile:164 - Fixed kubectl kustomize URL quoting for gateway-inference-extension CRD generation

📊 Change Statistics

Total: 415 files changed, 204,979 insertions(+), 3,457 deletions(-)

Generated Code: ~200k lines (CRDs, OpenAPI, Python SDK, deepcopy functions)

Core Logic: ~25 controller/API files with substantive changes

🏗️ Architecture

Dual-Pool Strategy
┌─────────────────────────────────────────────────────────────┐
│  HTTPRoute (/v1/completions, /v1/chat/completions)          │
│  ┌──────────────────────────────────────────────────────┐   │
│  │ Backend Refs (Initially)                             │   │
│  │  • v1alpha2 InferencePool: weight=100 (100% traffic) │   │
│  │  • v1 InferencePool: weight=0 (0% traffic)           │   │
│  └──────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────┘
                         ↓
             ┌──────────────────────┐
             │ Migration Trigger    │
             │ (v1 pool ready)      │
             └──────────────────────┘
                         ↓
┌─────────────────────────────────────────────────────────────┐
│  HTTPRoute (After Migration)                                │
│  ┌──────────────────────────────────────────────────────┐   │
│  │ Backend Refs                                         │   │
│  │  • v1 InferencePool: weight=100 (100% traffic)       │   │
│  │  • Annotation: inference-pool-migrated=v1            │   │
│  │  • One-way migration (no rollback)                   │   │
│  └──────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────┘
API Version Flow
┌────────────┐    Conversion     ┌────────────┐
│  v1alpha1  │ ←──  Webhook  ──→ │  v1alpha2  │
│  (Served)  │                   │  (Storage) │
└────────────┘                   └────────────┘
    ↓                                  ↓
 Existing                         Saved in
 Deployments                      etcd
✅ Testing

All Tests Passing (Latest CI Run: #1983448087489679360):

✅ Unit tests: make test

✅ Integration tests: make test-integration

✅ Controller integration: make controller-int-test

✅ LLMInferenceService envtest: make test-llmisvc - 48/48 PASS

✅ E2E LLMInferenceService tests: 26 passed, 3 skipped (38 min runtime)

Tests both v1alpha1 and v1alpha2 API versions

Tests dual-pool migration (router-managed, scheduler-managed, no-scheduler)

Tests authentication (enabled/disabled, token validation)

Tests custom route timeouts, gateway refs, and prefill/decode workloads

✅ Precommit checks: make precommit

Test Coverage Highlights:

v1alpha1 API: 13 LLM inference tests + 3 auth tests

v1alpha2 API: 13 LLM inference tests + 3 auth tests

Migration scenarios: Single node, router-managed, scheduler-managed, prefill-decode

Models tested: facebook/opt-125m, qwen2.5-0.5b, llmd-simulator

Recent Test Fixes:

Fixed HTTP 409 errors in e2e tests by reducing parallelism and adding wait loops

Fixed Python SDK test failures by removing orphaned model files

Fixed webhook validation for v1alpha1/v1alpha2 separation

Migration Behavior

One-Way Migration Logic

Initial State: LLMInferenceService created

Both v1 and v1alpha2 InferencePools created

HTTPRoute: 100% → v1alpha2, 0% → v1

Migration Trigger: v1 InferencePool becomes Ready

Controller detects v1 pool readiness (watches InferencePool status)

Shifts traffic: 0% → v1alpha2, 100% → v1

Sets annotation: serving.kserve.io/inference-pool-migrated: v1

Final State:

Traffic permanently on v1 InferencePool

No rollback even if v1 pool fails (prevents flapping)

v1alpha2 pool remains (for future cleanup/deprecation)

Backward Compatibility

v1alpha1 clients: Automatically converted to v1alpha2 via conversion webhooks

Existing deployments: Continue working with v1alpha1 API

New deployments: Can use either v1alpha1 or v1alpha2 API

Storage: All versions stored as v1alpha2 in etcd

Special Notes for Reviewers

CRD Changes: Both LLMInferenceService and LLMInferenceServiceConfig CRDs now have:

v1alpha2 as storage version (stored in etcd)

v1alpha1 as served version (backward compatibility)

Conversion webhooks for automatic translation

Dynamic Client Usage: v1alpha2 InferencePool uses DynamicClient because GIE v1alpha2 types aren't in the typed client (experimental API)

Migration Annotation: Once serving.kserve.io/inference-pool-migrated: v1 is set, traffic stays on v1 InferencePool permanently

Catch-all Route: Intentionally uses direct Service backend (not InferencePool) - always available, doesn't require endpoint picker scheduling

Test Cleanup Noise: make test-llmisvc shows envtest teardown errors, but all 48 tests pass (expected envtest behavior)

Makefile Fix: Added URL quoting for kubectl kustomize to prevent shell interpretation issues during make precommit

Python SDK Cleanup: Removed orphaned KServe v1alpha1 model files that were incorrectly generated artifacts

Checklist:

Have you added unit/e2e tests that prove your fix is effective or that this feature works?

Has code been commented, particularly in hard-to-understand areas?

Have you made corresponding changes to the documentation? (Pending)

Have you linked the JIRA issue(s) to this PR?

Release note:
Update LLMInferenceService to Gateway API Inference Extension v1.0.0 with dual-pool architecture for zero-downtime migration. Implements v1alpha2 API versioning with automatic conversion webhooks for backward compatibility. 
Re-running failed tests

/rerun-all - rerun all failed workflows.

/rerun-workflow <workflow name> - rerun a specific failed workflow. Only one workflow name can be specified. Multiple /rerun-workflow commands are allowed per comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

…4472-refactor

KillianGolds · 2025-11-19T16:55:09Z

/test e2e-llm-inference-service

Update serving.kserve.io_all_crds.yaml to reflect the webhook service name change from to for the LLMInferenceService and LLMInferenceServiceConfig CRDs. Signed-off-by: Killian Golds <kgolds@redhat.com>

KillianGolds · 2025-11-19T17:02:39Z

/test e2e-llm-inference-service

openshift-ci · 2025-11-19T19:42:15Z

@KillianGolds: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/e2e-raw	`85e05dc`	link	true	`/test e2e-raw`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

pierDipi · 2025-11-20T08:00:32Z

+          name: odh-model-controller-webhook-service
+          namespace: opendatahub


This doesn't seem right, kserve webhook is the webhook responsible for conversion

pierDipi · 2025-11-20T08:01:06Z

        service:
          name: kserve-webhook-server-service
-          namespace: kserve
+          namespace: opendatahub


this change should be part of the ODH overlay

…node workloads Increase kubernetesGeneratedSuffixLength from 12 to 20 to support the longer suffix chains used by LeaderWorkerSet (group index + StatefulSet pod index + pod-template-hash). This prevents 63-character label violations when deploying multi-node LLMInferenceService instances, which previously caused StatefulSet creation failures in E2E tests. Signed-off-by: Killian Golds <kgolds@redhat.com>

KillianGolds · 2025-11-20T14:42:43Z

/test e2e-llm-inference-service

Update all test files to use SafeChildName() for resource name lookups, ensuring consistency with the controller's naming implementation. This fixes 7 test failures in storage configuration tests that occurred because tests were using hardcoded string concatenation while the controller uses SafeChildName() which reserves 20 chars for K8s suffixes. Changes: - controller_workload_storage_int_test.go: Update deployment, LWS, and ServiceAccount lookups for both single-node and multi-node tests - controller_int_multi_node_test.go: Update LWS, ServiceAccount, Role, and RoleBinding lookups using llmisvc.SafeChildName - controller_int_test.go: Update deployment and InferencePool lookups Signed-off-by: Killian Golds <kgolds@redhat.com>

…4472-refactor

KillianGolds · 2025-11-21T19:32:06Z

/test e2e-llm-inference-service

KillianGolds · 2025-11-28T16:06:21Z

Superseded by #996

KillianGolds added 30 commits October 17, 2025 09:09

upgrade controller-gen to v0.17.2

70cc0cb

Upgrade controller-gen from v0.16.2 to v0.17.2 to resolve compatibility issue with gateway-api-inference-extension v1.0.0. Signed-off-by: Killian Golds <kgolds@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED

Fix openapi_generated.go formatting

b861c9b

Regenerate openapi_generated.go to restore blank lines that were incorrectly removed. These blank lines are required by the code generation tooling. rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED

chore: regenerate openapi_generated.go

308fba0

The code generator adds trailing blank lines that were previously removed. rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED

fix: update Dockerfiles to use Go 1.24.7

a83d643

Update all Go builder images from go-toolset:1.24 to go-toolset:1.24.7 to match go.mod requirements from dependency upgrades. Signed-off-by: Killian Golds <kgolds@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED

Make Precommit failure

6aa70d2

Signed-off-by: Killian Golds <kgolds@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED

KillianGolds changed the title ~~GIE v1 Migration~~ [WIP] GIE v1 Migration Nov 14, 2025

KillianGolds added 7 commits November 15, 2025 16:55

fix: only disable auth for non-auth tests in fixture

4a18f10

Auth tests need auth enabled by default, so only apply the security.opendatahub.io/enable-auth: false annotation to tests without the @pytest.mark.auth marker. Signed-off-by: Killian Golds <kgolds@redhat.com>

pierDipi changed the title ~~[WIP] GIE v1 Migration~~ RHOAIENG-34472: [WIP] GIE v1 Migration Nov 19, 2025

openshift-ci-robot added the jira/valid-reference label Nov 19, 2025

Merge remote-tracking branch 'upstream/release-v0.15' into RHOAIENG-3…

e3899d6

…4472-refactor

chore: regenerate consolidated CRD with updated webhook service name

6f9d72c

Update serving.kserve.io_all_crds.yaml to reflect the webhook service name change from to for the LLMInferenceService and LLMInferenceServiceConfig CRDs. Signed-off-by: Killian Golds <kgolds@redhat.com>

pierDipi reviewed Nov 20, 2025

View reviewed changes

Comment thread pkg/controller/llmisvc/router.go

KillianGolds added 2 commits November 21, 2025 19:16

Merge remote-tracking branch 'upstream/release-v0.15' into RHOAIENG-3…

eabd009

…4472-refactor

KillianGolds closed this Nov 28, 2025

github-project-automation Bot moved this from New/Backlog to Done in ODH Model Serving Planning Nov 28, 2025

		name: odh-model-controller-webhook-service
		namespace: opendatahub

Conversation

KillianGolds commented Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

What This PR Does

🔍 Key Files for Review (Core Logic)

1. API Versioning & Conversion Webhooks (NEW)

2. Dual-Pool Controller Logic (CRITICAL)

3. CRD & Webhook Configuration

4. HTTPRoute Configuration

5. Upstream Integration (CA Certificate Signing)

6. RBAC Updates

7. Build & Tooling

📊 Change Statistics

🏗️ Architecture

Dual-Pool Strategy

API Version Flow

✅ Testing

Migration Behavior

One-Way Migration Logic

Backward Compatibility

Special Notes for Reviewers

Uh oh!

KillianGolds commented Nov 13, 2025

Uh oh!

KillianGolds commented Nov 14, 2025

Uh oh!

openshift-ci-robot commented Nov 19, 2025 • edited by openshift-ci Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

What This PR Does

🔍 Key Files for Review (Core Logic)

1. API Versioning & Conversion Webhooks (NEW)

2. Dual-Pool Controller Logic (CRITICAL)

3. CRD & Webhook Configuration

4. HTTPRoute Configuration

5. Upstream Integration (CA Certificate Signing)

6. RBAC Updates

7. Build & Tooling

📊 Change Statistics

🏗️ Architecture

Dual-Pool Strategy

API Version Flow

✅ Testing

Migration Behavior

One-Way Migration Logic

Backward Compatibility

Special Notes for Reviewers

Uh oh!

KillianGolds commented Nov 19, 2025

Uh oh!

KillianGolds commented Nov 19, 2025

Uh oh!

openshift-ci Bot commented Nov 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pierDipi Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

pierDipi Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

KillianGolds commented Nov 20, 2025

Uh oh!

KillianGolds commented Nov 21, 2025

Uh oh!

KillianGolds commented Nov 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

KillianGolds commented Oct 20, 2025 •

edited

Loading

openshift-ci-robot commented Nov 19, 2025 •

edited by openshift-ci Bot

Loading

openshift-ci Bot commented Nov 19, 2025 •

edited

Loading