Skip to content

RHOAIENG-34472: [WIP] GIE v1 Migration#948

Closed
KillianGolds wants to merge 166 commits into
opendatahub-io:release-v0.15from
KillianGolds:RHOAIENG-34472-refactor
Closed

RHOAIENG-34472: [WIP] GIE v1 Migration#948
KillianGolds wants to merge 166 commits into
opendatahub-io:release-v0.15from
KillianGolds:RHOAIENG-34472-refactor

Conversation

@KillianGolds
Copy link
Copy Markdown

@KillianGolds KillianGolds commented Oct 20, 2025

Overview

This PR migrates KServe's LLMInferenceService to Gateway API Inference Extension (GIE) v1.0.0, implementing a dual-pool architecture for zero-downtime backward compatibility. 415 files changed with significant code generation, but core logic changes are focused in specific controller and API files.

Upstream Challenges & Contributions:
During this migration, critical bugs were discovered in upstream dependencies that required fixes:

Major Dependency Upgrades (merged from upstream during development):

  • Kubernetes: v0.33.1 → v0.34.1
  • Gateway API: v1.2.1 → v1.4.0
  • Gateway API Inference Extension: v0.3.0 → v1.0.0
  • KEDA: v2.16.1 → v2.18.0
  • controller-runtime: v0.19.7 → v0.22.3

What This PR Does

Updates KServe's LLMInferenceService from GIE v1alpha2 to v1.0.0 with:

  • ✅ Dual InferencePool creation (GIE v1 + v1alpha2 simultaneously)
  • ✅ v1alpha2 API versioning with conversion webhooks (v1alpha2 storage ↔ v1alpha1 served)
  • ✅ One-way traffic migration from v1alpha2 → v1 InferencePools
  • ✅ Zero-downtime migration with backward compatibility

🔍 Key Files for Review (Core Logic)

1. API Versioning & Conversion Webhooks (NEW)

v1alpha2 API (Storage Version - Hub)

v1alpha1 API (Served Version - Backward Compatibility)

GIE v1 Types Embedded in v1alpha1

2. Dual-Pool Controller Logic (CRITICAL)

Scheduler - Dual InferencePool Creation

  • pkg/controller/llmisvc/scheduler.go:157-213
    • Line 168: Creates GIE v1 InferencePool using typed client
    • Line 172: Creates v1alpha2 InferencePool using dynamic client
    • Line 667-712: reconcileAlpha2Pool() manages v1alpha2 InferencePool lifecycle

Router - One-Way Traffic Migration

  • pkg/controller/llmisvc/router.go:195-262
    • Line 195-213: Migration logic - shifts traffic from v1alpha2 → v1 when v1 pool becomes ready
    • Line 227-262: Dual backend ref creation for HTTPRoutes
    • Line 247: Annotation serving.kserve.io/inference-pool-migrated: v1 prevents rollback

Controller Entrypoint

3. CRD & Webhook Configuration

Conversion Webhook Patches

CA Injection for Conversion Webhooks

Manager Entrypoint

4. HTTPRoute Configuration

Dual Backend References

5. Upstream Integration (CA Certificate Signing)

Merged & Adapted from upstream/release-v0.15

6. RBAC Updates

GIE v1 InferencePool Permissions

7. Build & Tooling

Makefile Fix

  • Makefile:164 - Fixed kubectl kustomize URL quoting for gateway-inference-extension CRD generation

📊 Change Statistics

  • Total: 415 files changed, 204,979 insertions(+), 3,457 deletions(-)
  • Generated Code: ~200k lines (CRDs, OpenAPI, Python SDK, deepcopy functions)
  • Core Logic: ~25 controller/API files with substantive changes

🏗️ Architecture

Dual-Pool Strategy

┌─────────────────────────────────────────────────────────────┐
│  HTTPRoute (/v1/completions, /v1/chat/completions)          │
│  ┌──────────────────────────────────────────────────────┐   │
│  │ Backend Refs (Initially)                             │   │
│  │  • v1alpha2 InferencePool: weight=100 (100% traffic) │   │
│  │  • v1 InferencePool: weight=0 (0% traffic)           │   │
│  └──────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────┘
                          ↓
              ┌──────────────────────┐
              │ Migration Trigger    │
              │ (v1 pool ready)      │
              └──────────────────────┘
                          ↓
┌─────────────────────────────────────────────────────────────┐
│  HTTPRoute (After Migration)                                │
│  ┌──────────────────────────────────────────────────────┐   │
│  │ Backend Refs                                         │   │
│  │  • v1 InferencePool: weight=100 (100% traffic)       │   │
│  │  • Annotation: inference-pool-migrated=v1            │   │
│  │  • One-way migration (no rollback)                   │   │
│  └──────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────┘

API Version Flow

┌────────────┐    Conversion     ┌────────────┐
│  v1alpha1  │ ←──  Webhook  ──→ │  v1alpha2  │
│  (Served)  │                   │  (Storage) │
└────────────┘                   └────────────┘
     ↓                                  ↓
  Existing                         Saved in
  Deployments                      etcd

✅ Testing

All Tests Passing (Latest CI Run: #1983448087489679360):

  • ✅ Unit tests: make test
  • ✅ Integration tests: make test-integration
  • ✅ Controller integration: make controller-int-test
  • ✅ LLMInferenceService envtest: make test-llmisvc - 48/48 PASS
  • E2E LLMInferenceService tests: 26 passed, 3 skipped (38 min runtime)
    • Tests both v1alpha1 and v1alpha2 API versions
    • Tests dual-pool migration (router-managed, scheduler-managed, no-scheduler)
    • Tests authentication (enabled/disabled, token validation)
    • Tests custom route timeouts, gateway refs, and prefill/decode workloads
  • ✅ Precommit checks: make precommit

Test Coverage Highlights:

  • v1alpha1 API: 13 LLM inference tests + 3 auth tests
  • v1alpha2 API: 13 LLM inference tests + 3 auth tests
  • Migration scenarios: Single node, router-managed, scheduler-managed, prefill-decode
  • Models tested: facebook/opt-125m, qwen2.5-0.5b, llmd-simulator

Recent Test Fixes:

  • Fixed HTTP 409 errors in e2e tests by reducing parallelism and adding wait loops
  • Fixed Python SDK test failures by removing orphaned model files
  • Fixed webhook validation for v1alpha1/v1alpha2 separation

Migration Behavior

One-Way Migration Logic

  1. Initial State: LLMInferenceService created

    • Both v1 and v1alpha2 InferencePools created
    • HTTPRoute: 100% → v1alpha2, 0% → v1
  2. Migration Trigger: v1 InferencePool becomes Ready

    • Controller detects v1 pool readiness (watches InferencePool status)
    • Shifts traffic: 0% → v1alpha2, 100% → v1
    • Sets annotation: serving.kserve.io/inference-pool-migrated: v1
  3. Final State:

    • Traffic permanently on v1 InferencePool
    • No rollback even if v1 pool fails (prevents flapping)
    • v1alpha2 pool remains (for future cleanup/deprecation)

Backward Compatibility

  • v1alpha1 clients: Automatically converted to v1alpha2 via conversion webhooks
  • Existing deployments: Continue working with v1alpha1 API
  • New deployments: Can use either v1alpha1 or v1alpha2 API
  • Storage: All versions stored as v1alpha2 in etcd

Special Notes for Reviewers

  1. CRD Changes: Both LLMInferenceService and LLMInferenceServiceConfig CRDs now have:

    • v1alpha2 as storage version (stored in etcd)
    • v1alpha1 as served version (backward compatibility)
    • Conversion webhooks for automatic translation
  2. Dynamic Client Usage: v1alpha2 InferencePool uses DynamicClient because GIE v1alpha2 types aren't in the typed client (experimental API)

  3. Migration Annotation: Once serving.kserve.io/inference-pool-migrated: v1 is set, traffic stays on v1 InferencePool permanently

  4. Catch-all Route: Intentionally uses direct Service backend (not InferencePool) - always available, doesn't require endpoint picker scheduling

  5. Test Cleanup Noise: make test-llmisvc shows envtest teardown errors, but all 48 tests pass (expected envtest behavior)

  6. Makefile Fix: Added URL quoting for kubectl kustomize to prevent shell interpretation issues during make precommit

  7. Python SDK Cleanup: Removed orphaned KServe v1alpha1 model files that were incorrectly generated artifacts

Checklist:

  • Have you added unit/e2e tests that prove your fix is effective or that this feature works?
  • Has code been commented, particularly in hard-to-understand areas?
  • Have you made corresponding changes to the documentation? (Pending)
  • Have you linked the JIRA issue(s) to this PR?

Release note:

Update LLMInferenceService to Gateway API Inference Extension v1.0.0 with dual-pool architecture for zero-downtime migration. Implements v1alpha2 API versioning with automatic conversion webhooks for backward compatibility. 

Re-running failed tests

  • /rerun-all - rerun all failed workflows.
  • /rerun-workflow <workflow name> - rerun a specific failed workflow. Only one workflow name can be specified. Multiple /rerun-workflow commands are allowed per comment.

- Update API to use v1 InferencePool spec
- Create both v1 and v1alpha2 InferencePool objects
- Implement one-way migration with HTTPRoute annotation
- Add dual backendRefs to HTTPRoute (v1:100, v1alpha2:0)
- Report InferencePoolReady when either pool is ready (prioritize v1)
- Store migration state on child objects (HTTPRoute)
- Add test coverage for migration logic

Implements RHOAIENG-34472 acceptance criteria:
- ✅ v1 API usage
- ✅ Dual pool creation
- ✅ Readiness aggregation
- ✅ HTTPRoute migration logic

Signed-off-by: Killian Golds <kgolds@redhat.com>

rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED
  - Remove unused context assignment in config_merge_test.go
  - Fix empty branch in scheduler.go with explicit error ignore
  - Remove unnecessary string conversions for Condition types
  - Fix error handling in deleteAlpha2* functions to properly check IsNotFound
  - Auto-format code with go fmt (config_merge.go, openapi_generated.go)

Signed-off-by: Killian Golds <kgolds@redhat.com>

rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED
  Downgrade from v1.0.1 to v1.0.0 to resolve controller-gen compatibility issue.
  v1.0.1 has invalid kubebuilder markers (maxitem/minitems on non-array field)
  in shared_types.go:142 that cause controller-gen v0.16.2 to fail during
  manifest generation.

Signed-off-by: Killian Golds <kgolds@redhat.com>

rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED
  Upgrade controller-gen from v0.16.2 to v0.17.2 to resolve compatibility
  issue with gateway-api-inference-extension v1.0.0.

Signed-off-by: Killian Golds <kgolds@redhat.com>

rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED
…nifests

Gateway API Inference Extension v1.0.0 has invalid kubebuilder validation
markers (MinItems/MaxItems on a map field) causing controller-gen to fail.
Applied temporary fork workaround pointing to personal fork based on v1.0.0
tag with the fix applied.

Changes:
- Add replace directive in go.mod for GIE fork at kserve-compatibility-v1.0
- Regenerate all CRD manifests with controller-gen v0.17.2
- Regenerate Go code (deepcopy, openapi) and Python SDK
- Fix linting errors in scheduler.go

The fork maintains K8s v0.33.4 compatibility (no version conflicts).
This is a temporary workaround until upstream merges the fix and releases
a patched version.

Upstream issue: kubernetes-sigs/gateway-api-inference-extension#1678
Upstream fix PR: kubernetes-sigs/gateway-api-inference-extension#1679

rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED
Integration tests require the v1 InferencePool CRD to be present in test/crds/
for EnvTest to load it. Downloaded from the official GIE release-1.0 branch.

This fixes the validation test suite that was failing due to missing v1 CRDs
when trying to create config presets that reference v1 InferencePool resources.

rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED
Regenerate openapi_generated.go to restore blank lines that were incorrectly
removed. These blank lines are required by the code generation tooling.

rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED
The controller creates both v1 and v1alpha2 InferencePools for zero-downtime
migration. Tests were failing because the v1alpha2 CRD was missing from the
test/crds/ directory.

Downloaded from:
https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/config/crd/bases/inference.networking.x-k8s.io_inferencepools.yaml

rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED
The GIE v1 InferencePool CRD requires at least one label in
selector.matchLabels, but config-llm-scheduler.yaml had an empty map.

This adds app.kubernetes.io/part-of label which:
- Satisfies CRD validation requirements
- Matches the workload labels that GetWorkloadLabelSelector adds
- Gets merged with dynamic labels (app.kubernetes.io/name, kserve.io/component)
  by the controller in config_merge.go:156-162

rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED
GIE v1 requires the port field when endpointPickerRef kind is 'Service'.
The scheduler endpoint picker service listens on port 9002.

This is a schema change from v1alpha2 to v1 - the port field structure
changed from portNumber (int32) to port.number (Port object with number field).

rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED
- Initialize DynamicClient in test environment (required for v1alpha2 resource creation)
- Add DefaultServiceAccount helper for test namespaces (EnvTest doesn't create it)
- Add v1alpha2 InferenceModel CRD to gie-v1alpha2.yaml
- Update test helper to mark both v1 and v1alpha2 InferencePools as ready
- Remove duplicate test/crds/gie.yaml file

Test results: 22 failures → 13 failures (25 passing)

rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED
The code generator adds trailing blank lines that were previously removed.

rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED
- Add required parentRef to v1 InferencePool status
- Add ownerReferences when converting v1 to v1alpha2 pools
- Add ResolvedRefs condition to pool readiness helpers

These changes ensure v1 InferencePools pass validation and are
correctly detected as ready by the migration logic.

rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED
Changed plain int to int64 for port numbers in v1ToAlpha2Unstructured
to fix "cannot deep copy int" panic during InferencePool conversion.

Kubernetes unstructured objects require JSON-serializable types.
Plain Go int is not JSON-serializable, but int64 is.

This fixes 4 failing GIE v1 migration tests:
- should create both v1 and v1alpha2 InferencePool objects
- should create HTTPRoute with dual backend refs
- should set migration annotation when v1 becomes ready
- should report InferencePoolReady when pools become ready

rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED
  Restoring original config_merge.go structure from release-v0.15 to fix
  strategic merge bug that was causing container fields to become nil.

  The bug occurred when empty PodSpec{} structs generated patches with
  null fields, wiping out base container configurations during merge.
  This was caused by removing the critical override.SetDefaults(ctx)
  call which ensures proper defaulting before patch generation.

  Changes:
  - Restore context parameters and SetDefaults() call in mergeSpecs()
  - Restore logging and HTTPRoute backend ref configuration logic
  - Restore nested if/else structure for config reference selection
  - Apply minimal GIE v1 migration changes:
    - Update imports: igwapi → igwv1
    - Update selector logic for v1 LabelSelector (typed map)
    - Update isDefaultBackendRef to use igwv1.GroupName

  Test updates for dual InferencePool backend strategy:
  - Update routing tests to expect both v1 and v1alpha2 backends
  - Add BackendRefServiceWithWeight() test helper
  - Fix WithExtensionRef() to include required Port field for GIE v1

  Test results: 38/38 passing (up from 28/38)

Signed-off-by: Killian Golds <kgolds@redhat.com>

rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED
Restores igwv1.Install() and InferencePool ownership watch that were inadvertently removed during refactoring.
Note: InferenceModel is not included, as it was removed from the GIE v1 API and only existed in v1alpha2.

Signed-off-by: Killian Golds <kgolds@redhat.com>

rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED
Add controller watches for v1alpha2 InferencePool and InferenceModel resources to ensure the controller reacts to status changes during the GIE v1alpha2 to v1 migration. Uses PartialObjectMetadata for efficient watching without requiring typed API definitions.

Signed-off-by: Killian Golds <kgolds@redhat.com>

rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED
Upgrade core dependencies in preparation for Gateway API Inference Extension (GIE) v1:

Kubernetes: v0.33.4 → v0.34.1

Gateway API: v1.3.0 → v1.4.0

KEDA: v2.16.1 → v2.18.0

controller-runtime: v0.19.1 → v0.22.1

structured-merge-diff: v4 → v6

Key changes:Add ToCorev1PodSpec() method with explicit field mapping
Required for structured-merge-diff v6 field ownership tracking
Ensures proper Server-Side Apply behavior for custom components
Added comprehensive test coverage (18 test cases)
Fix Gateway API v1.4 compatibility
Removed unnecessary PortNumber type conversions (now a type alias for int32)
Updated 40+ test files with correct port number patterns
Remove deprecated controller-runtime v0.22 usage
Replaced result.Requeue checks with RequeueAfter
Update generated code
Regenerated CRDs, OpenAPI specs, and Python SDK for new API versions

Signed-off-by: Killian Golds <kgolds@redhat.com>

rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED
  Update all Go builder images from go-toolset:1.24 to go-toolset:1.24.7
  to match go.mod requirements from dependency upgrades.

Signed-off-by: Killian Golds <kgolds@redhat.com>

rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED
Set GOTOOLCHAIN=local in Dockerfiles to allow go-toolset:1.24 (Go 1.24.6)
to build code with go.mod requiring 1.24.7. The 1.24.7 tag doesn't exist
in the Red Hat UBI registry.

Signed-off-by: Killian Golds <kgolds@redhat.com>

rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED
  Set GOTOOLCHAIN=auto in Dockerfiles to allow Go 1.24.6 (from go-toolset:1.24)
  to automatically download Go 1.24.7 when required by go.mod dependencies.

Signed-off-by: Killian Golds <kgolds@redhat.com>

rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED
  Add GOTOOLCHAIN=auto to all Go builder images to allow Go 1.24.6
  (from go-toolset:1.24) to automatically download Go 1.24.7 when
  required by go.mod dependencies.

Signed-off-by: Killian Golds <kgolds@redhat.com>

rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED
Commented out go-licenses installation and checks in localmodel.Dockerfile
and localmodel-agent.Dockerfile. The go-licenses tool is incompatible with
GOTOOLCHAIN=auto because it cannot locate the standard library when Go
downloads the toolchain to a non-standard location.

This change aligns with the existing pattern in agent.Dockerfile,
router.Dockerfile, and Dockerfile, where go-licenses steps are already
commented out.

Signed-off-by: Killian Golds <kgolds@redhat.com>

rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED
…ions

  Add RBAC permissions for inferencemodels and inferenceobjectives resources
  in the GIE v1 API group (inference.networking.k8s.io). These permissions
  are required for the controller to create scheduler Roles that grant
  access to these resources.

  Previously, only inferencepools permissions were granted for the v1 API
  group, while v1alpha2 had all three resources. This caused e2e test
  failures when the controller attempted to grant permissions it didn't
  hold (RBAC escalation prevention).

Signed-off-by: Killian Golds <kgolds@redhat.com>

rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED
Updated e2e test infrastructure to install GIE v1.0.0 instead of v0.5.0.
This is required for the GIE v1 migration since the v0.5.0 controller
only supports v1alpha2 InferencePools and cannot set status on v1
InferencePools.

Without this change, InferencePool resources remain in a NotReady state
because no controller manages them, causing all e2e tests to fail.

Signed-off-by: Killian Golds <kgolds@redhat.com>

rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED
… check

  EvaluateInferencePoolConditions() was only checking v1 InferencePool
  readiness, causing tests to fail in environments without the GIE v1
  controller (e.g., OpenShift CI). This broke the dual-pool fallback
  strategy implemented in reconcileSchedulerInferencePool().

  The function now checks BOTH v1 and v1alpha2 InferencePools and marks
  the service ready if EITHER pool is ready. This allows LLMInferenceService
  to work with v1alpha2 InferencePools until GIE v1 controller integration
  is available, maintaining the intended migration strategy.

  Fixes E2E test failures where WorkloadsReady=True but RouterReady=False
  due to InferencePoolNotReady condition being incorrectly set.

Signed-off-by: Killian Golds <kgolds@redhat.com>

rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED
…er-managed routes

Ensures HTTPRoutes reference both v1 (0%) and v1alpha2 (100%) InferencePools
from the start, allowing the v1alpha2 controller to process pools immediately.
Fixes E2E test failures where pools remained unready due to missing backend refs.

Signed-off-by: Killian Golds <kgolds@redhat.com>

rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED
Signed-off-by: Killian Golds <kgolds@redhat.com>

rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED
When auto-injecting dual InferencePool backend refs, v1 now respects
external pool references while v1alpha2 always uses the default managed
pool name as a fallback. This fixes the external pool reference scenario
where v1 points to the user-provided pool and v1alpha2 provides fallback
to the auto-generated pool.

Signed-off-by: Killian Golds <kgolds@redhat.com>

rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED
Prevents premature migration to v1 InferencePool when the Gateway Controller
does not yet support the v1 API. The previous logic only checked whether
the InferencePool resource was ready (Accepted=True) but did not verify
that the Gateway Controller could actually route to it.

In OpenShift environments without GIE v1 integration, the v1 InferencePool
may report Accepted=True, but the Gateway Controller rejects it with
ResolvedRefs=False (reason: InvalidKind). This caused traffic to be sent to
an unsupported backend, resulting in 100% failure.

The fix adds isV1BackendResolvable() to check the HTTPRoute’s ResolvedRefs
condition, ensuring the Gateway Controller can resolve v1 backend refs before
migrating. This allows services to correctly fall back to v1alpha2 until full
v1 support is available.

Signed-off-by: Killian Golds <kgolds@redhat.com>

rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED
@KillianGolds
Copy link
Copy Markdown
Author

/test e2e-llm-inference-service

Reduce webhook timeout failures in CI by throttling reconciliation and
staggering dual InferencePool creation.

Changes:
- Lower MaxConcurrentReconciles from 8 → 4 (controller.go:259)
- Add 100ms delay between v1 and v1alpha2 pool creation (scheduler.go:173–176)

Root cause: I believe due to the complexity of the dual pool architectures high concurrency caused 700+ simultaneous conversion webhook
calls from dual InferencePool creation, overloading the webhook server
and leading to 30s timeouts.

Impact:
- ~50% lower webhook burst load
- Sequential pool creation within 100ms
- Ideally resolves conversion webhook timeout errors in CI runs

Signed-off-by: Killian Golds <kgolds@redhat.com>
@KillianGolds
Copy link
Copy Markdown
Author

/test e2e-llm-inference-service

@KillianGolds KillianGolds changed the title GIE v1 Migration [WIP] GIE v1 Migration Nov 14, 2025
CPU model compilation takes 8–10 minutes (vs 30–60s on GPU), causing the
default liveness probe settings to kill pods prematurely.

Updates:
- workload-single-cpu: initialDelaySeconds increased from 30s to 600s
- workload-pd-cpu (decode): added 600s initialDelaySeconds
- workload-pd-cpu (prefill): added 600s initialDelaySeconds

Required to prevent premature restarts, especially under the dual
pool architecture

Signed-off-by: Killian Golds <kgolds@redhat.com>
Fix 20 nil pointer errors across workload configuration templates by
replacing unsafe or usage with safe if and checks.

The Go template or operator eagerly evaluates both operands, causing
nil pointer panics when nested fields are nil. Example:
{{ or .Spec.Parallelism.Data 1 }} fails if .Spec.Parallelism is nil.

Using if and short-circuits safely and prevents accessing nil fields.

Changes:
- config-llm-worker-data-parallel.yaml: 4 fixes
- config-llm-decode-worker-data-parallel.yaml: 4 fixes
- config-llm-prefill-worker-data-parallel.yaml: 12 fixes

Replaced:
  {{ or .Spec.Parallelism.Data 1 }}
with:
  {{ if and .Spec.Parallelism .Spec.Parallelism.Data }}{{ .Spec.Parallelism.Data }}{{ else }}1{{ end }}

Fixes template merge failures that blocked workload creation.

Signed-off-by: Killian Golds <kgolds@redhat.com>
…Name

Add SafeChildName to avoid exceeding Kubernetes’ 63-char name limit when
StatefulSet controllers append pod hashes and indices.

Problem:
kmeta.ChildName() truncates to 63 chars but doesn’t account for the
Kubernetes-generated suffix (e.g., -f4b4c86d6-0, 12 chars). This caused
names like:
- llmisvc-model-deepseek-v2-lite-3965b7a6-v1alpha1-kserve-mn-f4b4c86d6 (68 chars)
- llmisvc-model-fb-opt-125m-route-f161173d-v1alpha1-kserve-mn-64cd5f968f (71 chars)

Solution:
SafeChildName reserves 12 chars for suffixes and limits the base name to
51 chars:
- Ensures base + suffix ≤ 63 chars
- Adds 8-char SHA256 hash when truncation is required
- Guarantees pod names like name-hash-f4b4c86d6-0 always fit limits

Changes:
- naming.go: Implement SafeChildName
- naming_test.go: Add length/uniqueness tests
- workload_multi_node.go: Replace kmeta.ChildName
- workload_single_node.go: Replace kmeta.ChildName

Fixes workload creation failures due to long resource names.

Signed-off-by: Killian Golds <kgolds@redhat.com>
…d external routes

Fix HTTPRoute cleanup issues in two cases:

1. Controller-managed HTTPRoutes (.spec.router.route.HTTP.Spec):
   - semanticHTTPRouteIsEqual() now compares owner references
   - Controller updates HTTPRoutes when owner refs differ
   - Ensures Kubernetes garbage collection works correctly

2. External HTTPRoutes (.spec.router.route.HTTP.Refs):
   - Added delete_router_with_refs_httproutes() for explicit cleanup
   - Router-with-refs tests create routes not owned by the controller
   - Prevents these routes from persisting between test runs

Changes:
- pkg/controller/llmisvc/router.go: owner reference comparison added
- test_llm_inference_service.py: cleanup logic for external HTTPRoutes

Impact:
- Fixes HTTP 500 cluster_not_found errors in scheduler tests
- Improves e2e pass rate from 77% to 85%
- Prevents HTTPRoute buildup across test runs
- Ensures proper garbage collection for controller-managed routes

Signed-off-by: Killian Golds <kgolds@redhat.com>
Updated conversion webhook configuration for LLMInferenceService and
LLMInferenceServiceConfig to point to the actual odh-model-controller
webhook service:

- Namespace: kserve → opendatahub
- Service: kserve-webhook-server-service → odh-model-controller-webhook-service

This ensures CRD conversion requests are routed correctly to the webhook
deployed in the opendatahub namespace.

Signed-off-by: Killian Golds <kgolds@redhat.com>
The session-scoped rhcl_available() fixture now explicitly calls
config.load_kube_config() so the Kubernetes client is initialized before
checking for the AuthPolicy CRD. This prevents failures caused by the
fixture running before any other Kubernetes-initializing fixtures.

Signed-off-by: Killian Golds <kgolds@redhat.com>
Auth tests need auth enabled by default, so only apply the
security.opendatahub.io/enable-auth: false annotation to
tests without the @pytest.mark.auth marker.

Signed-off-by: Killian Golds <kgolds@redhat.com>
@pierDipi pierDipi changed the title [WIP] GIE v1 Migration RHOAIENG-34472: [WIP] GIE v1 Migration Nov 19, 2025
@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented Nov 19, 2025

@KillianGolds: This pull request references RHOAIENG-34472 which is a valid jira issue.

Details

In response to this:

Overview

This PR migrates KServe's LLMInferenceService to Gateway API Inference Extension (GIE) v1.0.0, implementing a dual-pool architecture for zero-downtime backward compatibility. 415 files changed with significant code generation, but core logic changes are focused in specific controller and API files.

Upstream Challenges & Contributions:
During this migration, critical bugs were discovered in upstream dependencies that required fixes:

Major Dependency Upgrades (merged from upstream during development):

  • Kubernetes: v0.33.1 → v0.34.1
  • Gateway API: v1.2.1 → v1.4.0
  • Gateway API Inference Extension: v0.3.0 → v1.0.0
  • KEDA: v2.16.1 → v2.18.0
  • controller-runtime: v0.19.7 → v0.22.3

What This PR Does

Updates KServe's LLMInferenceService from GIE v1alpha2 to v1.0.0 with:

  • ✅ Dual InferencePool creation (GIE v1 + v1alpha2 simultaneously)
  • ✅ v1alpha2 API versioning with conversion webhooks (v1alpha2 storage ↔ v1alpha1 served)
  • ✅ One-way traffic migration from v1alpha2 → v1 InferencePools
  • ✅ Zero-downtime migration with backward compatibility

🔍 Key Files for Review (Core Logic)

1. API Versioning & Conversion Webhooks (NEW)

v1alpha2 API (Storage Version - Hub)

v1alpha1 API (Served Version - Backward Compatibility)

GIE v1 Types Embedded in v1alpha1

2. Dual-Pool Controller Logic (CRITICAL)

Scheduler - Dual InferencePool Creation

  • pkg/controller/llmisvc/scheduler.go:157-213
  • Line 168: Creates GIE v1 InferencePool using typed client
  • Line 172: Creates v1alpha2 InferencePool using dynamic client
  • Line 667-712: reconcileAlpha2Pool() manages v1alpha2 InferencePool lifecycle

Router - One-Way Traffic Migration

  • pkg/controller/llmisvc/router.go:195-262
  • Line 195-213: Migration logic - shifts traffic from v1alpha2 → v1 when v1 pool becomes ready
  • Line 227-262: Dual backend ref creation for HTTPRoutes
  • Line 247: Annotation serving.kserve.io/inference-pool-migrated: v1 prevents rollback

Controller Entrypoint

3. CRD & Webhook Configuration

Conversion Webhook Patches

CA Injection for Conversion Webhooks

Manager Entrypoint

4. HTTPRoute Configuration

Dual Backend References

5. Upstream Integration (CA Certificate Signing)

Merged & Adapted from upstream/release-v0.15

6. RBAC Updates

GIE v1 InferencePool Permissions

7. Build & Tooling

Makefile Fix

  • Makefile:164 - Fixed kubectl kustomize URL quoting for gateway-inference-extension CRD generation

📊 Change Statistics

  • Total: 415 files changed, 204,979 insertions(+), 3,457 deletions(-)
  • Generated Code: ~200k lines (CRDs, OpenAPI, Python SDK, deepcopy functions)
  • Core Logic: ~25 controller/API files with substantive changes

🏗️ Architecture

Dual-Pool Strategy

┌─────────────────────────────────────────────────────────────┐
│  HTTPRoute (/v1/completions, /v1/chat/completions)          │
│  ┌──────────────────────────────────────────────────────┐   │
│  │ Backend Refs (Initially)                             │   │
│  │  • v1alpha2 InferencePool: weight=100 (100% traffic) │   │
│  │  • v1 InferencePool: weight=0 (0% traffic)           │   │
│  └──────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────┘
                         ↓
             ┌──────────────────────┐
             │ Migration Trigger    │
             │ (v1 pool ready)      │
             └──────────────────────┘
                         ↓
┌─────────────────────────────────────────────────────────────┐
│  HTTPRoute (After Migration)                                │
│  ┌──────────────────────────────────────────────────────┐   │
│  │ Backend Refs                                         │   │
│  │  • v1 InferencePool: weight=100 (100% traffic)       │   │
│  │  • Annotation: inference-pool-migrated=v1            │   │
│  │  • One-way migration (no rollback)                   │   │
│  └──────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────┘

API Version Flow

┌────────────┐    Conversion     ┌────────────┐
│  v1alpha1  │ ←──  Webhook  ──→ │  v1alpha2  │
│  (Served)  │                   │  (Storage) │
└────────────┘                   └────────────┘
    ↓                                  ↓
 Existing                         Saved in
 Deployments                      etcd

✅ Testing

All Tests Passing (Latest CI Run: #1983448087489679360):

  • ✅ Unit tests: make test
  • ✅ Integration tests: make test-integration
  • ✅ Controller integration: make controller-int-test
  • ✅ LLMInferenceService envtest: make test-llmisvc - 48/48 PASS
  • E2E LLMInferenceService tests: 26 passed, 3 skipped (38 min runtime)
  • Tests both v1alpha1 and v1alpha2 API versions
  • Tests dual-pool migration (router-managed, scheduler-managed, no-scheduler)
  • Tests authentication (enabled/disabled, token validation)
  • Tests custom route timeouts, gateway refs, and prefill/decode workloads
  • ✅ Precommit checks: make precommit

Test Coverage Highlights:

  • v1alpha1 API: 13 LLM inference tests + 3 auth tests
  • v1alpha2 API: 13 LLM inference tests + 3 auth tests
  • Migration scenarios: Single node, router-managed, scheduler-managed, prefill-decode
  • Models tested: facebook/opt-125m, qwen2.5-0.5b, llmd-simulator

Recent Test Fixes:

  • Fixed HTTP 409 errors in e2e tests by reducing parallelism and adding wait loops
  • Fixed Python SDK test failures by removing orphaned model files
  • Fixed webhook validation for v1alpha1/v1alpha2 separation

Migration Behavior

One-Way Migration Logic

  1. Initial State: LLMInferenceService created
  • Both v1 and v1alpha2 InferencePools created
  • HTTPRoute: 100% → v1alpha2, 0% → v1
  1. Migration Trigger: v1 InferencePool becomes Ready
  • Controller detects v1 pool readiness (watches InferencePool status)
  • Shifts traffic: 0% → v1alpha2, 100% → v1
  • Sets annotation: serving.kserve.io/inference-pool-migrated: v1
  1. Final State:
  • Traffic permanently on v1 InferencePool
  • No rollback even if v1 pool fails (prevents flapping)
  • v1alpha2 pool remains (for future cleanup/deprecation)

Backward Compatibility

  • v1alpha1 clients: Automatically converted to v1alpha2 via conversion webhooks
  • Existing deployments: Continue working with v1alpha1 API
  • New deployments: Can use either v1alpha1 or v1alpha2 API
  • Storage: All versions stored as v1alpha2 in etcd

Special Notes for Reviewers

  1. CRD Changes: Both LLMInferenceService and LLMInferenceServiceConfig CRDs now have:
  • v1alpha2 as storage version (stored in etcd)
  • v1alpha1 as served version (backward compatibility)
  • Conversion webhooks for automatic translation
  1. Dynamic Client Usage: v1alpha2 InferencePool uses DynamicClient because GIE v1alpha2 types aren't in the typed client (experimental API)

  2. Migration Annotation: Once serving.kserve.io/inference-pool-migrated: v1 is set, traffic stays on v1 InferencePool permanently

  3. Catch-all Route: Intentionally uses direct Service backend (not InferencePool) - always available, doesn't require endpoint picker scheduling

  4. Test Cleanup Noise: make test-llmisvc shows envtest teardown errors, but all 48 tests pass (expected envtest behavior)

  5. Makefile Fix: Added URL quoting for kubectl kustomize to prevent shell interpretation issues during make precommit

  6. Python SDK Cleanup: Removed orphaned KServe v1alpha1 model files that were incorrectly generated artifacts

Checklist:

  • Have you added unit/e2e tests that prove your fix is effective or that this feature works?
  • Has code been commented, particularly in hard-to-understand areas?
  • Have you made corresponding changes to the documentation? (Pending)
  • Have you linked the JIRA issue(s) to this PR?

Release note:

Update LLMInferenceService to Gateway API Inference Extension v1.0.0 with dual-pool architecture for zero-downtime migration. Implements v1alpha2 API versioning with automatic conversion webhooks for backward compatibility. 

Re-running failed tests

  • /rerun-all - rerun all failed workflows.
  • /rerun-workflow <workflow name> - rerun a specific failed workflow. Only one workflow name can be specified. Multiple /rerun-workflow commands are allowed per comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@KillianGolds
Copy link
Copy Markdown
Author

/test e2e-llm-inference-service

Update serving.kserve.io_all_crds.yaml to reflect the webhook service
name change from  to
 for the LLMInferenceService and
LLMInferenceServiceConfig CRDs.

Signed-off-by: Killian Golds <kgolds@redhat.com>
@KillianGolds
Copy link
Copy Markdown
Author

/test e2e-llm-inference-service

@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented Nov 19, 2025

@KillianGolds: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-raw 85e05dc link true /test e2e-raw

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Comment on lines +28120 to +28121
name: odh-model-controller-webhook-service
namespace: opendatahub
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem right, kserve webhook is the webhook responsible for conversion

service:
name: kserve-webhook-server-service
namespace: kserve
namespace: opendatahub
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this change should be part of the ODH overlay

Comment thread pkg/controller/llmisvc/router.go
…node workloads

Increase kubernetesGeneratedSuffixLength from 12 to 20 to support the
longer suffix chains used by LeaderWorkerSet (group index + StatefulSet
pod index + pod-template-hash).

This prevents 63-character label violations when deploying multi-node
LLMInferenceService instances, which previously caused StatefulSet
creation failures in E2E tests.

Signed-off-by: Killian Golds <kgolds@redhat.com>
@KillianGolds
Copy link
Copy Markdown
Author

/test e2e-llm-inference-service

Update all test files to use SafeChildName() for resource name lookups,
ensuring consistency with the controller's naming implementation.

This fixes 7 test failures in storage configuration tests that occurred
because tests were using hardcoded string concatenation while the
controller uses SafeChildName() which reserves 20 chars for K8s suffixes.

Changes:
- controller_workload_storage_int_test.go: Update deployment, LWS, and
  ServiceAccount lookups for both single-node and multi-node tests
- controller_int_multi_node_test.go: Update LWS, ServiceAccount, Role,
  and RoleBinding lookups using llmisvc.SafeChildName
- controller_int_test.go: Update deployment and InferencePool lookups

Signed-off-by: Killian Golds <kgolds@redhat.com>
@KillianGolds
Copy link
Copy Markdown
Author

/test e2e-llm-inference-service

@github-project-automation github-project-automation Bot moved this from New/Backlog to Done in ODH Model Serving Planning Nov 28, 2025
@KillianGolds
Copy link
Copy Markdown
Author

Superseded by #996

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

5 participants