[WIP] GIE v1 Migration#941
Conversation
|
Skipping CI for Draft Pull Request. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: KillianGolds The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the ✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
/test e2e-llm-inference-service |
|
/test e2e-llm-inference-service |
7 similar comments
|
/test e2e-llm-inference-service |
|
/test e2e-llm-inference-service |
|
/test e2e-llm-inference-service |
|
/test e2e-llm-inference-service |
|
/test e2e-llm-inference-service |
|
/test e2e-llm-inference-service |
|
/test e2e-llm-inference-service |
|
/test e2e-llm-inference-service |
d72fdae to
08dc6e6
Compare
|
/test e2e-llm-inference-service |
3 similar comments
|
/test e2e-llm-inference-service |
|
/test e2e-llm-inference-service |
|
/test e2e-llm-inference-service |
…TPRoute creation Removed the premature InferencePool readiness check that caused E2E test failures. The Gateway Controller can only populate InferencePool status after an HTTPRoute references the pool. Checking readiness before the route exists always fails due to empty status. Readiness is now evaluated in EvaluateInferencePoolConditions(), which runs after reconcileHTTPRoutes() creates the route. This allows the Gateway Controller to populate pool status before evaluation, fixing the issue where neither v1 nor v1alpha2 pools were becoming ready. Signed-off-by: Killian Golds <kgolds@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED
…pport OpenShift CI was using the standard Gateway Controller, which cannot manage InferencePools, causing all scheduler-based E2E tests to fail. Switched to upstream Istio with GIE v1.0.0 support to enable proper InferencePool reconciliation. Signed-off-by: Killian Golds <kgolds@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED fix(ci): install GIE controller in OpenShift E2E for InferencePool support OpenShift CI was using the standard Gateway Controller, which cannot manage InferencePools, causing scheduler E2E tests to fail with Neither v1 nor v1alpha2 InferencePool is ready. Switched to Istio with GIE support to provide a Gateway Controller that can reconcile InferencePool resources and populate their Status.Parents[]. Changes: - Install GIE v1.0.0 CRDs (InferencePool, InferenceModel) - Deploy Istio v1.27-alpha with GIE controller support - Create GatewayClass pointing to istio.io/gateway-controller - Skip Gateway API CRDs (OpenShift 4.19.9+ manages them) Signed-off-by: Killian Golds <kgolds@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED
Added ENABLE_GATEWAY_API_INFERENCE_EXTENSION=true to istiod to enable InferencePool status population by Istio’s Gateway Controller. This hopefully fixes InferencePool is not ready errors in OpenShift E2E tests. Signed-off-by: Killian Golds <kgolds@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED
Implemented retry.RetryOnConflict in the Update function and reconcileInferencePoolMigration to handle resource version conflicts during v1 InferencePool migration. Fixes a race condition where HTTPRoute status and spec updates conflicted, causing four integration test failures. Signed-off-by: Killian Golds <kgolds@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED
E2E tests expect the gatewayClassName openshift-default, but the Istio setup script was creating istio, causing all tests to fail with GatewayClass not found errors. Signed-off-by: Killian Golds <kgolds@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED
Replaced personal fork workaround with the upstream gateway-api main branch now that PR kserve#4178 (validation markers fix) has been merged upstream. Signed-off-by: Killian Golds <kgolds@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED
…test timing Added retry logic with exponential backoff for finalizer add/remove operations to handle resource version conflicts during concurrent reconciliation. Updated the integration test to wait for asynchronous InferencePool migration completion before asserting backend ref configuration, fixing a race condition where the test checked HTTPRoute backend refs before the controller finished updating weights. Signed-off-by: Killian Golds <kgolds@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED
Updated go.sum and openapi_generated.go with changes from the precommit hook. Signed-off-by: Killian Golds <kgolds@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED
Moved HTTPRoute trimming logic to apply to both scheduler and no-scheduler cases. The well-known config defines three rules, but GIE v1 migration requires a single rule with dual backend refs. Fixes four failing integration tests. Signed-off-by: Killian Golds <kgolds@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED
Signed-off-by: Killian Golds <kgolds@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED
Co-authored-by: Pierangelo Di Pilato <pierdipi@redhat.com> Signed-off-by: Killian Golds <91667190+KillianGolds@users.noreply.github.com>
…validation Added missing required fields to the InferencePool status to satisfy GIE v1 validation requirements. Signed-off-by: Killian Golds <kgolds@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED
Fixed a pointer comparison bug where Service backends were never detected.
Changed from pointer comparison to value comparison:
- if backendRef.Kind == &serviceKind {
+ if backendRef.Kind != nil && *backendRef.Kind == serviceKind {
This bug caused status URLs to incorrectly use InferencePool paths instead
of Service paths, leading to duplicated paths in some test scenarios
(e.g., /v1/completions/v1/completions).
Signed-off-by: Killian Golds <kgolds@redhat.com>
rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED
Added rhcl_available fixture and skipif decorators to auth tests per PR #939. Tests now skip gracefully when RHCL is not installed in the environment. Signed-off-by: Killian Golds <kgolds@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED
…ions LLMInferenceServiceConfig now uses plain string types instead of GIE v1 types directly. This fixes deserialization issues in the ODH Model Controller. The controller now converts these plain types to GIE v1 types at runtime when creating actual InferencePool resources. Signed-off-by: Killian Golds <kgolds@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED
Fixed linter issues and updated generated code: - Removed unnecessary string conversion in EPPServiceName - Pre-allocated targetPorts slice in ConvertKServePoolToV1 - Fixed whitespace alignment in tests - Regenerated OpenAPI schemas - Updated go.sum dependencies Signed-off-by: Killian Golds <kgolds@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED
Changed v1alpha2 pool weight from 0 to 1 to enable automatic fallback. Previous behavior (weight: 0): - v1 ready → uses v1 - v1 NOT ready → route fails New behavior (weight: 1): - v1 ready → uses v1 (priority with 100:1 ratio) - v1 NOT ready → falls back to v1alpha2 Signed-off-by: Killian Golds <kgolds@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED
Changed from @pytest.mark.skipif decorator to runtime pytest.skip() within the test body. The fixture cannot be referenced in the skipif expression context. Error: NameError: name 'rhcl_available' is not defined Also includes precommit whitespace fixes in openapi_generated.go. Signed-off-by: Killian Golds <kgolds@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED
Removed duplicate test fixture builder functions with hardcoded weights in favor of flexible, weight-parameterized versions: - Removed BackendRefInferencePool() → use BackendRefInferencePoolV1Alpha2(name, 1) - Removed BackendRefService() → use BackendRefServiceWithWeight(name, 1) Also fixed test expectations for the dual-pool fallback strategy: - Changed v1alpha2 InferencePool weight from 0 to 1 to enable automatic fallback when v1 is not ready Updated all test usages to use the parameterized builder functions. Signed-off-by: Killian Golds <kgolds@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED
Signed-off-by: Killian Golds <kgolds@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED
e68bedd to
2f227a8
Compare
|
/test e2e-llm-inference-service |
The controller was using .Owns() for v1 InferencePool, which only watches metadata and spec changes, not status updates. This caused the migration annotation test to timeout because the controller did not reconcile when the pool became ready. Added a .Watches() handler to trigger reconciliation on v1 InferencePool status changes, allowing the migration logic to detect when v1 becomes ready and set the migration annotation to switch traffic from v1alpha2. Signed-off-by: Killian Golds <kgolds@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED
|
/test e2e-llm-inference-service |
|
@KillianGolds: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
Superseded by #948 |
What this PR does / why we need it:
Migrates the Gateway API Inference Extension (GIE) from v1alpha2 to v1.0.0 while maintaining backward compatibility through a dual-pool strategy.
The controller creates both v1 and v1alpha2 InferencePool objects simultaneously, using v1 as the primary interface while allowing v1alpha2 to function as a fallback in environments without GIE v1 controller integration (e.g., current OpenShift).
Key changes:
ToCorev1PodSpec()method for structured-merge-diff v6 compatibilityWhich issue(s) this PR fixes:
Fixes RHOAIENG-34472
Type of changes
Upstream Blockers Resolved:
This PR encountered several upstream dependency issues requiring coordination with multiple projects:
GIE v1.0.0 Validation Bug (kubernetes-sigs/gateway-api-inference-extension#1679)
Kubernetes v0.34 Upgrade Requirement
Gateway API Validation Markers Bug (kubernetes-sigs/gateway-api#4172)
Feature/Issue validation/testing:
Special notes for your reviewer:
Known Limitations:
Please confirm that if this PR changes any image versions, then that’s the sole change this PR makes.
Checklist:
Release note:
Re-running failed tests: