feat: implement postgres provisioning logic for database trait (#34)#50
Conversation
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughPR này triển khai logic cấp phát cơ sở dữ liệu Postgres trong Operator bằng cách thêm chức năng tạo StatefulSet và Service, hằng số mặc định, bộ kiểm thử, và tích hợp reconciliation instance cơ sở dữ liệu vào luồng HeliosApp controller. Changes
Sequence Diagram(s)sequenceDiagram
participant HC as HeliosApp Controller
participant DB as Database Reconciler
participant K8s as Kubernetes API
participant STS as StatefulSet
participant SEC as Secret
HC->>DB: reconcileDatabaseInstance(ctx, app)
DB->>K8s: Check if database secret exists
K8s-->>DB: Secret found
DB->>K8s: Check if StatefulSet exists
alt StatefulSet does not exist
DB->>DB: GenerateDatabaseStatefulSet()
DB->>STS: Create StatefulSet with<br/>POSTGRES_DB, POSTGRES_USER,<br/>POSTGRES_PASSWORD from Secret
STS-->>K8s: StatefulSet created
end
DB->>K8s: Check if Service exists
alt Service does not exist
DB->>DB: GenerateDatabaseService()
DB->>K8s: Create headless Service
K8s-->>DB: Service created
end
DB-->>HC: Reconciliation complete
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@apps/operator/internal/controller/database_resources.go`:
- Around line 360-384: The reconcile currently logs and skips when a StatefulSet
(existingSts) or Service already exists (after r.Get/r.Create paths), which
causes spec drift when dbName, version, storage, or port change; instead, fetch
the existing objects (existingSts / existingSvc), compare relevant mutable
fields from the desired sts and svc (container image/version, resources/storage
requests, container ports, service ports, labels/annotations) and perform a
patch/update (using r.Update or a strategic merge/patch) to reconcile mutable
changes; for immutable fields detect differences and return a clear error
indicating which field is immutable so the caller/user can take corrective
action, and ensure the same compare-and-patch logic is applied in the analogous
Service handling (the block around lines 395-418 referencing svc/existingSvc).
- Around line 337-341: The Postgres container isn't configured to listen on a
custom port from dbTrait.Properties.Port: ensure the postgres container's
environment includes PGPORT set to the resolved port (the same value used in
ContainerPort and Service) and update readiness/liveness probe commands that
call pg_isready to pass -p <port> (or use the resolved port variable) so probes
check the correct port; alternatively, implement validation in the
reconciliation (using dbTrait.Properties.Port) to reject non-default ports.
Update the postgres container setup (where the ContainerPort and Service are
created) to add the PGPORT env var and to modify probe Command args to include
"-p" with the resolved port, or add a validation block that errors on port !=
DefaultPostgresPort.
- Around line 555-557: The code currently calls resourceMustParse(storage)
(which panics on invalid input) inside GenerateDatabaseStatefulSet; replace that
usage with resource.ParseQuantity to safely parse the storage string, check and
handle the returned error, and remove/stop using resourceMustParse(). Update
GenerateDatabaseStatefulSet and its callers (notably reconcileDatabaseResources)
to return an error when parsing fails so reconcileDatabaseResources can
propagate the error and update the HeliosApp status instead of letting the
controller panic; ensure error flows from resource.ParseQuantity ->
GenerateDatabaseStatefulSet -> reconcileDatabaseResources and is used to set the
CR status/condition accordingly.
In `@apps/operator/internal/controller/heliosapp_controller.go`:
- Around line 131-141: The database provisioning block
(reconcileDatabaseInstance) is currently executed after the image-guard in
Reconcile, so new apps without an image skip Phase 0.5/0.7; move or call
reconcileDatabaseInstance(ctx, &heliosApp) before the image-existence guard (or
extract it out of the image check) so stateful provisioning runs regardless of
component image presence; ensure error handling/logging and status update still
use the same logic (log.Error and r.updateStatus with PhaseFailed) when
reconcileDatabaseInstance returns an error.
- Around line 521-522: The controller calls Owns(&appsv1.StatefulSet{}) and uses
reconcileDatabaseInstance() to create/manage StatefulSets but the
kubebuilder:rbac markers lack permissions for statefulsets; update the
kubebuilder:rbac markers (near the existing markers around lines 57-62) to add a
rule for resources=statefulsets (apps) with verbs including
get,list,watch,create,update,patch,delete so the generated RBAC grants the
controller permissions to manage StatefulSets at runtime.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 0b3a9d39-dbab-4b89-ab6a-6b897adf0b7e
📒 Files selected for processing (5)
apps/operator/internal/controller/database_resources.goapps/operator/internal/controller/database_resources_test.goapps/operator/internal/controller/heliosapp_controller.goapps/operator/internal/controller/suite_test.gocue/definitions/traits/database.cue
| Resources: corev1.VolumeResourceRequirements{ | ||
| Requests: corev1.ResourceList{ | ||
| corev1.ResourceStorage: resourceMustParse(storage), |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cat -n apps/operator/internal/controller/database_resources.go | sed -n '540,570p'Repository: helios-platform-team/helios-platform
Length of output: 1117
🏁 Script executed:
cat -n apps/operator/internal/controller/database_resources.go | sed -n '580,610p'Repository: helios-platform-team/helios-platform
Length of output: 1075
🏁 Script executed:
# Find the resourceMustParse function definition
rg -n "func.*resourceMustParse|resourceMustParse\s*:=" apps/operator/internal/controller/ -A 3Repository: helios-platform-team/helios-platform
Length of output: 405
🏁 Script executed:
# Check what MustParse does and locate its definition
rg -n "MustParse" apps/operator/internal/controller/database_resources.go -B 2 -A 2Repository: helios-platform-team/helios-platform
Length of output: 912
🏁 Script executed:
cat -n apps/operator/internal/controller/database_resources.go | sed -n '480,565p'Repository: helios-platform-team/helios-platform
Length of output: 3163
🏁 Script executed:
# Search for function definitions near line 555 and understand storage parameter
rg -n "func.*GenerateDatabase" apps/operator/internal/controller/database_resources.go -A 60 | head -100Repository: helios-platform-team/helios-platform
Length of output: 3448
🏁 Script executed:
# Check HeliosApp type/struct to understand status field
rg -n "type HeliosApp" apps/operator/ -A 20Repository: helios-platform-team/helios-platform
Length of output: 8304
🏁 Script executed:
rg -n "GenerateDatabaseStatefulSet" apps/operator/ -B 3 -A 3Repository: helios-platform-team/helios-platform
Length of output: 2443
🏁 Script executed:
# Check the reconciliation loop to see error handling patterns
rg -n "func.*Reconcile" apps/operator/internal/controller/heliosapp_controller.go -A 50 | head -80Repository: helios-platform-team/helios-platform
Length of output: 3578
🏁 Script executed:
# Check if there's other places using ParseQuantity for error handling
rg -n "ParseQuantity" apps/operator/ -B 2 -A 2Repository: helios-platform-team/helios-platform
Length of output: 62
🏁 Script executed:
cat -n apps/operator/internal/controller/database_resources.go | sed -n '330,365p'Repository: helios-platform-team/helios-platform
Length of output: 1488
🏁 Script executed:
# Check if there are any error handling patterns after calling GenerateDatabaseStatefulSet
cat -n apps/operator/internal/controller/database_resources.go | sed -n '340,380p'Repository: helios-platform-team/helios-platform
Length of output: 1703
🏁 Script executed:
# Search for where storage parameter comes from in the database trait
rg -n "type DatabaseTraitProperties" apps/operator/ -A 10Repository: helios-platform-team/helios-platform
Length of output: 1084
🏁 Script executed:
# Find all usages of resourceMustParse in the file
rg -n "resourceMustParse" apps/operator/internal/controller/database_resources.go -B 2 -A 1Repository: helios-platform-team/helios-platform
Length of output: 874
Xử lý lỗi giá trị storage không hợp lệ để tránh panic controller
Khi người dùng cung cấp giá trị storage sai định dạng trong CR (ví dụ: "invalid-size"), hàm resourceMustParse() sẽ gọi resource.MustParse() và gây ra panic, làm crash toàn bộ controller process. Hiện tại không có xác thực giá trị storage trước khi truyền vào GenerateDatabaseStatefulSet() ở dòng 350-352.
Thay vì sử dụng resourceMustParse(), nên chuyển sang resource.ParseQuantity() để có thể xử lý lỗi, sau đó propagate lỗi qua return value của reconcileDatabaseResources() để cập nhật trạng thái của HeliosApp thay vì để controller panic.
Vị trí ảnh hưởng
- Dòng 557:
resourceMustParse(storage)trong VolumeResourceRequirements - Dòng 599: Định nghĩa hàm
resourceMustParse()có comment "Panics on invalid input"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@apps/operator/internal/controller/database_resources.go` around lines 555 -
557, The code currently calls resourceMustParse(storage) (which panics on
invalid input) inside GenerateDatabaseStatefulSet; replace that usage with
resource.ParseQuantity to safely parse the storage string, check and handle the
returned error, and remove/stop using resourceMustParse(). Update
GenerateDatabaseStatefulSet and its callers (notably reconcileDatabaseResources)
to return an error when parsing fails so reconcileDatabaseResources can
propagate the error and update the HeliosApp status instead of letting the
controller panic; ensure error flows from resource.ParseQuantity ->
GenerateDatabaseStatefulSet -> reconcileDatabaseResources and is used to set the
CR status/condition accordingly.
| // ------------------------------------------------------------------ | ||
| // PHASE 0.7: Database Instance Provisioning | ||
| // Provision StatefulSets and headless Services for database traits. | ||
| // Runs AFTER secrets so that the credential Secret already exists | ||
| // when the database pod starts. | ||
| // ------------------------------------------------------------------ | ||
| if err := r.reconcileDatabaseInstance(ctx, &heliosApp); err != nil { | ||
| log.Error(err, "Failed to reconcile database instance") | ||
| r.updateStatus(ctx, &heliosApp, appv1alpha1.PhaseFailed, fmt.Sprintf("Database instance provisioning failed: %v", err)) | ||
| return ctrl.Result{}, err | ||
| } |
There was a problem hiding this comment.
Pha provision DB vẫn bị chặn bởi guard image phía trên
Reconcile return sớm ở Line 93-97 khi component chưa có image, nên phase 0.5/0.7 ở block này sẽ không chạy cho app mới đang đợi build. Như vậy Postgres vẫn còn phụ thuộc vào pipeline build/GitOps, trái với mục tiêu tách stateful provisioning khỏi pipeline. Nên dời phần database lên trước guard đó, hoặc tách nó khỏi check image.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@apps/operator/internal/controller/heliosapp_controller.go` around lines 131 -
141, The database provisioning block (reconcileDatabaseInstance) is currently
executed after the image-guard in Reconcile, so new apps without an image skip
Phase 0.5/0.7; move or call reconcileDatabaseInstance(ctx, &heliosApp) before
the image-existence guard (or extract it out of the image check) so stateful
provisioning runs regardless of component image presence; ensure error
handling/logging and status update still use the same logic (log.Error and
r.updateStatus with PhaseFailed) when reconcileDatabaseInstance returns an
error.
… "- Move database provisioning above image validation guard - Fix panic on invalid storage by using ParseQuantity - Add PGPORT env var and update health probes to use custom port - Handle StatefulSet and Service drift by actively updating mutable fields
This PR addresses Issue #34 by implementing the core stateful provisioning logic in the Helios Operator. It reacts to the CUE
#DatabaseTraitdefinition and provisions a working Postgres instance directly within the target cluster.The implementation ensures proper interaction between the CUE schema and the Go Operator, cleanly separating the responsibility of deploying stateful workloads from the GitOps pipeline.
Closes #34
Summary by CodeRabbit
Ghi chú Phát hành
Tính năng Mới
Kiểm thử