Skip to content

feat: implement postgres provisioning logic for database trait (#34)#50

Merged
NgocAnhDo26 merged 3 commits into
features/database-persistencefrom
features/operator/postgres-provisioning
Mar 13, 2026
Merged

feat: implement postgres provisioning logic for database trait (#34)#50
NgocAnhDo26 merged 3 commits into
features/database-persistencefrom
features/operator/postgres-provisioning

Conversation

@PhamHoangKha1403
Copy link
Copy Markdown
Member

@PhamHoangKha1403 PhamHoangKha1403 commented Mar 7, 2026

This PR addresses Issue #34 by implementing the core stateful provisioning logic in the Helios Operator. It reacts to the CUE #DatabaseTrait definition and provisions a working Postgres instance directly within the target cluster.
The implementation ensures proper interaction between the CUE schema and the Go Operator, cleanly separating the responsibility of deploying stateful workloads from the GitOps pipeline.
Closes #34

Summary by CodeRabbit

Ghi chú Phát hành

  • Tính năng Mới

    • Hỗ trợ cấp phát PostgreSQL với StatefulSet và Service tự động.
    • Thêm luồng reconciliation để tạo cơ sở dữ liệu từ HeliosApp.
    • Cấu hình mặc định cho PostgreSQL (phiên bản 16, cổng 5432, lưu trữ 1Gi).
  • Kiểm thử

    • Thêm bộ kiểm thử toàn diện cho cấp phát tài nguyên cơ sở dữ liệu và quy trình reconciliation.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 8, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: e427b091-c851-4f97-8383-74ab232f12e5

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

PR này triển khai logic cấp phát cơ sở dữ liệu Postgres trong Operator bằng cách thêm chức năng tạo StatefulSet và Service, hằng số mặc định, bộ kiểm thử, và tích hợp reconciliation instance cơ sở dữ liệu vào luồng HeliosApp controller.

Changes

Cohort / File(s) Summary
Database Resource Implementation
apps/operator/internal/controller/database_resources.go
Thêm logic cấp phát cơ sở dữ liệu với hàm GenerateDatabaseStatefulSetGenerateDatabaseService, hằng số Postgres mặc định (phiên bản, cổng, bộ nhớ, đường dẫn dữ liệu), phương thức reconciliation reconcileDatabaseInstance, và helper resourceMustParse để phân tích số lượng tài nguyên Kubernetes.
Database Tests
apps/operator/internal/controller/database_resources_test.go
Thêm bộ kiểm thử toàn diện cho StatefulSet và Service generator, bao gồm xác thực metadata, biến môi trường Postgres, probe sống còn, cấu hình lưu trữ, và kiểm thử end-to-end reconciliation với các kịch bản không có trait hoặc loại non-Postgres.
Controller Integration
apps/operator/internal/controller/heliosapp_controller.go
Thêm giai đoạn PHASE 0.7 "Database Instance Provisioning" vào quy trình Reconcile sau xử lý secret, và mở rộng SetupWithManager để theo dõi StatefulSet thông qua Owns(&appsv1.StatefulSet{}).
Test Infrastructure
apps/operator/internal/controller/suite_test.go
Cập nhật getFirstFoundEnvTestBinaryDir để sử dụng filepath.WalkDir tìm kiếm sớm etcd binary dưới thư mục bin/k8s trước khi quay lại hành vi ReadDir gốc.
CUE Trait Definitions
cue/definitions/traits/database.cue
Thêm liên kết helper nội bộ _dbHostName_secretName để tập trung hóa đặt tên, cập nhật tài liệu làm rõ rằng Go Operator xử lý cấp phát StatefulSet/Service.

Sequence Diagram(s)

sequenceDiagram
    participant HC as HeliosApp Controller
    participant DB as Database Reconciler
    participant K8s as Kubernetes API
    participant STS as StatefulSet
    participant SEC as Secret
    
    HC->>DB: reconcileDatabaseInstance(ctx, app)
    DB->>K8s: Check if database secret exists
    K8s-->>DB: Secret found
    DB->>K8s: Check if StatefulSet exists
    alt StatefulSet does not exist
        DB->>DB: GenerateDatabaseStatefulSet()
        DB->>STS: Create StatefulSet with<br/>POSTGRES_DB, POSTGRES_USER,<br/>POSTGRES_PASSWORD from Secret
        STS-->>K8s: StatefulSet created
    end
    DB->>K8s: Check if Service exists
    alt Service does not exist
        DB->>DB: GenerateDatabaseService()
        DB->>K8s: Create headless Service
        K8s-->>DB: Service created
    end
    DB-->>HC: Reconciliation complete
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 Hoppy code hops through the database door,
StatefulSets sprouting, Secrets galore,
Postgres pods dance with grace and style,
The Operator's magic makes infrastructure smile!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 55.56% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title clearly and concisely describes the main change: implementing Postgres provisioning logic for the database trait, which aligns with the primary objective of the changeset.
Linked Issues check ✅ Passed The PR successfully implements both core requirements from issue #34: provisions a Postgres StatefulSet with database.name via POSTGRES_DB env var, and configures pods to use K8S Secrets for credentials.
Out of Scope Changes check ✅ Passed All changes are directly related to Postgres provisioning. The addition of etcd binary detection in suite_test.go is a minor supporting change for test infrastructure and does not constitute out-of-scope work.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch features/operator/postgres-provisioning

Comment @coderabbitai help to get the list of available commands and usage tips.

@NgocAnhDo26
Copy link
Copy Markdown
Contributor

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 8, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@NgocAnhDo26 NgocAnhDo26 linked an issue Mar 8, 2026 that may be closed by this pull request
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@apps/operator/internal/controller/database_resources.go`:
- Around line 360-384: The reconcile currently logs and skips when a StatefulSet
(existingSts) or Service already exists (after r.Get/r.Create paths), which
causes spec drift when dbName, version, storage, or port change; instead, fetch
the existing objects (existingSts / existingSvc), compare relevant mutable
fields from the desired sts and svc (container image/version, resources/storage
requests, container ports, service ports, labels/annotations) and perform a
patch/update (using r.Update or a strategic merge/patch) to reconcile mutable
changes; for immutable fields detect differences and return a clear error
indicating which field is immutable so the caller/user can take corrective
action, and ensure the same compare-and-patch logic is applied in the analogous
Service handling (the block around lines 395-418 referencing svc/existingSvc).
- Around line 337-341: The Postgres container isn't configured to listen on a
custom port from dbTrait.Properties.Port: ensure the postgres container's
environment includes PGPORT set to the resolved port (the same value used in
ContainerPort and Service) and update readiness/liveness probe commands that
call pg_isready to pass -p <port> (or use the resolved port variable) so probes
check the correct port; alternatively, implement validation in the
reconciliation (using dbTrait.Properties.Port) to reject non-default ports.
Update the postgres container setup (where the ContainerPort and Service are
created) to add the PGPORT env var and to modify probe Command args to include
"-p" with the resolved port, or add a validation block that errors on port !=
DefaultPostgresPort.
- Around line 555-557: The code currently calls resourceMustParse(storage)
(which panics on invalid input) inside GenerateDatabaseStatefulSet; replace that
usage with resource.ParseQuantity to safely parse the storage string, check and
handle the returned error, and remove/stop using resourceMustParse(). Update
GenerateDatabaseStatefulSet and its callers (notably reconcileDatabaseResources)
to return an error when parsing fails so reconcileDatabaseResources can
propagate the error and update the HeliosApp status instead of letting the
controller panic; ensure error flows from resource.ParseQuantity ->
GenerateDatabaseStatefulSet -> reconcileDatabaseResources and is used to set the
CR status/condition accordingly.

In `@apps/operator/internal/controller/heliosapp_controller.go`:
- Around line 131-141: The database provisioning block
(reconcileDatabaseInstance) is currently executed after the image-guard in
Reconcile, so new apps without an image skip Phase 0.5/0.7; move or call
reconcileDatabaseInstance(ctx, &heliosApp) before the image-existence guard (or
extract it out of the image check) so stateful provisioning runs regardless of
component image presence; ensure error handling/logging and status update still
use the same logic (log.Error and r.updateStatus with PhaseFailed) when
reconcileDatabaseInstance returns an error.
- Around line 521-522: The controller calls Owns(&appsv1.StatefulSet{}) and uses
reconcileDatabaseInstance() to create/manage StatefulSets but the
kubebuilder:rbac markers lack permissions for statefulsets; update the
kubebuilder:rbac markers (near the existing markers around lines 57-62) to add a
rule for resources=statefulsets (apps) with verbs including
get,list,watch,create,update,patch,delete so the generated RBAC grants the
controller permissions to manage StatefulSets at runtime.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 0b3a9d39-dbab-4b89-ab6a-6b897adf0b7e

📥 Commits

Reviewing files that changed from the base of the PR and between d3f4193 and 0c1561d.

📒 Files selected for processing (5)
  • apps/operator/internal/controller/database_resources.go
  • apps/operator/internal/controller/database_resources_test.go
  • apps/operator/internal/controller/heliosapp_controller.go
  • apps/operator/internal/controller/suite_test.go
  • cue/definitions/traits/database.cue

Comment thread apps/operator/internal/controller/database_resources.go
Comment thread apps/operator/internal/controller/database_resources.go
Comment on lines +555 to +557
Resources: corev1.VolumeResourceRequirements{
Requests: corev1.ResourceList{
corev1.ResourceStorage: resourceMustParse(storage),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

cat -n apps/operator/internal/controller/database_resources.go | sed -n '540,570p'

Repository: helios-platform-team/helios-platform

Length of output: 1117


🏁 Script executed:

cat -n apps/operator/internal/controller/database_resources.go | sed -n '580,610p'

Repository: helios-platform-team/helios-platform

Length of output: 1075


🏁 Script executed:

# Find the resourceMustParse function definition
rg -n "func.*resourceMustParse|resourceMustParse\s*:=" apps/operator/internal/controller/ -A 3

Repository: helios-platform-team/helios-platform

Length of output: 405


🏁 Script executed:

# Check what MustParse does and locate its definition
rg -n "MustParse" apps/operator/internal/controller/database_resources.go -B 2 -A 2

Repository: helios-platform-team/helios-platform

Length of output: 912


🏁 Script executed:

cat -n apps/operator/internal/controller/database_resources.go | sed -n '480,565p'

Repository: helios-platform-team/helios-platform

Length of output: 3163


🏁 Script executed:

# Search for function definitions near line 555 and understand storage parameter
rg -n "func.*GenerateDatabase" apps/operator/internal/controller/database_resources.go -A 60 | head -100

Repository: helios-platform-team/helios-platform

Length of output: 3448


🏁 Script executed:

# Check HeliosApp type/struct to understand status field
rg -n "type HeliosApp" apps/operator/ -A 20

Repository: helios-platform-team/helios-platform

Length of output: 8304


🏁 Script executed:

rg -n "GenerateDatabaseStatefulSet" apps/operator/ -B 3 -A 3

Repository: helios-platform-team/helios-platform

Length of output: 2443


🏁 Script executed:

# Check the reconciliation loop to see error handling patterns
rg -n "func.*Reconcile" apps/operator/internal/controller/heliosapp_controller.go -A 50 | head -80

Repository: helios-platform-team/helios-platform

Length of output: 3578


🏁 Script executed:

# Check if there's other places using ParseQuantity for error handling
rg -n "ParseQuantity" apps/operator/ -B 2 -A 2

Repository: helios-platform-team/helios-platform

Length of output: 62


🏁 Script executed:

cat -n apps/operator/internal/controller/database_resources.go | sed -n '330,365p'

Repository: helios-platform-team/helios-platform

Length of output: 1488


🏁 Script executed:

# Check if there are any error handling patterns after calling GenerateDatabaseStatefulSet
cat -n apps/operator/internal/controller/database_resources.go | sed -n '340,380p'

Repository: helios-platform-team/helios-platform

Length of output: 1703


🏁 Script executed:

# Search for where storage parameter comes from in the database trait
rg -n "type DatabaseTraitProperties" apps/operator/ -A 10

Repository: helios-platform-team/helios-platform

Length of output: 1084


🏁 Script executed:

# Find all usages of resourceMustParse in the file
rg -n "resourceMustParse" apps/operator/internal/controller/database_resources.go -B 2 -A 1

Repository: helios-platform-team/helios-platform

Length of output: 874


Xử lý lỗi giá trị storage không hợp lệ để tránh panic controller

Khi người dùng cung cấp giá trị storage sai định dạng trong CR (ví dụ: "invalid-size"), hàm resourceMustParse() sẽ gọi resource.MustParse() và gây ra panic, làm crash toàn bộ controller process. Hiện tại không có xác thực giá trị storage trước khi truyền vào GenerateDatabaseStatefulSet() ở dòng 350-352.

Thay vì sử dụng resourceMustParse(), nên chuyển sang resource.ParseQuantity() để có thể xử lý lỗi, sau đó propagate lỗi qua return value của reconcileDatabaseResources() để cập nhật trạng thái của HeliosApp thay vì để controller panic.

Vị trí ảnh hưởng
  • Dòng 557: resourceMustParse(storage) trong VolumeResourceRequirements
  • Dòng 599: Định nghĩa hàm resourceMustParse() có comment "Panics on invalid input"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/operator/internal/controller/database_resources.go` around lines 555 -
557, The code currently calls resourceMustParse(storage) (which panics on
invalid input) inside GenerateDatabaseStatefulSet; replace that usage with
resource.ParseQuantity to safely parse the storage string, check and handle the
returned error, and remove/stop using resourceMustParse(). Update
GenerateDatabaseStatefulSet and its callers (notably reconcileDatabaseResources)
to return an error when parsing fails so reconcileDatabaseResources can
propagate the error and update the HeliosApp status instead of letting the
controller panic; ensure error flows from resource.ParseQuantity ->
GenerateDatabaseStatefulSet -> reconcileDatabaseResources and is used to set the
CR status/condition accordingly.

Comment on lines +131 to +141
// ------------------------------------------------------------------
// PHASE 0.7: Database Instance Provisioning
// Provision StatefulSets and headless Services for database traits.
// Runs AFTER secrets so that the credential Secret already exists
// when the database pod starts.
// ------------------------------------------------------------------
if err := r.reconcileDatabaseInstance(ctx, &heliosApp); err != nil {
log.Error(err, "Failed to reconcile database instance")
r.updateStatus(ctx, &heliosApp, appv1alpha1.PhaseFailed, fmt.Sprintf("Database instance provisioning failed: %v", err))
return ctrl.Result{}, err
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Pha provision DB vẫn bị chặn bởi guard image phía trên

Reconcile return sớm ở Line 93-97 khi component chưa có image, nên phase 0.5/0.7 ở block này sẽ không chạy cho app mới đang đợi build. Như vậy Postgres vẫn còn phụ thuộc vào pipeline build/GitOps, trái với mục tiêu tách stateful provisioning khỏi pipeline. Nên dời phần database lên trước guard đó, hoặc tách nó khỏi check image.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/operator/internal/controller/heliosapp_controller.go` around lines 131 -
141, The database provisioning block (reconcileDatabaseInstance) is currently
executed after the image-guard in Reconcile, so new apps without an image skip
Phase 0.5/0.7; move or call reconcileDatabaseInstance(ctx, &heliosApp) before
the image-existence guard (or extract it out of the image check) so stateful
provisioning runs regardless of component image presence; ensure error
handling/logging and status update still use the same logic (log.Error and
r.updateStatus with PhaseFailed) when reconcileDatabaseInstance returns an
error.

Comment thread apps/operator/internal/controller/heliosapp_controller.go
… "- Move database provisioning above image validation guard

- Fix panic on invalid storage by using ParseQuantity
- Add PGPORT env var and update health probes to use custom port
- Handle StatefulSet and Service drift by actively updating mutable fields
@NgocAnhDo26 NgocAnhDo26 merged commit b0a4d24 into features/database-persistence Mar 13, 2026
1 check passed
@NgocAnhDo26 NgocAnhDo26 deleted the features/operator/postgres-provisioning branch March 13, 2026 01:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Impl] Postgres Provisioning Operator Logic

2 participants