Skip to content

add build expiry#239

Merged
bennyz merged 1 commit into
centos-automotive-suite:mainfrom
bennyz:expirable-builds
Apr 20, 2026
Merged

add build expiry#239
bennyz merged 1 commit into
centos-automotive-suite:mainfrom
bennyz:expirable-builds

Conversation

@bennyz

@bennyz bennyz commented Apr 19, 2026

Copy link
Copy Markdown
Contributor

Builds will be expired by default, removing their ImageBuild and clearing up resources

Summary

Related Issues

Type of Change

  • Bug fix
  • New feature
  • Breaking change
  • Documentation update
  • CI/CD improvement
  • Refactoring

Testing

  • Unit tests pass (make test)
  • Linter passes (make lint)
  • Manifests are up to date (make manifests generate)
  • Tested on OpenShift cluster (if applicable)

Summary by CodeRabbit

  • New Features

    • Builds support automatic expiration with configurable TTL; build requests may include TTL (empty = server default, "0" = no expiry).
    • CLI gained a --ttl option for build, disk, and build-dev commands.
    • Build responses now include an expiresAt timestamp.
  • Configuration

    • Server/operator configurable default and maximum TTLs, plus an opt-out annotation to disable expiry.
  • Tests

    • Added unit tests covering TTL parsing, clamping, and expiry behavior.

@coderabbitai

coderabbitai Bot commented Apr 19, 2026

Copy link
Copy Markdown

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds optional per-build TTL and expiry tracking: API types gain spec.ttl and status.expiresAt; OperatorConfig gains default/max TTL knobs; CLI accepts --ttl; server resolves/clamps TTL; controller computes, stores, requeues for, and deletes expired ImageBuilds.

Changes

Cohort / File(s) Summary
API Types & Deepcopy
api/v1alpha1/imagebuild_types.go, api/v1alpha1/operatorconfig_types.go, api/v1alpha1/zz_generated.deepcopy.go
Added ImageBuildSpec.TTL + GetTTL(); added ImageBuildStatus.ExpiresAt; added OSBuildsConfig.DefaultBuildTTL/MaxBuildTTL fields, DefaultBuildTTL and NoExpireAnnotation constants, and getters; updated deepcopy to deep-copy ExpiresAt.
CLI & Runtime Wiring
cmd/caib/buildcmd/build.go, cmd/caib/image/image.go, cmd/caib/main.go, cmd/caib/runtime_wiring.go, cmd/caib/buildcmd/build_disk_test.go
Added nullable TTL *string to Options, new --ttl flags wired into subcommands, package-level buildTTL var, and threaded TTL through runtimeState into handlers; test helper initializes TTL.
Build API Server & Types
internal/buildapi/server.go, internal/buildapi/types.go, internal/buildapi/openapi.yaml, docs/openapi.yaml
Added ttl to BuildRequest and expiresAt to BuildResponse; implemented resolveAndClampTTL that parses/validates/clamps TTL against OperatorConfig; createBuild persists resolved TTL; getBuild/getBuildTemplate expose TTL/ExpiresAt; OpenAPI schemas updated.
Controller Reconciliation & Tests
internal/controller/imagebuild/controller.go, internal/controller/imagebuild/expiry_test.go
Refactored Reconcile flow and added checkExpiry/resolveEffectiveTTL logic to compute/update status.ExpiresAt, schedule requeues, emit expiry events, and delete expired ImageBuilds; adjusted phase result merging and cleanup error handling; added controller expiry tests.
Server TTL Unit Tests
internal/buildapi/ttl_test.go
Added Ginkgo tests covering TTL parsing, negative/invalid values, max-TTL clamping semantics, passthrough behavior for "" and "0", and OperatorConfig interaction.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant APIServer
    participant K8sAPI
    participant OperatorConfig
    participant Controller

    Client->>APIServer: POST /build (body includes ttl)
    APIServer->>APIServer: resolveAndClampTTL(req.TTL)
    APIServer->>OperatorConfig: loadOperatorConfig()
    OperatorConfig-->>APIServer: max/default TTL
    APIServer->>APIServer: parse & clamp TTL
    APIServer->>K8sAPI: Create ImageBuild (spec.ttl)
    K8sAPI-->>APIServer: Created
    APIServer-->>Client: BuildResponse (expiresAt)

    K8sAPI->>Controller: Notify ImageBuild creation/update
    Controller->>Controller: Reconcile()
    Controller->>Controller: checkExpiry() -> resolveEffectiveTTL()
    alt Not expired
        Controller->>K8sAPI: Update status.expiresAt
        K8sAPI-->>Controller: Status updated
        Controller->>Controller: Requeue until expiresAt
    else Expired
        Controller->>K8sAPI: Emit BuildExpired event
        Controller->>K8sAPI: Delete ImageBuild
        K8sAPI-->>Controller: Deleted
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • bkhizgiy

Poem

🐰
Builds now bound by hours, not by chance,
I nibbled a TTL, then I danced—
When time is up, I give a cheer,
Sweep stale artifacts far from here,
Hop, delete, and bounce—reconcile!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 38.89% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title "add build expiry" directly and clearly describes the main feature added across the changeset: build expiration functionality with TTL configuration, status tracking, and reconciliation logic.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
internal/controller/imagebuild/controller.go (1)

142-172: ⚠️ Potential issue | 🔴 Critical

Critical: checkExpiry's RequeueAfter is dropped for non-expired builds — expiry will not fire on time.

When the build is not yet expired, checkExpiry returns ctrl.Result{RequeueAfter: remaining}, false, nil. The condition expired || err != nil is false, so the returned result is discarded and reconciliation falls through to the phase switch. Terminal phases (phaseCompleted returns ctrl.Result{}, nil; phaseFailed the same when no cleanup error) return without any requeue, so the controller will not wake up at expiresAt and the ImageBuild will only be deleted on the next controller-runtime resync (up to ~10h later, assuming defaults), or whenever something else triggers reconciliation. The feature therefore does not delete builds reliably at their TTL.

Additionally, requeue signals from checkExpiry during non-terminal phases are similarly discarded.

🐛 Proposed fix: propagate the requeue and take the shorter of the two
-	if result, expired, err := r.checkExpiry(ctx, imageBuild); expired || err != nil {
-		return result, err
-	}
-
-	switch imageBuild.Status.Phase {
+	expiryResult, expired, err := r.checkExpiry(ctx, imageBuild)
+	if expired || err != nil {
+		return expiryResult, err
+	}
+
+	var phaseResult ctrl.Result
+	var phaseErr error
+	switch imageBuild.Status.Phase {
 	case "":
-		return r.handleInitialState(ctx, imageBuild)
+		phaseResult, phaseErr = r.handleInitialState(ctx, imageBuild)
 	case "Uploading":
-		return r.handleUploadingState(ctx, imageBuild)
+		phaseResult, phaseErr = r.handleUploadingState(ctx, imageBuild)
 	case phaseBuilding:
-		return r.handleBuildingState(ctx, imageBuild)
+		phaseResult, phaseErr = r.handleBuildingState(ctx, imageBuild)
 	case "Pushing":
-		return r.handlePushingState(ctx, imageBuild)
+		phaseResult, phaseErr = r.handlePushingState(ctx, imageBuild)
 	case "Flashing":
-		return r.handleFlashingState(ctx, imageBuild)
+		phaseResult, phaseErr = r.handleFlashingState(ctx, imageBuild)
 	case phaseCompleted:
-		return r.handleCompletedState(ctx, imageBuild)
+		phaseResult, phaseErr = r.handleCompletedState(ctx, imageBuild)
 	case phaseFailed:
 		if err := r.cleanupTransientSecrets(ctx, imageBuild, r.Log); err != nil {
-			return ctrl.Result{RequeueAfter: secretCleanupRequeue}, nil
+			phaseResult = ctrl.Result{RequeueAfter: secretCleanupRequeue}
 		}
-		return ctrl.Result{}, nil
 	default:
 		log.Info("Unknown phase", "phase", imageBuild.Status.Phase)
-		return ctrl.Result{}, nil
 	}
+	if phaseErr != nil {
+		return phaseResult, phaseErr
+	}
+	// Ensure we still wake up at TTL even if the phase result has no sooner requeue.
+	if expiryResult.RequeueAfter > 0 &&
+		(phaseResult.RequeueAfter == 0 || expiryResult.RequeueAfter < phaseResult.RequeueAfter) {
+		phaseResult.RequeueAfter = expiryResult.RequeueAfter
+	}
+	return phaseResult, nil

Alternatively, keep the early-return flow but also return when expiryResult.RequeueAfter > 0 && imageBuild.Status.Phase is terminal, and watch Status.ExpiresAt changes to re-trigger.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/imagebuild/controller.go` around lines 142 - 172,
checkExpiry's ctrl.Result (RequeueAfter) is being dropped when expired==false,
so expiry requeues never fire; capture the result returned by checkExpiry (e.g.
expiryResult, expired, err := r.checkExpiry(...)) and propagate its RequeueAfter
when appropriate instead of discarding it: call checkExpiry first, return
immediately if expired or err!=nil as before, but for non-expired cases keep
expiryResult and when returning from the phase switch (including terminal phases
handled by handleCompletedState, phaseFailed/cleanupTransientSecrets, and other
handlers like
handleInitialState/handleUploadingState/handleBuildingState/handlePushingState/handleFlashingState)
ensure you return the shorter/min RequeueAfter between the handler's result and
expiryResult (or expiryResult if the handler returns no requeue) so the
controller wakes at ExpiresAt; alternatively, if keeping early returns, ensure
terminal-phase returns honor expiryResult.RequeueAfter > 0 and requeue
accordingly.
🧹 Nitpick comments (2)
internal/controller/imagebuild/controller.go (2)

52-73: Nit: colocate eventReasonBuildExpired with the other event-reason constants.

eventReasonBuildExpired was added to the maxK8sNameLength block at line 54, but all other eventReason* constants live in the separate block at lines 60-73. Moving it there improves discoverability and keeps the grouping intentional.

♻️ Proposed diff
 	maxK8sNameLength = 63
-
-	eventReasonBuildExpired = "BuildExpired"
 )
@@
 const (
 	eventReasonPhaseChanged     = "PhaseChanged"
@@
 	eventReasonDiskBuildDone    = "DiskBuildCompleted"
+	eventReasonBuildExpired     = "BuildExpired"
 )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/imagebuild/controller.go` around lines 52 - 73, Move the
eventReasonBuildExpired constant out of the block with maxK8sNameLength and
colocate it with the other eventReason* constants; specifically, remove
eventReasonBuildExpired from the block containing maxK8sNameLength and add it
into the const block that contains eventReasonPhaseChanged,
eventReasonPipelineRunReady, eventReasonUploadPodReady, etc., so all
eventReason* constants (including eventReasonBuildExpired) are grouped together
for discoverability.

456-480: Default TTL fallback short-circuits the operator-config getter.

When r.Get(...) succeeds but operatorConfig.Spec.OSBuilds == nil, the else branch sets ttlStr = automotivev1alpha1.DefaultBuildTTL. This is equivalent to what GetDefaultBuildTTL() would return on a nil receiver, so the logic is correct, but the asymmetry (sometimes consulting the getter, sometimes the constant directly) is a bit awkward. More importantly: if the API fetch fails for any non-NotFound reason (e.g., transient API-server hiccup), the error is swallowed and we silently fall back to the hardcoded default, which can quietly override a cluster-admin-configured TTL. Consider surfacing non-NotFound errors so the caller can requeue instead of applying the wrong TTL.

♻️ Proposed diff
 	if ttlStr == "" {
 		operatorConfig := &automotivev1alpha1.OperatorConfig{}
-		if err := r.Get(ctx, types.NamespacedName{
-			Name: "config", Namespace: OperatorNamespace,
-		}, operatorConfig); err == nil && operatorConfig.Spec.OSBuilds != nil {
-			ttlStr = operatorConfig.Spec.OSBuilds.GetDefaultBuildTTL()
-		} else {
-			ttlStr = automotivev1alpha1.DefaultBuildTTL
-		}
+		err := r.Get(ctx, types.NamespacedName{
+			Name: "config", Namespace: OperatorNamespace,
+		}, operatorConfig)
+		switch {
+		case err == nil:
+			ttlStr = operatorConfig.Spec.OSBuilds.GetDefaultBuildTTL()
+		case errors.IsNotFound(err):
+			ttlStr = automotivev1alpha1.DefaultBuildTTL
+		default:
+			return 0, fmt.Errorf("failed to read OperatorConfig for TTL: %w", err)
+		}
 	}

(OSBuildsConfig.GetDefaultBuildTTL() already nil-checks the receiver, so this is safe.)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/imagebuild/controller.go` around lines 456 - 480,
resolveEffectiveTTL currently swallows any r.Get(...) error and falls back to
the hardcoded DefaultBuildTTL; change the error handling so non-NotFound API
errors are returned to the caller (so the reconcile can requeue) while only
treating NotFound as "use default". Concretely: after calling r.Get(ctx, ...,
operatorConfig) if err != nil then if !apierrors.IsNotFound(err) return 0, err
else set ttlStr = automotivev1alpha1.DefaultBuildTTL; if err == nil set ttlStr =
operatorConfig.Spec.OSBuilds.GetDefaultBuildTTL() (GetDefaultBuildTTL already
nil-checks the receiver). Import k8s.io/apimachinery/pkg/api/errors as apierrors
and update resolveEffectiveTTL accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@api/v1alpha1/imagebuild_types.go`:
- Around line 80-85: You added a new field Spec.TTL and Status.ExpiresAt but did
not regenerate code/CRDs; run the generator to update deepcopy and CRD
manifests: execute "make generate manifests" (or the project equivalent) to
regenerate zz_generated.deepcopy.go and the CRD YAMLs so
ImageBuildStatus.ExpiresAt (pointer) is deep-copied and the CRD
validation/printer columns include Spec.TTL, then commit the generated files.
- Around line 80-85: The TTL behavior is inconsistent with its documentation:
either make expiry only apply to terminal builds or document that TTL can delete
in-flight builds; to fix, decide which behavior you want and implement one of
two changes — (A) restrict the expiry check in checkExpiry
(internal/controller/imagebuild/controller.go) to only consider builds with
CompletionTime set or Phase == "Completed"/"Failed" (so in-progress phases are
excluded), or (B) if mid-flight expiry is intended, update the TTL field comment
in ImageBuild (api/v1alpha1/imagebuild_types.go) and the related OperatorConfig
comment (api/v1alpha1/operatorconfig_types.go) to state TTL applies to any phase
and warn about short user-set TTLs interacting with long uploads; ensure the
updated docs mention the "0" disable sentinel and the potential for in-progress
deletion.

In `@api/v1alpha1/operatorconfig_types.go`:
- Around line 560-564: The field comment for MaxBuildTTL is inaccurate:
resolveAndClampTTL in internal/buildapi/server.go currently returns an error
when a requested TTL exceeds MaxBuildTTL rather than silently clamping it.
Update the MaxBuildTTL docstring to state that requests with ttl > MaxBuildTTL
are rejected (include that it uses Go duration format and that "0" means no
maximum) so the comment matches the behavior in resolveAndClampTTL, or
alternatively modify resolveAndClampTTL to actually clamp values to MaxBuildTTL
if you prefer the original wording — but ensure the MaxBuildTTL comment and the
resolveAndClampTTL behavior are consistent.

In `@internal/buildapi/server.go`:
- Around line 1520-1527: The code currently ignores the error from
loadOperatorConfigFn; change the logic in the block that calls
loadOperatorConfigFn so that if loadOperatorConfigFn returns a non-nil err you
only treat it as benign when apierrors.IsNotFound(err) is true—otherwise
return/surface that error (fail closed) instead of proceeding; likewise, when
reading operatorCfg.Spec.OSBuilds.GetMaxBuildTTL(), treat a time.ParseDuration
parseErr as an actual error to return (not silently ignore) so malformed TTL
policies also fail closed; reference loadOperatorConfigFn, operatorCfg,
Spec.OSBuilds.GetMaxBuildTTL, time.ParseDuration and ensure the function returns
an error when these conditions occur instead of proceeding as if no policy
exists.

In `@internal/buildapi/types.go`:
- Around line 147-148: getBuildTemplate in internal/buildapi/server.go is not
preserving BuildRequest.TTL from the template, so recreated builds drop the
original TTL; update getBuildTemplate to copy the TTL from the stored template
into the returned BuildRequest (e.g., assign template.TTL into the
BuildRequest/ctx.Request that getBuildTemplate returns) so the TTL value is
preserved when templates are used to recreate builds.
- Around line 147-148: Update the OpenAPI schemas used for API docs/clients so
BuildRequest includes a "ttl" property and BuildResponse includes an "expiresAt"
property: add "ttl" (type: string, JSON name "ttl", include the existing TTL
semantics in the description) to the BuildRequest schema that maps to the
BuildRequest TTL field, and add "expiresAt" (type: string, format: date-time,
JSON name "expiresAt") to the BuildResponse schema to match the BuildResponse
ExpiresAt field; ensure both the internal buildapi OpenAPI schema and the public
docs OpenAPI schema are updated so generated clients and docs reflect
BuildRequest.TTL and BuildResponse.ExpiresAt.

In `@internal/controller/imagebuild/controller.go`:
- Around line 391-454: The checkExpiry function should use the reconciler's
nil-safe emitter and surface TTL parse failures: replace the direct
r.Recorder.Eventf call in checkExpiry with r.emitEventf(...) (same message and
args) so it won't NPE when Recorder is nil, and when resolveEffectiveTTL returns
an error emit a Warning event via r.emitEventf(...) that includes the parse
error (and optionally set a status condition) instead of silently skipping
expiry; refer to the checkExpiry, resolveEffectiveTTL, and emitEventf symbols to
locate the changes.

---

Outside diff comments:
In `@internal/controller/imagebuild/controller.go`:
- Around line 142-172: checkExpiry's ctrl.Result (RequeueAfter) is being dropped
when expired==false, so expiry requeues never fire; capture the result returned
by checkExpiry (e.g. expiryResult, expired, err := r.checkExpiry(...)) and
propagate its RequeueAfter when appropriate instead of discarding it: call
checkExpiry first, return immediately if expired or err!=nil as before, but for
non-expired cases keep expiryResult and when returning from the phase switch
(including terminal phases handled by handleCompletedState,
phaseFailed/cleanupTransientSecrets, and other handlers like
handleInitialState/handleUploadingState/handleBuildingState/handlePushingState/handleFlashingState)
ensure you return the shorter/min RequeueAfter between the handler's result and
expiryResult (or expiryResult if the handler returns no requeue) so the
controller wakes at ExpiresAt; alternatively, if keeping early returns, ensure
terminal-phase returns honor expiryResult.RequeueAfter > 0 and requeue
accordingly.

---

Nitpick comments:
In `@internal/controller/imagebuild/controller.go`:
- Around line 52-73: Move the eventReasonBuildExpired constant out of the block
with maxK8sNameLength and colocate it with the other eventReason* constants;
specifically, remove eventReasonBuildExpired from the block containing
maxK8sNameLength and add it into the const block that contains
eventReasonPhaseChanged, eventReasonPipelineRunReady, eventReasonUploadPodReady,
etc., so all eventReason* constants (including eventReasonBuildExpired) are
grouped together for discoverability.
- Around line 456-480: resolveEffectiveTTL currently swallows any r.Get(...)
error and falls back to the hardcoded DefaultBuildTTL; change the error handling
so non-NotFound API errors are returned to the caller (so the reconcile can
requeue) while only treating NotFound as "use default". Concretely: after
calling r.Get(ctx, ..., operatorConfig) if err != nil then if
!apierrors.IsNotFound(err) return 0, err else set ttlStr =
automotivev1alpha1.DefaultBuildTTL; if err == nil set ttlStr =
operatorConfig.Spec.OSBuilds.GetDefaultBuildTTL() (GetDefaultBuildTTL already
nil-checks the receiver). Import k8s.io/apimachinery/pkg/api/errors as apierrors
and update resolveEffectiveTTL accordingly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: e0860e76-8b8b-4abb-a5f6-f9dd74f1010c

📥 Commits

Reviewing files that changed from the base of the PR and between 0ac14ab and d3b6481.

⛔ Files ignored due to path filters (2)
  • config/crd/bases/automotive.sdv.cloud.redhat.com_imagebuilds.yaml is excluded by !config/crd/bases/**
  • config/crd/bases/automotive.sdv.cloud.redhat.com_operatorconfigs.yaml is excluded by !config/crd/bases/**
📒 Files selected for processing (11)
  • api/v1alpha1/imagebuild_types.go
  • api/v1alpha1/operatorconfig_types.go
  • api/v1alpha1/zz_generated.deepcopy.go
  • cmd/caib/buildcmd/build.go
  • cmd/caib/buildcmd/build_disk_test.go
  • cmd/caib/image/image.go
  • cmd/caib/main.go
  • cmd/caib/runtime_wiring.go
  • internal/buildapi/server.go
  • internal/buildapi/types.go
  • internal/controller/imagebuild/controller.go

Comment on lines +80 to +85
// TTL is the time-to-live for this build. The build is automatically deleted
// after this duration past its completion (or creation if still in progress).
// Uses Go duration format (e.g. "24h", "72h", "168h").
// Empty uses the OperatorConfig default. Set to "0" to disable expiry.
// +optional
TTL string `json:"ttl,omitempty"`

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Reminder: regenerate CRDs/DeepCopy after type changes.

The PR checklist shows make manifests generate is unchecked. Adding Spec.TTL and Status.ExpiresAt requires regenerating both the CRD manifests (for validation, additionalPrinterColumns, etc.) and zz_generated.deepcopy.go so ImageBuildStatus.ExpiresAt (pointer type) is correctly deep-copied. Please ensure make generate manifests is run and committed.

As per coding guidelines: "After modifying types in api/v1alpha1/, run make generate manifests to regenerate DeepCopy methods and CRDs".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/v1alpha1/imagebuild_types.go` around lines 80 - 85, You added a new field
Spec.TTL and Status.ExpiresAt but did not regenerate code/CRDs; run the
generator to update deepcopy and CRD manifests: execute "make generate
manifests" (or the project equivalent) to regenerate zz_generated.deepcopy.go
and the CRD YAMLs so ImageBuildStatus.ExpiresAt (pointer) is deep-copied and the
CRD validation/printer columns include Spec.TTL, then commit the generated
files.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# Check the TTL-related code in the controller
fd 'controller.go' api/v1alpha1 internal/controller

Repository: centos-automotive-suite/automotive-dev-operator

Length of output: 423


🏁 Script executed:

# Search for DefaultBuildTTL and DefaultBuildTimeoutMinutes definitions
rg 'DefaultBuildTTL|DefaultBuildTimeoutMinutes' --type go

Repository: centos-automotive-suite/automotive-dev-operator

Length of output: 1471


🏁 Script executed:

# Look at the controller.go file around lines 425-430
rg -n 'TTL' internal/controller --type go -B 3 -A 3

Repository: centos-automotive-suite/automotive-dev-operator

Length of output: 3777


🏁 Script executed:

# Check the imagebuild_types.go file for the TTL field
fd 'imagebuild_types.go' -type f

Repository: centos-automotive-suite/automotive-dev-operator

Length of output: 261


🏁 Script executed:

# Read the controller.go file to see the complete checkExpiry implementation
cat -n internal/controller/imagebuild/controller.go | sed -n '391,454p'

Repository: centos-automotive-suite/automotive-dev-operator

Length of output: 2551


🏁 Script executed:

# Also check around the lines mentioned (425-430)
cat -n internal/controller/imagebuild/controller.go | sed -n '415,440p'

Repository: centos-automotive-suite/automotive-dev-operator

Length of output: 957


🏁 Script executed:

# Look at ImageBuild status phases
rg -A 20 'type ImageBuildStatus' api/v1alpha1/imagebuild_types.go

Repository: centos-automotive-suite/automotive-dev-operator

Length of output: 910


🏁 Script executed:

# Search for test files related to TTL logic
find . -name '*test*.go' -type f | xargs grep -l 'TTL\|checkExpiry' 2>/dev/null

Repository: centos-automotive-suite/automotive-dev-operator

Length of output: 129


🏁 Script executed:

# Look for any phase-related logic in the reconciler
rg 'Phase|Status' internal/controller/imagebuild/controller.go | head -20

Repository: centos-automotive-suite/automotive-dev-operator

Length of output: 1223


🏁 Script executed:

# Check the current doc comment in imagebuild_types.go more carefully
cat -n api/v1alpha1/imagebuild_types.go | sed -n '75,90p'

Repository: centos-automotive-suite/automotive-dev-operator

Length of output: 961


🏁 Script executed:

# Look for any NoExpireAnnotation usage or documentation
rg 'NoExpireAnnotation|no.*expire|disable.*expiry' --type go -B 2 -A 2

Repository: centos-automotive-suite/automotive-dev-operator

Length of output: 8935


🏁 Script executed:

# Check if there are any integration tests or examples for TTL behavior
find . -name '*.yaml' -o -name '*.yml' | xargs grep -l 'ttl' 2>/dev/null | head -5

Repository: centos-automotive-suite/automotive-dev-operator

Length of output: 220


🏁 Script executed:

# Check the DefaultBuildTTL comment in operatorconfig_types.go to see if intent is documented
cat -n api/v1alpha1/operatorconfig_types.go | grep -A 5 'DefaultBuildTTL ='

Repository: centos-automotive-suite/automotive-dev-operator

Length of output: 396


🏁 Script executed:

# Check if there are any tests that verify TTL behavior with different phases
rg -B 5 -A 10 'checkExpiry|TTL.*in.*progress|in.*progress.*TTL' internal/controller/imagebuild --type go

Repository: centos-automotive-suite/automotive-dev-operator

Length of output: 2729


🏁 Script executed:

# Look at the reconcile loop to see when checkExpiry is called (in what phase)
rg -B 10 'checkExpiry' internal/controller/imagebuild/controller.go | head -40

Repository: centos-automotive-suite/automotive-dev-operator

Length of output: 1013


🏁 Script executed:

# Check the full DefaultBuildTTL comment for context
cat -n api/v1alpha1/operatorconfig_types.go | sed -n '60,75p'

Repository: centos-automotive-suite/automotive-dev-operator

Length of output: 909


TTL can delete builds in any phase, including in-progress ones — clarify intent and update documentation.

The code allows deletion of builds in Pending/Uploading/Building/Pushing/Flashing phases if their CreationTimestamp + TTL elapses (since CompletionTime is nil). The constant comment in api/v1alpha1/operatorconfig_types.go (line 63) states "for completed/failed builds," but the actual checkExpiry logic (lines 425-430 of internal/controller/imagebuild/controller.go) applies TTL deletion to any phase without filtering. This creates a documentation-code mismatch.

Either confirm this behavior is intentional and update the doc comments accordingly, or restrict expiry to terminal phases (Completed/Failed) only. If intentional, also note that user-set short TTLs combined with long artifact uploads could trigger mid-flight deletion despite the 24h default.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/v1alpha1/imagebuild_types.go` around lines 80 - 85, The TTL behavior is
inconsistent with its documentation: either make expiry only apply to terminal
builds or document that TTL can delete in-flight builds; to fix, decide which
behavior you want and implement one of two changes — (A) restrict the expiry
check in checkExpiry (internal/controller/imagebuild/controller.go) to only
consider builds with CompletionTime set or Phase == "Completed"/"Failed" (so
in-progress phases are excluded), or (B) if mid-flight expiry is intended,
update the TTL field comment in ImageBuild (api/v1alpha1/imagebuild_types.go)
and the related OperatorConfig comment (api/v1alpha1/operatorconfig_types.go) to
state TTL applies to any phase and warn about short user-set TTLs interacting
with long uploads; ensure the updated docs mention the "0" disable sentinel and
the potential for in-progress deletion.

Comment thread api/v1alpha1/operatorconfig_types.go
Comment thread internal/buildapi/server.go Outdated
Comment thread internal/buildapi/types.go
Comment thread internal/controller/imagebuild/controller.go Outdated
@bennyz bennyz force-pushed the expirable-builds branch from d3b6481 to 19d95b5 Compare April 19, 2026 12:27
@bennyz

bennyz commented Apr 19, 2026

Copy link
Copy Markdown
Contributor Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Apr 19, 2026

Copy link
Copy Markdown
✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
internal/buildapi/server.go (1)

1524-1528: ⚠️ Potential issue | 🟠 Major

Fail closed on invalid maxBuildTTL values.

parseErr is ignored, and parsed non-positive values like -1h/0h can poison the effective TTL policy. Reject malformed or non-positive max values, keeping only exact "0" as the “no maximum” sentinel.

Proposed fix
 	if operatorCfg != nil && operatorCfg.Spec.OSBuilds != nil {
 		if maxStr := operatorCfg.Spec.OSBuilds.GetMaxBuildTTL(); maxStr != "" && maxStr != "0" {
-			if maxDur, parseErr := time.ParseDuration(maxStr); parseErr == nil && dur > maxDur {
+			maxDur, parseErr := time.ParseDuration(maxStr)
+			if parseErr != nil {
+				return "", fmt.Errorf("invalid maxBuildTTL %q: %w", maxStr, parseErr)
+			}
+			if maxDur <= 0 {
+				return "", fmt.Errorf("maxBuildTTL must be positive or \"0\"")
+			}
+			if dur > maxDur {
 				return maxStr, nil
 			}
 		}
 	}

After the fix, add coverage for malformed, negative, and zero-duration max policies:

#!/bin/bash
# Expect tests covering invalid/negative/zero-form MaxBuildTTL policy values.
rg -n -C2 'MaxBuildTTL|resolveAndClampTTL|invalid maxBuildTTL|negative|max.*0h' internal/buildapi --type go
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/buildapi/server.go` around lines 1524 - 1528, The current code
ignores parseErr and treats parsed non-positive durations as valid; change the
logic in the TTL resolution path (where
operatorCfg.Spec.OSBuilds.GetMaxBuildTTL() is read and time.ParseDuration is
called) so that any parse error or a parsed duration <= 0 (except the exact
string "0" which remains the “no maximum” sentinel) is rejected and does not
override the effective TTL; specifically, if GetMaxBuildTTL() != "" then first
check if the string == "0" (allow as sentinel), otherwise call
time.ParseDuration and if parseErr != nil or parsedDur <= 0 return an error (or
skip using the max) instead of accepting it; update the code paths that return
maxStr (the branch that currently returns when maxDur < dur) to only do so for
valid positive parsed durations, and add unit tests covering malformed values,
negative durations (e.g. "-1h"), and zero-duration policies.
🧹 Nitpick comments (2)
internal/controller/imagebuild/expiry_test.go (1)

88-107: Test gap: NoExpireAnnotation branch doesn't cover clearing an existing status.ExpiresAt.

checkExpiry (controller.go Lines 406-413) has a branch that calls clearExpiresAt when the no-expire annotation is present and status.ExpiresAt != nil. This test seeds an ImageBuild without a pre-existing ExpiresAt, so the clearExpiresAt path is never exercised. Consider adding a variant that pre-sets status.ExpiresAt and asserts it is reset to nil after checkExpiry. The same gap applies to TestCheckExpiry_TTLZeroDisablesExpiry (Lines 109-123) for the ttl == 0 clear branch.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/imagebuild/expiry_test.go` around lines 88 - 107, Add
test cases that exercise the branch where an existing status.ExpiresAt is
cleared: modify TestCheckExpiry_NoExpireAnnotation to seed the ImageBuild's
status.ExpiresAt with a non-nil timestamp before calling
expiryReconciler.checkExpiry (so the controller path that calls clearExpiresAt
is exercised), then after calling r.checkExpiry assert that the returned object
in the API has status.ExpiresAt == nil; repeat the same pattern for
TestCheckExpiry_TTLZeroDisablesExpiry to cover the ttl==0 branch that should
also call clearExpiresAt. Ensure you reference the ImageBuild under test (e.g.,
the "pinned-build" instance) and use the existing r.checkExpiry and r.Get
assertions to validate the ExpiresAt field is reset.
internal/controller/imagebuild/controller.go (1)

513-525: clearExpiresAt issues an unconditional Status().Update even when ExpiresAt is already nil.

Both callers in checkExpiry already gate on imageBuild.Status.ExpiresAt != nil before invoking clearExpiresAt (Lines 407, 421), so this is currently fine — but the helper itself doesn't re-check the fresh object. If a future caller skips the guard, this writes on every reconcile. Consider defensively no-op'ing when fresh.Status.ExpiresAt == nil after the GET:

♻️ Suggested tightening
 	if err := r.Get(ctx, types.NamespacedName{
 		Name: imageBuild.Name, Namespace: imageBuild.Namespace,
 	}, fresh); err != nil {
 		return err
 	}
+	if fresh.Status.ExpiresAt == nil {
+		return nil
+	}
 	fresh.Status.ExpiresAt = nil
 	return r.Status().Update(ctx, fresh)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/imagebuild/controller.go` around lines 513 - 525, The
clearExpiresAt helper currently always calls r.Status().Update even if the fresh
ImageBuild already has Status.ExpiresAt == nil; after fetching fresh in
clearExpiresAt (function ImageBuildReconciler.clearExpiresAt) add a defensive
no-op: if fresh.Status.ExpiresAt == nil return nil to avoid an unnecessary
status update (this preserves existing callers in checkExpiry but prevents
future callers from causing a redundant write).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/controller/imagebuild/controller.go`:
- Around line 492-510: The ensureExpiresAt function is comparing a
high-precision expiresAt against the stored metav1.Time which is
second-precision, causing unnecessary Status().Update calls; fix it by
truncating expiresAt to API storage precision using metav1.NewTime(expiresAt)
and compare/set against that truncated time (use t := metav1.NewTime(expiresAt)
and compare imageBuild.Status.ExpiresAt.Time.Equal(t.Time) or set
fresh.Status.ExpiresAt = &t before Update), and consider switching to
client.MergeFrom patch semantics around r.Get / r.Status().Update to avoid
conflicts with concurrent status writers in
handleBuildingState/checkBuildProgress.

---

Duplicate comments:
In `@internal/buildapi/server.go`:
- Around line 1524-1528: The current code ignores parseErr and treats parsed
non-positive durations as valid; change the logic in the TTL resolution path
(where operatorCfg.Spec.OSBuilds.GetMaxBuildTTL() is read and time.ParseDuration
is called) so that any parse error or a parsed duration <= 0 (except the exact
string "0" which remains the “no maximum” sentinel) is rejected and does not
override the effective TTL; specifically, if GetMaxBuildTTL() != "" then first
check if the string == "0" (allow as sentinel), otherwise call
time.ParseDuration and if parseErr != nil or parsedDur <= 0 return an error (or
skip using the max) instead of accepting it; update the code paths that return
maxStr (the branch that currently returns when maxDur < dur) to only do so for
valid positive parsed durations, and add unit tests covering malformed values,
negative durations (e.g. "-1h"), and zero-duration policies.

---

Nitpick comments:
In `@internal/controller/imagebuild/controller.go`:
- Around line 513-525: The clearExpiresAt helper currently always calls
r.Status().Update even if the fresh ImageBuild already has Status.ExpiresAt ==
nil; after fetching fresh in clearExpiresAt (function
ImageBuildReconciler.clearExpiresAt) add a defensive no-op: if
fresh.Status.ExpiresAt == nil return nil to avoid an unnecessary status update
(this preserves existing callers in checkExpiry but prevents future callers from
causing a redundant write).

In `@internal/controller/imagebuild/expiry_test.go`:
- Around line 88-107: Add test cases that exercise the branch where an existing
status.ExpiresAt is cleared: modify TestCheckExpiry_NoExpireAnnotation to seed
the ImageBuild's status.ExpiresAt with a non-nil timestamp before calling
expiryReconciler.checkExpiry (so the controller path that calls clearExpiresAt
is exercised), then after calling r.checkExpiry assert that the returned object
in the API has status.ExpiresAt == nil; repeat the same pattern for
TestCheckExpiry_TTLZeroDisablesExpiry to cover the ttl==0 branch that should
also call clearExpiresAt. Ensure you reference the ImageBuild under test (e.g.,
the "pinned-build" instance) and use the existing r.checkExpiry and r.Get
assertions to validate the ExpiresAt field is reset.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 9871d6c9-b2b7-43f0-b8ed-0bffac21f2ec

📥 Commits

Reviewing files that changed from the base of the PR and between d3b6481 and 19d95b5.

⛔ Files ignored due to path filters (2)
  • config/crd/bases/automotive.sdv.cloud.redhat.com_imagebuilds.yaml is excluded by !config/crd/bases/**
  • config/crd/bases/automotive.sdv.cloud.redhat.com_operatorconfigs.yaml is excluded by !config/crd/bases/**
📒 Files selected for processing (15)
  • api/v1alpha1/imagebuild_types.go
  • api/v1alpha1/operatorconfig_types.go
  • api/v1alpha1/zz_generated.deepcopy.go
  • cmd/caib/buildcmd/build.go
  • cmd/caib/buildcmd/build_disk_test.go
  • cmd/caib/image/image.go
  • cmd/caib/main.go
  • cmd/caib/runtime_wiring.go
  • docs/openapi.yaml
  • internal/buildapi/openapi.yaml
  • internal/buildapi/server.go
  • internal/buildapi/ttl_test.go
  • internal/buildapi/types.go
  • internal/controller/imagebuild/controller.go
  • internal/controller/imagebuild/expiry_test.go
✅ Files skipped from review due to trivial changes (2)
  • cmd/caib/main.go
  • internal/buildapi/types.go
🚧 Files skipped from review as they are similar to previous changes (3)
  • api/v1alpha1/zz_generated.deepcopy.go
  • cmd/caib/buildcmd/build_disk_test.go
  • cmd/caib/buildcmd/build.go

Comment thread internal/controller/imagebuild/controller.go Outdated
@bennyz bennyz force-pushed the expirable-builds branch 3 times, most recently from 4906efe to 09a75a9 Compare April 19, 2026 16:30

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (1)
internal/buildapi/server.go (1)

1598-1617: ⚠️ Potential issue | 🟠 Major

Enforce maxBuildTTL for no-expire requests and malformed policy.

Returning "0" before loading OperatorConfig lets users disable expiry even when maxBuildTTL is configured. Also, a malformed maxBuildTTL currently disables clamping silently because parseErr is ignored.

Proposed fix
-	if requestedTTL == "" || requestedTTL == "0" {
+	if requestedTTL == "" {
 		return requestedTTL, nil
 	}
-	dur, err := time.ParseDuration(requestedTTL)
-	if err != nil {
-		return "", fmt.Errorf("invalid TTL %q: %w", requestedTTL, err)
-	}
-	if dur < 0 {
-		return "", fmt.Errorf("TTL must not be negative")
+	requestedNoExpire := requestedTTL == "0"
+	var dur time.Duration
+	if !requestedNoExpire {
+		var err error
+		dur, err = time.ParseDuration(requestedTTL)
+		if err != nil {
+			return "", fmt.Errorf("invalid TTL %q: %w", requestedTTL, err)
+		}
+		if dur < 0 {
+			return "", fmt.Errorf("TTL must not be negative")
+		}
 	}
 	operatorCfg, cfgErr := loadOperatorConfigFn(ctx, k8sClient, namespace)
 	if cfgErr != nil && !k8serrors.IsNotFound(cfgErr) {
 		return "", fmt.Errorf("failed to load OperatorConfig: %w", cfgErr)
 	}
 	if operatorCfg != nil && operatorCfg.Spec.OSBuilds != nil {
 		if maxStr := operatorCfg.Spec.OSBuilds.GetMaxBuildTTL(); maxStr != "" && maxStr != "0" {
-			if maxDur, parseErr := time.ParseDuration(maxStr); parseErr == nil && dur > maxDur {
+			maxDur, parseErr := time.ParseDuration(maxStr)
+			if parseErr != nil {
+				return "", fmt.Errorf("invalid maxBuildTTL %q: %w", maxStr, parseErr)
+			}
+			if requestedNoExpire || dur > maxDur {
 				return maxStr, nil
 			}
 		}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/buildapi/server.go` around lines 1598 - 1617, The current logic
returns early for requestedTTL == "" or "0" and ignores parse errors for
operatorCfg.Spec.OSBuilds.GetMaxBuildTTL(), which allows bypassing maxBuildTTL
and silently disables clamping; change the flow so you always load operator
config (call loadOperatorConfigFn with ctx, k8sClient, namespace) before
deciding to return for a no-expire "0" or blank TTL, and enforce the operator
max when present: if operatorCfg.Spec.OSBuilds.GetMaxBuildTTL() yields a
non-empty/non-"0" maxStr, parse it (using time.ParseDuration) and if parsing
fails return an error (do not ignore parseErr), and if requestedTTL == "0" or
parsed dur > maxDur return maxStr; otherwise proceed with the parsed
requestedTTL logic and keep the existing negative-duration check—use the symbols
requestedTTL, loadOperatorConfigFn, operatorCfg.Spec.OSBuilds.GetMaxBuildTTL(),
time.ParseDuration to locate the code to change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/buildapi/server.go`:
- Around line 930-937: When cancelling a build you only flip the CR to
phaseCancelled (build.Status.Phase = phaseCancelled) but do not tear down upload
resources for builds that were in the Uploading state; add logic around the
cancellation block to detect if the build was in "Uploading" (check the previous
phase or build.Status/Spec fields that record the upload) and explicitly delete
the upload pod/related secrets before (or immediately after) updating status —
e.g., find the upload pod by the build's upload-related identifier (pod
name/labels stored on the build, or build.Spec.Upload fields) and call the k8s
client Delete (k8sClient.Delete(ctx, pod) or clientset equivalent);
alternatively, invoke the controller's cancelled-upload handler if one is
available instead of manual deletion. Ensure this runs in the same cancellation
path where you set phaseCancelled and call k8sClient.Status().Update so uploads
are torn down immediately.
- Around line 896-904: The cancellation check in the handler treats an empty
initial phase as non-cancellable, causing newly created ImageBuilds (created by
createBuild with no status) to return 409 until the controller reconciles;
update the switch on build.Status.Phase (currently comparing against
phasePending, phaseUploading, phaseBuilding) to also allow the zero/empty phase
("" or whatever the controller treats as initial) as cancellable so new builds
can be cancelled before the first reconcile; reference build.Status.Phase,
phasePending, phaseUploading, phaseBuilding and createBuild when locating the
logic to change.

In `@internal/controller/imagebuild/controller.go`:
- Around line 481-485: The TTL parsing currently returns
time.ParseDuration(ttlStr) and treats "0" specially but doesn't reject negative
durations; update the TTL parsing function that reads ttlStr to parse the
duration (using time.ParseDuration), then if the parsed duration is negative
return an error (same error semantics used by the API) so negative TTLs from
ImageBuild/OperatorConfig are rejected before being applied (i.e., check d < 0
and return a descriptive error instead of returning the negative duration).

---

Duplicate comments:
In `@internal/buildapi/server.go`:
- Around line 1598-1617: The current logic returns early for requestedTTL == ""
or "0" and ignores parse errors for operatorCfg.Spec.OSBuilds.GetMaxBuildTTL(),
which allows bypassing maxBuildTTL and silently disables clamping; change the
flow so you always load operator config (call loadOperatorConfigFn with ctx,
k8sClient, namespace) before deciding to return for a no-expire "0" or blank
TTL, and enforce the operator max when present: if
operatorCfg.Spec.OSBuilds.GetMaxBuildTTL() yields a non-empty/non-"0" maxStr,
parse it (using time.ParseDuration) and if parsing fails return an error (do not
ignore parseErr), and if requestedTTL == "0" or parsed dur > maxDur return
maxStr; otherwise proceed with the parsed requestedTTL logic and keep the
existing negative-duration check—use the symbols requestedTTL,
loadOperatorConfigFn, operatorCfg.Spec.OSBuilds.GetMaxBuildTTL(),
time.ParseDuration to locate the code to change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: b80bdf1f-679a-4a4a-8a5d-2e86b60e2d2b

📥 Commits

Reviewing files that changed from the base of the PR and between 19d95b5 and 4906efe.

⛔ Files ignored due to path filters (2)
  • config/crd/bases/automotive.sdv.cloud.redhat.com_imagebuilds.yaml is excluded by !config/crd/bases/**
  • config/crd/bases/automotive.sdv.cloud.redhat.com_operatorconfigs.yaml is excluded by !config/crd/bases/**
📒 Files selected for processing (19)
  • api/v1alpha1/imagebuild_types.go
  • api/v1alpha1/operatorconfig_types.go
  • api/v1alpha1/zz_generated.deepcopy.go
  • cmd/caib/buildcmd/build.go
  • cmd/caib/buildcmd/build_disk_test.go
  • cmd/caib/buildcmd/logs.go
  • cmd/caib/image/image.go
  • cmd/caib/main.go
  • cmd/caib/runtime_wiring.go
  • docs/openapi.yaml
  • internal/buildapi/client/client.go
  • internal/buildapi/container_builds.go
  • internal/buildapi/openapi.yaml
  • internal/buildapi/server.go
  • internal/buildapi/server_test.go
  • internal/buildapi/ttl_test.go
  • internal/buildapi/types.go
  • internal/controller/imagebuild/controller.go
  • internal/controller/imagebuild/expiry_test.go
✅ Files skipped from review due to trivial changes (2)
  • cmd/caib/main.go
  • internal/buildapi/openapi.yaml
🚧 Files skipped from review as they are similar to previous changes (5)
  • cmd/caib/buildcmd/build_disk_test.go
  • internal/buildapi/types.go
  • api/v1alpha1/zz_generated.deepcopy.go
  • internal/buildapi/ttl_test.go
  • api/v1alpha1/operatorconfig_types.go

Comment thread internal/buildapi/server.go
Comment thread internal/buildapi/server.go Outdated
Comment thread internal/controller/imagebuild/controller.go Outdated
@bennyz bennyz force-pushed the expirable-builds branch from 09a75a9 to 7b8d7b8 Compare April 19, 2026 18:17

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
internal/buildapi/server.go (1)

1614-1617: ⚠️ Potential issue | 🟠 Major

Fail closed on malformed maxBuildTTL.

Line 1615 silently ignores time.ParseDuration(maxStr) errors, so a bad OperatorConfig value disables the max-TTL policy for user requests. Return the parse error instead, and reject non-positive max values unless the literal "0" no-limit sentinel was used.

Proposed fix
 	if operatorCfg != nil && operatorCfg.Spec.OSBuilds != nil {
 		if maxStr := operatorCfg.Spec.OSBuilds.GetMaxBuildTTL(); maxStr != "" && maxStr != "0" {
-			if maxDur, parseErr := time.ParseDuration(maxStr); parseErr == nil && dur > maxDur {
+			maxDur, parseErr := time.ParseDuration(maxStr)
+			if parseErr != nil {
+				return "", fmt.Errorf("invalid maxBuildTTL %q: %w", maxStr, parseErr)
+			}
+			if maxDur <= 0 {
+				return "", fmt.Errorf("maxBuildTTL must be positive or \"0\"")
+			}
+			if dur > maxDur {
 				return maxStr, nil
 			}
 		}
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/buildapi/server.go` around lines 1614 - 1617, The code currently
swallows errors from time.ParseDuration when reading
operatorCfg.Spec.OSBuilds.GetMaxBuildTTL(); update the logic in the block that
checks GetMaxBuildTTL() so that parseErr is returned instead of ignored, and
validate the parsed duration (maxDur) is positive — treat non-positive durations
as an error unless the literal "0" sentinel was used to indicate no limit;
specifically modify the branch that calls time.ParseDuration(maxStr) (the
GetMaxBuildTTL check) to return parseErr on parse failure and to reject maxDur
<= 0 with an appropriate error.
🧹 Nitpick comments (4)
internal/controller/imagebuild/expiry_test.go (4)

53-70: Consider asserting on the returned ctrl.Result for expired paths.

TestCheckExpiry_ExpiredBuild and TestCheckExpiry_FailedBuildExpires discard the result value. Per checkExpiry (controller.go:401-460), the expired path returns an empty ctrl.Result{} — worth asserting result.RequeueAfter == 0 so a future regression that accidentally requeues a deleted build is caught.

Also applies to: 139-150

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/imagebuild/expiry_test.go` around lines 53 - 70, Update
the tests TestCheckExpiry_ExpiredBuild and TestCheckExpiry_FailedBuildExpires to
also assert the returned ctrl.Result from checkExpiry: capture the first return
value (result) and assert result.RequeueAfter == 0 (and/or result ==
ctrl.Result{}), referencing the checkExpiry call in the tests so they explicitly
verify the expired path returns an empty/no-requeue result; keep the existing
expired boolean and deletion assertions intact.

125-137: Also assert the in-progress build is actually deleted.

This test verifies the expired flag but not that Delete actually removed the object, unlike TestCheckExpiry_ExpiredBuild. Adding a follow-up r.Get(...) and expecting NotFound would close the loop and protect against a regression where the in-progress branch resolves expiry but skips deletion.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/imagebuild/expiry_test.go` around lines 125 - 137, After
asserting expired in TestCheckExpiry_InProgressUsesCreationTimestamp (which
calls r.checkExpiry), add a follow-up retrieval using the reconciler's client
(r.Get) for the same ImageBuild object and assert that the call returns a
NotFound error (use apierrors.IsNotFound) to verify the object was actually
deleted; this mirrors TestCheckExpiry_ExpiredBuild and prevents regression where
expiry is reported but Delete is skipped.

31-51: Confusing parameter semantics in newTestImageBuild.

completedAgo is used to compute both CreationTimestamp (now - completedAgo - 1h) and CompletionTime (now - completedAgo), which couples two unrelated anchors and makes the test fixtures hard to reason about. It also makes the phaseBuilding case in TestCheckExpiry_InProgressUsesCreationTimestamp (Line 127) immediately overwrite CreationTimestamp, indicating the helper doesn't fit that case well.

Consider splitting into two helpers (e.g. newCompletedBuild(name, ttl, completedAgo) and newInProgressBuild(name, ttl, createdAgo)) or accepting both anchors explicitly.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/imagebuild/expiry_test.go` around lines 31 - 51, The
helper newTestImageBuild conflates creation and completion anchors by using
completedAgo for both CreationTimestamp and CompletionTime, making fixtures
confusing and unsuitable for in-progress cases; split it into two focused
helpers—e.g. newCompletedBuild(name, ttl, completedAgo) that sets
CreationTimestamp relative to completedAgo minus a small delta and sets
CompletionTime to now - completedAgo, and newInProgressBuild(name, ttl,
createdAgo) that sets CreationTimestamp to now - createdAgo and leaves
CompletionTime nil (or accept both createdAgo and completedAgo explicitly if
needed); update tests such as TestCheckExpiry_InProgressUsesCreationTimestamp to
use the new helpers (replace usages of newTestImageBuild) so creation and
completion times are set independently.

178-295: Optional: deduplicate the four resolveEffectiveTTL tests.

The four TestResolveEffectiveTTL_* cases share nearly identical scaffolding (scheme + reconciler + optional OperatorConfig); a small table-driven test would cut ~80 lines and make the precedence rules (spec → OperatorConfig → hardcoded → "0" disables) easier to read at a glance.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/imagebuild/expiry_test.go` around lines 178 - 295, The
tests TestResolveEffectiveTTL_SpecOverride,
TestResolveEffectiveTTL_OperatorConfigDefault,
TestResolveEffectiveTTL_HardcodedFallback, and
TestResolveEffectiveTTL_ZeroDisables/OperatorConfigZeroDisables duplicate
scaffolding; refactor them into a single table-driven test that iterates cases
describing input IB TTL, optional OperatorConfig.DefaultBuildTTL, and expected
time.Duration, calling r.resolveEffectiveTTL for each. Reuse
newTestSchemeWithTekton, newTestImageBuild and ImageBuildReconciler setup in a
loop (create the fake client with or without the OperatorConfig per case) and
assert ttl/err per table row to capture precedence (spec → OperatorConfig →
hardcoded → "0" disables) while removing the four near-identical test functions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@internal/buildapi/server.go`:
- Around line 1614-1617: The code currently swallows errors from
time.ParseDuration when reading operatorCfg.Spec.OSBuilds.GetMaxBuildTTL();
update the logic in the block that checks GetMaxBuildTTL() so that parseErr is
returned instead of ignored, and validate the parsed duration (maxDur) is
positive — treat non-positive durations as an error unless the literal "0"
sentinel was used to indicate no limit; specifically modify the branch that
calls time.ParseDuration(maxStr) (the GetMaxBuildTTL check) to return parseErr
on parse failure and to reject maxDur <= 0 with an appropriate error.

---

Nitpick comments:
In `@internal/controller/imagebuild/expiry_test.go`:
- Around line 53-70: Update the tests TestCheckExpiry_ExpiredBuild and
TestCheckExpiry_FailedBuildExpires to also assert the returned ctrl.Result from
checkExpiry: capture the first return value (result) and assert
result.RequeueAfter == 0 (and/or result == ctrl.Result{}), referencing the
checkExpiry call in the tests so they explicitly verify the expired path returns
an empty/no-requeue result; keep the existing expired boolean and deletion
assertions intact.
- Around line 125-137: After asserting expired in
TestCheckExpiry_InProgressUsesCreationTimestamp (which calls r.checkExpiry), add
a follow-up retrieval using the reconciler's client (r.Get) for the same
ImageBuild object and assert that the call returns a NotFound error (use
apierrors.IsNotFound) to verify the object was actually deleted; this mirrors
TestCheckExpiry_ExpiredBuild and prevents regression where expiry is reported
but Delete is skipped.
- Around line 31-51: The helper newTestImageBuild conflates creation and
completion anchors by using completedAgo for both CreationTimestamp and
CompletionTime, making fixtures confusing and unsuitable for in-progress cases;
split it into two focused helpers—e.g. newCompletedBuild(name, ttl,
completedAgo) that sets CreationTimestamp relative to completedAgo minus a small
delta and sets CompletionTime to now - completedAgo, and
newInProgressBuild(name, ttl, createdAgo) that sets CreationTimestamp to now -
createdAgo and leaves CompletionTime nil (or accept both createdAgo and
completedAgo explicitly if needed); update tests such as
TestCheckExpiry_InProgressUsesCreationTimestamp to use the new helpers (replace
usages of newTestImageBuild) so creation and completion times are set
independently.
- Around line 178-295: The tests TestResolveEffectiveTTL_SpecOverride,
TestResolveEffectiveTTL_OperatorConfigDefault,
TestResolveEffectiveTTL_HardcodedFallback, and
TestResolveEffectiveTTL_ZeroDisables/OperatorConfigZeroDisables duplicate
scaffolding; refactor them into a single table-driven test that iterates cases
describing input IB TTL, optional OperatorConfig.DefaultBuildTTL, and expected
time.Duration, calling r.resolveEffectiveTTL for each. Reuse
newTestSchemeWithTekton, newTestImageBuild and ImageBuildReconciler setup in a
loop (create the fake client with or without the OperatorConfig per case) and
assert ttl/err per table row to capture precedence (spec → OperatorConfig →
hardcoded → "0" disables) while removing the four near-identical test functions.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 41cb26f6-68f6-42a8-b076-1937e74aba07

📥 Commits

Reviewing files that changed from the base of the PR and between 4906efe and 7b8d7b8.

⛔ Files ignored due to path filters (2)
  • config/crd/bases/automotive.sdv.cloud.redhat.com_imagebuilds.yaml is excluded by !config/crd/bases/**
  • config/crd/bases/automotive.sdv.cloud.redhat.com_operatorconfigs.yaml is excluded by !config/crd/bases/**
📒 Files selected for processing (15)
  • api/v1alpha1/imagebuild_types.go
  • api/v1alpha1/operatorconfig_types.go
  • api/v1alpha1/zz_generated.deepcopy.go
  • cmd/caib/buildcmd/build.go
  • cmd/caib/buildcmd/build_disk_test.go
  • cmd/caib/image/image.go
  • cmd/caib/main.go
  • cmd/caib/runtime_wiring.go
  • docs/openapi.yaml
  • internal/buildapi/openapi.yaml
  • internal/buildapi/server.go
  • internal/buildapi/ttl_test.go
  • internal/buildapi/types.go
  • internal/controller/imagebuild/controller.go
  • internal/controller/imagebuild/expiry_test.go
✅ Files skipped from review due to trivial changes (2)
  • cmd/caib/main.go
  • internal/buildapi/ttl_test.go
🚧 Files skipped from review as they are similar to previous changes (6)
  • docs/openapi.yaml
  • cmd/caib/buildcmd/build_disk_test.go
  • internal/buildapi/openapi.yaml
  • cmd/caib/buildcmd/build.go
  • api/v1alpha1/operatorconfig_types.go
  • internal/controller/imagebuild/controller.go

@bennyz bennyz force-pushed the expirable-builds branch from 7b8d7b8 to 4ddd68b Compare April 20, 2026 06:51

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
internal/buildapi/server.go (1)

2101-2136: ⚠️ Potential issue | 🟠 Major

Validate TTL before starting extra-repo side effects.

resolveExtraRepos can start HTTP servers in workspace pods before lines 2132-2136 reject an invalid or over-limit TTL. Move TTL validation ahead of resolveExtraRepos so bad requests fail without mutating cluster/workspace state.

Proposed refactor
 	if err := applyBuildDefaults(&req); err != nil {
 		c.JSON(http.StatusBadRequest, gin.H{"error": err.Error()})
 		return
 	}
 
+	k8sClient, err := getClientFromRequest(c)
+	if err != nil {
+		c.JSON(http.StatusInternalServerError, gin.H{"error": fmt.Sprintf("k8s client error: %v", err)})
+		return
+	}
+
+	ctx := c.Request.Context()
+	namespace := resolveNamespace()
+	requestedBy := a.resolveRequester(c)
+
+	effectiveTTL, ttlErr := resolveAndClampTTL(ctx, k8sClient, namespace, req.TTL)
+	if ttlErr != nil {
+		c.JSON(http.StatusBadRequest, gin.H{"error": ttlErr.Error()})
+		return
+	}
+
 	// Resolve --extra-repo workspace:path pairs into extra_repos custom defines
 	if len(req.ExtraRepos) > 0 {
-		k8sClientForRepos, err := getClientFromRequest(c)
-		if err != nil {
-			c.JSON(http.StatusInternalServerError, gin.H{"error": "failed to create kubernetes client"})
-			return
-		}
 		restCfgForRepos, err := getRESTConfigFromRequest(c)
 		if err != nil {
 			c.JSON(http.StatusInternalServerError, gin.H{"error": "failed to get kubernetes config"})
 			return
 		}
-		if err := a.resolveExtraRepos(c.Request.Context(), k8sClientForRepos, restCfgForRepos, &req); err != nil {
+		if err := a.resolveExtraRepos(ctx, k8sClient, restCfgForRepos, &req); err != nil {
 			c.JSON(http.StatusBadRequest, gin.H{"error": err.Error()})
 			return
 		}
 	}
 
 	// Append a short random suffix to ensure unique names for parallel builds
 	req.Name = fmt.Sprintf("%s-%s", req.Name, uuid.New().String()[:5])
-
-	k8sClient, err := getClientFromRequest(c)
-	if err != nil {
-		c.JSON(http.StatusInternalServerError, gin.H{"error": fmt.Sprintf("k8s client error: %v", err)})
-		return
-	}
-
-	ctx := c.Request.Context()
-	namespace := resolveNamespace()
-	requestedBy := a.resolveRequester(c)
-
-	effectiveTTL, ttlErr := resolveAndClampTTL(ctx, k8sClient, namespace, req.TTL)
-	if ttlErr != nil {
-		c.JSON(http.StatusBadRequest, gin.H{"error": ttlErr.Error()})
-		return
-	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/buildapi/server.go` around lines 2101 - 2136, The TTL is validated
too late — move the resolveAndClampTTL call (and its error handling) to occur
before calling a.resolveExtraRepos so invalid/over-limit TTLs fail without
mutating cluster state; obtain the k8s client/REST config once (reuse
getClientFromRequest/getRESTConfigFromRequest results) or get a lightweight
client for TTL validation if possible, then run effectiveTTL, ttlErr :=
resolveAndClampTTL(...) and return on ttlErr before invoking
a.resolveExtraRepos(c.Request.Context(), k8sClientForRepos, restCfgForRepos,
&req); keep the UUID suffix and downstream logic unchanged.
🧹 Nitpick comments (1)
api/v1alpha1/zz_generated.deepcopy.go (1)

676-679: Keep generated deepcopy artifacts out of the PR if this repo policy still applies.

The ExpiresAt pointer is deep-copied correctly, but this generated file should be omitted if the source type changes are regenerated during deploy/build as documented for this repo. Based on learnings, Generated files that are not committed should be ignored in PRs. Specifically, do not include api/v1alpha1/zz_generated.deepcopy.go.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/v1alpha1/zz_generated.deepcopy.go` around lines 676 - 679, This
autogenerated deepcopy file (api/v1alpha1/zz_generated.deepcopy.go) containing
DeepCopy logic for types like the ExpiresAt pointer was committed but must be
omitted per repo policy; remove zz_generated.deepcopy.go from the PR (revert the
commit that added/modified it), add the generated path
(api/v1alpha1/zz_generated.deepcopy.go) to the repo ignore rules (e.g.,
.gitignore) or the project’s generated-files exclusion list, and ensure the
canonical codegen step (the DeepCopy generator used by your build or
make/generate script) is documented/run during CI/build so DeepCopy methods are
regenerated rather than checked in.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/buildapi/server.go`:
- Around line 1597-1628: resolveAndClampTTL currently returns early when
requestedTTL == "0", allowing no-expiry even if
OperatorConfig.spec.osBuilds.maxBuildTTL is set; change the logic in
resolveAndClampTTL to defer handling of the "0" (no-expiry) case until after
loading OperatorConfig via loadOperatorConfigFn, and if
operatorCfg.Spec.OSBuilds.GetMaxBuildTTL() yields a non-empty, non-"0" value
then treat requestedTTL "0" as disallowed (return an error stating no-expiry
exceeds the configured max); keep the existing duration parsing/validation for
non-zero values and retain the existing error messages for invalid or negative
durations.

In `@internal/controller/imagebuild/controller.go`:
- Around line 467-496: The current logic trusts imageBuild.Spec.GetTTL() without
applying the operator's MaxBuildTTL; load the OperatorConfig (same way as the
existing branch that handles empty ttlStr) even when ttlStr != "" to obtain
operatorConfig.Spec.OSBuilds.GetMaxBuildTTL()/MaxBuildTTL and enforce it: parse
ttlStr into ttl, parse operator max (if set) into maxTTL, then if ttl > maxTTL
set ttl = maxTTL (or return an error if your project's policy prefers); preserve
existing handling for "0" (return 0) and negative values (error), and reuse the
same variables/operatorConfig lookup used for DefaultBuildTTL to avoid
duplication.

---

Outside diff comments:
In `@internal/buildapi/server.go`:
- Around line 2101-2136: The TTL is validated too late — move the
resolveAndClampTTL call (and its error handling) to occur before calling
a.resolveExtraRepos so invalid/over-limit TTLs fail without mutating cluster
state; obtain the k8s client/REST config once (reuse
getClientFromRequest/getRESTConfigFromRequest results) or get a lightweight
client for TTL validation if possible, then run effectiveTTL, ttlErr :=
resolveAndClampTTL(...) and return on ttlErr before invoking
a.resolveExtraRepos(c.Request.Context(), k8sClientForRepos, restCfgForRepos,
&req); keep the UUID suffix and downstream logic unchanged.

---

Nitpick comments:
In `@api/v1alpha1/zz_generated.deepcopy.go`:
- Around line 676-679: This autogenerated deepcopy file
(api/v1alpha1/zz_generated.deepcopy.go) containing DeepCopy logic for types like
the ExpiresAt pointer was committed but must be omitted per repo policy; remove
zz_generated.deepcopy.go from the PR (revert the commit that added/modified it),
add the generated path (api/v1alpha1/zz_generated.deepcopy.go) to the repo
ignore rules (e.g., .gitignore) or the project’s generated-files exclusion list,
and ensure the canonical codegen step (the DeepCopy generator used by your build
or make/generate script) is documented/run during CI/build so DeepCopy methods
are regenerated rather than checked in.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: dc090379-5008-4b18-9ddb-38ca24786a66

📥 Commits

Reviewing files that changed from the base of the PR and between 7b8d7b8 and 4ddd68b.

⛔ Files ignored due to path filters (2)
  • config/crd/bases/automotive.sdv.cloud.redhat.com_imagebuilds.yaml is excluded by !config/crd/bases/**
  • config/crd/bases/automotive.sdv.cloud.redhat.com_operatorconfigs.yaml is excluded by !config/crd/bases/**
📒 Files selected for processing (15)
  • api/v1alpha1/imagebuild_types.go
  • api/v1alpha1/operatorconfig_types.go
  • api/v1alpha1/zz_generated.deepcopy.go
  • cmd/caib/buildcmd/build.go
  • cmd/caib/buildcmd/build_disk_test.go
  • cmd/caib/image/image.go
  • cmd/caib/main.go
  • cmd/caib/runtime_wiring.go
  • docs/openapi.yaml
  • internal/buildapi/openapi.yaml
  • internal/buildapi/server.go
  • internal/buildapi/ttl_test.go
  • internal/buildapi/types.go
  • internal/controller/imagebuild/controller.go
  • internal/controller/imagebuild/expiry_test.go
✅ Files skipped from review due to trivial changes (6)
  • cmd/caib/buildcmd/build_disk_test.go
  • cmd/caib/main.go
  • internal/buildapi/openapi.yaml
  • docs/openapi.yaml
  • api/v1alpha1/imagebuild_types.go
  • internal/controller/imagebuild/expiry_test.go

Comment thread internal/buildapi/server.go Outdated
Comment thread internal/controller/imagebuild/controller.go
@bennyz bennyz force-pushed the expirable-builds branch 2 times, most recently from 1ca7117 to 1a75a8b Compare April 20, 2026 07:16
@bennyz bennyz requested a review from bkhizgiy April 20, 2026 09:53
Comment thread internal/controller/imagebuild/controller.go
Builds will be expired by default, removing their ImageBuild and clearing up resources

Signed-off-by: Benny Zlotnik <bzlotnik@redhat.com>
Assisted-by: claude-opus-4.6
@bennyz bennyz force-pushed the expirable-builds branch from 1a75a8b to 13902aa Compare April 20, 2026 12:44
@bennyz bennyz merged commit d76efa5 into centos-automotive-suite:main Apr 20, 2026
4 checks passed
@bennyz bennyz deleted the expirable-builds branch April 20, 2026 15:39
@coderabbitai coderabbitai Bot mentioned this pull request Jun 11, 2026
10 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants