Skip to content

add auto-stop for idle workspaces#181

Merged
bennyz merged 1 commit into
centos-automotive-suite:mainfrom
bennyz:auto-pause
Mar 25, 2026
Merged

add auto-stop for idle workspaces#181
bennyz merged 1 commit into
centos-automotive-suite:mainfrom
bennyz:auto-pause

Conversation

@bennyz

@bennyz bennyz commented Mar 25, 2026

Copy link
Copy Markdown
Contributor

Summary by CodeRabbit

  • New Features

    • Workspace auto-pause with configurable idle timeout (default 30 minutes)
    • Per-workspace override (including explicit “disabled” option) and CLI flag to set timeout on create
    • Activity tracking (last active time), shown in workspace commands
    • Automatic detection and pausing of idle running workspaces
  • Bug Fixes

    • Improved workspace status handling and deepcopy to avoid shared-pointer/status copy issues

@coderabbitai

coderabbitai Bot commented Mar 25, 2026

Copy link
Copy Markdown
📝 Walkthrough

Walkthrough

Adds per-workspace and global auto-pause configuration, workspace activity tracking, CLI and API surface changes, and controller logic that probes running workspace pods (via pods/exec) to update activity timestamps and auto-pause idle workspaces.

Changes

Cohort / File(s) Summary
CRD & operator config
api/v1alpha1/operatorconfig_types.go, api/v1alpha1/workspace_types.go
Added DefaultAutoPauseTimeoutMinutes constant and WorkspacesConfig.AutoPauseTimeoutMinutes with getter; added WorkspaceSpec.AutoPauseTimeoutMinutes (*int32) and WorkspaceStatus.LastActivityTime (*metav1.Time).
Generated deepcopy
api/v1alpha1/zz_generated.deepcopy.go
Updated deepcopy logic to deep-copy new pointer/time fields and Workspace status correctly.
CLI & bootstrap
cmd/caib/workspace/workspace.go, cmd/main.go
Added --auto-pause-timeout flag to workspace create; workspace show prints auto-pause and last-activity; reconciler initialized with RestConfig in manager bootstrap.
Build API / HTTP layer
internal/buildapi/workspace.go
Extended WorkspaceRequest/WorkspaceResponse with auto-pause and last-activity fields; formatting logic for display; added touchWorkspaceActivity to update status.lastActivityTime; invoked from sync/exec/shell/deploy flows.
Auto-pause controller
internal/controller/workspace/controller.go, internal/controller/workspace/controller_test.go
Added RestConfig and cached clientset; implemented auto-pause flow: effective timeout resolution, periodic checks, in-pod activity probe via pods/exec, status patching, and set Spec.Stopped=true when idle; updated RBAC for pods/exec; test constant updated for running phase.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Controller
    participant K8sAPI as K8s API
    participant Pod

    User->>K8sAPI: Create/Update Workspace (autoPauseTimeoutMinutes)
    K8sAPI-->>Controller: Workspace event

    Controller->>K8sAPI: Get Workspace resource
    K8sAPI-->>Controller: WorkspaceSpec + Status

    rect rgba(100,150,200,0.5)
        Note over Controller: Auto-pause check (only when Running)
        Controller->>Controller: Compute effective timeout
        Controller->>Controller: Calculate idle duration (now - lastActivityTime)
    end

    alt Idle duration < timeout
        Controller->>K8sAPI: Exec request (pods/exec) -> probe activity
        K8sAPI->>Pod: Exec probe
        Pod-->>K8sAPI: Probe result
        K8sAPI-->>Controller: Activity detected
        Controller->>K8sAPI: Patch Status.LastActivityTime
        Controller->>Controller: Requeue for next check
    else Idle duration >= timeout
        Controller->>K8sAPI: Patch Spec.Stopped = true
        K8sAPI-->>K8sAPI: Workspace transitions to Stopped
        Controller->>K8sAPI: Update Status message (auto-paused)
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

  • initial workspace #147: Introduced the workspace CRD and operator config types that this PR extends with auto-pause fields and controller behavior.

Suggested reviewers

  • bkhizgiy

Poem

🐰 I peek inside pods where terminals hum,
I tally the ticks until keyboards are numb,
If silence lingers and no processes play,
I fold the workspace and tuck it away,
A soft little pause — nap time, hip-hip hooray.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 10.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title accurately describes the main feature: adding automatic stopping (pausing) of idle workspaces, which is the core functionality across all changes.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
api/v1alpha1/zz_generated.deepcopy.go (1)

1-28: ⚠️ Potential issue | 🟡 Minor

Generated file should not be committed.

Based on learnings: api/v1alpha1/zz_generated.deepcopy.go is explicitly listed in .gitignore and regenerated at deploy/build time via make generate manifests. This file should be removed from the PR to follow repository conventions.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/v1alpha1/zz_generated.deepcopy.go` around lines 1 - 28, Remove the
generated file zz_generated.deepcopy.go from this PR/branch (do not commit
generated artifacts); delete the file (so it no longer appears in the commit)
and rely on the repo’s generation step (make generate manifests) to recreate it
during build/deploy; confirm package v1alpha1 files are untouched aside from
removing this generated file and do not alter .gitignore which already lists
zz_generated.deepcopy.go.
🧹 Nitpick comments (1)
internal/controller/workspace/controller.go (1)

443-450: Two-patch approach may cause inconsistent state on partial failure.

The auto-pause logic patches Spec.Stopped = true first, then separately calls setStatus to update the status. If the spec patch succeeds but setStatus fails, the workspace is stopped but the status message/phase is stale. The next reconcile will fix this, but consider combining into an atomic operation or handling the error more explicitly.

Additionally, the setStatus call uses the in-memory ws object which still has the old resourceVersion from before the spec patch. This could fail with a conflict error if another actor (like touchWorkspaceActivity) modified the resource.

Consider re-fetching after spec patch
 		specPatch := client.MergeFrom(ws.DeepCopy())
 		ws.Spec.Stopped = true
 		if err := r.Patch(ctx, ws, specPatch); err != nil {
 			return ctrl.Result{}, fmt.Errorf("failed to auto-pause workspace: %w", err)
 		}

+		// Re-fetch to get updated resourceVersion for status patch
+		if err := r.Get(ctx, client.ObjectKeyFromObject(ws), ws); err != nil {
+			return ctrl.Result{}, err
+		}
+
 		msg := fmt.Sprintf("Auto-paused after %s of inactivity", idleDuration.Truncate(time.Minute))
 		return ctrl.Result{}, r.setStatus(ctx, ws, "Stopped", msg)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/workspace/controller.go` around lines 443 - 450, The
current two-step auto-pause can leave Spec updated but Status stale or conflict
because ws still holds the old resourceVersion; after successfully calling
r.Patch(ctx, ws, specPatch) (the specPatch/ r.Patch/ ws code path), re-fetch the
latest workspace into ws (e.g., r.Get(ctx, namespacedName, ws) or load a fresh
copy) to refresh resourceVersion and then call r.setStatus (or r.Status().Patch)
using a new patch built from that fresh copy; also handle setStatus conflicts by
retrying (or requeueing) and consider reverting the spec change or requeuing on
persistent failures so state stays consistent (references: specPatch, r.Patch,
setStatus, touchWorkspaceActivity, resourceVersion).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@api/v1alpha1/operatorconfig_types.go`:
- Around line 418-425: The accessor GetAutoPauseTimeoutMinutes currently treats
any non-positive AutoPauseTimeoutMinutes as missing and returns
DefaultAutoPauseTimeoutMinutes, which prevents an explicit 0 from disabling
auto-pause; update GetAutoPauseTimeoutMinutes (in WorkspacesConfig) to first
check for a non-nil receiver and return c.AutoPauseTimeoutMinutes when it is
explicitly set (including 0 to indicate disabled), and only return
DefaultAutoPauseTimeoutMinutes when the config is nil or the field is unset
(e.g., negative/absent) so that 0 properly disables auto-pause cluster-wide.

In `@internal/buildapi/workspace.go`:
- Around line 466-479: touchWorkspaceActivity currently patches
ws.Status.LastActivityTime using the original ws which can race with the
auto-pause controller; change the function to re-fetch the latest Workspace from
the API (using k8sClient.Get) immediately before creating the MergeFrom patch,
check that the fresh object's Spec.Stopped is false (abort if true), update the
fresh object's Status.LastActivityTime and then call k8sClient.Status().Patch
with the MergeFrom of that fresh copy, and handle/ignore conflict errors as
best-effort (do not blindly patch the stale ws or overwrite stopped state).

---

Outside diff comments:
In `@api/v1alpha1/zz_generated.deepcopy.go`:
- Around line 1-28: Remove the generated file zz_generated.deepcopy.go from this
PR/branch (do not commit generated artifacts); delete the file (so it no longer
appears in the commit) and rely on the repo’s generation step (make generate
manifests) to recreate it during build/deploy; confirm package v1alpha1 files
are untouched aside from removing this generated file and do not alter
.gitignore which already lists zz_generated.deepcopy.go.

---

Nitpick comments:
In `@internal/controller/workspace/controller.go`:
- Around line 443-450: The current two-step auto-pause can leave Spec updated
but Status stale or conflict because ws still holds the old resourceVersion;
after successfully calling r.Patch(ctx, ws, specPatch) (the specPatch/ r.Patch/
ws code path), re-fetch the latest workspace into ws (e.g., r.Get(ctx,
namespacedName, ws) or load a fresh copy) to refresh resourceVersion and then
call r.setStatus (or r.Status().Patch) using a new patch built from that fresh
copy; also handle setStatus conflicts by retrying (or requeueing) and consider
reverting the spec change or requeuing on persistent failures so state stays
consistent (references: specPatch, r.Patch, setStatus, touchWorkspaceActivity,
resourceVersion).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 263a22a6-4145-49d0-a7e6-b4dcace49fd6

📥 Commits

Reviewing files that changed from the base of the PR and between a20f06b and 18c6398.

⛔ Files ignored due to path filters (2)
  • config/crd/bases/automotive.sdv.cloud.redhat.com_operatorconfigs.yaml is excluded by !config/crd/bases/**
  • config/crd/bases/automotive.sdv.cloud.redhat.com_workspaces.yaml is excluded by !config/crd/bases/**
📒 Files selected for processing (8)
  • api/v1alpha1/operatorconfig_types.go
  • api/v1alpha1/workspace_types.go
  • api/v1alpha1/zz_generated.deepcopy.go
  • cmd/caib/workspace/workspace.go
  • cmd/main.go
  • internal/buildapi/workspace.go
  • internal/controller/workspace/controller.go
  • internal/controller/workspace/controller_test.go

Comment on lines +418 to +425
// GetAutoPauseTimeoutMinutes returns the global auto-pause timeout in minutes.
// Returns DefaultAutoPauseTimeoutMinutes (30) when not configured.
func (c *WorkspacesConfig) GetAutoPauseTimeoutMinutes() int32 {
if c != nil && c.AutoPauseTimeoutMinutes > 0 {
return c.AutoPauseTimeoutMinutes
}
return DefaultAutoPauseTimeoutMinutes
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Potential semantic inconsistency: 0 should disable auto-pause, not fall back to default.

The workspace spec documents that 0 = disable auto-pause, but this accessor returns DefaultAutoPauseTimeoutMinutes when the value is <= 0. This means setting WorkspacesConfig.AutoPauseTimeoutMinutes = 0 at the cluster level won't actually disable auto-pause globally.

If 0 should disable auto-pause cluster-wide (consistent with per-workspace behavior), consider:

Suggested fix
 func (c *WorkspacesConfig) GetAutoPauseTimeoutMinutes() int32 {
-	if c != nil && c.AutoPauseTimeoutMinutes > 0 {
+	if c != nil && c.AutoPauseTimeoutMinutes != 0 {
 		return c.AutoPauseTimeoutMinutes
 	}
 	return DefaultAutoPauseTimeoutMinutes
 }

Alternatively, if 0 at the cluster level should not disable but only per-workspace 0 should disable, document this distinction clearly.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/v1alpha1/operatorconfig_types.go` around lines 418 - 425, The accessor
GetAutoPauseTimeoutMinutes currently treats any non-positive
AutoPauseTimeoutMinutes as missing and returns DefaultAutoPauseTimeoutMinutes,
which prevents an explicit 0 from disabling auto-pause; update
GetAutoPauseTimeoutMinutes (in WorkspacesConfig) to first check for a non-nil
receiver and return c.AutoPauseTimeoutMinutes when it is explicitly set
(including 0 to indicate disabled), and only return
DefaultAutoPauseTimeoutMinutes when the config is nil or the field is unset
(e.g., negative/absent) so that 0 properly disables auto-pause cluster-wide.

Comment on lines +466 to +479
// touchWorkspaceActivity updates LastActivityTime on the workspace status.
// Called from handlers that represent actual workspace usage (exec, shell, sync, deploy)
// so the auto-pause controller knows the workspace is in use.
func (a *APIServer) touchWorkspaceActivity(c *gin.Context, ws *automotivev1alpha1.Workspace) {
k8sClient, err := getClientFromRequest(c)
if err != nil {
return // best-effort, don't fail the operation
}

now := metav1.Now()
patch := client.MergeFrom(ws.DeepCopy())
ws.Status.LastActivityTime = &now
_ = k8sClient.Status().Patch(c.Request.Context(), ws, patch)
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Race condition between activity touch and auto-pause controller.

This function patches LastActivityTime using a stale ws object (fetched at handler entry). If called after the controller decides to auto-pause but before setStatus clears LastActivityTime, this patch may:

  1. Fail with a conflict error (silently ignored)
  2. Succeed and set activity time on a workspace that's already being stopped

While the best-effort approach is reasonable for activity tracking, consider re-fetching the workspace before patching, or at minimum checking that ws.Spec.Stopped is still false:

Suggested improvement
 func (a *APIServer) touchWorkspaceActivity(c *gin.Context, ws *automotivev1alpha1.Workspace) {
+	// Don't touch activity on workspaces being stopped
+	if ws.Spec.Stopped {
+		return
+	}
 	k8sClient, err := getClientFromRequest(c)
 	if err != nil {
 		return // best-effort, don't fail the operation
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/buildapi/workspace.go` around lines 466 - 479,
touchWorkspaceActivity currently patches ws.Status.LastActivityTime using the
original ws which can race with the auto-pause controller; change the function
to re-fetch the latest Workspace from the API (using k8sClient.Get) immediately
before creating the MergeFrom patch, check that the fresh object's Spec.Stopped
is false (abort if true), update the fresh object's Status.LastActivityTime and
then call k8sClient.Status().Patch with the MergeFrom of that fresh copy, and
handle/ignore conflict errors as best-effort (do not blindly patch the stale ws
or overwrite stopped state).

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
internal/controller/workspace/controller.go (1)

366-375: ⚠️ Potential issue | 🟡 Minor

Don't let the no-op fast path skip the new activity reset.

LastActivityTime is not part of the equality guard. If it gets re-populated after a workspace is already Stopped, Pending, or Creating, later setStatus calls return early and never clear it.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/workspace/controller.go` around lines 366 - 375, The
early-return equality check (comparing ws.Status.Phase, ws.Status.Message and
ws.Status.PodName) can skip clearing ws.Status.LastActivityTime; move the
LastActivityTime reset logic so it executes before the no-op fast path or
include LastActivityTime in the equality comparison in the setStatus block (the
code touching ws.Status.Phase, ws.Status.Message, ws.Status.PodName and
ws.Status.LastActivityTime). Concretely, ensure that when phase is
"Stopped"|"Pending"|"Creating" you set ws.Status.LastActivityTime = nil prior to
returning early in the no-op path (or update the equality guard to also compare
LastActivityTime) so subsequent calls cannot leave LastActivityTime incorrectly
populated.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/buildapi/workspace.go`:
- Around line 943-950: The code currently prints the compiled
DefaultAutoPauseTimeoutMinutes whenever ws.Spec.AutoPauseTimeoutMinutes is nil,
but nil means "inherit from OperatorConfig"; update the nil branch to read the
operator-level setting instead: fetch the OperatorConfig's
AutoPauseTimeoutMinutes (e.g. via your reconciler/helper that exposes
operatorConfig or a getOperatorConfig call), then if that operator value is nil
fall back to automotivev1alpha1.DefaultAutoPauseTimeoutMinutes, if the operator
value is 0 print "disabled", otherwise print the operator value with "%dm". Use
the same semantics for 0/ptr vs nil as with ws.Spec.AutoPauseTimeoutMinutes
(reference symbols: ws.Spec.AutoPauseTimeoutMinutes,
automotivev1alpha1.DefaultAutoPauseTimeoutMinutes, and
OperatorConfig.AutoPauseTimeoutMinutes or your reconciler method that returns
it).

In `@internal/controller/workspace/controller.go`:
- Around line 390-397: The current lookup of OperatorConfig (r.Get(..., oc))
treats any error as “not configured” and returns DefaultAutoPauseTimeoutMinutes;
change the logic to only fall back to the default when errors.IsNotFound(err) is
true, and for any other error propagate the error back to the caller so the
reconcile loop can retry; specifically, update the block that calls r.Get(ctx,
client.ObjectKey{Name: "config", Namespace: ws.Namespace}, oc) to check
errors.IsNotFound(err) and return the default only in that case, and otherwise
return/forward the Get error instead of silently using
DefaultAutoPauseTimeoutMinutes (preserve use of
oc.Spec.Workspaces.GetAutoPauseTimeoutMinutes() when oc is found).
- Around line 38-45: The clientset initialization error is currently stored in a
local initErr inside getClientset(), so if the once initializer fails subsequent
calls return (nil, nil) and cause panics; add a new Reconciler field (e.g.,
clientsetInitErr error), set that field from within the sync.Once initializer in
getClientset(), and change getClientset() to return (r.clientset,
r.clientsetInitErr); update any callers (like isWorkspaceActive()) to handle the
returned error. This ensures the initialization error is persisted across calls
and prevents nil clientset usage.

---

Outside diff comments:
In `@internal/controller/workspace/controller.go`:
- Around line 366-375: The early-return equality check (comparing
ws.Status.Phase, ws.Status.Message and ws.Status.PodName) can skip clearing
ws.Status.LastActivityTime; move the LastActivityTime reset logic so it executes
before the no-op fast path or include LastActivityTime in the equality
comparison in the setStatus block (the code touching ws.Status.Phase,
ws.Status.Message, ws.Status.PodName and ws.Status.LastActivityTime).
Concretely, ensure that when phase is "Stopped"|"Pending"|"Creating" you set
ws.Status.LastActivityTime = nil prior to returning early in the no-op path (or
update the equality guard to also compare LastActivityTime) so subsequent calls
cannot leave LastActivityTime incorrectly populated.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 3a9369c7-7adc-4f89-97d6-d753eea2369e

📥 Commits

Reviewing files that changed from the base of the PR and between 18c6398 and 2dd8e5c.

⛔ Files ignored due to path filters (2)
  • config/crd/bases/automotive.sdv.cloud.redhat.com_operatorconfigs.yaml is excluded by !config/crd/bases/**
  • config/crd/bases/automotive.sdv.cloud.redhat.com_workspaces.yaml is excluded by !config/crd/bases/**
📒 Files selected for processing (8)
  • api/v1alpha1/operatorconfig_types.go
  • api/v1alpha1/workspace_types.go
  • api/v1alpha1/zz_generated.deepcopy.go
  • cmd/caib/workspace/workspace.go
  • cmd/main.go
  • internal/buildapi/workspace.go
  • internal/controller/workspace/controller.go
  • internal/controller/workspace/controller_test.go
✅ Files skipped from review due to trivial changes (3)
  • api/v1alpha1/workspace_types.go
  • internal/controller/workspace/controller_test.go
  • api/v1alpha1/zz_generated.deepcopy.go
🚧 Files skipped from review as they are similar to previous changes (3)
  • cmd/caib/workspace/workspace.go
  • cmd/main.go
  • api/v1alpha1/operatorconfig_types.go

Comment on lines +943 to +950
var autoPauseTimeout string
switch {
case ws.Spec.AutoPauseTimeoutMinutes == nil:
autoPauseTimeout = fmt.Sprintf("default (%dm)", automotivev1alpha1.DefaultAutoPauseTimeoutMinutes)
case *ws.Spec.AutoPauseTimeoutMinutes == 0:
autoPauseTimeout = "disabled"
default:
autoPauseTimeout = fmt.Sprintf("%dm", *ws.Spec.AutoPauseTimeoutMinutes)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't render the compiled-in default as the effective timeout.

nil here means "inherit from OperatorConfig", but this branch always prints DefaultAutoPauseTimeoutMinutes. Any namespace with a custom global timeout — or a global disable — will return the wrong autoPauseTimeout in list/show responses.

Minimal safe fix
 	case ws.Spec.AutoPauseTimeoutMinutes == nil:
-		autoPauseTimeout = fmt.Sprintf("default (%dm)", automotivev1alpha1.DefaultAutoPauseTimeoutMinutes)
+		autoPauseTimeout = "default"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
var autoPauseTimeout string
switch {
case ws.Spec.AutoPauseTimeoutMinutes == nil:
autoPauseTimeout = fmt.Sprintf("default (%dm)", automotivev1alpha1.DefaultAutoPauseTimeoutMinutes)
case *ws.Spec.AutoPauseTimeoutMinutes == 0:
autoPauseTimeout = "disabled"
default:
autoPauseTimeout = fmt.Sprintf("%dm", *ws.Spec.AutoPauseTimeoutMinutes)
var autoPauseTimeout string
switch {
case ws.Spec.AutoPauseTimeoutMinutes == nil:
autoPauseTimeout = "default"
case *ws.Spec.AutoPauseTimeoutMinutes == 0:
autoPauseTimeout = "disabled"
default:
autoPauseTimeout = fmt.Sprintf("%dm", *ws.Spec.AutoPauseTimeoutMinutes)
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/buildapi/workspace.go` around lines 943 - 950, The code currently
prints the compiled DefaultAutoPauseTimeoutMinutes whenever
ws.Spec.AutoPauseTimeoutMinutes is nil, but nil means "inherit from
OperatorConfig"; update the nil branch to read the operator-level setting
instead: fetch the OperatorConfig's AutoPauseTimeoutMinutes (e.g. via your
reconciler/helper that exposes operatorConfig or a getOperatorConfig call), then
if that operator value is nil fall back to
automotivev1alpha1.DefaultAutoPauseTimeoutMinutes, if the operator value is 0
print "disabled", otherwise print the operator value with "%dm". Use the same
semantics for 0/ptr vs nil as with ws.Spec.AutoPauseTimeoutMinutes (reference
symbols: ws.Spec.AutoPauseTimeoutMinutes,
automotivev1alpha1.DefaultAutoPauseTimeoutMinutes, and
OperatorConfig.AutoPauseTimeoutMinutes or your reconciler method that returns
it).

Comment thread internal/controller/workspace/controller.go
Comment on lines +390 to +397
oc := &automotivev1alpha1.OperatorConfig{}
if err := r.Get(ctx, client.ObjectKey{Name: "config", Namespace: ws.Namespace}, oc); err == nil {
if oc.Spec.Workspaces != nil {
return time.Duration(oc.Spec.Workspaces.GetAutoPauseTimeoutMinutes()) * time.Minute
}
}

return time.Duration(automotivev1alpha1.DefaultAutoPauseTimeoutMinutes) * time.Minute

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Only fall back to the built-in timeout on NotFound.

This helper currently turns any r.Get error into DefaultAutoPauseTimeoutMinutes. A transient cache/API failure can therefore change the effective idle policy and auto-pause with the wrong timeout instead of retrying.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/workspace/controller.go` around lines 390 - 397, The
current lookup of OperatorConfig (r.Get(..., oc)) treats any error as “not
configured” and returns DefaultAutoPauseTimeoutMinutes; change the logic to only
fall back to the default when errors.IsNotFound(err) is true, and for any other
error propagate the error back to the caller so the reconcile loop can retry;
specifically, update the block that calls r.Get(ctx, client.ObjectKey{Name:
"config", Namespace: ws.Namespace}, oc) to check errors.IsNotFound(err) and
return the default only in that case, and otherwise return/forward the Get error
instead of silently using DefaultAutoPauseTimeoutMinutes (preserve use of
oc.Spec.Workspaces.GetAutoPauseTimeoutMinutes() when oc is found).

@bennyz

bennyz commented Mar 25, 2026

Copy link
Copy Markdown
Contributor Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Mar 25, 2026

Copy link
Copy Markdown
✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (2)
internal/buildapi/workspace.go (1)

469-481: ⚠️ Potential issue | 🟡 Minor

Still patching activity from a stale Workspace snapshot.

ws was loaded before the handler began its main operation, so this MergeFrom(ws.DeepCopy()) can still race with auto-pause/stop. A late best-effort patch can conflict or repopulate LastActivityTime after the workspace has already transitioned toward Stopped.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/buildapi/workspace.go` around lines 469 - 481,
touchWorkspaceActivity is creating a MergeFrom patch from an earlier ws snapshot
which can race with auto-pause/stop; instead re-read the current Workspace
before patching and build the patch from that fresh copy: after
getClientFromRequest succeeds, call k8sClient.Get to fetch the latest Workspace
into a new variable (e.g., latestWS), check latestWS.Spec.Stopped again, set
latestWS.Status.LastActivityTime = &now and then call k8sClient.Status().Patch
using client.MergeFrom(latestWS.DeepCopy()) (or bail if the workspace is now
stopped) so you never patch from a stale snapshot.
internal/controller/workspace/controller.go (1)

391-398: ⚠️ Potential issue | 🟠 Major

Don't turn OperatorConfig read failures into the built-in timeout.

Any transient r.Get() error here currently changes the effective idle policy instead of retrying, so the controller can auto-pause with the wrong timeout. Only fall back on IsNotFound; bubble other errors back to checkAutoPause().

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/workspace/controller.go` around lines 391 - 398, The
current logic around r.Get(ctx, client.ObjectKey{Name: "config", Namespace:
ws.Namespace}, oc) swallows all errors and falls back to the built-in timeout;
change this so only a NotFound error falls back to
automotivev1alpha1.DefaultAutoPauseTimeoutMinutes, while any other r.Get error
is returned to the caller (so checkAutoPause() can retry/handle it). Concretely,
update the method using OperatorConfig/oc to inspect the error with
apierrors.IsNotFound(err) and return the default timeout only in that case; for
any other non-nil err return it (adjust the function signature to return
(time.Duration, error) if needed) and keep the existing path that uses
oc.Spec.Workspaces.GetAutoPauseTimeoutMinutes() when err == nil.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@api/v1alpha1/workspace_types.go`:
- Around line 68-73: The AutoPauseTimeoutMinutes field currently allows negative
values; add a kubebuilder validation marker to reject negatives by annotating
the AutoPauseTimeoutMinutes field (the pointer int32 field named
AutoPauseTimeoutMinutes) with "+kubebuilder:validation:Minimum=0" so the CRD
schema enforces values >= 0 (nil and 0 still allowed), then regenerate CRDs
(e.g., run controller-gen / repo's generate task) so the CRD rejects negative
numbers at the API boundary.

In `@cmd/caib/workspace/workspace.go`:
- Around line 297-300: The code currently treats any negative autoPauseTimeout
like "-1" by only setting req.AutoPauseTimeoutMinutes when autoPauseTimeout >=
0; update the validation logic where autoPauseTimeout is processed (the block
that sets req.AutoPauseTimeoutMinutes) to explicitly reject values less than -1
by returning an error (or exiting with a clear message) if autoPauseTimeout <
-1, and keep the existing behavior for -1 (leave req.AutoPauseTimeoutMinutes
nil) and for >= 0 (convert to int32 and set req.AutoPauseTimeoutMinutes = &v).

In `@internal/controller/workspace/controller.go`:
- Around line 403-405: The current early return when timeout :=
r.getAutoPauseTimeout(ctx, ws) yields zero prevents workspaces from observing
later global (OperatorConfig) changes; detect whether the zero came from an
inherited OperatorConfig value (add a second return or helper like
getAutoPauseTimeoutInherited or isAutoPauseInherited) and, if timeout==0 AND
inherited==true, return a slow requeue (e.g., ctrl.Result{RequeueAfter:
<reasonable duration>}, nil) so global enables/changes propagate; alternatively
wire a watch on OperatorConfig to trigger reconciles, but implement one of these
in the reconciliation code path around r.getAutoPauseTimeout.
- Around line 374-376: The early-return in setStatus() only compares phase and
can skip updates when ws.Status.LastActivityTime should be changed; update the
change-detection logic in setStatus (or the function doing the no-op check) to
also compare the previous ws.Status.LastActivityTime with the new computed
LastActivityTime and treat them as a difference if they are not equal, so the
code that sets ws.Status.LastActivityTime for phases "Stopped", "Pending", or
"Creating" runs even when phase is unchanged; ensure comparisons handle nil vs
non-nil correctly and use the same time equality semantics used elsewhere.

---

Duplicate comments:
In `@internal/buildapi/workspace.go`:
- Around line 469-481: touchWorkspaceActivity is creating a MergeFrom patch from
an earlier ws snapshot which can race with auto-pause/stop; instead re-read the
current Workspace before patching and build the patch from that fresh copy:
after getClientFromRequest succeeds, call k8sClient.Get to fetch the latest
Workspace into a new variable (e.g., latestWS), check latestWS.Spec.Stopped
again, set latestWS.Status.LastActivityTime = &now and then call
k8sClient.Status().Patch using client.MergeFrom(latestWS.DeepCopy()) (or bail if
the workspace is now stopped) so you never patch from a stale snapshot.

In `@internal/controller/workspace/controller.go`:
- Around line 391-398: The current logic around r.Get(ctx,
client.ObjectKey{Name: "config", Namespace: ws.Namespace}, oc) swallows all
errors and falls back to the built-in timeout; change this so only a NotFound
error falls back to automotivev1alpha1.DefaultAutoPauseTimeoutMinutes, while any
other r.Get error is returned to the caller (so checkAutoPause() can
retry/handle it). Concretely, update the method using OperatorConfig/oc to
inspect the error with apierrors.IsNotFound(err) and return the default timeout
only in that case; for any other non-nil err return it (adjust the function
signature to return (time.Duration, error) if needed) and keep the existing path
that uses oc.Spec.Workspaces.GetAutoPauseTimeoutMinutes() when err == nil.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 4080a934-0722-45bb-8be5-b932c8b5cfc0

📥 Commits

Reviewing files that changed from the base of the PR and between 2dd8e5c and 7477bb2.

⛔ Files ignored due to path filters (2)
  • config/crd/bases/automotive.sdv.cloud.redhat.com_operatorconfigs.yaml is excluded by !config/crd/bases/**
  • config/crd/bases/automotive.sdv.cloud.redhat.com_workspaces.yaml is excluded by !config/crd/bases/**
📒 Files selected for processing (8)
  • api/v1alpha1/operatorconfig_types.go
  • api/v1alpha1/workspace_types.go
  • api/v1alpha1/zz_generated.deepcopy.go
  • cmd/caib/workspace/workspace.go
  • cmd/main.go
  • internal/buildapi/workspace.go
  • internal/controller/workspace/controller.go
  • internal/controller/workspace/controller_test.go
✅ Files skipped from review due to trivial changes (2)
  • internal/controller/workspace/controller_test.go
  • api/v1alpha1/zz_generated.deepcopy.go
🚧 Files skipped from review as they are similar to previous changes (2)
  • cmd/main.go
  • api/v1alpha1/operatorconfig_types.go

Comment thread api/v1alpha1/workspace_types.go
Comment thread cmd/caib/workspace/workspace.go
Comment thread internal/controller/workspace/controller.go
Comment thread internal/controller/workspace/controller.go
Signed-off-by: Benny Zlotnik <bzlotnik@redhat.com>
Assisted-by: claude-opus-4.6

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (3)
internal/controller/workspace/controller.go (3)

391-396: ⚠️ Potential issue | 🟠 Major

Only fall back to the default timeout on NotFound errors.

This lookup treats any r.Get error (including transient API failures or timeouts) the same as "not configured", falling through to the built-in default. A transient failure could cause the workspace to use an incorrect timeout. Only NotFound should trigger the default fallback; other errors should be propagated.

Suggested fix
 	oc := &automotivev1alpha1.OperatorConfig{}
-	if err := r.Get(ctx, client.ObjectKey{Name: "config", Namespace: ws.Namespace}, oc); err == nil {
+	if err := r.Get(ctx, client.ObjectKey{Name: "config", Namespace: ws.Namespace}, oc); err != nil {
+		if !k8serrors.IsNotFound(err) {
+			// Log transient error but don't fail — use default
+			// Consider returning error to let reconcile retry
+		}
+	} else {
 		if oc.Spec.Workspaces != nil {
 			return time.Duration(oc.Spec.Workspaces.GetAutoPauseTimeoutMinutes()) * time.Minute
 		}
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/workspace/controller.go` around lines 391 - 396, The
current r.Get call in the workspace timeout lookup treats any error as "not
configured" and falls back to the default; change it to only fall back when the
error is a NotFound. After calling r.Get(ctx, client.ObjectKey{Name: "config",
Namespace: ws.Namespace}, oc), check apierrors.IsNotFound(err) and only then use
the built-in default timeout; for any other non-nil err return/propagate the
error from the surrounding function. Keep the existing logic that returns
oc.Spec.Workspaces.GetAutoPauseTimeoutMinutes() * time.Minute when oc is
present.

367-376: ⚠️ Potential issue | 🟡 Minor

Early return doesn't account for LastActivityTime needing to be cleared.

If the phase/message/podName are unchanged but LastActivityTime is non-nil when transitioning to Stopped/Pending/Creating (e.g., due to a race with the build API), the early return would skip clearing it. This is a minor edge case but could leave stale activity data.

Suggested fix
+	shouldClearActivity := phase == "Stopped" || phase == "Pending" || phase == "Creating"
+	activityNeedsClear := shouldClearActivity && ws.Status.LastActivityTime != nil
-	if ws.Status.Phase == phase && ws.Status.Message == message && ws.Status.PodName == podName {
+	if ws.Status.Phase == phase && ws.Status.Message == message && ws.Status.PodName == podName && !activityNeedsClear {
 		return nil // no change
 	}
 	patch := client.MergeFrom(ws.DeepCopy())
 	ws.Status.Phase = phase
 	ws.Status.Message = message
 	ws.Status.PodName = podName
-	if phase == "Stopped" || phase == "Pending" || phase == "Creating" {
+	if shouldClearActivity {
 		ws.Status.LastActivityTime = nil
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/workspace/controller.go` around lines 367 - 376, The
early return only compares ws.Status.Phase, Message and PodName and can skip
clearing ws.Status.LastActivityTime when transitioning to
"Stopped"/"Pending"/"Creating"; update the guard so it also checks whether
LastActivityTime must be cleared: before returning ensure that if phase is
"Stopped" or "Pending" or "Creating" and ws.Status.LastActivityTime != nil you
proceed to clear it (set to nil) and perform the status patch (use the existing
client.MergeFrom(ws.DeepCopy()) flow), otherwise return early as before;
reference ws.Status.LastActivityTime, ws.Status.Phase, ws.Status.Message,
ws.Status.PodName, client.MergeFrom and ws.DeepCopy in your change.

403-406: ⚠️ Potential issue | 🟡 Minor

Inherited global disables won't observe later OperatorConfig changes.

When timeout == 0 comes from the inherited OperatorConfig (rather than an explicit workspace override), returning without a requeue means the workspace won't re-check if the operator later enables auto-pause. Consider either:

  1. Returning a slow requeue (e.g., 30 minutes) for inherited-zero cases
  2. Watching OperatorConfig changes to trigger workspace reconciles
Suggested improvement
 	timeout := r.getAutoPauseTimeout(ctx, ws)
 	if timeout == 0 {
-		return ctrl.Result{}, nil
+		// Still requeue slowly to observe OperatorConfig changes
+		// when auto-pause is inherited (not explicitly disabled per-workspace)
+		if ws.Spec.AutoPauseTimeoutMinutes == nil {
+			return ctrl.Result{RequeueAfter: 30 * time.Minute}, nil
+		}
+		return ctrl.Result{}, nil
 	}
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@internal/controller/workspace/controller.go`:
- Around line 391-396: The current r.Get call in the workspace timeout lookup
treats any error as "not configured" and falls back to the default; change it to
only fall back when the error is a NotFound. After calling r.Get(ctx,
client.ObjectKey{Name: "config", Namespace: ws.Namespace}, oc), check
apierrors.IsNotFound(err) and only then use the built-in default timeout; for
any other non-nil err return/propagate the error from the surrounding function.
Keep the existing logic that returns
oc.Spec.Workspaces.GetAutoPauseTimeoutMinutes() * time.Minute when oc is
present.
- Around line 367-376: The early return only compares ws.Status.Phase, Message
and PodName and can skip clearing ws.Status.LastActivityTime when transitioning
to "Stopped"/"Pending"/"Creating"; update the guard so it also checks whether
LastActivityTime must be cleared: before returning ensure that if phase is
"Stopped" or "Pending" or "Creating" and ws.Status.LastActivityTime != nil you
proceed to clear it (set to nil) and perform the status patch (use the existing
client.MergeFrom(ws.DeepCopy()) flow), otherwise return early as before;
reference ws.Status.LastActivityTime, ws.Status.Phase, ws.Status.Message,
ws.Status.PodName, client.MergeFrom and ws.DeepCopy in your change.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 887a78fc-5661-47d5-a59a-99daf50778c0

📥 Commits

Reviewing files that changed from the base of the PR and between 7477bb2 and e7d580e.

⛔ Files ignored due to path filters (2)
  • config/crd/bases/automotive.sdv.cloud.redhat.com_operatorconfigs.yaml is excluded by !config/crd/bases/**
  • config/crd/bases/automotive.sdv.cloud.redhat.com_workspaces.yaml is excluded by !config/crd/bases/**
📒 Files selected for processing (8)
  • api/v1alpha1/operatorconfig_types.go
  • api/v1alpha1/workspace_types.go
  • api/v1alpha1/zz_generated.deepcopy.go
  • cmd/caib/workspace/workspace.go
  • cmd/main.go
  • internal/buildapi/workspace.go
  • internal/controller/workspace/controller.go
  • internal/controller/workspace/controller_test.go
✅ Files skipped from review due to trivial changes (1)
  • internal/controller/workspace/controller_test.go
🚧 Files skipped from review as they are similar to previous changes (3)
  • api/v1alpha1/workspace_types.go
  • api/v1alpha1/zz_generated.deepcopy.go
  • api/v1alpha1/operatorconfig_types.go

@bennyz bennyz requested a review from bkhizgiy March 25, 2026 12:22
@bennyz bennyz merged commit d11c83b into centos-automotive-suite:main Mar 25, 2026
4 checks passed
@bennyz bennyz deleted the auto-pause branch March 25, 2026 13:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants