refactor: extract flash out of server.go#246
Conversation
Signed-off-by: Benny Zlotnik <bzlotnik@redhat.com>
|
@ambient-code please review |
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 32 minutes and 23 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (6)
📝 WalkthroughWalkthroughAdds a Flash Build API (create/list/get/stream logs) with request validation, Kubernetes secret creation, and TaskRun generation; introduces Prometheus metrics for Flash API and integrates flash metrics into the imagebuild controller; adds shared helpers for validation, pagination, namespace/client resolution, and streaming; includes tests for metrics and Flash endpoints. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant APIServer as API Server
participant K8s as Kubernetes API
participant Secrets as Secret Store
participant TaskRun as Tekton TaskRun
Client->>APIServer: POST /v1/flash (imageRef, clientConfig)
APIServer->>APIServer: Validate request, sanitize/generate name
APIServer->>K8s: Resolve namespace, load OperatorConfig
APIServer->>APIServer: Derive flash command, substitute placeholders
APIServer->>Secrets: Create Jumpstarter client Secret
APIServer->>Secrets: Create OCI pull-auth Secret (optional)
APIServer->>TaskRun: Build Tekton TaskRun spec & bind secrets
APIServer->>K8s: Create TaskRun
K8s->>Secrets: Attach ownerReferences for cleanup
K8s-->>TaskRun: TaskRun created
APIServer-->>Client: HTTP 202 with FlashResponse
APIServer->>APIServer: Increment FlashCreatedTotal metric
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review — PR #246: Extract flash out of server.go
Overall: Clean refactoring. The flash logic and shared helpers are extracted into well-scoped files without changing external behavior. A few observations below.
Highlights
- Testability improvement:
flash.gonow callsgetClientFromRequestFn/getRESTConfigFromRequestFn(function variables) instead of the bare functions. The old code inserver.gocalled the concrete functions directly, meaning flash handlers couldn't be unit-tested with mock clients. This is a nice improvement that aligns flash code with the existing testing pattern in the package. - helpers.go: Good choice to extract shared utilities (
writeJSON,parsePagination,applyPagination,resolveNamespace, validation funcs, etc.) into a single helpers file — they're used by bothserver.goandflash.go. - Test coverage:
flash_test.gocoversgetTaskRunStatus,getFlash,listFlash, andstreamFlashLogsedge cases well.
Notes
-
createFlashClientConfigSecret/createFlashOCIAuthSecretlocation — These functions live inflash_helpers.goand are called exclusively byflash.go. That's fine as-is, but if you plan further flash-related refactoring, consider whether they belong inflash.godirectly to keep the call graph more localized. -
strconvimport removal fromserver.go— Correct, sinceparsePagination(the only user) moved tohelpers.go.
No blocking issues. LGTM with the minor inline notes.
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/buildapi/flash.go`:
- Around line 67-76: The code validates req.Name with validateBuildName but
never replaces it with the sanitized value, so non-kubernetes-safe names (e.g.,
"My_Flash") can pass validation and later fail during TaskRun creation after
secrets are created; fix by calling sanitizeBuildNameForValidation(req.Name) and
assigning its result back to req.Name (or use a new variable and then set
req.Name) before calling validateBuildName and before any Kubernetes resource
creation (ensure this change is applied where req.Name is used to create
Secrets/TaskRun/etc., e.g., in the flash handler that calls validateBuildName
and later creates TaskRun).
- Around line 346-356: getTaskRunStatus may return an empty message when a
TaskRun has Succeeded=False but cond.Message is empty; update getTaskRunStatus
to return a sensible fallback (e.g., "Flash failed") whenever you would return
phaseFailed with an empty cond.Message. Specifically, inside getTaskRunStatus
where you inspect tr.Status.Conditions and currently return phaseFailed,
cond.Message, add a check for cond.Message == "" and return phaseFailed, "Flash
failed" instead (preserve existing phaseCompleted and final fallback behavior).
- Around line 421-430: The current code opens req :=
clientset.CoreV1().Pods(namespace).GetLogs(...); stream, err :=
req.Stream(streamCtx) and immediately returns on error, which can close the 200
response if the container is still starting; change this to retry opening the
log stream while the pod/container is initializing: loop until either
req.Stream(streamCtx) succeeds or streamCtx is cancelled, and on transient
failure check the pod status via clientset.CoreV1().Pods(namespace).Get(ctx,
podName, metav1.GetOptions{}) and inspect pod.Status.Phase and
pod.Status.ContainerStatuses[*].State.Waiting.Reason (look for
Pending/ContainerCreating/ContainerCreating/PodInitializing) to decide to retry;
use a short backoff (e.g., time.Sleep(500ms) with a max wait or context
cancellation) before reattempting, only write a terminal error to c.Writer and
return if the pod/container is not initializing or the context expired; keep
using symbols req, streamCtx, c.Writer, sinceTime, podName, containerName,
clientset.CoreV1().Pods(...) to locate the code.
In `@internal/buildapi/server.go`:
- Around line 582-583: Locate the flash route group where v1.Group("/flash") is
created and currently calls flashGroup.Use(a.authMiddleware(),
flashMetricsMiddleware()); swap the middleware order so metrics wrap auth by
changing the Use call to flashGroup.Use(flashMetricsMiddleware(),
a.authMiddleware()); ensure you reference the flashGroup, authMiddleware, and
flashMetricsMiddleware symbols so the auth check runs inside the metrics
middleware's Next/after logic.
In `@internal/controller/imagebuild/controller.go`:
- Around line 2006-2025: The code increments FlashTotal using
FlashTotal.WithLabelValues(target, status).Inc() before confirming a flash
TaskRun exists; move the increment so it only runs after locating a child with
child.PipelineTaskName == "flash-image" and successfully retrieving its TaskRun
(the same block that computes FlashDuration from taskRun.Status.CompletionTime),
ensuring you do not increment when the TaskRun lookup errors or the flash task
was never created; update the logic around pipelineRun.Status.ChildReferences,
child.PipelineTaskName, r.Get(...) into the same conditional scope where
FlashDuration.WithLabelValues(target, status).Observe(duration) is called so
both metrics are recorded only when the flash TaskRun is present.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 7e5da9a5-14d1-4995-97a5-8cfaf7aadc4f
📒 Files selected for processing (8)
internal/buildapi/flash.gointernal/buildapi/flash_metrics.gointernal/buildapi/flash_metrics_test.gointernal/buildapi/flash_test.gointernal/buildapi/helpers.gointernal/buildapi/server.gointernal/controller/imagebuild/controller.gointernal/controller/imagebuild/metrics.go
| // Auto-generate name if not provided | ||
| if req.Name == "" { | ||
| req.Name = fmt.Sprintf("flash-%s", uuid.New().String()[:5]) | ||
| } | ||
|
|
||
| // Validate name | ||
| if err := validateBuildName(req.Name); err != nil { | ||
| c.JSON(http.StatusBadRequest, gin.H{"error": err.Error()}) | ||
| return | ||
| } |
There was a problem hiding this comment.
Apply the sanitized flash name before creating Kubernetes resources.
Line 73 validates using validateBuildName, but req.Name is never replaced with sanitizeBuildNameForValidation(req.Name). Names like My_Flash pass validation, then TaskRun creation fails later with a Kubernetes name error after secrets were already created.
Proposed fix
// Validate name
if err := validateBuildName(req.Name); err != nil {
c.JSON(http.StatusBadRequest, gin.H{"error": err.Error()})
return
}
+ req.Name = sanitizeBuildNameForValidation(req.Name)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/buildapi/flash.go` around lines 67 - 76, The code validates req.Name
with validateBuildName but never replaces it with the sanitized value, so
non-kubernetes-safe names (e.g., "My_Flash") can pass validation and later fail
during TaskRun creation after secrets are created; fix by calling
sanitizeBuildNameForValidation(req.Name) and assigning its result back to
req.Name (or use a new variable and then set req.Name) before calling
validateBuildName and before any Kubernetes resource creation (ensure this
change is applied where req.Name is used to create Secrets/TaskRun/etc., e.g.,
in the flash handler that calls validateBuildName and later creates TaskRun).
0fd7f48 to
8b09089
Compare
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
internal/controller/imagebuild/controller.go (1)
1645-1663:⚠️ Potential issue | 🟡 MinorRecord flash metrics only after the terminal status patch succeeds.
Lines 1648 and 1652 emit Prometheus side effects before
Status().Patchsucceeds. If the patch fails or conflicts, the reconcile retries the same completed TaskRun and can double-countFlashTotal/FlashDuration.📊 Proposed fix
+ flashStatus := buildStatusFailure if isTaskRunSuccessful(taskRun) { fresh.Status.Phase = phaseCompleted fresh.Status.Message = "Build, push, and flash completed successfully" - recordFlashMetrics(imageBuild, taskRun, buildStatusSuccess) + flashStatus = buildStatusSuccess } else { fresh.Status.Phase = phaseFailed fresh.Status.Message = taskRunFailureMessage(taskRun, "Flash to device failed") - recordFlashMetrics(imageBuild, taskRun, buildStatusFailure) } if fresh.Status.CompletionTime == nil { @@ if err := r.Status().Patch(ctx, fresh, patch); err != nil { log.Error(err, "Failed to patch status after flash completion") return ctrl.Result{}, err } + recordFlashMetrics(fresh, taskRun, flashStatus)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/controller/imagebuild/controller.go` around lines 1645 - 1663, The recordFlashMetrics calls are executed before persisting the terminal status and can be double-counted if r.Status().Patch fails; move the recordFlashMetrics(imageBuild, taskRun, ...) invocation(s) so they run only after r.Status().Patch(ctx, fresh, patch) returns nil. Concretely, remove or comment out the two calls to recordFlashMetrics in the if/else that sets fresh.Status.Phase/Message, then after the Patch succeeds (immediately after the if err := r.Status().Patch(...) check), call recordFlashMetrics with the appropriate buildStatusSuccess or buildStatusFailure based on fresh.Status.Phase (or re-evaluate isTaskRunSuccessful(taskRun) again) so metrics are emitted only after the status update is persisted.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@internal/controller/imagebuild/controller.go`:
- Around line 1645-1663: The recordFlashMetrics calls are executed before
persisting the terminal status and can be double-counted if r.Status().Patch
fails; move the recordFlashMetrics(imageBuild, taskRun, ...) invocation(s) so
they run only after r.Status().Patch(ctx, fresh, patch) returns nil. Concretely,
remove or comment out the two calls to recordFlashMetrics in the if/else that
sets fresh.Status.Phase/Message, then after the Patch succeeds (immediately
after the if err := r.Status().Patch(...) check), call recordFlashMetrics with
the appropriate buildStatusSuccess or buildStatusFailure based on
fresh.Status.Phase (or re-evaluate isTaskRunSuccessful(taskRun) again) so
metrics are emitted only after the status update is persisted.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 1b13f94e-099c-4e55-b97f-6e97269cc478
📒 Files selected for processing (6)
internal/buildapi/flash.gointernal/buildapi/flash_metrics.gointernal/buildapi/flash_metrics_test.gointernal/buildapi/server.gointernal/controller/imagebuild/controller.gointernal/controller/imagebuild/metrics.go
✅ Files skipped from review due to trivial changes (1)
- internal/buildapi/flash.go
🚧 Files skipped from review as they are similar to previous changes (2)
- internal/controller/imagebuild/metrics.go
- internal/buildapi/flash_metrics_test.go
- Replace log.Printf with a.log.Error for structured logging in flash - Add /metrics endpoint with prometheus handler - Add ado_flash_created_total counter (incremented on flash creation) - Add ado_flash_request_duration_seconds histogram with per-endpoint latency tracking via gin middleware on /v1/flash/* routes Assisted-by: claude-opus-4.6 Signed-off-by: Benny Zlotnik <bzlotnik@redhat.com>
8b09089 to
a3ec827
Compare
Summary
Related Issues
Type of Change
Testing
make test)make lint)make manifests generate)Summary by CodeRabbit
New Features
Tests