Skip to content

refactor: extract flash out of server.go#246

Merged
bennyz merged 2 commits into
centos-automotive-suite:mainfrom
bennyz:refactor-server-flash
Apr 21, 2026
Merged

refactor: extract flash out of server.go#246
bennyz merged 2 commits into
centos-automotive-suite:mainfrom
bennyz:refactor-server-flash

Conversation

@bennyz

@bennyz bennyz commented Apr 20, 2026

Copy link
Copy Markdown
Contributor

Summary

Related Issues

Type of Change

  • Bug fix
  • New feature
  • Breaking change
  • Documentation update
  • CI/CD improvement
  • Refactoring

Testing

  • Unit tests pass (make test)
  • Linter passes (make lint)
  • Manifests are up to date (make manifests generate)
  • Tested on OpenShift cluster (if applicable)

Summary by CodeRabbit

  • New Features

    • Flash API for creating, listing, fetching and streaming logs of flash build tasks; HTTP metrics endpoint exposing flash metrics.
    • Prometheus instrumentation for flash activity and request durations; controller now records flash-related metrics and durations.
  • Tests

    • Comprehensive tests validating Flash API behavior, log streaming edge cases, and Prometheus metric exposure/values.

Signed-off-by: Benny Zlotnik <bzlotnik@redhat.com>
@bennyz

bennyz commented Apr 20, 2026

Copy link
Copy Markdown
Contributor Author

@ambient-code please review

@coderabbitai

coderabbitai Bot commented Apr 20, 2026

Copy link
Copy Markdown

Warning

Rate limit exceeded

@bennyz has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 32 minutes and 23 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 32 minutes and 23 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 5643daf7-f075-4591-946e-2527cfea5282

📥 Commits

Reviewing files that changed from the base of the PR and between 8b09089 and a3ec827.

📒 Files selected for processing (6)
  • internal/buildapi/flash.go
  • internal/buildapi/flash_metrics.go
  • internal/buildapi/flash_metrics_test.go
  • internal/buildapi/server.go
  • internal/controller/imagebuild/controller.go
  • internal/controller/imagebuild/metrics.go
📝 Walkthrough

Walkthrough

Adds a Flash Build API (create/list/get/stream logs) with request validation, Kubernetes secret creation, and TaskRun generation; introduces Prometheus metrics for Flash API and integrates flash metrics into the imagebuild controller; adds shared helpers for validation, pagination, namespace/client resolution, and streaming; includes tests for metrics and Flash endpoints.

Changes

Cohort / File(s) Summary
Flash API
internal/buildapi/flash.go
New handlers and core logic to create/list/get flash TaskRuns, perform request validation, create Jumpstarter/OCI secrets, construct Tekton TaskRuns, compute lease duration, label/annotate TaskRuns, stream step-flash logs, and return JSON responses.
Flash metrics (API)
internal/buildapi/flash_metrics.go, internal/buildapi/flash_metrics_test.go
Adds Prometheus collectors FlashCreatedTotal and FlashRequestDuration, Gin middleware to record request durations, metric exposition handler, and tests validating metric instrumentation and /metrics exposure.
Flash tests
internal/buildapi/flash_test.go
Ginkgo/Gomega tests for getTaskRunStatus, getFlash, listFlash ordering/labels, and streamFlashLogs error cases using fake controller-runtime clients and test namespace handling.
Shared helpers
internal/buildapi/helpers.go
New utilities: writeJSON, pagination (parsePagination, applyPagination), parseSinceTime, namespace resolution, Kubernetes client/REST config helpers, log-stream header setup, input validation/sanitization (validateInput, validateContainerRef, validateBuildName, sanitizeBuildNameForValidation), and (*APIServer).resolveRequester.
Server routing / metrics wiring
internal/buildapi/server.go
Registers global GET /metrics, applies flashMetricsMiddleware() to /v1/flash routes, and removes now-extracted Flash handlers/helpers from this file.
Controller flash metrics
internal/controller/imagebuild/controller.go, internal/controller/imagebuild/metrics.go
Adds FlashTotal (CounterVec) and FlashDuration (HistogramVec), records pipeline-triggered and standalone flash metrics (status and duration) during build/flash state transitions.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant APIServer as API Server
    participant K8s as Kubernetes API
    participant Secrets as Secret Store
    participant TaskRun as Tekton TaskRun

    Client->>APIServer: POST /v1/flash (imageRef, clientConfig)
    APIServer->>APIServer: Validate request, sanitize/generate name
    APIServer->>K8s: Resolve namespace, load OperatorConfig
    APIServer->>APIServer: Derive flash command, substitute placeholders
    APIServer->>Secrets: Create Jumpstarter client Secret
    APIServer->>Secrets: Create OCI pull-auth Secret (optional)
    APIServer->>TaskRun: Build Tekton TaskRun spec & bind secrets
    APIServer->>K8s: Create TaskRun
    K8s->>Secrets: Attach ownerReferences for cleanup
    K8s-->>TaskRun: TaskRun created
    APIServer-->>Client: HTTP 202 with FlashResponse
    APIServer->>APIServer: Increment FlashCreatedTotal metric
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • bkhizgiy

Poem

🐰 A little hop, a flash, a build so bright,
Secrets snug and TaskRuns taking flight,
Metrics counting hops with rhythmic cheer,
Helpers tidy paths so logs appear,
Hooray — the rabbit coded through the night!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 42.86% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'refactor: extract flash out of server.go' clearly and concisely describes the main change in the pull request: refactoring the codebase by extracting flash-related functionality from server.go into separate dedicated files.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ambient-code ambient-code Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review — PR #246: Extract flash out of server.go

Overall: Clean refactoring. The flash logic and shared helpers are extracted into well-scoped files without changing external behavior. A few observations below.

Highlights

  • Testability improvement: flash.go now calls getClientFromRequestFn / getRESTConfigFromRequestFn (function variables) instead of the bare functions. The old code in server.go called the concrete functions directly, meaning flash handlers couldn't be unit-tested with mock clients. This is a nice improvement that aligns flash code with the existing testing pattern in the package.
  • helpers.go: Good choice to extract shared utilities (writeJSON, parsePagination, applyPagination, resolveNamespace, validation funcs, etc.) into a single helpers file — they're used by both server.go and flash.go.
  • Test coverage: flash_test.go covers getTaskRunStatus, getFlash, listFlash, and streamFlashLogs edge cases well.

Notes

  1. createFlashClientConfigSecret / createFlashOCIAuthSecret location — These functions live in flash_helpers.go and are called exclusively by flash.go. That's fine as-is, but if you plan further flash-related refactoring, consider whether they belong in flash.go directly to keep the call graph more localized.

  2. strconv import removal from server.go — Correct, since parsePagination (the only user) moved to helpers.go.

No blocking issues. LGTM with the minor inline notes.

Comment thread internal/buildapi/flash.go Outdated
Comment thread internal/buildapi/flash.go

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/buildapi/flash.go`:
- Around line 67-76: The code validates req.Name with validateBuildName but
never replaces it with the sanitized value, so non-kubernetes-safe names (e.g.,
"My_Flash") can pass validation and later fail during TaskRun creation after
secrets are created; fix by calling sanitizeBuildNameForValidation(req.Name) and
assigning its result back to req.Name (or use a new variable and then set
req.Name) before calling validateBuildName and before any Kubernetes resource
creation (ensure this change is applied where req.Name is used to create
Secrets/TaskRun/etc., e.g., in the flash handler that calls validateBuildName
and later creates TaskRun).
- Around line 346-356: getTaskRunStatus may return an empty message when a
TaskRun has Succeeded=False but cond.Message is empty; update getTaskRunStatus
to return a sensible fallback (e.g., "Flash failed") whenever you would return
phaseFailed with an empty cond.Message. Specifically, inside getTaskRunStatus
where you inspect tr.Status.Conditions and currently return phaseFailed,
cond.Message, add a check for cond.Message == "" and return phaseFailed, "Flash
failed" instead (preserve existing phaseCompleted and final fallback behavior).
- Around line 421-430: The current code opens req :=
clientset.CoreV1().Pods(namespace).GetLogs(...); stream, err :=
req.Stream(streamCtx) and immediately returns on error, which can close the 200
response if the container is still starting; change this to retry opening the
log stream while the pod/container is initializing: loop until either
req.Stream(streamCtx) succeeds or streamCtx is cancelled, and on transient
failure check the pod status via clientset.CoreV1().Pods(namespace).Get(ctx,
podName, metav1.GetOptions{}) and inspect pod.Status.Phase and
pod.Status.ContainerStatuses[*].State.Waiting.Reason (look for
Pending/ContainerCreating/ContainerCreating/PodInitializing) to decide to retry;
use a short backoff (e.g., time.Sleep(500ms) with a max wait or context
cancellation) before reattempting, only write a terminal error to c.Writer and
return if the pod/container is not initializing or the context expired; keep
using symbols req, streamCtx, c.Writer, sinceTime, podName, containerName,
clientset.CoreV1().Pods(...) to locate the code.

In `@internal/buildapi/server.go`:
- Around line 582-583: Locate the flash route group where v1.Group("/flash") is
created and currently calls flashGroup.Use(a.authMiddleware(),
flashMetricsMiddleware()); swap the middleware order so metrics wrap auth by
changing the Use call to flashGroup.Use(flashMetricsMiddleware(),
a.authMiddleware()); ensure you reference the flashGroup, authMiddleware, and
flashMetricsMiddleware symbols so the auth check runs inside the metrics
middleware's Next/after logic.

In `@internal/controller/imagebuild/controller.go`:
- Around line 2006-2025: The code increments FlashTotal using
FlashTotal.WithLabelValues(target, status).Inc() before confirming a flash
TaskRun exists; move the increment so it only runs after locating a child with
child.PipelineTaskName == "flash-image" and successfully retrieving its TaskRun
(the same block that computes FlashDuration from taskRun.Status.CompletionTime),
ensuring you do not increment when the TaskRun lookup errors or the flash task
was never created; update the logic around pipelineRun.Status.ChildReferences,
child.PipelineTaskName, r.Get(...) into the same conditional scope where
FlashDuration.WithLabelValues(target, status).Observe(duration) is called so
both metrics are recorded only when the flash TaskRun is present.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 7e5da9a5-14d1-4995-97a5-8cfaf7aadc4f

📥 Commits

Reviewing files that changed from the base of the PR and between 53a5df4 and 0fd7f48.

📒 Files selected for processing (8)
  • internal/buildapi/flash.go
  • internal/buildapi/flash_metrics.go
  • internal/buildapi/flash_metrics_test.go
  • internal/buildapi/flash_test.go
  • internal/buildapi/helpers.go
  • internal/buildapi/server.go
  • internal/controller/imagebuild/controller.go
  • internal/controller/imagebuild/metrics.go

Comment on lines +67 to +76
// Auto-generate name if not provided
if req.Name == "" {
req.Name = fmt.Sprintf("flash-%s", uuid.New().String()[:5])
}

// Validate name
if err := validateBuildName(req.Name); err != nil {
c.JSON(http.StatusBadRequest, gin.H{"error": err.Error()})
return
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Apply the sanitized flash name before creating Kubernetes resources.

Line 73 validates using validateBuildName, but req.Name is never replaced with sanitizeBuildNameForValidation(req.Name). Names like My_Flash pass validation, then TaskRun creation fails later with a Kubernetes name error after secrets were already created.

Proposed fix
 	// Validate name
 	if err := validateBuildName(req.Name); err != nil {
 		c.JSON(http.StatusBadRequest, gin.H{"error": err.Error()})
 		return
 	}
+	req.Name = sanitizeBuildNameForValidation(req.Name)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/buildapi/flash.go` around lines 67 - 76, The code validates req.Name
with validateBuildName but never replaces it with the sanitized value, so
non-kubernetes-safe names (e.g., "My_Flash") can pass validation and later fail
during TaskRun creation after secrets are created; fix by calling
sanitizeBuildNameForValidation(req.Name) and assigning its result back to
req.Name (or use a new variable and then set req.Name) before calling
validateBuildName and before any Kubernetes resource creation (ensure this
change is applied where req.Name is used to create Secrets/TaskRun/etc., e.g.,
in the flash handler that calls validateBuildName and later creates TaskRun).

Comment thread internal/buildapi/flash.go
Comment thread internal/buildapi/flash.go Outdated
Comment thread internal/buildapi/server.go Outdated
Comment thread internal/controller/imagebuild/controller.go
@bennyz bennyz force-pushed the refactor-server-flash branch from 0fd7f48 to 8b09089 Compare April 21, 2026 04:30

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
internal/controller/imagebuild/controller.go (1)

1645-1663: ⚠️ Potential issue | 🟡 Minor

Record flash metrics only after the terminal status patch succeeds.

Lines 1648 and 1652 emit Prometheus side effects before Status().Patch succeeds. If the patch fails or conflicts, the reconcile retries the same completed TaskRun and can double-count FlashTotal/FlashDuration.

📊 Proposed fix
+	flashStatus := buildStatusFailure
 	if isTaskRunSuccessful(taskRun) {
 		fresh.Status.Phase = phaseCompleted
 		fresh.Status.Message = "Build, push, and flash completed successfully"
-		recordFlashMetrics(imageBuild, taskRun, buildStatusSuccess)
+		flashStatus = buildStatusSuccess
 	} else {
 		fresh.Status.Phase = phaseFailed
 		fresh.Status.Message = taskRunFailureMessage(taskRun, "Flash to device failed")
-		recordFlashMetrics(imageBuild, taskRun, buildStatusFailure)
 	}
 
 	if fresh.Status.CompletionTime == nil {
@@
 	if err := r.Status().Patch(ctx, fresh, patch); err != nil {
 		log.Error(err, "Failed to patch status after flash completion")
 		return ctrl.Result{}, err
 	}
+	recordFlashMetrics(fresh, taskRun, flashStatus)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/imagebuild/controller.go` around lines 1645 - 1663, The
recordFlashMetrics calls are executed before persisting the terminal status and
can be double-counted if r.Status().Patch fails; move the
recordFlashMetrics(imageBuild, taskRun, ...) invocation(s) so they run only
after r.Status().Patch(ctx, fresh, patch) returns nil. Concretely, remove or
comment out the two calls to recordFlashMetrics in the if/else that sets
fresh.Status.Phase/Message, then after the Patch succeeds (immediately after the
if err := r.Status().Patch(...) check), call recordFlashMetrics with the
appropriate buildStatusSuccess or buildStatusFailure based on fresh.Status.Phase
(or re-evaluate isTaskRunSuccessful(taskRun) again) so metrics are emitted only
after the status update is persisted.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@internal/controller/imagebuild/controller.go`:
- Around line 1645-1663: The recordFlashMetrics calls are executed before
persisting the terminal status and can be double-counted if r.Status().Patch
fails; move the recordFlashMetrics(imageBuild, taskRun, ...) invocation(s) so
they run only after r.Status().Patch(ctx, fresh, patch) returns nil. Concretely,
remove or comment out the two calls to recordFlashMetrics in the if/else that
sets fresh.Status.Phase/Message, then after the Patch succeeds (immediately
after the if err := r.Status().Patch(...) check), call recordFlashMetrics with
the appropriate buildStatusSuccess or buildStatusFailure based on
fresh.Status.Phase (or re-evaluate isTaskRunSuccessful(taskRun) again) so
metrics are emitted only after the status update is persisted.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 1b13f94e-099c-4e55-b97f-6e97269cc478

📥 Commits

Reviewing files that changed from the base of the PR and between 0fd7f48 and 8b09089.

📒 Files selected for processing (6)
  • internal/buildapi/flash.go
  • internal/buildapi/flash_metrics.go
  • internal/buildapi/flash_metrics_test.go
  • internal/buildapi/server.go
  • internal/controller/imagebuild/controller.go
  • internal/controller/imagebuild/metrics.go
✅ Files skipped from review due to trivial changes (1)
  • internal/buildapi/flash.go
🚧 Files skipped from review as they are similar to previous changes (2)
  • internal/controller/imagebuild/metrics.go
  • internal/buildapi/flash_metrics_test.go

- Replace log.Printf with a.log.Error for structured logging in flash
- Add /metrics endpoint with prometheus handler
- Add ado_flash_created_total counter (incremented on flash creation)
- Add ado_flash_request_duration_seconds histogram with per-endpoint
  latency tracking via gin middleware on /v1/flash/* routes

Assisted-by: claude-opus-4.6
Signed-off-by: Benny Zlotnik <bzlotnik@redhat.com>
@bennyz bennyz force-pushed the refactor-server-flash branch from 8b09089 to a3ec827 Compare April 21, 2026 04:58
@bennyz bennyz requested a review from bkhizgiy April 21, 2026 05:17
@bennyz bennyz merged commit 52273b7 into centos-automotive-suite:main Apr 21, 2026
4 checks passed
@bennyz bennyz deleted the refactor-server-flash branch April 21, 2026 08:08
@coderabbitai coderabbitai Bot mentioned this pull request Apr 26, 2026
10 tasks
@coderabbitai coderabbitai Bot mentioned this pull request May 7, 2026
10 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants