Codex/phase1 hardening#8
Conversation
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughWorkflow and CI tweaks, Go linter and module version bumps, multiple Go files refactored to ignore rollback errors and improve error-variable scoping, HTTP servers given ReadHeaderTimeout, minor runtime/formatting fixes, and targeted runtime/behavior adjustments in several services. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
.github/workflows/ci.yml (1)
97-99: 🧹 Nitpick | 🔵 TrivialTypecheck fallback causes inconsistent behavior and unnecessary full builds.
The fallback
npm run typecheck || npm run buildproduces different results per app:
- gateway: Has
typecheckscript → runstsc --noEmit(lightweight)- bff: No
typecheckscript → falls back totsc(full build)- jobs-worker: No
typecheckscript → falls back totsc(full build)- dashboard: No
typecheckscript → falls back totsc -b && vite build(heavy, includes full Vite bundle)For
bff,jobs-worker, and especiallydashboard, the fallback runs unnecessary full builds instead of type checking alone, increasing CI time. Add explicittypecheckscripts to each app for consistent and faster CI runs.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/ci.yml around lines 97 - 99, The CI step named "Typecheck" currently falls back to running a full build when an app lacks a typecheck script; add explicit "typecheck" scripts to the package.json of the bff, jobs-worker, and dashboard apps so the workflow's Typecheck step runs lightweight type checks only. For each app add a "typecheck" script that invokes TypeScript without emitting (e.g., "tsc --noEmit") or, if the repo uses composite/project refs, "tsc -b --noEmit" for that project; ensure the script names are exactly "typecheck" so the existing workflow step (Typecheck) picks them up and no changes to .github/workflows/ci.yml are required.apps/read-model-builder/internal/projection/telemetry_projection.go (2)
185-190:⚠️ Potential issue | 🟡 MinorRemove duplicate
redisClientnil check.Lines 185-190 contain a duplicate nil check. The second check at line 188 is dead code since if
redisClient == nil, the function already returned at line 186.🧹 Proposed fix
if redisClient == nil { return nil } - if redisClient == nil { - return nil - } pipe := redisClient.Pipeline()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/read-model-builder/internal/projection/telemetry_projection.go` around lines 185 - 190, Remove the duplicate nil check for redisClient in telemetry_projection.go: locate the function/block that contains the two consecutive "if redisClient == nil { return nil }" checks and delete the second one so there is only a single guard that returns when redisClient is nil (preserve the first check and surrounding logic unchanged).
305-311: 🧹 Nitpick | 🔵 TrivialMissing
rows.Err()check after firstrows.Close().Per coding guidelines,
rows.Err()should be checked afterrows.Close(). The second query result (lines 377-387) correctly checksrows.Err(), but this first iteration does not. While the impact is lower here (only scanning event IDs), it's good practice to check consistently.♻️ Proposed fix
for rows.Next() { var id string if scanErr := rows.Scan(&id); scanErr == nil { newEventIDs[id] = struct{}{} } } rows.Close() + if err := rows.Err(); err != nil { + observability.EventsRetry.Inc() + return err + }As per coding guidelines: "Check pgx rows.Err() after rows.Close()."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/read-model-builder/internal/projection/telemetry_projection.go` around lines 305 - 311, The rows returned for populating newEventIDs are not checking rows.Err() after rows.Close(); update the block that iterates rows.Next() filling newEventIDs to call rows.Close() then check if err := rows.Err(); handle that error the same way as the later query (e.g., return the error or log and exit), ensuring you reference the same variables (rows, newEventIDs) and follow the error-handling pattern used in the second query that already checks rows.Err().
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/ci.yml:
- Around line 30-32: The CI currently pairs golangci/golangci-lint-action@v8
with golangci-lint version v1.62.0 which is incompatible; update the workflow to
either change the action to a v6 or v7 tag (e.g.,
golangci/golangci-lint-action@v7) to keep using golangci-lint v1.x, or keep
golangci/golangci-lint-action@v8 and upgrade the referenced golangci-lint
`version:` to a v2.x release (e.g., v2.1.0+) so the action and linter versions
are compatible.
In @.github/workflows/security.yml:
- Line 47: Replace both occurrences of the compromised action string "uses:
aquasecurity/trivy-action@0.33.1" with the safe pinned tag "uses:
aquasecurity/trivy-action@v0.35.0" (update the two locations in the workflow
where that exact string appears) and then trigger a secrets audit for any tokens
or credentials that were available to runs during March 19–20, 2026 to
rotate/revoke any possibly exposed secrets.
In `@apps/cassandra-writer/main.go`:
- Around line 330-335: In the branch that handles non-matching event types (when
event.EventType != "telemetry.recorded"), increment the skip counter by calling
skipped.Add(1) before continuing; specifically, update the block that commits
the message via reader.CommitMessages(ctx, msg) to also call skipped.Add(1)
(ensure this happens regardless of commitErr) so the skip path for this
condition matches the other skip paths (which call skipped.Add(1)).
In `@apps/saga-orchestrator/internal/health/handler.go`:
- Around line 59-63: The http.Server returned in the health server (the return
block creating &http.Server with Addr, Handler, ReadHeaderTimeout) only sets
ReadHeaderTimeout; add ReadTimeout, WriteTimeout, and IdleTimeout on that same
&http.Server to harden against resource exhaustion (e.g., set small
ReadTimeout/WriteTimeout like a few seconds and an IdleTimeout for keep-alives);
update the return in handler.go where the server is constructed so the server
config includes ReadTimeout, WriteTimeout, and IdleTimeout alongside
ReadHeaderTimeout.
In `@apps/telemetry-service/internal/application/record_telemetry.go`:
- Around line 73-75: The defer currently swallows tx.Rollback(ctx) and uses the
request ctx which may be cancelled; change the defer to call tx.Rollback using a
non-request cleanup context (e.g., context.Background() or a short-lived context
with timeout) and check its error, logging any failure instead of discarding it.
Locate the rollback defer around tx.Rollback and replace it with a guarded defer
that does something like: create a cleanupCtx, if err :=
tx.Rollback(cleanupCtx); err != nil && err != sql.ErrTxDone { logger.Errorf("tx
rollback failed: %v", err) } so rollback failures are observed and the request
context cancellation cannot silently prevent cleanup.
In `@apps/telemetry-service/internal/worker/outbox_worker.go`:
- Around line 268-270: The deferred rollback currently discards errors and uses
the incoming ctx which may be canceled; instead update the defer that calls
tx.Rollback(ctx) to call Rollback with a fresh non-canceled context (e.g.,
context.Background() or a short timeout context) and check the returned
error—log it or wrap it (do not ignore) unless it is sql.ErrTxDone; locate the
defer around tx.Rollback in outbox_worker.go (the tx variable and its deferred
rollback) and replace the silent discard `_ = tx.Rollback(ctx)` with explicit
error handling that distinguishes a harmless "transaction already finished"
error from real rollback failures and records the failure via the component
logger/error reporter used in this file.
---
Outside diff comments:
In @.github/workflows/ci.yml:
- Around line 97-99: The CI step named "Typecheck" currently falls back to
running a full build when an app lacks a typecheck script; add explicit
"typecheck" scripts to the package.json of the bff, jobs-worker, and dashboard
apps so the workflow's Typecheck step runs lightweight type checks only. For
each app add a "typecheck" script that invokes TypeScript without emitting
(e.g., "tsc --noEmit") or, if the repo uses composite/project refs, "tsc -b
--noEmit" for that project; ensure the script names are exactly "typecheck" so
the existing workflow step (Typecheck) picks them up and no changes to
.github/workflows/ci.yml are required.
In `@apps/read-model-builder/internal/projection/telemetry_projection.go`:
- Around line 185-190: Remove the duplicate nil check for redisClient in
telemetry_projection.go: locate the function/block that contains the two
consecutive "if redisClient == nil { return nil }" checks and delete the second
one so there is only a single guard that returns when redisClient is nil
(preserve the first check and surrounding logic unchanged).
- Around line 305-311: The rows returned for populating newEventIDs are not
checking rows.Err() after rows.Close(); update the block that iterates
rows.Next() filling newEventIDs to call rows.Close() then check if err :=
rows.Err(); handle that error the same way as the later query (e.g., return the
error or log and exit), ensuring you reference the same variables (rows,
newEventIDs) and follow the error-handling pattern used in the second query that
already checks rows.Err().
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 98f1c8d9-4e4c-4385-ab24-c541daf76db3
⛔ Files ignored due to path filters (1)
go.sumis excluded by!**/*.sum
📒 Files selected for processing (20)
.github/workflows/ci.yml.github/workflows/security.yml.golangci.ymlapps/cassandra-writer/main.goapps/ingest-service/main.goapps/read-model-builder/internal/projection/device_projection.goapps/read-model-builder/internal/projection/telemetry_projection.goapps/read-model-builder/internal/projection/telemetry_projection_integration_test.goapps/saga-orchestrator/internal/consumer/kafka_consumer.goapps/saga-orchestrator/internal/health/handler.goapps/saga-orchestrator/internal/recovery/recovery_worker.goapps/saga-orchestrator/internal/repository/postgres_saga_repository.goapps/telemetry-service/cmd/main.goapps/telemetry-service/internal/application/create_device.goapps/telemetry-service/internal/application/record_telemetry.goapps/telemetry-service/internal/worker/outbox_worker.gogo.modlibs/health/health.golibs/migrate/migrate.golibs/observability/metrics.go
| - name: Run Trivy vulnerability scan | ||
| continue-on-error: true | ||
| uses: aquasecurity/trivy-action@master | ||
| uses: aquasecurity/trivy-action@0.33.1 |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify trivy-action version 0.33.1 exists and check for newer versions
gh api repos/aquasecurity/trivy-action/releases --jq '[.[] | {tag: .tag_name, published: .published_at}] | .[0:10]'Repository: pahuldeepp/GrainGuard-
Length of output: 226
🌐 Web query:
trivy-action 0.33.1 release aquasecurity
💡 Result:
The trivy-action v0.33.1 release by Aqua Security was a legitimate version of the GitHub Action for running Trivy vulnerability scans in CI pipelines. It is referenced in the official repository README and example workflows as "aquasecurity/trivy-action@0.33.1". However, on March 19, 2026 (~17:43 UTC), the repository suffered a supply chain attack where attackers with compromised credentials force-pushed 76 out of 77 version tags (all except v0.35.0) to malicious commits using a credential stealer payload. The tag v0.33.1 was specifically force-pushed to commit SHA 91e7c2c36dcad14149d8e455b960af62a2ffb275, which contains malicious code that reads GitHub Actions runner memory and exfiltrates data. The malicious tags were active until remediation around March 20, 2026 (05:40-07:12 UTC). As of 2026-03-28, Aqua Security recommends pinning to the verified safe version v0.35.0 or specific safe commit SHAs. Avoid using v0.33.1 or other affected tags; update workflows immediately and check for exposure during the attack window.
Citations:
- 1: https://github.com/aquasecurity/trivy-action/releases
- 2: https://www.aquasec.com/blog/trivy-supply-chain-attack-what-you-need-to-know/
- 3: https://github.com/aquasecurity/trivy-action/tree/v0.35.0
- 4: https://www.stepsecurity.io/blog/trivy-compromised-a-second-time---malicious-v0-69-4-release
- 5: https://github.com/aquasecurity/trivy-action
- 6: https://noxalis-lab.io/trivy-under-attack-again-widespread-github-actions-tag-compromise-secrets/
🏁 Script executed:
# Check the workflow file and verify both lines 47 and 84 have the version pinning
cat -n .github/workflows/security.yml | sed -n '40,90p'Repository: pahuldeepp/GrainGuard-
Length of output: 2349
CRITICAL: trivy-action v0.33.1 is compromised—update to v0.35.0 immediately.
The pinned version v0.33.1 was force-pushed to malicious code during a supply chain attack on March 19, 2026. The compromised tags contain a credential stealer payload that exfiltrates data from GitHub Actions runner memory. Update both lines 47 and 84 to the verified safe version aquasecurity/trivy-action@v0.35.0 and audit any secrets potentially exposed if this workflow ran during the attack window (March 19-20, 2026).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.github/workflows/security.yml at line 47, Replace both occurrences of the
compromised action string "uses: aquasecurity/trivy-action@0.33.1" with the safe
pinned tag "uses: aquasecurity/trivy-action@v0.35.0" (update the two locations
in the workflow where that exact string appears) and then trigger a secrets
audit for any tokens or credentials that were available to runs during March
19–20, 2026 to rotate/revoke any possibly exposed secrets.
| if event.EventType != "telemetry.recorded" { | ||
| reader.CommitMessages(ctx, msg) | ||
| if commitErr := reader.CommitMessages(ctx, msg); commitErr != nil { | ||
| log.Printf("commit error after skipping event type: %v", commitErr) | ||
| } | ||
| continue | ||
| } |
There was a problem hiding this comment.
Missing skipped.Add(1) for non-matching event types.
When event.EventType != "telemetry.recorded", the message is committed and skipped but skipped.Add(1) is not called, unlike other skip paths (lines 317, 323, 340, 349). This causes undercounting in the stats output.
🔧 Proposed fix
if event.EventType != "telemetry.recorded" {
+ skipped.Add(1)
if commitErr := reader.CommitMessages(ctx, msg); commitErr != nil {
log.Printf("commit error after skipping event type: %v", commitErr)
}
continue
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if event.EventType != "telemetry.recorded" { | |
| reader.CommitMessages(ctx, msg) | |
| if commitErr := reader.CommitMessages(ctx, msg); commitErr != nil { | |
| log.Printf("commit error after skipping event type: %v", commitErr) | |
| } | |
| continue | |
| } | |
| if event.EventType != "telemetry.recorded" { | |
| skipped.Add(1) | |
| if commitErr := reader.CommitMessages(ctx, msg); commitErr != nil { | |
| log.Printf("commit error after skipping event type: %v", commitErr) | |
| } | |
| continue | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@apps/cassandra-writer/main.go` around lines 330 - 335, In the branch that
handles non-matching event types (when event.EventType != "telemetry.recorded"),
increment the skip counter by calling skipped.Add(1) before continuing;
specifically, update the block that commits the message via
reader.CommitMessages(ctx, msg) to also call skipped.Add(1) (ensure this happens
regardless of commitErr) so the skip path for this condition matches the other
skip paths (which call skipped.Add(1)).
| return &http.Server{ | ||
| Addr: addr, | ||
| Handler: mux, | ||
| ReadHeaderTimeout: 5 * time.Second, | ||
| } |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Consider adding additional server timeouts for defense-in-depth.
While ReadHeaderTimeout protects against Slowloris attacks on header reads, the server lacks ReadTimeout, WriteTimeout, and IdleTimeout. For a health check server with minimal traffic, this is lower risk, but adding these provides better protection against resource exhaustion.
♻️ Suggested timeout configuration
return &http.Server{
Addr: addr,
Handler: mux,
ReadHeaderTimeout: 5 * time.Second,
+ ReadTimeout: 10 * time.Second,
+ WriteTimeout: 10 * time.Second,
+ IdleTimeout: 60 * time.Second,
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| return &http.Server{ | |
| Addr: addr, | |
| Handler: mux, | |
| ReadHeaderTimeout: 5 * time.Second, | |
| } | |
| return &http.Server{ | |
| Addr: addr, | |
| Handler: mux, | |
| ReadHeaderTimeout: 5 * time.Second, | |
| ReadTimeout: 10 * time.Second, | |
| WriteTimeout: 10 * time.Second, | |
| IdleTimeout: 60 * time.Second, | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@apps/saga-orchestrator/internal/health/handler.go` around lines 59 - 63, The
http.Server returned in the health server (the return block creating
&http.Server with Addr, Handler, ReadHeaderTimeout) only sets ReadHeaderTimeout;
add ReadTimeout, WriteTimeout, and IdleTimeout on that same &http.Server to
harden against resource exhaustion (e.g., set small ReadTimeout/WriteTimeout
like a few seconds and an IdleTimeout for keep-alives); update the return in
handler.go where the server is constructed so the server config includes
ReadTimeout, WriteTimeout, and IdleTimeout alongside ReadHeaderTimeout.
| defer func() { | ||
| _ = tx.Rollback(ctx) | ||
| }() |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cat -n apps/telemetry-service/internal/application/record_telemetry.go | head -100Repository: pahuldeepp/GrainGuard-
Length of output: 3251
🏁 Script executed:
cat -n apps/telemetry-service/internal/application/record_telemetry.go | sed -n '100,130p'Repository: pahuldeepp/GrainGuard-
Length of output: 423
Use robust rollback handling instead of swallowing rollback failures.
The defer rollback at lines 73–75 ignores rollback errors and uses the request ctx; if the request context is cancelled, cleanup may fail silently without any logging or observability.
🔧 Proposed fix
import (
"context"
"encoding/json"
+ "errors"
"fmt"
"time"
"github.com/google/uuid"
+ "github.com/jackc/pgx/v5"
"github.com/jackc/pgx/v5/pgxpool"
@@
- defer func() {
- _ = tx.Rollback(ctx)
- }()
+ defer func() {
+ rbCtx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
+ defer cancel()
+ if rbErr := tx.Rollback(rbCtx); rbErr != nil && !errors.Is(rbErr, pgx.ErrTxClosed) {
+ // replace with project logger if available
+ fmt.Printf("tx rollback error: %v\n", rbErr)
+ }
+ }()📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| defer func() { | |
| _ = tx.Rollback(ctx) | |
| }() | |
| defer func() { | |
| rbCtx, cancel := context.WithTimeout(context.Background(), 5*time.Second) | |
| defer cancel() | |
| if rbErr := tx.Rollback(rbCtx); rbErr != nil && !errors.Is(rbErr, pgx.ErrTxClosed) { | |
| // replace with project logger if available | |
| fmt.Printf("tx rollback error: %v\n", rbErr) | |
| } | |
| }() |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@apps/telemetry-service/internal/application/record_telemetry.go` around lines
73 - 75, The defer currently swallows tx.Rollback(ctx) and uses the request ctx
which may be cancelled; change the defer to call tx.Rollback using a non-request
cleanup context (e.g., context.Background() or a short-lived context with
timeout) and check its error, logging any failure instead of discarding it.
Locate the rollback defer around tx.Rollback and replace it with a guarded defer
that does something like: create a cleanupCtx, if err :=
tx.Rollback(cleanupCtx); err != nil && err != sql.ErrTxDone { logger.Errorf("tx
rollback failed: %v", err) } so rollback failures are observed and the request
context cancellation cannot silently prevent cleanup.
| defer func() { | ||
| _ = tx.Rollback(ctx) | ||
| }() |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
rg -nP --type=go 'defer\s+func\(\)\s*\{\s*_\s*=\s*tx\.Rollback\(ctx\)\s*\}\(\)'Repository: pahuldeepp/GrainGuard-
Length of output: 48
🏁 Script executed:
find . -path "*apps/telemetry-service/internal/worker/outbox_worker.go" -type fRepository: pahuldeepp/GrainGuard-
Length of output: 123
🏁 Script executed:
cat -n apps/telemetry-service/internal/worker/outbox_worker.go | sed -n '260,280p'Repository: pahuldeepp/GrainGuard-
Length of output: 647
🏁 Script executed:
cat -n apps/telemetry-service/internal/worker/outbox_worker.go | sed -n '280,330p'Repository: pahuldeepp/GrainGuard-
Length of output: 1334
🏁 Script executed:
cat -n apps/telemetry-service/internal/worker/outbox_worker.go | sed -n '330,370p'Repository: pahuldeepp/GrainGuard-
Length of output: 973
🏁 Script executed:
find ./apps/telemetry-service -name "*.go" -type f -exec grep -l "tx.Rollback" {} \;Repository: pahuldeepp/GrainGuard-
Length of output: 252
🏁 Script executed:
rg -nP --type=go 'defer.*tx\.Rollback' ./apps/telemetry-serviceRepository: pahuldeepp/GrainGuard-
Length of output: 48
🏁 Script executed:
grep -n "Rollback" ./apps/telemetry-service/internal/application/create_device.go ./apps/telemetry-service/internal/application/record_telemetry.go ./apps/telemetry-service/internal/worker/outbox_worker.goRepository: pahuldeepp/GrainGuard-
Length of output: 331
🏁 Script executed:
cat -n ./apps/telemetry-service/internal/application/create_device.go | sed -n '50,70p'Repository: pahuldeepp/GrainGuard-
Length of output: 714
🏁 Script executed:
cat -n ./apps/telemetry-service/internal/application/record_telemetry.go | sed -n '65,85p'Repository: pahuldeepp/GrainGuard-
Length of output: 606
Handle rollback errors explicitly and avoid canceled request context for rollback.
At line 269, rollback errors are silently discarded with _, and the rollback uses ctx which may already be canceled at error boundaries. This can mask transaction cleanup failures and is a systemic pattern across device creation, telemetry recording, and outbox publishing paths in the service.
🔧 Proposed fix
import (
"context"
"encoding/json"
+ "errors"
"fmt"
"log"
"os"
"strings"
"time"
+ "github.com/jackc/pgx/v5"
"github.com/jackc/pgx/v5/pgxpool"
"github.com/segmentio/kafka-go"
"google.golang.org/protobuf/proto"
@@
- defer func() {
- _ = tx.Rollback(ctx)
- }()
+ defer func() {
+ rbCtx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
+ defer cancel()
+ if rbErr := tx.Rollback(rbCtx); rbErr != nil && !errors.Is(rbErr, pgx.ErrTxClosed) {
+ log.Println("tx rollback error:", rbErr)
+ }
+ }()📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| defer func() { | |
| _ = tx.Rollback(ctx) | |
| }() | |
| import ( | |
| "context" | |
| "encoding/json" | |
| "errors" | |
| "fmt" | |
| "log" | |
| "os" | |
| "strings" | |
| "time" | |
| "github.com/jackc/pgx/v5" | |
| "github.com/jackc/pgx/v5/pgxpool" | |
| "github.com/segmentio/kafka-go" | |
| "google.golang.org/protobuf/proto" | |
| ) | |
| // ... (other code) | |
| defer func() { | |
| rbCtx, cancel := context.WithTimeout(context.Background(), 5*time.Second) | |
| defer cancel() | |
| if rbErr := tx.Rollback(rbCtx); rbErr != nil && !errors.Is(rbErr, pgx.ErrTxClosed) { | |
| log.Println("tx rollback error:", rbErr) | |
| } | |
| }() |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@apps/telemetry-service/internal/worker/outbox_worker.go` around lines 268 -
270, The deferred rollback currently discards errors and uses the incoming ctx
which may be canceled; instead update the defer that calls tx.Rollback(ctx) to
call Rollback with a fresh non-canceled context (e.g., context.Background() or a
short timeout context) and check the returned error—log it or wrap it (do not
ignore) unless it is sql.ErrTxDone; locate the defer around tx.Rollback in
outbox_worker.go (the tx variable and its deferred rollback) and replace the
silent discard `_ = tx.Rollback(ctx)` with explicit error handling that
distinguishes a harmless "transaction already finished" error from real rollback
failures and records the failure via the component logger/error reporter used in
this file.
HIGH PRIORITY: - #9: Fix CSRF timing attack — pad buffers to fixed length before timingSafeEqual - #10: Upgrade circuit breaker to distributed (Redis-backed state sharing across pods) - #8: Fix saga recovery infinite loop — mark corrupted payloads as FAILED instead of retrying MEDIUM PRIORITY: - #11: Add webhook idempotency check (dedup by endpoint_id + event_type) - #12: Assert Stripe webhook body is Buffer before signature verification - #13: Eliminate N+1 query in deviceTelemetry resolver (single JOIN query) - #15: Add ORDER BY + LIMIT to saga FindByCorrelationID for deterministic results - #17: Make critical audit events (auth, admin) throw on failure instead of silent swallow LOW PRIORITY: - #20: Add 10s query timeout to all saga repository DB operations - #23: Add IsValidStatus validator for saga status constants - #24: Set httpServer.timeout (30s) and keepAliveTimeout (65s) on BFF - #25: Add RabbitMQ heartbeat (30s) and connection error/close handlers - #7: Fix remaining saga JSON marshal error check (initialErr) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Summary by CodeRabbit
Bug Fixes
Chores
Refactor
Tests