feat: SBOM scanner sidecar for memory-isolated SBOM generation#335
feat: SBOM scanner sidecar for memory-isolated SBOM generation#335
Conversation
Kubevuln OOM-kills when Syft generates SBOMs for large container images, crashing the entire pod and losing Grype DB state + in-flight scans. This adds an opt-in sidecar container that runs Syft in a separate memory cgroup, following the same pattern implemented for node-agent (PR #753). - gRPC service over Unix domain socket for registry-based SBOM scanning - SidecarSBOMAdapter implements ports.SBOMCreator with crash detection and retry logic (3 retries before marking as TooLarge) - Scanner binary entrypoint (cmd/sbom-scanner) included in Docker image - Backward compatible: without SBOM_SCANNER_SOCKET env var, kubevuln uses the in-process SyftAdapter as before Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Ben <ben@armosec.io>
📝 WalkthroughWalkthroughAdds a sidecar-based SBOM scanning system: a gRPC scanner service, Unix-socket client, a Sidecar SBOM adapter, runtime selection in the HTTP server, build updates to produce the sidecar binary, and tests for client/server and adapter behavior. Changes
Sequence DiagramsequenceDiagram
participant HTTP as HTTP Server (cmd/http)
participant Adapter as SidecarSBOMAdapter (adapters/v1)
participant Client as SBOMScannerClient (pkg/sbomscanner/v1)
participant Server as SBOMScanner gRPC (cmd/sbom-scanner)
participant Syft as Syft Engine
HTTP->>Adapter: CreateSBOM(name,imageID,imageTag,opts)
Adapter->>Client: CreateSBOM(ScanRequest)
Client->>Server: CreateSBOM RPC
Server->>Syft: Fetch image + Run scan (timeout/size enforced)
Syft-->>Server: SBOM
Server-->>Client: CreateSBOMResponse{status, sbom_size, sbom_document}
Client-->>Adapter: ScanResult{Status, SBOMSize, SyftDocument, ErrorMessage}
Adapter->>Adapter: Map status, set annotations/labels, update retry state
Adapter-->>HTTP: domain.SBOM{status, content, annotations}
alt Scanner crash (ErrScannerCrashed)
Adapter->>Adapter: increment retry counter, return crash error (retry)
end
alt Max retries exceeded
Adapter->>Adapter: clear retry counter, mark TooLarge, annotate memory-limit
Adapter-->>HTTP: domain.SBOM{status: TooLarge}
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Suggested labels
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Add opt-in sidecar container for kubevuln that isolates Syft SBOM generation in a separate memory cgroup, preventing OOM kills from crashing the main kubevuln pod. - kubevuln.sbomScanner.enabled (default: false) adds sbom-scanner sidecar container using the same kubevuln image - Shared emptyDir volume (sbom-comm) for Unix domain socket IPC - SBOM_SCANNER_SOCKET and SCANNER_MEMORY_LIMIT env vars on main container - GOMEMLIMIT via downward API on sidecar container Companion PR: kubescape/kubevuln#335 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Ben <ben@armosec.io>
There was a problem hiding this comment.
Actionable comments posted: 8
🧹 Nitpick comments (9)
pkg/sbomscanner/v1/integration_test.go (1)
77-88: Minor: inconsistent cleanup order compared to other tests.Other tests use
deferfor cleanup, but this test manually callsclient.Close()at the end. This is fine for the test logic but slightly inconsistent with the file's style.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/sbomscanner/v1/integration_test.go` around lines 77 - 88, The test TestIntegration_ReadyCheck currently calls client.Close() manually at the end; make cleanup consistent with other tests by deferring client.Close() immediately after obtaining client from startIntegrationServer(t) (and optionally defer srv.Stop() as well) so resources are released via defer rather than an explicit call at the test end.adapters/v1/sidecar_test.go (1)
92-117: Test relies on stateful retry tracking in adapter.The test expects the adapter to track crash retries per-image across separate
CreateSBOMcalls, returningErrScannerCrashedon the first two attempts andTooLargeon the third. This coupling is fine if it matches the adapter's actual implementation.Consider adding a comment clarifying that this test exercises the adapter's per-image retry exhaustion tracking rather than immediate retry logic within a single call.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@adapters/v1/sidecar_test.go` around lines 92 - 117, TestSidecarSBOMAdapter_CreateSBOM_CrashRetry relies on the adapter tracking crash retries across separate CreateSBOM calls for the same image; update the test to include a short clarifying comment above the test or the sequence that states this explicitly (mentioning NewSidecarSBOMAdapter, CreateSBOM, ErrScannerCrashed, and that the third call should result in helpersv1.TooLarge after retry exhaustion and callCount==3) so readers know the test exercises per-image retry-exhaustion state rather than in-call retry logic.cmd/sbom-scanner/main.go (1)
36-41: Consider adding force-stop on repeated signal for faster shutdown.If a long-running scan is in progress,
GracefulStop()will block until it completes. A second signal during this wait has no effect. Adding a force-stop mechanism improves operational flexibility during deployments.♻️ Optional: Force-stop on second signal
go func() { sig := <-sigCh logger.L().Info("received signal, shutting down", helpers.String("signal", sig.String())) + go func() { + sig := <-sigCh + logger.L().Warning("received second signal, forcing immediate shutdown", helpers.String("signal", sig.String())) + srv.Stop() + }() srv.GracefulStop() os.Remove(socketPath) }()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/sbom-scanner/main.go` around lines 36 - 41, The shutdown goroutine currently calls srv.GracefulStop() and waits indefinitely, so a second signal is ignored; modify the logic around sigCh and srv to support a force-stop on a repeated signal by: after receiving the first signal from sigCh and invoking srv.GracefulStop() (and removing socketPath), start a short second-listener that if a second signal arrives calls srv.Stop() (or os.Exit(1) if immediate process termination is preferred) to forcefully terminate the server; reference the existing symbols sigCh, logger.L(), srv.GracefulStop(), srv.Stop(), os.Remove and socketPath when making this change and ensure any cleanup (socket removal) remains idempotent.pkg/sbomscanner/v1/server_test.go (1)
68-95: Consider adding more test cases fornormalizeImageID.The function likely handles more edge cases. Consider adding tests for:
- Short image names without registry (e.g.,
nginx:latest→ implicitdocker.io/library/)- Digest-only imageID without tag
- Invalid or malformed imageID inputs
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/sbomscanner/v1/server_test.go` around lines 68 - 95, Add more unit cases to TestNormalizeImageID to cover short image names, digest-only IDs, and malformed inputs: include a case where imageID is a short name like "nginx:latest" expecting the implicit "docker.io/library/nginx:latest" (or whatever normalizeImageID's canonicalization produces), a case where imageID is a digest-only reference like "@sha256:abc..." with imageTag empty or set and verify the digest is returned/merged correctly, and a case for malformed inputs (e.g., "not@@valid" or an incomplete digest) asserting that normalizeImageID either returns the input unchanged or an empty string/error according to current behavior; update TestNormalizeImageID to name each scenario clearly so normalizeImageID behavior for these edge cases is covered.cmd/http/main.go (1)
76-79: Document or make configurable the fallback behavior.When
SBOM_SCANNER_SOCKETis explicitly set but connection fails, the code silently falls back to in-process scanning. This is resilient but may mask deployment issues. Consider adding aSBOM_SCANNER_STRICTenv var to fail fast when the sidecar is expected but unavailable.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/http/main.go` around lines 76 - 79, Summary: When SBOM_SCANNER_SOCKET is set but connection fails the code silently falls back to in-process Syft; add a strict-mode option to fail fast. Update the connection error handling (the block that logs via logger.L().Warning and assigns sbomAdapter = v1.NewSyftAdapter(...)) to read a new SBOM_SCANNER_STRICT boolean env var (or configuration flag), parse it, and if true return an error or exit immediately instead of falling back; if false keep the existing fallback behavior and log that strict mode is disabled. Ensure references to sbomAdapter and v1.NewSyftAdapter remain unchanged and that the new env var is documented in startup logs and config handling so deployments can enable strict failure behavior when the sidecar is required.pkg/sbomscanner/v1/types.go (2)
25-25: Document the implicit zero-value behavior forTimeout.When
Timeoutis zero-valued, the server defaults to 5 minutes (perserver.golines 132-135). This implicit contract should be documented here so callers understand the default behavior.// ScanRequest contains all parameters needed for a registry-based SBOM scan. type ScanRequest struct { ImageID string ImageTag string Options domain.RegistryOptions MaxImageSize int64 MaxSBOMSize int32 EnableEmbeddedSBOMs bool - Timeout time.Duration + Timeout time.Duration // Zero value defaults to 5 minutes on the server }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/sbomscanner/v1/types.go` at line 25, Document the implicit zero-value behavior for the Timeout field: update the comment for the Timeout field in the relevant struct in pkg/sbomscanner/v1/types.go to state that a zero value causes the server to use a default timeout of 5 minutes (as enforced by the server implementation), so callers know the default contract when they omit or set Timeout to 0.
32-32: Consider using a typed constant forStatusto prevent silent mismatches.
Statusis a plainstringthat is compared againsthelpersv1.*constants (e.g.,helpersv1.Learning,helpersv1.TooLarge). If the server sends a value that doesn't match these constants exactly, comparisons will silently fail. A typed constant or enum would provide compile-time safety.This is a low-priority enhancement since the current approach mirrors existing patterns in the codebase.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/sbomscanner/v1/types.go` at line 32, Replace the plain string Status in the SBOM scan types with a new dedicated typed constant (e.g., type ScanStatus string) and declare the known constants (e.g., ScanStatusLearning, ScanStatusTooLarge) so comparisons against helpersv1.Learning / helpersv1.TooLarge map to these typed values; update the struct field (the Status field in pkg/sbomscanner/v1/types.go) to use ScanStatus and ensure JSON unmarshalling still works by keeping the underlying type as string (or adding UnmarshalText/UnmarshalJSON if needed), then update any comparisons or assignments that reference helpersv1.* to use the new ScanStatus constants so mismatches are caught at compile time.adapters/v1/sidecar.go (1)
108-114: Consider clearing retry count earlier on any successful response.Currently, retry count is only cleared after a successful
CreateSBOMcall (lines 109-111). However, if the scan returns a non-crash error (line 93), the retry count persists. This is likely fine since the crash scenario is the primary concern, but worth noting for completeness.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@adapters/v1/sidecar.go` around lines 108 - 114, The retry count in s.retryCount is only cleared after a successful CreateSBOM return; modify the flow so any successful scan/response path clears the retry count earlier (not just the CreateSBOM success branch). Locate the code paths that return successful non-crash responses (the branch that returns a non-crash error/response) and add delete(s.retryCount, imageID) protected by s.mu.Lock()/Unlock() there as well (same mutex usage as the existing clear), so all non-crash successful outcomes reset the retry counter consistently.pkg/sbomscanner/v1/server.go (1)
251-277: Code duplication — remove private function and use exportedNormalizeImageIDfrom adapters/v1.This private
normalizeImageIDfunction duplicates the exportedNormalizeImageIDfromadapters/v1/syft.go. Instead of maintaining two copies, remove the private implementation inpkg/sbomscanner/v1/server.goand import the public function from adapters/v1. This ensures a single source of truth for image ID normalization logic.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/sbomscanner/v1/server.go` around lines 251 - 277, The private normalizeImageID in pkg/sbomscanner/v1/server.go duplicates the exported NormalizeImageID in adapters/v1; remove the local normalizeImageID function, import the adapters/v1 package, and replace all local calls to normalizeImageID with adaptersv1.NormalizeImageID (or the actual package alias you import) so the file uses the single exported NormalizeImageID implementation from adapters/v1; ensure any required imports (adapters/v1, plus any digest/name packages if no longer used) are cleaned up.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@adapters/v1/sidecar.go`:
- Line 83: The MaxSBOMSize assignment casts s.maxSBOMSize (an int) to int32
which can overflow on 64-bit systems; update the code that sets MaxSBOMSize
(where MaxSBOMSize: int32(s.maxSBOMSize) is used) to validate s.maxSBOMSize
before casting: either ensure it's within math.MaxInt32 (return an error or
clamp the value) or change the proto/field to use int64 so no downcast is
necessary; reference the MaxSBOMSize field and the s.maxSBOMSize variable when
applying the validation or type change.
In `@cmd/http/main.go`:
- Around line 72-86: The SBOM scanner gRPC client (scannerClient created by
NewSBOMScannerClient) is never closed on shutdown; declare scannerClient in the
outer scope so it is accessible after setup, and ensure you close it during
shutdown (e.g., after controller.Shutdown() or in the registered
shutdown/cleanup path) by checking scannerClient != nil and calling its Close()
method; update the sbom adapter setup around sbomAdapter, NewSidecarSBOMAdapter
and NewSBOMScannerClient to preserve scannerClient for later cleanup.
In `@pkg/sbomscanner/v1/client.go`:
- Around line 44-55: The exponential backoff used in the health check retry
(backoff.Retry called with backoff.WithBackOff(backoff.NewExponentialBackOff()))
can run up to the default 15 minutes; create an ExponentialBackOff instance,
call WithMaxElapsedTime with a shorter duration (e.g., a few seconds or the
existing healthCheckTimeout), and pass that configured backoff to
backoff.WithBackOff so the Health retry in the client.Health loop fails faster
instead of potentially delaying startup for 15 minutes.
In `@pkg/sbomscanner/v1/server.go`:
- Around line 103-109: The retry branch mutates registryOptions.Credentials (in
the block that calls syft.GetSource with ctxWithSize and imageID), causing later
retry logic (e.g., the MANIFEST_UNKNOWN retry that also calls syft.GetSource) to
see cleared credentials; instead, make defensive copies of registryOptions
before any modification and use those copies for the downstream syft.GetSource
calls so the original registryOptions remains unchanged, and reorder/structure
the checks so MANIFEST_UNKNOWN retry (retry with imageTag) is attempted first
and the 401 retry (retry without credentials) uses a copied registryOptions with
Credentials set to nil.
- Around line 225-233: The current code silently returns nil on
json.Marshal(doc) and json.Unmarshal(b, &syftDoc) failures, which hides errors;
change those early returns to propagate the error (e.g., return nil, err) or at
minimum log the error with context using the existing logger before returning,
and ensure the returned error type matches the function signature so callers can
handle it; update the caller at the earlier call site that invokes this logic
(the site referenced in the review) to check and handle the error instead of
assuming a nil result.
- Around line 47-48: The global s.mu.Lock() at the top of CreateSBOM serializes
all scans and creates a throughput bottleneck; either remove the global lock (if
concurrent scans are safe) or replace it with a bounded-concurrency semaphore
(add a field like scanSem to the server, acquire before starting a scan and
release after) or implement per-image locking/duplication suppression (add a
perImageLocks map or use singleflight.Group keyed by image reference to prevent
duplicate scans of the same image) and update CreateSBOM to use the chosen
mechanism instead of s.mu.Lock()/s.mu.Unlock(); if the global lock is
intentional for OOM protection, add a clear comment in server.go explaining that
constraint and why.
- Around line 137-158: The deadline wrapper (dl.Run) does not actually stop
syft's cataloguing because syft.CreateSBOM is called with context.Background()
and Syft's cataloguers ignore cancellation, so a timeout won't free resources;
update the code around dl.Run and the syft.CreateSBOM invocation to avoid
launching unkillable heavy cataloguing work—either (A) do not call
syft.CreateSBOM inside dl.Run on the cancellable path (move/remove the dl.Run
wrapper around CreateSBOM) or (B) reduce work by configuring cfg to use a
minimal set of catalogers (modify cfg from syft.DefaultCreateSBOMConfig() and
avoid adding sbomcataloger.NewCataloger when req.EnableEmbeddedSboms is set, or
explicitly set catalogers to a lightweight list via
pkgcataloging.NewCatalogerReference) and add a comment noting Syft cannot be
cancelled and that long-running cataloguing is an accepted tradeoff or tracked
via a Syft issue; refer to dl.Run, syft.CreateSBOM,
syft.DefaultCreateSBOMConfig, cfg, req.EnableEmbeddedSboms,
pkgcataloging.NewCatalogerReference and sbomcataloger.NewCataloger when making
the change.
- Around line 58-62: The platform variable is computed but never used; update
the three syft.GetSource(...) calls to pass an OCI platform constructed from
req.Platform (or fallback to runtime.GOOS/runtime.GOARCH) so multi-arch images
resolve correctly. Build the platform by parsing req.Platform as "os/arch" when
present (or default to runtime.GOOS and runtime.GOARCH), create an
image.Platform instance (e.g., &image.Platform{OS: os, Architecture: arch,
Variant: variantIfProvided}), and include it in the syft GetSource options (use
the WithPlatform/WithPlatform option expected by syft/source or syft.GetSource)
so the platform is actually applied when calling syft.GetSource in server.go.
---
Nitpick comments:
In `@adapters/v1/sidecar_test.go`:
- Around line 92-117: TestSidecarSBOMAdapter_CreateSBOM_CrashRetry relies on the
adapter tracking crash retries across separate CreateSBOM calls for the same
image; update the test to include a short clarifying comment above the test or
the sequence that states this explicitly (mentioning NewSidecarSBOMAdapter,
CreateSBOM, ErrScannerCrashed, and that the third call should result in
helpersv1.TooLarge after retry exhaustion and callCount==3) so readers know the
test exercises per-image retry-exhaustion state rather than in-call retry logic.
In `@adapters/v1/sidecar.go`:
- Around line 108-114: The retry count in s.retryCount is only cleared after a
successful CreateSBOM return; modify the flow so any successful scan/response
path clears the retry count earlier (not just the CreateSBOM success branch).
Locate the code paths that return successful non-crash responses (the branch
that returns a non-crash error/response) and add delete(s.retryCount, imageID)
protected by s.mu.Lock()/Unlock() there as well (same mutex usage as the
existing clear), so all non-crash successful outcomes reset the retry counter
consistently.
In `@cmd/http/main.go`:
- Around line 76-79: Summary: When SBOM_SCANNER_SOCKET is set but connection
fails the code silently falls back to in-process Syft; add a strict-mode option
to fail fast. Update the connection error handling (the block that logs via
logger.L().Warning and assigns sbomAdapter = v1.NewSyftAdapter(...)) to read a
new SBOM_SCANNER_STRICT boolean env var (or configuration flag), parse it, and
if true return an error or exit immediately instead of falling back; if false
keep the existing fallback behavior and log that strict mode is disabled. Ensure
references to sbomAdapter and v1.NewSyftAdapter remain unchanged and that the
new env var is documented in startup logs and config handling so deployments can
enable strict failure behavior when the sidecar is required.
In `@cmd/sbom-scanner/main.go`:
- Around line 36-41: The shutdown goroutine currently calls srv.GracefulStop()
and waits indefinitely, so a second signal is ignored; modify the logic around
sigCh and srv to support a force-stop on a repeated signal by: after receiving
the first signal from sigCh and invoking srv.GracefulStop() (and removing
socketPath), start a short second-listener that if a second signal arrives calls
srv.Stop() (or os.Exit(1) if immediate process termination is preferred) to
forcefully terminate the server; reference the existing symbols sigCh,
logger.L(), srv.GracefulStop(), srv.Stop(), os.Remove and socketPath when making
this change and ensure any cleanup (socket removal) remains idempotent.
In `@pkg/sbomscanner/v1/integration_test.go`:
- Around line 77-88: The test TestIntegration_ReadyCheck currently calls
client.Close() manually at the end; make cleanup consistent with other tests by
deferring client.Close() immediately after obtaining client from
startIntegrationServer(t) (and optionally defer srv.Stop() as well) so resources
are released via defer rather than an explicit call at the test end.
In `@pkg/sbomscanner/v1/server_test.go`:
- Around line 68-95: Add more unit cases to TestNormalizeImageID to cover short
image names, digest-only IDs, and malformed inputs: include a case where imageID
is a short name like "nginx:latest" expecting the implicit
"docker.io/library/nginx:latest" (or whatever normalizeImageID's
canonicalization produces), a case where imageID is a digest-only reference like
"@sha256:abc..." with imageTag empty or set and verify the digest is
returned/merged correctly, and a case for malformed inputs (e.g., "not@@valid"
or an incomplete digest) asserting that normalizeImageID either returns the
input unchanged or an empty string/error according to current behavior; update
TestNormalizeImageID to name each scenario clearly so normalizeImageID behavior
for these edge cases is covered.
In `@pkg/sbomscanner/v1/server.go`:
- Around line 251-277: The private normalizeImageID in
pkg/sbomscanner/v1/server.go duplicates the exported NormalizeImageID in
adapters/v1; remove the local normalizeImageID function, import the adapters/v1
package, and replace all local calls to normalizeImageID with
adaptersv1.NormalizeImageID (or the actual package alias you import) so the file
uses the single exported NormalizeImageID implementation from adapters/v1;
ensure any required imports (adapters/v1, plus any digest/name packages if no
longer used) are cleaned up.
In `@pkg/sbomscanner/v1/types.go`:
- Line 25: Document the implicit zero-value behavior for the Timeout field:
update the comment for the Timeout field in the relevant struct in
pkg/sbomscanner/v1/types.go to state that a zero value causes the server to use
a default timeout of 5 minutes (as enforced by the server implementation), so
callers know the default contract when they omit or set Timeout to 0.
- Line 32: Replace the plain string Status in the SBOM scan types with a new
dedicated typed constant (e.g., type ScanStatus string) and declare the known
constants (e.g., ScanStatusLearning, ScanStatusTooLarge) so comparisons against
helpersv1.Learning / helpersv1.TooLarge map to these typed values; update the
struct field (the Status field in pkg/sbomscanner/v1/types.go) to use ScanStatus
and ensure JSON unmarshalling still works by keeping the underlying type as
string (or adding UnmarshalText/UnmarshalJSON if needed), then update any
comparisons or assignments that reference helpersv1.* to use the new ScanStatus
constants so mismatches are caught at compile time.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 7cb3931f-a87f-4aaa-969f-75b72f510b8d
⛔ Files ignored due to path filters (2)
pkg/sbomscanner/v1/proto/scanner.pb.gois excluded by!**/*.pb.gopkg/sbomscanner/v1/proto/scanner_grpc.pb.gois excluded by!**/*.pb.go
📒 Files selected for processing (12)
adapters/v1/sidecar.goadapters/v1/sidecar_test.gobuild/Dockerfilecmd/http/main.gocmd/sbom-scanner/main.gopkg/sbomscanner/v1/client.gopkg/sbomscanner/v1/integration_test.gopkg/sbomscanner/v1/proto/buf.gen.yamlpkg/sbomscanner/v1/proto/scanner.protopkg/sbomscanner/v1/server.gopkg/sbomscanner/v1/server_test.gopkg/sbomscanner/v1/types.go
| ImageTag: imageTag, | ||
| Options: options, | ||
| MaxImageSize: s.maxImageSize, | ||
| MaxSBOMSize: int32(s.maxSBOMSize), |
There was a problem hiding this comment.
Potential integer overflow when casting maxSBOMSize to int32.
s.maxSBOMSize is typed as int, which on 64-bit systems can exceed int32 max value (2,147,483,647). If a very large value is configured, this cast could silently overflow to a negative number.
🛡️ Proposed fix: validate or use int64
Option 1: Add validation in constructor:
func NewSidecarSBOMAdapter(
client sbomscanner.SBOMScannerClient,
scanTimeout time.Duration,
maxImageSize int64,
maxSBOMSize int,
scanEmbeddedSBOMs bool,
memoryLimit string,
) *SidecarSBOMAdapter {
+ if maxSBOMSize > math.MaxInt32 {
+ maxSBOMSize = math.MaxInt32
+ }
return &SidecarSBOMAdapter{Option 2: Change proto field to int64 to match Go's int on 64-bit.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@adapters/v1/sidecar.go` at line 83, The MaxSBOMSize assignment casts
s.maxSBOMSize (an int) to int32 which can overflow on 64-bit systems; update the
code that sets MaxSBOMSize (where MaxSBOMSize: int32(s.maxSBOMSize) is used) to
validate s.maxSBOMSize before casting: either ensure it's within math.MaxInt32
(return an error or clamp the value) or change the proto/field to use int64 so
no downcast is necessary; reference the MaxSBOMSize field and the s.maxSBOMSize
variable when applying the validation or type change.
| var sbomAdapter ports.SBOMCreator | ||
| if socketPath := os.Getenv("SBOM_SCANNER_SOCKET"); socketPath != "" { | ||
| logger.L().Info("connecting to SBOM scanner sidecar", helpers.String("socket", socketPath)) | ||
| scannerClient, err := sbomscanner.NewSBOMScannerClient(socketPath) | ||
| if err != nil { | ||
| logger.L().Warning("failed to connect to SBOM scanner sidecar, falling back to in-process Syft", | ||
| helpers.Error(err)) | ||
| sbomAdapter = v1.NewSyftAdapter(c.ScanTimeout, c.MaxImageSize, c.MaxSBOMSize, c.ScanEmbeddedSboms) | ||
| } else { | ||
| memoryLimit := os.Getenv("SCANNER_MEMORY_LIMIT") | ||
| sbomAdapter = v1.NewSidecarSBOMAdapter(scannerClient, c.ScanTimeout, c.MaxImageSize, c.MaxSBOMSize, c.ScanEmbeddedSboms, memoryLimit) | ||
| } | ||
| } else { | ||
| sbomAdapter = v1.NewSyftAdapter(c.ScanTimeout, c.MaxImageSize, c.MaxSBOMSize, c.ScanEmbeddedSboms) | ||
| } |
There was a problem hiding this comment.
Missing cleanup of sidecar client on shutdown.
The scannerClient connection is never closed during graceful shutdown. While the process exits cleanly, it's good practice to close gRPC connections explicitly to ensure proper resource cleanup and observability.
🛡️ Proposed fix to add client cleanup
+ var scannerClient sbomscanner.SBOMScannerClient
var sbomAdapter ports.SBOMCreator
if socketPath := os.Getenv("SBOM_SCANNER_SOCKET"); socketPath != "" {
logger.L().Info("connecting to SBOM scanner sidecar", helpers.String("socket", socketPath))
- scannerClient, err := sbomscanner.NewSBOMScannerClient(socketPath)
+ scannerClient, err = sbomscanner.NewSBOMScannerClient(socketPath)
if err != nil {Then add cleanup before exit:
// After controller.Shutdown()
if scannerClient != nil {
scannerClient.Close()
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cmd/http/main.go` around lines 72 - 86, The SBOM scanner gRPC client
(scannerClient created by NewSBOMScannerClient) is never closed on shutdown;
declare scannerClient in the outer scope so it is accessible after setup, and
ensure you close it during shutdown (e.g., after controller.Shutdown() or in the
registered shutdown/cleanup path) by checking scannerClient != nil and calling
its Close() method; update the sbom adapter setup around sbomAdapter,
NewSidecarSBOMAdapter and NewSBOMScannerClient to preserve scannerClient for
later cleanup.
| _, err = backoff.Retry(context.Background(), func() (struct{}, error) { | ||
| ctx, cancel := context.WithTimeout(context.Background(), healthCheckTimeout) | ||
| defer cancel() | ||
| resp, err := c.client.Health(ctx, &pb.HealthRequest{}) | ||
| if err != nil { | ||
| return struct{}{}, fmt.Errorf("health check failed: %w", err) | ||
| } | ||
| if !resp.Ready { | ||
| return struct{}{}, fmt.Errorf("scanner not ready") | ||
| } | ||
| return struct{}{}, nil | ||
| }, backoff.WithBackOff(backoff.NewExponentialBackOff())) |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
cenkalti/backoff/v5 NewExponentialBackOff default MaxElapsedTime
💡 Result:
The default MaxElapsedTime used with NewExponentialBackOff in github.com/cenkalti/backoff/v5 is 15 minutes. This is the value of the constant DefaultMaxElapsedTime, which is applied in the Retry function when no custom MaxElapsedTime is specified via WithMaxElapsedTime. NewExponentialBackOff itself does not set MaxElapsedTime; it configures other defaults like InitialInterval (500ms), RandomizationFactor (0.5), Multiplier (1.5), and MaxInterval (60s).
Citations:
- 1: https://pkg.go.dev/github.com/cenkalti/backoff/v5
- 2: https://github.com/cenkalti/backoff/blob/v5/retry.go
- 3: https://godoc.org/github.com/cenkalti/backoff/v5
🏁 Script executed:
# First, let's check if the repository uses cenkalti/backoff/v5 and find the exact imports
rg "github.com/cenkalti/backoff" --type go | head -20Repository: kubescape/kubevuln
Length of output: 124
🏁 Script executed:
# Check the exact code at lines 44-55 in the client.go file
cat -n pkg/sbomscanner/v1/client.go | sed -n '40,60p'Repository: kubescape/kubevuln
Length of output: 917
🏁 Script executed:
# Check go.mod to verify the backoff version
grep "cenkalti/backoff" go.modRepository: kubescape/kubevuln
Length of output: 163
🌐 Web query:
cenkalti/backoff/v5 WithMaxElapsedTime API ExponentialBackOff
💡 Result:
In github.com/cenkalti/backoff/v5, WithMaxElapsedTime is a RetryOption func that limits the total duration for retry attempts in backoff.Retry. It sets the MaxElapsedTime field in retryOptions. It is used with ExponentialBackOff (created via NewExponentialBackOff, with defaults: InitialInterval=500ms, RandomizationFactor=0.5, Multiplier=1.5, MaxInterval=60s) as follows: result, err := backoff.Retry(ctx, operation, backoff.WithBackOff(backoff.NewExponentialBackOff), backoff.WithMaxElapsedTime(5*time.Minute)) The default MaxElapsedTime is 15 minutes if not specified. Retries stop if elapsed time from start exceeds this or backoff.NextBackOff returns Stop (-1). Example from example_test.go uses WithBackOff(NewExponentialBackOff) without WithMaxElapsedTime, relying on default. Full usage: import "github.com/cenkalti/backoff/v5". Define operation func (T, error), call Retry with options.
Citations:
- 1: https://pkg.go.dev/github.com/cenkalti/backoff/v5
- 2: https://godoc.org/github.com/cenkalti/backoff/v5
- 3: https://github.com/cenkalti/backoff/blob/v5/retry.go
- 4: https://chromium.googlesource.com/external/github.com/cenkalti/backoff/%2B/refs/heads/v5/example_test.go
Unbounded exponential backoff may delay startup significantly.
backoff.NewExponentialBackOff() uses a default max elapsed time of 15 minutes. If the sidecar is slow to start or misconfigured, kubevuln could hang at startup for up to 15 minutes before failing. Set a shorter max elapsed time via WithMaxElapsedTime().
🛡️ Proposed fix to limit backoff duration
- }, backoff.WithBackOff(backoff.NewExponentialBackOff()))
+ }, backoff.WithBackOff(backoff.NewExponentialBackOff()), backoff.WithMaxElapsedTime(2*time.Minute))📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| _, err = backoff.Retry(context.Background(), func() (struct{}, error) { | |
| ctx, cancel := context.WithTimeout(context.Background(), healthCheckTimeout) | |
| defer cancel() | |
| resp, err := c.client.Health(ctx, &pb.HealthRequest{}) | |
| if err != nil { | |
| return struct{}{}, fmt.Errorf("health check failed: %w", err) | |
| } | |
| if !resp.Ready { | |
| return struct{}{}, fmt.Errorf("scanner not ready") | |
| } | |
| return struct{}{}, nil | |
| }, backoff.WithBackOff(backoff.NewExponentialBackOff())) | |
| _, err = backoff.Retry(context.Background(), func() (struct{}, error) { | |
| ctx, cancel := context.WithTimeout(context.Background(), healthCheckTimeout) | |
| defer cancel() | |
| resp, err := c.client.Health(ctx, &pb.HealthRequest{}) | |
| if err != nil { | |
| return struct{}{}, fmt.Errorf("health check failed: %w", err) | |
| } | |
| if !resp.Ready { | |
| return struct{}{}, fmt.Errorf("scanner not ready") | |
| } | |
| return struct{}{}, nil | |
| }, backoff.WithBackOff(backoff.NewExponentialBackOff(backoff.WithMaxElapsedTime(2*time.Minute)))) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/sbomscanner/v1/client.go` around lines 44 - 55, The exponential backoff
used in the health check retry (backoff.Retry called with
backoff.WithBackOff(backoff.NewExponentialBackOff())) can run up to the default
15 minutes; create an ExponentialBackOff instance, call WithMaxElapsedTime with
a shorter duration (e.g., a few seconds or the existing healthCheckTimeout), and
pass that configured backoff to backoff.WithBackOff so the Health retry in the
client.Health loop fails faster instead of potentially delaying startup for 15
minutes.
| if err != nil && strings.Contains(err.Error(), "401 Unauthorized") { | ||
| logger.L().Debug("got 401, retrying without credentials", | ||
| helpers.String("imageID", imageID)) | ||
| registryOptions.Credentials = nil | ||
| src, err = syft.GetSource(ctxWithSize, imageID, | ||
| syft.DefaultGetSourceConfig().WithRegistryOptions(®istryOptions).WithSources("registry")) | ||
| } |
There was a problem hiding this comment.
Credential retry logic has side effects that persist across retry branches.
Line 106 clears registryOptions.Credentials = nil for the 401 retry. However, if this retry also fails with MANIFEST_UNKNOWN, no further retry with the imageTag will occur because the MANIFEST_UNKNOWN check (lines 95-101) has already passed.
The retry order and side effects should be reviewed:
- First check for MANIFEST_UNKNOWN → retry with imageTag
- Then check for 401 → retry without credentials (but credentials are now nil for any subsequent logic)
Consider making defensive copies of registryOptions before modifying:
♻️ Proposed fix
if err != nil && strings.Contains(err.Error(), "401 Unauthorized") {
logger.L().Debug("got 401, retrying without credentials",
helpers.String("imageID", imageID))
- registryOptions.Credentials = nil
+ retryOpts := registryOptions
+ retryOpts.Credentials = nil
src, err = syft.GetSource(ctxWithSize, imageID,
- syft.DefaultGetSourceConfig().WithRegistryOptions(®istryOptions).WithSources("registry"))
+ syft.DefaultGetSourceConfig().WithRegistryOptions(&retryOpts).WithSources("registry"))
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if err != nil && strings.Contains(err.Error(), "401 Unauthorized") { | |
| logger.L().Debug("got 401, retrying without credentials", | |
| helpers.String("imageID", imageID)) | |
| registryOptions.Credentials = nil | |
| src, err = syft.GetSource(ctxWithSize, imageID, | |
| syft.DefaultGetSourceConfig().WithRegistryOptions(®istryOptions).WithSources("registry")) | |
| } | |
| if err != nil && strings.Contains(err.Error(), "401 Unauthorized") { | |
| logger.L().Debug("got 401, retrying without credentials", | |
| helpers.String("imageID", imageID)) | |
| retryOpts := registryOptions | |
| retryOpts.Credentials = nil | |
| src, err = syft.GetSource(ctxWithSize, imageID, | |
| syft.DefaultGetSourceConfig().WithRegistryOptions(&retryOpts).WithSources("registry")) | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/sbomscanner/v1/server.go` around lines 103 - 109, The retry branch
mutates registryOptions.Credentials (in the block that calls syft.GetSource with
ctxWithSize and imageID), causing later retry logic (e.g., the MANIFEST_UNKNOWN
retry that also calls syft.GetSource) to see cleared credentials; instead, make
defensive copies of registryOptions before any modification and use those copies
for the downstream syft.GetSource calls so the original registryOptions remains
unchanged, and reorder/structure the checks so MANIFEST_UNKNOWN retry (retry
with imageTag) is attempted first and the 401 retry (retry without credentials)
uses a copied registryOptions with Credentials set to nil.
| b, err := json.Marshal(doc) | ||
| if err != nil { | ||
| return nil | ||
| } | ||
|
|
||
| var syftDoc *v1beta1.SyftDocument | ||
| if err := json.Unmarshal(b, &syftDoc); err != nil { | ||
| return nil | ||
| } |
There was a problem hiding this comment.
Silent nil returns on serialization errors may hide data corruption.
Lines 227-228 and 231-233 return nil when JSON marshal/unmarshal fails, with no logging or error propagation. This could mask issues where SBOM data is silently dropped.
🐛 Proposed fix: Log errors or propagate
func syftToDomain(sbomSBOM sbom.SBOM) (*v1beta1.SyftDocument, error) {
doc := syftjson.ToFormatModel(sbomSBOM, syftjson.EncoderConfig{
Pretty: false,
Legacy: false,
})
b, err := json.Marshal(doc)
if err != nil {
- return nil
+ return nil, fmt.Errorf("failed to marshal syft document: %w", err)
}
var syftDoc *v1beta1.SyftDocument
if err := json.Unmarshal(b, &syftDoc); err != nil {
- return nil
+ return nil, fmt.Errorf("failed to unmarshal to v1beta1: %w", err)
}Then update the call site (line 192) to handle the error.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| b, err := json.Marshal(doc) | |
| if err != nil { | |
| return nil | |
| } | |
| var syftDoc *v1beta1.SyftDocument | |
| if err := json.Unmarshal(b, &syftDoc); err != nil { | |
| return nil | |
| } | |
| b, err := json.Marshal(doc) | |
| if err != nil { | |
| return nil, fmt.Errorf("failed to marshal syft document: %w", err) | |
| } | |
| var syftDoc *v1beta1.SyftDocument | |
| if err := json.Unmarshal(b, &syftDoc); err != nil { | |
| return nil, fmt.Errorf("failed to unmarshal to v1beta1: %w", err) | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/sbomscanner/v1/server.go` around lines 225 - 233, The current code
silently returns nil on json.Marshal(doc) and json.Unmarshal(b, &syftDoc)
failures, which hides errors; change those early returns to propagate the error
(e.g., return nil, err) or at minimum log the error with context using the
existing logger before returning, and ensure the returned error type matches the
function signature so callers can handle it; update the caller at the earlier
call site that invokes this logic (the site referenced in the review) to check
and handle the error instead of assuming a nil result.
|
Summary:
|
matthyx
left a comment
There was a problem hiding this comment.
fix the platform one, maybe have a look at the 2 Minor issues that I left open
The platform variable was computed but never passed to any of the three syft.GetSource() calls, breaking multi-arch image resolution. Parse the platform string into an image.Platform and pass it via WithPlatform(). Also document the known Syft limitation where cataloguers ignore context cancellation (anchore/syft#3705). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Ben <ben@armosec.io>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/sbomscanner/v1/server.go`:
- Around line 133-137: The generic error branch that returns a
pb.CreateSBOMResponse is missing the Status field; update the handler (the case
with err != nil) to set Status to the terminal value for incomplete operations
(e.g., pb.Status_Incomplete or the appropriate enum/constant defined in pb)
alongside ErrorMessage so the response includes a proper terminal status; locate
the CreateSBOMResponse construction in server.go and add the Status field to the
returned struct.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 4ea8e11d-a486-4f54-89ac-b8aede1260f7
📒 Files selected for processing (1)
pkg/sbomscanner/v1/server.go
| case err != nil: | ||
| return &pb.CreateSBOMResponse{ | ||
| ErrorMessage: err.Error(), | ||
| }, nil | ||
| } |
There was a problem hiding this comment.
Missing Status field for generic source-fetch errors.
When a generic error occurs (not 401 or TooLarge), the response omits Status. Per the proto definition, valid terminal statuses are "Learning", "Incomplete", "TooLarge", "Unauthorize". Other error cases (lines 126, 130) set appropriate statuses. For consistency, this case should set Incomplete.
🐛 Proposed fix
case err != nil:
return &pb.CreateSBOMResponse{
+ Status: helpersv1.Incomplete,
ErrorMessage: err.Error(),
}, nil🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/sbomscanner/v1/server.go` around lines 133 - 137, The generic error
branch that returns a pb.CreateSBOMResponse is missing the Status field; update
the handler (the case with err != nil) to set Status to the terminal value for
incomplete operations (e.g., pb.Status_Incomplete or the appropriate
enum/constant defined in pb) alongside ErrorMessage so the response includes a
proper terminal status; locate the CreateSBOMResponse construction in server.go
and add the Status field to the returned struct.
|
Summary:
|
|
thanks @slashben you can merge |
Rebase onto main (includes PR #335 sidecar adapter). New classifiers: - ErrScannerCrashed → ReasonScannerOOMKilled (via errors.Is) - context.DeadlineExceeded → ReasonScanTimeout - TooLarge + "scanner OOM" annotation → ReasonScannerOOMKilled (classifySBOMStatusWithAnnotation for sidecar crash-exhausted images) Bump armoapi-go to pick up new reason codes. 19 classifier tests + 6 adapter tests pass. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Rebase onto main (includes PR #335 sidecar adapter). New classifiers: - ErrScannerCrashed → ReasonScannerOOMKilled (via errors.Is) - context.DeadlineExceeded → ReasonScanTimeout - TooLarge + "scanner OOM" annotation → ReasonScannerOOMKilled (classifySBOMStatusWithAnnotation for sidecar crash-exhausted images) Bump armoapi-go to pick up new reason codes. 19 classifier tests + 6 adapter tests pass. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> # Conflicts: # go.mod
Summary
SBOM_SCANNER_SOCKETenv var, kubevuln uses the in-process SyftAdapter as beforeDesign
See kubevuln-sbom-sidecar-design.md — follows the same pattern as node-agent PR #753.
New files
pkg/sbomscanner/v1/proto/scanner.protopkg/sbomscanner/v1/server.gopkg/sbomscanner/v1/client.gocodes.Unavailablepkg/sbomscanner/v1/types.gocmd/sbom-scanner/main.goadapters/v1/sidecar.goSidecarSBOMAdapterimplementingports.SBOMCreatorModified files
cmd/http/main.goSBOM_SCANNER_SOCKETenv varbuild/Dockerfilekubevulnandsbom-scannerbinariesCompanion PR
Helm chart changes: kubescape/helm-charts (PR to follow)
Test plan
go test ./pkg/sbomscanner/v1/...— health, crash detection, context cancellation, normalizeImageIDErrScannerCrashed, readiness togglego test ./adapters/v1/...andgo test ./core/services/...— no regressions🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Tests
Chores