feat: subprocess SBOM creation using moby/sys/reexec + os.Pipe by kooomix · Pull Request #333 · kubescape/kubevuln

kooomix · 2026-03-05T06:13:15Z

Summary

Isolate Syft SBOM generation in a child process so that OOM kills don't crash the kubevuln pod. Uses moby/sys/reexec for clean subprocess dispatch and os.Pipe for IPC (keeping stdout/stderr free for logging).

Key improvements over naive self-re-exec:

reexec.Register() in init() — no env var checks in main()
os.Pipe (FD 3/4) — child can log to stdout/stderr without corrupting the JSON data channel
Shutdown() propagates SIGTERM to active children during graceful shutdown
Child tracking via mutex-protected slice for safe concurrent access

Feature flag: subprocessSBOM: false (default off, zero overhead)

Changes

File	What
`adapters/v1/subprocess_sbom.go`	Full reexec implementation: parent orchestration + child handler
`adapters/v1/subprocess_sbom_test.go`	11 tests including OOM simulation via real SIGKILL
`adapters/v1/SUBPROCESS_SBOM.md`	Architecture docs, config reference, error sentinels
`cmd/http/main.go`	`InitReexec()` at top + `Shutdown()` on graceful exit
`config/config.go`	`subprocessSBOM` + `subprocessSBOMMemoryLimit` fields
`go.mod`	+`github.com/moby/sys/reexec`

Test plan

TestHandleChildError_OOMKilled — child SIGKILLs itself, parent returns ErrChildOOMKilled
TestParentSurvivesChildOOMKill — parent does work + spawns new child after OOM
TestHandleChildError_NonZeroExit — exit code 1 not misclassified as OOM
TestHandleChildError_Timeout — expired context returns ErrChildTimeout
TestSubprocessSBOMCreator_ChildSucceeds — valid SBOM response parsed from child
TestSubprocessSBOMCreator_Shutdown — no panic on empty children list
TestSubprocessSBOMCreator_TrackUntrack — child tracking bookkeeping
JSON roundtrip tests for request/response types
Disabled mode falls through to SyftAdapter

Jira: SUB-7103

Summary by CodeRabbit

New Features
- SBOM generation can run in isolated subprocess workers with timeouts, memory limits, and signal-aware error reporting; falls back to the original path when disabled.
Configuration
- New options to enable subprocess SBOM creation and set per-worker memory limits and scan timeouts.
Shutdown / Lifecycle
- Subprocess workers are tracked and terminated during server graceful shutdown.
Tests
- Comprehensive tests for subprocess behaviors, IPC encoding/decoding, error propagation (OOM, signals, timeouts), and lifecycle.
Documentation
- Added operational guidance and investigation notes around subprocess SBOM behavior, OOM handling, and recommended mitigations.

Isolate SBOM generation in a child process to survive OOM kills. - Use reexec.Register("sbom-worker") in init() for clean dispatch - IPC via os.Pipe (FD 3/4), keeping stdout/stderr free for logging - SIGKILL detection → ErrChildOOMKilled sentinel error - RLIMIT_AS memory limit on child via env var - SIGTERM propagation: Shutdown() forwards to tracked children - Temp dir cleanup after child failure (stereoscope* dirs) - Feature flag: subprocessSBOM (default false, zero overhead when off) Signed-off-by: kooomix <eranm@armosec.io> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai · 2026-03-05T06:13:34Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds a subprocess-based SBOM creation path using reexec and IPC (pipes), with per-child temp dirs, RLIMIT_AS memory limits, timeouts, child lifecycle tracking, shutdown, tests, docs, config flags, and integration into HTTP server startup/shutdown.

Changes

Cohort / File(s)	Summary
Design Doc `adapters/v1/SUBPROCESS_SBOM.md`	New design doc describing subprocess SBOM creation, parent/child control flow, activation, config fields, sentinel errors, and related files.
Subprocess SBOM Implementation `adapters/v1/subprocess_sbom.go`	New exported `SubprocessSBOMCreator`, constructor, `CreateSBOM`/`Version`/`Shutdown`, reexec worker registration (`InitReexec`), parent↔child IPC via pipes, per-child temp dirs, RLIMIT_AS memory limit, timeout handling, child tracking/cleanup, and sentinel errors (`ErrChildOOMKilled`, `ErrChildSignaled`, `ErrChildTimeout`).
Tests `adapters/v1/subprocess_sbom_test.go`	Extensive tests for JSON request/response round-trips, child behaviors (OOM/signal/exit/timeout), subprocess integration, lifecycle tracking, Shutdown semantics, and disabled-fallback behavior.
Integration (HTTP / Main) `cmd/http/main.go`	Adds early reexec path, wraps Syft adapter with `NewSubprocessSBOMCreator(...)`, logs subprocess config when enabled, and calls `Shutdown()` on SBOM adapter during server shutdown.
Configuration `config/config.go`	Adds `SubprocessSBOM bool` and `SubprocessSBOMMemoryLimit int64` to config with defaults; reorders struct fields accordingly.
Deps `go.mod`	Adds dependency `github.com/moby/sys/reexec v0.1.0` for reexec worker support.
OOM Investigation / Fix Docs `oom-investigation-findings.md`, `oom-killer-fix.md`, `reproducing-oom.md`	New documentation describing OOM investigation, reproduction steps, and proposed fixes (oom_score_adj strategies, monitoring, and deployment options).

Sequence Diagram(s)

sequenceDiagram
    participant Parent as Parent Process (Server)
    participant PipeReq as Pipe FD (req)
    participant PipeResp as Pipe FD (resp)
    participant Child as Child Process (reexec worker)
    participant Syft as SyftAdapter

    Parent->>Parent: Marshal sbomWorkerRequest (JSON)
    Parent->>Parent: Create temp dir, create two pipes (req/resp)
    Parent->>Child: Spawn reexec child (ExtraFiles -> FD, env memlimit)
    Parent->>PipeReq: Write request JSON to child's FD (req)
    Parent->>Parent: Wait for response with timeout

    Child->>Child: Apply RLIMIT_AS from env (optional)
    Child->>PipeReq: Read request JSON from FD (req)
    Child->>Syft: Call CreateSBOM(...)
    alt Success
        Syft-->>Child: SBOM result
        Child->>PipeResp: Write sbomWorkerResponse JSON to FD (resp)
        PipeResp-->>Parent: Parent reads and unmarshals response
        Parent->>Parent: Return SBOM to caller
    else Child error / non-zero exit / signal
        Child-->>Parent: Exit with code/signal
        Parent->>Parent: Detect OOM/signal/exit -> map to sentinel error
    else Parent timeout
        Parent->>Child: Send SIGTERM (then SIGKILL if needed)
        Parent->>Parent: Return ErrChildTimeout
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Poem

🐰 I hopped through pipes and tiny forks,

spawned a child to mind the SBOM corks.
With JSON crumbs and a memory string,
I time the hops and guard the ring.
Hooray — tidy SBOMs ready to spring!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 36.36% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title directly and clearly describes the main technical change: implementing subprocess SBOM creation using specific technologies (moby/sys/reexec and os.Pipe), which aligns with the core objective of isolating SBOM generation in a child process.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feature/SUB-7103-subprocess-sbom-reexec

📝 Coding Plan

Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Tip

You can disable poems in the walkthrough.

Disable the reviews.poem setting to disable the poems in the walkthrough.

coderabbitai

Actionable comments posted: 5

🧹 Nitpick comments (2)

config/config.go (1)

49-60: Validate subprocessSBOMMemoryLimit after unmarshal.

A negative value is currently accepted and silently treated as “no limit” later. It’s safer to fail fast on invalid config.

🔧 Proposed patch

 import (
+	"fmt"
 	"path/filepath"
 	"time"
@@
 	var config Config
 	err = viper.Unmarshal(&config)
+	if err != nil {
+		return Config{}, err
+	}
+	if config.SubprocessSBOMMemoryLimit < 0 {
+		return Config{}, fmt.Errorf("subprocessSBOMMemoryLimit must be >= 0")
+	}
-	return config, err
+	return config, nil
 }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@config/config.go` around lines 49 - 60, After unmarshalling into the Config
struct (the variable config produced by viper.Unmarshal in config/config.go),
validate the subprocessSBOMMemoryLimit field and fail fast: if
config.SubprocessSBOMMemoryLimit is negative, return a non-nil error (e.g.,
fmt.Errorf("invalid subprocessSBOMMemoryLimit: %d",
config.SubprocessSBOMMemoryLimit)); do this immediately before returning config
so callers receive the validation error instead of silently treating negatives
as "no limit".

adapters/v1/subprocess_sbom_test.go (1)

175-196: Timeout test bypasses the real subprocess timeout path.

This test doesn’t exercise createSBOMInSubprocess, so it won’t catch regressions in child timeout enforcement or kill/reap behavior. Add an integration case that drives the real method and asserts ErrChildTimeout.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@adapters/v1/subprocess_sbom_test.go` around lines 175 - 196, The test
TestHandleChildError_Timeout is only exercising handleChildError directly and
thus misses regressions in the real subprocess timeout path; update it (or add a
new integration test) to call SubprocessSBOMCreator.createSBOMInSubprocess
instead of only handleChildError, arrange a child context that will expire
(e.g., very short timeout) while the subprocess work is running, and assert the
returned error equals ErrChildTimeout; reference SubprocessSBOMCreator,
createSBOMInSubprocess, ErrChildTimeout and keep the existing dummy
image/workload such that the test actually drives the subprocess timeout and
validates kill/reap behavior.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@adapters/v1/subprocess_sbom.go`:
- Around line 184-185: childCtx is created but not used to enforce timeout: wrap
the blocking io.ReadAll and cmd.Wait in a goroutine and use select on
childCtx.Done() vs a done channel so a hung child triggers timeout handling; on
childCtx.Done() call cmd.Process.Kill() (or cmd.Process.Signal) and return
ErrChildTimeout. Update the code paths that call io.ReadAll and cmd.Wait
(references: childCtx, cancel, cmd, io.ReadAll, cmd.Wait, ErrChildTimeout,
s.timeout, parentTimeoutBuffer) to perform the start/read/wait in a goroutine,
select for completion or timeout, ensure cancel() is still deferred, and
replicate the same pattern for the other similar block noted in the comment.
- Around line 294-311: cleanupOrphanedTempDirs is too broad and may delete other
processes' stereoscope* temp dirs; instead, change createSBOMInSubprocess to
create a unique per-child temp directory (use ioutil.TempDir or os.MkdirTemp),
set TMPDIR in the subprocess env to that directory, and remove only that
specific directory on subprocess exit/failure (do not scan os.TempDir for
stereo* entries). Remove or limit use of cleanupOrphanedTempDirs to only target
directories you explicitly created/tracked by the process (e.g., return/record
the temp path from createSBOMInSubprocess and use that exact path in defered
cleanup).
- Around line 260-262: The current check returns success when resp.SBOM is nil
(the block that returns domain.SBOM{}, nil), which hides protocol/worker bugs;
change this to return a non-nil error instead of an empty SBOM so callers know
the payload is missing. Locate the nil-check for resp.SBOM in subprocess_sbom.go
(the if resp.SBOM == nil branch) and replace the silent success with an error
return (include a clear message like "missing SBOM in subprocess response" and
wrap any underlying context if available) so consumers receive a failure rather
than an empty SBOM.

In `@adapters/v1/SUBPROCESS_SBOM.md`:
- Around line 18-41: The fenced diagram block at the top uses ``` without a
language and triggers MD040; update the opening fence for that ASCII diagram
(the triple-backtick that precedes the "┌────────────────..." block) to include
a language hint such as text (e.g., change ``` to ```text) so markdownlint sees
a language, leaving the closing fence unchanged and preserving the diagram
content (target the fenced code block containing the reexec/child-parent
diagram).

In `@cmd/http/main.go`:
- Around line 144-148: Reorder the shutdown sequence so the controller's queue
is drained before terminating SBOM subprocesses: call controller.Shutdown()
prior to sbomAdapter.Shutdown() to avoid sending SIGTERM to SBOM children while
queued controller jobs may still start; update the shutdown order in the
function where sbomAdapter.Shutdown() and controller.Shutdown() are invoked so
controller finishes draining before SBOM processes are killed.

---

Nitpick comments:
In `@adapters/v1/subprocess_sbom_test.go`:
- Around line 175-196: The test TestHandleChildError_Timeout is only exercising
handleChildError directly and thus misses regressions in the real subprocess
timeout path; update it (or add a new integration test) to call
SubprocessSBOMCreator.createSBOMInSubprocess instead of only handleChildError,
arrange a child context that will expire (e.g., very short timeout) while the
subprocess work is running, and assert the returned error equals
ErrChildTimeout; reference SubprocessSBOMCreator, createSBOMInSubprocess,
ErrChildTimeout and keep the existing dummy image/workload such that the test
actually drives the subprocess timeout and validates kill/reap behavior.

In `@config/config.go`:
- Around line 49-60: After unmarshalling into the Config struct (the variable
config produced by viper.Unmarshal in config/config.go), validate the
subprocessSBOMMemoryLimit field and fail fast: if
config.SubprocessSBOMMemoryLimit is negative, return a non-nil error (e.g.,
fmt.Errorf("invalid subprocessSBOMMemoryLimit: %d",
config.SubprocessSBOMMemoryLimit)); do this immediately before returning config
so callers receive the validation error instead of silently treating negatives
as "no limit".

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4d4595ab-97ff-4400-acb2-9403129c5920

📥 Commits

Reviewing files that changed from the base of the PR and between 736e887 and 1eae4ed.

⛔ Files ignored due to path filters (1)

go.sum is excluded by !**/*.sum

📒 Files selected for processing (6)

adapters/v1/SUBPROCESS_SBOM.md
adapters/v1/subprocess_sbom.go
adapters/v1/subprocess_sbom_test.go
cmd/http/main.go
config/config.go
go.mod

github-actions · 2026-03-05T07:06:20Z

Summary:

License scan: failure
Credentials scan: failure
Vulnerabilities scan: failure
Unit test: success
Go linting: failure

…eorder shutdown - Use goroutine+select to enforce parent timeout on hung child (Critical) - Return error when worker response has nil SBOM (Major) - Create per-child TMPDIR instead of broad stereoscope* cleanup (Major) - Reorder shutdown: drain controller queue before killing SBOM workers (Major) - Add language hint to markdown fenced code block (Minor) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: kooomix <eranm@armosec.io>

coderabbitai

Actionable comments posted: 1

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 735f05ac-9229-4d44-9972-d3f054dbdc35

📥 Commits

Reviewing files that changed from the base of the PR and between 1eae4ed and 47ff4a2.

📒 Files selected for processing (3)

adapters/v1/SUBPROCESS_SBOM.md
adapters/v1/subprocess_sbom.go
cmd/http/main.go

🚧 Files skipped from review as they are similar to previous changes (1)

adapters/v1/SUBPROCESS_SBOM.md

github-actions · 2026-03-05T07:28:00Z

Summary:

License scan: failure
Credentials scan: failure
Vulnerabilities scan: failure
Unit test: success
Go linting: failure

When childCtx.Done() fires, check whether it was DeadlineExceeded (return ErrChildTimeout) or context.Canceled from parent shutdown (return wrapped context.Canceled). This prevents misclassifying graceful shutdown as a scan timeout. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: kooomix <eranm@armosec.io>

coderabbitai

🧹 Nitpick comments (3)

adapters/v1/subprocess_sbom.go (3)

390-398: Silent failure on memory limit parse error reduces observability.

If the KUBEVULN_SBOM_WORKER_MEMLIMIT environment variable contains an invalid value, the function silently returns without applying any limit. While invalid values are unlikely (parent controls the env var), a warning message would aid debugging misconfiguration scenarios.

🔧 Suggested improvement

 func applyMemoryLimit() {
 	limitStr := os.Getenv("KUBEVULN_SBOM_WORKER_MEMLIMIT")
 	if limitStr == "" {
 		return
 	}
 	var limit int64
 	if _, err := fmt.Sscanf(limitStr, "%d", &limit); err != nil || limit <= 0 {
+		fmt.Fprintf(os.Stderr, "warning: invalid memory limit value %q, skipping\n", limitStr)
 		return
 	}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@adapters/v1/subprocess_sbom.go` around lines 390 - 398, The applyMemoryLimit
function silently returns when parsing KUBEVULN_SBOM_WORKER_MEMLIMIT fails;
update applyMemoryLimit to log a warning via your logger (or
fmt.Println/processLogger) when fmt.Sscanf on KUBEVULN_SBOM_WORKER_MEMLIMIT
returns an error or a non-positive limit before returning so invalid env values
are observable; keep the existing early-return behavior but ensure the warning
references the env var name and the parse error/value for debugging.

167-178: Pipe file descriptors are closed twice on the success path.

The defers at lines 171 and 178 close reqReader and respWriter, but lines 229-230 also explicitly close them after cmd.Start() succeeds. While os.File.Close() is safe to call multiple times (returns EINVAL on the second call), this is a code smell that could mask real issues.

🔧 Suggested fix: nil-guard the defers

 	reqReader, reqWriter, err := os.Pipe()
 	if err != nil {
 		return domain.SBOM{}, fmt.Errorf("failed to create request pipe: %w", err)
 	}
-	defer reqReader.Close()
+	defer func() {
+		if reqReader != nil {
+			reqReader.Close()
+		}
+	}()

 	respReader, respWriter, err := os.Pipe()
 	if err != nil {
 		reqWriter.Close()
 		return domain.SBOM{}, fmt.Errorf("failed to create response pipe: %w", err)
 	}
-	defer respWriter.Close()
+	defer func() {
+		if respWriter != nil {
+			respWriter.Close()
+		}
+	}()

Then at lines 229-230:

 	// Close the parent's copy of the child's read/write ends.
 	reqReader.Close()
+	reqReader = nil
 	respWriter.Close()
+	respWriter = nil

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@adapters/v1/subprocess_sbom.go` around lines 167 - 178, The pipe file
descriptors (reqReader, reqWriter, respReader, respWriter) are being closed
twice because there are unconditional defers and then explicit Close calls after
cmd.Start(); update the cleanup to avoid double-closing by either (a) changing
the defers to guard with nil checks (e.g., defer func() { if reqReader != nil {
reqReader.Close() } }()), or (b) keep the defers but set the variables to nil
immediately after you explicitly Close() them post cmd.Start() so the deferred
closures become no-ops; apply this change around the reqReader/reqWriter and
respReader/respWriter usage near the cmd.Start() call in subprocess_sbom.go.

377-380: Response includes SBOM pointer even when there's an error.

When CreateSBOM fails, resp.SBOM is still set to &sbom (which may be a zero-value). While the parent correctly checks resp.Error first (line 296-298), it's cleaner to only set SBOM on success to make the contract explicit.

🔧 Suggested clarification

-	resp := sbomWorkerResponse{SBOM: &sbom}
+	var resp sbomWorkerResponse
 	if err != nil {
 		resp.Error = err.Error()
+	} else {
+		resp.SBOM = &sbom
 	}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@adapters/v1/subprocess_sbom.go` around lines 377 - 380, The response
currently sets resp := sbomWorkerResponse{SBOM: &sbom} before checking err, so
even on CreateSBOM failure the SBOM pointer is non-nil; change the flow in the
sbom worker (where sbomWorkerResponse, resp and sbom are used) to only assign
resp.SBOM = &sbom when err == nil (or explicitly set resp.SBOM = nil when err !=
nil) after the CreateSBOM call and error check, ensuring resp.Error contains the
error string and SBOM is only populated on success.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@adapters/v1/subprocess_sbom.go`:
- Around line 390-398: The applyMemoryLimit function silently returns when
parsing KUBEVULN_SBOM_WORKER_MEMLIMIT fails; update applyMemoryLimit to log a
warning via your logger (or fmt.Println/processLogger) when fmt.Sscanf on
KUBEVULN_SBOM_WORKER_MEMLIMIT returns an error or a non-positive limit before
returning so invalid env values are observable; keep the existing early-return
behavior but ensure the warning references the env var name and the parse
error/value for debugging.
- Around line 167-178: The pipe file descriptors (reqReader, reqWriter,
respReader, respWriter) are being closed twice because there are unconditional
defers and then explicit Close calls after cmd.Start(); update the cleanup to
avoid double-closing by either (a) changing the defers to guard with nil checks
(e.g., defer func() { if reqReader != nil { reqReader.Close() } }()), or (b)
keep the defers but set the variables to nil immediately after you explicitly
Close() them post cmd.Start() so the deferred closures become no-ops; apply this
change around the reqReader/reqWriter and respReader/respWriter usage near the
cmd.Start() call in subprocess_sbom.go.
- Around line 377-380: The response currently sets resp :=
sbomWorkerResponse{SBOM: &sbom} before checking err, so even on CreateSBOM
failure the SBOM pointer is non-nil; change the flow in the sbom worker (where
sbomWorkerResponse, resp and sbom are used) to only assign resp.SBOM = &sbom
when err == nil (or explicitly set resp.SBOM = nil when err != nil) after the
CreateSBOM call and error check, ensuring resp.Error contains the error string
and SBOM is only populated on success.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a9aaf533-75ba-4451-9428-3b30d56f0157

📥 Commits

Reviewing files that changed from the base of the PR and between 47ff4a2 and 07d2dca.

📒 Files selected for processing (1)

adapters/v1/subprocess_sbom.go

github-actions · 2026-03-05T08:20:33Z

Summary:

License scan: failure
Credentials scan: failure
Vulnerabilities scan: failure
Unit test: success
Go linting: failure

… subprocess The parent's childCtx was derived from the HTTP request context. When the handler returned 200, the request context was canceled, immediately killing the child process before it could do any work. Switch to context.Background() so the subprocess has its own independent lifecycle with timeout enforcement via the creator's configured timeout + parentTimeoutBuffer. Graceful shutdown is handled separately via Shutdown(). Add 4 unit tests covering the pre-canceled context bug, own-timeout enforcement, disabled-mode delegation, and generic error fallthrough. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai

🧹 Nitpick comments (1)

adapters/v1/subprocess_sbom.go (1)
171-178: Double-close of pipe file descriptors on success path.

reqReader and respWriter are closed twice: once by the defers at lines 171/178, and again explicitly at lines 229/230 after cmd.Start() succeeds. While os.File.Close() is safe to call multiple times (returns an error on second call that's discarded), this is a minor code smell.

Consider removing the defers and consolidating cleanup in a helper or restructuring so each close happens exactly once.
♻️ One possible fix pattern
 	reqReader, reqWriter, err := os.Pipe()
 	if err != nil {
 		return domain.SBOM{}, fmt.Errorf("failed to create request pipe: %w", err)
 	}
-	defer reqReader.Close()
 
 	respReader, respWriter, err := os.Pipe()
 	if err != nil {
 		reqWriter.Close()
+		reqReader.Close()
 		return domain.SBOM{}, fmt.Errorf("failed to create response pipe: %w", err)
 	}
-	defer respWriter.Close()
Then ensure all early-return error paths close the appropriate pipe ends.
Also applies to: 228-230
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@adapters/v1/subprocess_sbom.go` around lines 171 - 178, The defers closing
reqReader and respWriter cause double-closes because those same descriptors are
explicitly closed after cmd.Start() (see reqReader, respWriter, reqWriter and
the cmd.Start() success path); remove the redundant defers and consolidate pipe
cleanup into a single place (e.g., a small helper or a single defer that runs
after successful cmd.Start()) so each os.Pipe end is closed exactly once, and
ensure all early-return/error paths still close the appropriate pipe ends (close
reqWriter on pipe-creation errors, close respReader/respWriter as needed) to
avoid leaks.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@adapters/v1/subprocess_sbom.go`:
- Around line 171-178: The defers closing reqReader and respWriter cause
double-closes because those same descriptors are explicitly closed after
cmd.Start() (see reqReader, respWriter, reqWriter and the cmd.Start() success
path); remove the redundant defers and consolidate pipe cleanup into a single
place (e.g., a small helper or a single defer that runs after successful
cmd.Start()) so each os.Pipe end is closed exactly once, and ensure all
early-return/error paths still close the appropriate pipe ends (close reqWriter
on pipe-creation errors, close respReader/respWriter as needed) to avoid leaks.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ba5e622e-c78b-4871-8499-d3d7854dab11

📥 Commits

Reviewing files that changed from the base of the PR and between 07d2dca and 91c0d46.

📒 Files selected for processing (2)

adapters/v1/subprocess_sbom.go
adapters/v1/subprocess_sbom_test.go

github-actions · 2026-03-09T16:58:27Z

Summary:

License scan: failure
Credentials scan: failure
Vulnerabilities scan: failure
Unit test: success
Go linting: failure

slashben · 2026-03-11T08:46:58Z

+# Subprocess SBOM Creation
+
+## Problem
+
+When kubevuln scans large container images, the Syft SBOM generation step can consume
+unbounded memory. If the Go runtime or the kernel OOM-killer terminates the process,
+the entire kubevuln pod crashes — losing all in-flight scans and requiring a restart.
+
+## Solution
+
+`SubprocessSBOMCreator` isolates SBOM generation in a **child process** using
+[moby/sys/reexec](https://pkg.go.dev/github.com/moby/sys/reexec). The parent
+(kubevuln main process) survives even if the child is OOM-killed, and reports a
+structured error instead of crashing.
+
+## How It Works
+
+```text
+┌─────────────────────────────────────────────────────┐
+│  kubevuln (parent)                                  │
+│                                                     │
+│  1. Serialize scan request as JSON                  │


@kooomix based on this design, will we have multiple child processes in the same time? (parallel child process scans). I am ok with the option, but to be on the safe side I wouldn't run parallel scans for now (and since it is not time sensitive)

Signed-off-by: Ben <ben@armosec.io>

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (2)

oom-killer-fix.md (1)

37-47: Show error-checked examples for /proc/*/oom_score_adj writes.

The sample code writes to procfs without handling errors. In practice these writes can fail (permissions/capabilities), so the example should model explicit error checks and logging to avoid copy-paste unsafe patterns.

Also applies to: 52-54

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@oom-killer-fix.md` around lines 37 - 47, The examples call os.WriteFile for
"/proc/self/oom_score_adj" and fmt.Sprintf("/proc/%d/oom_score_adj", childPID)
without checking errors; update both uses so the return error is captured and
handled (e.g., check err != nil and log or return it) instead of discarding it.
Locate the os.WriteFile calls (the self-adjustment and the per-child write using
childPID) and wrap each write with proper error handling: assign the error, log
a descriptive message including the target path and error, and decide whether to
continue or fail based on the context (e.g., log.Warn on inability to set child
OOM target, log.Error/exit if protecting the orchestrator must be enforced).
Ensure you reference the same os.WriteFile calls and childPID formatting when
adding the checks.

reproducing-oom.md (1)

32-39: Add a rollback step for memory limit changes.

Line 32 modifies deployment resources but the doc does not include a revert command. Adding a short rollback step will prevent accidental long-lived degraded limits in shared environments.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@reproducing-oom.md` around lines 32 - 39, Add a brief rollback step after the
memory-limit change that tells operators how to revert the kubevuln Deployment
in the kubescape namespace back to its original memory limits or to undo the
change; reference the kubevuln Deployment and suggest using either kubectl
rollout undo for the Deployment or re-patching the
/spec/template/spec/containers/0/resources/limits/memory and /requests/memory
paths to their prior values so the temporary 512Mi change is not left in shared
environments.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@oom-investigation-findings.md`:
- Around line 27-30: The Markdown fenced code blocks shown (e.g., the block
containing "Pod-level cgroup: memory.oom.group = 0" and the other blocks at the
ranges called out) lack language identifiers causing MD040 failures; update each
triple-backtick fence to include an appropriate language tag (for example `text`
or `bash`) for the blocks at the cited ranges (27-30, 40-48, 56-58, 62-68) so
markdownlint passes and the blocks render with correct syntax highlighting.

In `@oom-killer-fix.md`:
- Around line 20-27: The paragraph incorrectly states that the kernel OOM
selects a single process and that memory.oom.group defaults to per-process
behavior, which contradicts the investigation in oom-investigation-findings.md
showing container-level group kills and capability constraints; update
oom-killer-fix.md to remove or reword the misleading claims, explicitly state
the correct sequence (kernel kills highest-RSS process like sbom-worker, but PID
1 behavior—kubevuln—can cause the pod to be restarted and the cgroup to be
reaped by the orchestrator) and reference memory.oom.group behavior and
capability limitations as documented in oom-investigation-findings.md (mention
sbom-worker, kubevuln, PID 1, and oom_score) so readers are guided to the actual
root cause and not toward fixes that assume per-process OOM behavior.

---

Nitpick comments:
In `@oom-killer-fix.md`:
- Around line 37-47: The examples call os.WriteFile for
"/proc/self/oom_score_adj" and fmt.Sprintf("/proc/%d/oom_score_adj", childPID)
without checking errors; update both uses so the return error is captured and
handled (e.g., check err != nil and log or return it) instead of discarding it.
Locate the os.WriteFile calls (the self-adjustment and the per-child write using
childPID) and wrap each write with proper error handling: assign the error, log
a descriptive message including the target path and error, and decide whether to
continue or fail based on the context (e.g., log.Warn on inability to set child
OOM target, log.Error/exit if protecting the orchestrator must be enforced).
Ensure you reference the same os.WriteFile calls and childPID formatting when
adding the checks.

In `@reproducing-oom.md`:
- Around line 32-39: Add a brief rollback step after the memory-limit change
that tells operators how to revert the kubevuln Deployment in the kubescape
namespace back to its original memory limits or to undo the change; reference
the kubevuln Deployment and suggest using either kubectl rollout undo for the
Deployment or re-patching the
/spec/template/spec/containers/0/resources/limits/memory and /requests/memory
paths to their prior values so the temporary 512Mi change is not left in shared
environments.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9e97ebee-3353-482d-8a72-dc33eeb4e6a4

📥 Commits

Reviewing files that changed from the base of the PR and between 91c0d46 and 3e841e9.

📒 Files selected for processing (3)

oom-investigation-findings.md
oom-killer-fix.md
reproducing-oom.md

coderabbitai · 2026-03-16T18:58:33Z

+```
+Pod-level cgroup:       memory.oom.group = 0   (per-process selection)
+Container-level cgroup: memory.oom.group = 1   (kills ALL processes together)
+```


⚠️ Potential issue | 🟡 Minor

Specify fence languages for code blocks.

These fenced blocks are missing language identifiers (MD040), which will keep markdownlint failing. Please label them (text, bash, etc.).

Also applies to: 40-48, 56-58, 62-68

🧰 Tools

🪛 markdownlint-cli2 (0.21.0)

[warning] 27-27: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@oom-investigation-findings.md` around lines 27 - 30, The Markdown fenced code blocks shown (e.g., the block containing "Pod-level cgroup: memory.oom.group = 0" and the other blocks at the ranges called out) lack language identifiers causing MD040 failures; update each triple-backtick fence to include an appropriate language tag (for example `text` or `bash`) for the blocks at the cited ranges (27-30, 40-48, 56-58, 62-68) so markdownlint passes and the blocks render with correct syntax highlighting.

coderabbitai · 2026-03-16T18:58:33Z

+The Linux kernel's OOM killer **is** correctly killing only the child process with the highest RSS. The pod restart happens because of a chain reaction:
+
+1. Kernel OOM killer targets the sbom-worker (highest `oom_score` due to highest RSS)
+2. sbom-worker is killed with SIGKILL
+3. kubevuln (PID 1) either: crashes when it detects the child died, exits because it doesn't handle child death gracefully, or propagates the error and terminates
+4. kubelet detects PID 1 has exited → restarts the pod per `restartPolicy`
+
+The cgroup itself is **not** being killed as a whole — the orchestrator is simply not surviving the child's death.


⚠️ Potential issue | 🟠 Major

Root-cause guidance here contradicts current investigation findings.

This section asserts single-process OOM selection and memory.oom.group default behavior that conflicts with the findings documented in oom-investigation-findings.md (container-level group kill behavior and capability constraints). As written, this can drive engineers toward a fix path that won’t work in the current deployment model.

Also applies to: 31-58, 98-101

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@oom-killer-fix.md` around lines 20 - 27, The paragraph incorrectly states that the kernel OOM selects a single process and that memory.oom.group defaults to per-process behavior, which contradicts the investigation in oom-investigation-findings.md showing container-level group kills and capability constraints; update oom-killer-fix.md to remove or reword the misleading claims, explicitly state the correct sequence (kernel kills highest-RSS process like sbom-worker, but PID 1 behavior—kubevuln—can cause the pod to be restarted and the cgroup to be reaped by the orchestrator) and reference memory.oom.group behavior and capability limitations as documented in oom-investigation-findings.md (mention sbom-worker, kubevuln, PID 1, and oom_score) so readers are guided to the actual root cause and not toward fixes that assume per-process OOM behavior.

github-actions · 2026-03-16T19:08:14Z

Summary:

License scan: failure
Credentials scan: failure
Vulnerabilities scan: failure
Unit test: success
Go linting: failure

Signed-off-by: Ben <ben@armosec.io>

github-actions · 2026-03-16T19:19:00Z

Summary:

License scan: failure
Credentials scan: failure
Vulnerabilities scan: failure
Unit test: success
Go linting: failure

matthyx · 2026-03-29T08:49:13Z

superseded by #335

coderabbitai Bot reviewed Mar 5, 2026

View reviewed changes

Comment thread adapters/v1/subprocess_sbom.go Outdated

Comment thread adapters/v1/subprocess_sbom.go

Comment thread adapters/v1/subprocess_sbom.go Outdated

Comment thread adapters/v1/SUBPROCESS_SBOM.md Outdated

Comment thread cmd/http/main.go Outdated

coderabbitai Bot reviewed Mar 5, 2026

View reviewed changes

Comment thread adapters/v1/subprocess_sbom.go

coderabbitai Bot reviewed Mar 5, 2026

View reviewed changes

coderabbitai Bot reviewed Mar 9, 2026

View reviewed changes

slashben reviewed Mar 11, 2026

View reviewed changes

matthyx added this to KS PRs tracking Mar 16, 2026

Adding findings

3e841e9

Signed-off-by: Ben <ben@armosec.io>

coderabbitai Bot reviewed Mar 16, 2026

View reviewed changes

Changing recomendation

1802a8c

Signed-off-by: Ben <ben@armosec.io>

matthyx moved this to WIP in KS PRs tracking Mar 17, 2026

matthyx closed this Mar 29, 2026

matthyx moved this from WIP to To Archive in KS PRs tracking Mar 29, 2026

matthyx deleted the feature/SUB-7103-subprocess-sbom-reexec branch April 20, 2026 08:09

Conversation

kooomix commented Mar 5, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Mar 5, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions Bot commented Mar 5, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Mar 5, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Mar 9, 2026

Uh oh!

slashben Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Mar 16, 2026

Uh oh!

github-actions Bot commented Mar 16, 2026

Uh oh!

matthyx commented Mar 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kooomix commented Mar 5, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Mar 5, 2026 •

edited

Loading