strip unnecessary fields from SBOM to reduce size by matthyx · Pull Request #327 · kubescape/kubevuln

matthyx · 2026-02-12T07:53:40Z

this is the reimplementation of kubescape/node-agent#720 in a way we can move to a shared library between node-agent and kubevuln (maybe via storage)
results on gitlab/gitlab-ee:latest:

-rw-r--r-- 1 mbertschy mbertschy  83M Feb 12 08:42 gitlab.json
-rw-r--r-- 1 mbertschy mbertschy  60M Feb 12 08:43 gitlab-stripped.json

Summary by CodeRabbit

Bug Fixes
- Enhanced SBOM payload handling by stripping unnecessary metadata before processing to improve performance.
Chores
- Upgraded Go runtime to version 1.25.0.
- Updated multiple core dependencies including Kubernetes, Prometheus, and storage libraries to latest versions.

coderabbitai · 2026-02-12T07:53:58Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between eba7826 and 73d53c3.

⛔ Files ignored due to path filters (1)

go.sum is excluded by !**/*.sum

📒 Files selected for processing (6)

adapters/v1/syft.go
adapters/v1/testdata/alpine-embedded-sbom.json
adapters/v1/testdata/alpine-sbom.format.json
adapters/v1/testdata/hello-world-sbom.format.json
adapters/v1/testdata/stretch-slim-sbom.format.json
go.mod

🚧 Files skipped from review as they are similar to previous changes (1)

adapters/v1/syft.go

📝 Walkthrough

Walkthrough

This PR adds SBOM payload stripping functionality by invoking v1beta1.StripSBOM() before size calculation in the syft adapter and updates test fixture files to reflect stripped SBOM payloads with minimal metadata. Go version and multiple dependencies are also updated.

Changes

Cohort / File(s)	Summary
SBOM Stripping Implementation `adapters/v1/syft.go`	Adds import for `github.com/kubescape/storage/pkg/apis/softwarecomposition/v1beta1` and invokes `v1beta1.StripSBOM()` on syft SBOM before size calculation to reduce payload.
Test Data Fixtures `adapters/v1/testdata/alpine-embedded-sbom.json`, `adapters/v1/testdata/alpine-sbom.format.json`, `adapters/v1/testdata/hello-world-sbom.format.json`, `adapters/v1/testdata/stretch-slim-sbom.format.json`	Updates test SBOM files to reflect stripped payloads: clears artifact provenance fields (`foundBy`, `accessPath`, `annotations`), sets source metadata to null, removes descriptor configuration blocks, and replaces detailed metadata with null values.
Dependency Updates `go.mod`	Updates Go version from 1.24.1 to 1.25.0; bumps multiple dependencies including `kubescape/storage` (v0.0.220 → v0.0.246), Kubernetes modules (k8s.io/apimachinery, k8s.io/client-go to v0.35.0), Prometheus modules, and various other packages to newer versions.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 A rabbit hops through stripped-down data flows,
Where metadata melts and each payload grows light,
With StripSBOM magic and versions uptight,
The dependencies dance in fresh spring sunlight! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main objective of the PR: stripping unnecessary fields from SBOMs to reduce file size, which aligns with the code changes.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch strip

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

go.mod (1)
32-32: Storage dependency pinned to a pseudo-version (unreleased commit).

v0.0.246-0.20260223110517-7f69ac32ce07 points to a specific commit rather than a tagged release. This is fine during development but should be updated to a proper tagged release before merging to main, to ensure reproducible builds and clear versioning.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@go.mod` at line 32, The go.mod entry for github.com/kubescape/storage is
pinned to a pseudo-version (github.com/kubescape/storage
v0.0.246-0.20260223110517-7f69ac32ce07); replace this pseudo-version with the
appropriate tagged release (e.g., v0.0.246 or newer official tag) in go.mod,
then run the module update commands (go get <module>@<tag> and go mod tidy) to
regenerate go.sum so the project uses a proper released version instead of an
unreleased commit.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@adapters/v1/grype.go`:
- Around line 227-231: The call to v1beta1.StripSBOM is currently invoked before
checking the error from domainToSyft, which can cause a nil dereference if
domainToSyft returned an error; move the v1beta1.StripSBOM(s) invocation to
after the err check (i.e., call domainToSyft(*sbom.Content) into s, check if err
!= nil and return the error, then call v1beta1.StripSBOM(s)), updating the code
paths that use s afterwards (function names: domainToSyft and
v1beta1.StripSBOM).

---

Nitpick comments:
In `@go.mod`:
- Line 32: The go.mod entry for github.com/kubescape/storage is pinned to a
pseudo-version (github.com/kubescape/storage
v0.0.246-0.20260223110517-7f69ac32ce07); replace this pseudo-version with the
appropriate tagged release (e.g., v0.0.246 or newer official tag) in go.mod,
then run the module update commands (go get <module>@<tag> and go mod tidy) to
regenerate go.sum so the project uses a proper released version instead of an
unreleased commit.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 852cec9 and 72ddc33.

⛔ Files ignored due to path filters (1)

go.sum is excluded by !**/*.sum

📒 Files selected for processing (9)

adapters/v1/grype.go
adapters/v1/syft.go
adapters/v1/testdata/alpine-cve.format.json
adapters/v1/testdata/alpine-embedded-sbom.json
adapters/v1/testdata/alpine-sbom.format.json
adapters/v1/testdata/hello-world-sbom.format.json
adapters/v1/testdata/nginx-filtered-cve.format.json
adapters/v1/testdata/stretch-slim-sbom.format.json
go.mod

🚧 Files skipped from review as they are similar to previous changes (1)

adapters/v1/testdata/hello-world-sbom.format.json

coderabbitai · 2026-02-23T11:38:36Z

 	s, err := domainToSyft(*sbom.Content)
+	v1beta1.StripSBOM(s)
 	if err != nil {
 		return domain.CVEManifest{}, err
 	}


⚠️ Potential issue | 🔴 Critical

Critical: StripSBOM called before error check — potential nil pointer panic.

domainToSyft returns (s, err). If it fails, s is likely nil, and v1beta1.StripSBOM(s) will dereference a nil pointer, causing a panic. Move the call after the error check.

🐛 Proposed fix

s, err := domainToSyft(*sbom.Content) - v1beta1.StripSBOM(s) if err != nil { return domain.CVEManifest{}, err } + v1beta1.StripSBOM(s)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

s, err := domainToSyft(*sbom.Content)

v1beta1.StripSBOM(s)

if err != nil {

return domain.CVEManifest{}, err

}

s, err := domainToSyft(*sbom.Content)

if err != nil {

return domain.CVEManifest{}, err

}

v1beta1.StripSBOM(s)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@adapters/v1/grype.go` around lines 227 - 231, The call to v1beta1.StripSBOM is currently invoked before checking the error from domainToSyft, which can cause a nil dereference if domainToSyft returned an error; move the v1beta1.StripSBOM(s) invocation to after the err check (i.e., call domainToSyft(*sbom.Content) into s, check if err != nil and return the error, then call v1beta1.StripSBOM(s)), updating the code paths that use s afterwards (function names: domainToSyft and v1beta1.StripSBOM).

github-actions · 2026-02-23T11:46:54Z

Summary:

License scan: failure
Credentials scan: failure
Vulnerabilities scan: failure
Unit test: success
Go linting: failure

github-actions · 2026-02-23T20:33:22Z

Summary:

License scan: failure
Credentials scan: failure
Vulnerabilities scan: failure
Unit test: success
Go linting: failure

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@adapters/v1/domain_to_armo.go`:
- Around line 52-57: The current code in the conversion function unconditionally
swallows any json.Unmarshal error for src.Target into imageMetadata and returns
nil, but we should only skip parsing when Target is a plain JSON string or null;
change the logic in the function that handles src.Target (where json.Unmarshal
and imageMetadata are used) to first inspect src.Target bytes: if src.Target is
nil, json.Valid(src.Target) is false but treat as null, or check the first
non-whitespace byte to see if it is '"' (string) or 'n' (null) and only then
return target, nil; otherwise, return the actual unmarshal error so
corrupted/malformed SBOM metadata surfaces. Keep the rest of the flow (the
returned target variable and callers using IsImageTarget()) unchanged.
- Around line 329-332: In ParseImageManifest, don't silently swallow
json.Unmarshal errors: inspect grypeDocument.Source.Target before returning nil,
nil — if Target is empty or only JSON null/whitespace, return (nil, nil) to
indicate stripped metadata; otherwise propagate the unmarshal error so callers
can detect malformed JSON. Update the block around
json.Unmarshal(grypeDocument.Source.Target, &rawManifest) to check trimmed
Target bytes for empty/"null" and only return (nil, nil) in that case; for any
other unmarshal failure, return the error (referencing ParseImageManifest,
grypeDocument.Source.Target, and rawManifest).

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2cdbb06 and eba7826.

⛔ Files ignored due to path filters (1)

go.sum is excluded by !**/*.sum

📒 Files selected for processing (7)

adapters/v1/domain_to_armo.go
adapters/v1/syft.go
adapters/v1/testdata/alpine-embedded-sbom.json
adapters/v1/testdata/alpine-sbom.format.json
adapters/v1/testdata/hello-world-sbom.format.json
adapters/v1/testdata/stretch-slim-sbom.format.json
go.mod

🚧 Files skipped from review as they are similar to previous changes (2)

adapters/v1/testdata/stretch-slim-sbom.format.json
adapters/v1/syft.go

coderabbitai · 2026-02-24T07:07:25Z

+		if err := json.Unmarshal(src.Target, &imageMetadata); err != nil {
+			// When source metadata has been stripped (e.g. by StripSBOM),
+			// the target may be a plain string (image name) instead of ImageMetadata.
+			// In that case, return a target with no image metadata — callers
+			// already guard with IsImageTarget() before using it.
+			return target, nil


⚠️ Potential issue | 🟠 Major

Don’t swallow malformed image metadata; only skip when the target is string/null.

Returning nil for any unmarshal error hides corrupted SBOMs and silently drops layer context. Consider falling back only when Target is a plain string (or null), otherwise surface the error.

Proposed fix

- if err := json.Unmarshal(src.Target, &imageMetadata); err != nil { - // When source metadata has been stripped (e.g. by StripSBOM), - // the target may be a plain string (image name) instead of ImageMetadata. - // In that case, return a target with no image metadata — callers - // already guard with IsImageTarget() before using it. - return target, nil - } + if err := json.Unmarshal(src.Target, &imageMetadata); err != nil { + // When source metadata has been stripped, the target may be a plain string (or null). + var imageName *string + if err2 := json.Unmarshal(src.Target, &imageName); err2 == nil { + return target, nil + } + return target, fmt.Errorf("failed to unmarshal image target: %w", err) + }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if err := json.Unmarshal(src.Target, &imageMetadata); err != nil {

// When source metadata has been stripped (e.g. by StripSBOM),

// the target may be a plain string (image name) instead of ImageMetadata.

// In that case, return a target with no image metadata — callers

// already guard with IsImageTarget() before using it.

return target, nil

if err := json.Unmarshal(src.Target, &imageMetadata); err != nil {

// When source metadata has been stripped, the target may be a plain string (or null).

var imageName *string

if err2 := json.Unmarshal(src.Target, &imageName); err2 == nil {

return target, nil

}

return target, fmt.Errorf("failed to unmarshal image target: %w", err)

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@adapters/v1/domain_to_armo.go` around lines 52 - 57, The current code in the conversion function unconditionally swallows any json.Unmarshal error for src.Target into imageMetadata and returns nil, but we should only skip parsing when Target is a plain JSON string or null; change the logic in the function that handles src.Target (where json.Unmarshal and imageMetadata are used) to first inspect src.Target bytes: if src.Target is nil, json.Valid(src.Target) is false but treat as null, or check the first non-whitespace byte to see if it is '"' (string) or 'n' (null) and only then return target, nil; otherwise, return the actual unmarshal error so corrupted/malformed SBOM metadata surfaces. Keep the rest of the flow (the returned target variable and callers using IsImageTarget()) unchanged.

coderabbitai · 2026-02-24T07:07:25Z

 	if err := json.Unmarshal(grypeDocument.Source.Target, &rawManifest); err != nil {
-		return nil, err
+		// Source metadata may have been stripped, return nil manifest
+		return nil, nil
 	}


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's verify the code at the specified location cat -n adapters/v1/domain_to_armo.go | sed -n '325,335p'

Repository: kubescape/kubevuln

Length of output: 480

🏁 Script executed:

# Find all call sites of ParseImageManifest rg -n "ParseImageManifest\s*\(" -C5

Repository: kubescape/kubevuln

Length of output: 2695

🏁 Script executed:

# Let's also look at the function signature to understand what it returns rg -n "func.*ParseImageManifest" -A10

Repository: kubescape/kubevuln

Length of output: 1460

🏁 Script executed:

cat -n adapters/v1/backend.go | sed -n '195,230p'

Repository: kubescape/kubevuln

Length of output: 1428

🏁 Script executed:

cat -n adapters/v1/backend.go | sed -n '225,260p'

Repository: kubescape/kubevuln

Length of output: 2171

🏁 Script executed:

# Search for where imageManifest variable is used in backend.go rg -n "imageManifest" adapters/v1/backend.go -C2

Repository: kubescape/kubevuln

Length of output: 460

🏁 Script executed:

# Find the Summarize function signature and implementation rg -n "func Summarize" -A20

Repository: kubescape/kubevuln

Length of output: 2055

🏁 Script executed:

# Check if ImageManifest field is dereferenced anywhere in the summary processing rg -n "ImageManifest" adapters/v1/backend_utils.go -C3

Repository: kubescape/kubevuln

Length of output: 830

🏁 Script executed:

# Check if CommonContainerScanSummaryResult.ImageManifest is dereferenced anywhere rg -n "\.ImageManifest\s*\." -C2

Repository: kubescape/kubevuln

Length of output: 44

🏁 Script executed:

# Also check the type definition of CommonContainerScanSummaryResult to see if ImageManifest is a pointer rg -n "type CommonContainerScanSummaryResult" -A50 | head -80

Repository: kubescape/kubevuln

Length of output: 44

🏁 Script executed:

# Find where CommonContainerScanSummaryResult is imported from rg -n "CommonContainerScanSummaryResult" adapters/v1/backend_utils.go | head -5

Repository: kubescape/kubevuln

Length of output: 436

🏁 Script executed:

# Check if there are any other uses of imageManifest in the codebase rg -n "imageManifest" --type go -C2

Repository: kubescape/kubevuln

Length of output: 5273

🏁 Script executed:

# Check the test to see what cases are tested for ParseImageManifest cat -n adapters/v1/backend_test.go | sed -n '206,260p'

Repository: kubescape/kubevuln

Length of output: 1657

🏁 Script executed:

cat -n adapters/v1/domain_to_armo.go | sed -n '323,365p'

Repository: kubescape/kubevuln

Length of output: 1655

🏁 Script executed:

# Check what source.ImageMetadata looks like rg -n "type ImageMetadata" -A10

Repository: kubescape/kubevuln

Length of output: 44

🏁 Script executed:

# Check if there are test cases with empty Source.Target rg -n "Source.Target\s*=" -B2 -A2

Repository: kubescape/kubevuln

Length of output: 44

🏁 Script executed:

# Let's run the test to see if it passes or fails cd /tmp && python3 << 'EOF' import subprocess import os # Try to find and run the specific test result = subprocess.run( ['bash', '-c', 'cd /repo && grep -n "malformed metadata base64 config" adapters/v1/backend_test.go -A30'], capture_output=True, text=True, timeout=30 ) print("Test case details:") print(result.stdout) print("---") # Also check what the nginxSBOMMetadata contains to understand what valid input looks like result2 = subprocess.run( ['bash', '-c', 'cd /repo && cat -n adapters/v1/backend_test.go | sed -n "200,210p"'], capture_output=True, text=True, timeout=30 ) print("Context around nginxSBOMMetadata:") print(result2.stdout) EOF

Repository: kubescape/kubevuln

Length of output: 120

🏁 Script executed:

# Check if Source.Target can be empty/null and how that's handled rg -n "Source.Target" -B2 -A2 | head -30

Repository: kubescape/kubevuln

Length of output: 730

🏁 Script executed:

# Check source.ImageMetadata structure to understand if first unmarshal would really fail on malformed JSON rg -n "type ImageMetadata" -B2 -A15

Repository: kubescape/kubevuln

Length of output: 44

🏁 Script executed:

# Let's look at the test data file to understand what valid metadata looks like find . -name "nginx-document-source.json" -exec head -50 {} \;

Repository: kubescape/kubevuln

Length of output: 13352

🏁 Script executed:

# Let's verify the complete picture - check if imageManifest is ever dereferenced # by searching for uses with field access rg -n "imageManifest\." --type go

Repository: kubescape/kubevuln

Length of output: 160

Address silent error masking in ParseImageManifest by distinguishing malformed JSON from stripped metadata.

Returning (nil, nil) when json.Unmarshal fails masks real parse errors and makes it impossible to distinguish between intentionally stripped metadata and actual malformed JSON. The test case "malformed metadata base64 config" expects an error to be returned for malformed JSON, but the current code returns (nil, nil) instead. While nil manifests are safely handled as pointers in struct fields, the error suppression prevents callers and logging from detecting actual data corruption. Consider checking if the target is an empty/null JSON value before returning (nil, nil), and surfacing actual unmarshal errors to aid debugging.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@adapters/v1/domain_to_armo.go` around lines 329 - 332, In ParseImageManifest, don't silently swallow json.Unmarshal errors: inspect grypeDocument.Source.Target before returning nil, nil — if Target is empty or only JSON null/whitespace, return (nil, nil) to indicate stripped metadata; otherwise propagate the unmarshal error so callers can detect malformed JSON. Update the block around json.Unmarshal(grypeDocument.Source.Target, &rawManifest) to check trimmed Target bytes for empty/"null" and only return (nil, nil) in that case; for any other unmarshal failure, return the error (referencing ParseImageManifest, grypeDocument.Source.Target, and rawManifest).

github-actions · 2026-02-24T07:08:47Z

Summary:

License scan: failure
Credentials scan: failure
Vulnerabilities scan: failure
Unit test: success
Go linting: failure

Signed-off-by: Matthias Bertschy <matthias.bertschy@gmail.com>

slashben reviewed Feb 12, 2026

View reviewed changes

Comment thread adapters/v1/syft.go Outdated

matthyx added this to KS PRs tracking Feb 23, 2026

matthyx force-pushed the strip branch from 852cec9 to 72ddc33 Compare February 23, 2026 11:34

coderabbitai Bot reviewed Feb 23, 2026

View reviewed changes

matthyx force-pushed the strip branch from 72ddc33 to 2cdbb06 Compare February 23, 2026 20:21

matthyx force-pushed the strip branch from 2cdbb06 to eba7826 Compare February 24, 2026 06:59

coderabbitai Bot reviewed Feb 24, 2026

View reviewed changes

strip unnecessary fields from SBOM to reduce size

73d53c3

Signed-off-by: Matthias Bertschy <matthias.bertschy@gmail.com>

matthyx force-pushed the strip branch from eba7826 to 73d53c3 Compare February 24, 2026 11:49

matthyx added the release Create release label Feb 24, 2026

matthyx merged commit 8debfbc into main Feb 24, 2026
4 of 9 checks passed

matthyx deleted the strip branch February 24, 2026 11:51

matthyx moved this to To Archive in KS PRs tracking Feb 25, 2026

coderabbitai Bot mentioned this pull request Mar 23, 2026

feat: SBOM scanner sidecar for memory-isolated SBOM generation #335

Merged

6 tasks

coderabbitai Bot mentioned this pull request Apr 28, 2026

perf: switch to kubescape/syft v1.32.0-ks.2 + disable file catalogers #355

Merged

5 tasks

Conversation

matthyx commented Feb 12, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Feb 23, 2026

Uh oh!

github-actions Bot commented Feb 23, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Feb 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

matthyx commented Feb 12, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Feb 12, 2026 •

edited

Loading