Skip to content

strip unnecessary fields from SBOM to reduce size#327

Merged
matthyx merged 1 commit intomainfrom
strip
Feb 24, 2026
Merged

strip unnecessary fields from SBOM to reduce size#327
matthyx merged 1 commit intomainfrom
strip

Conversation

@matthyx
Copy link
Copy Markdown
Contributor

@matthyx matthyx commented Feb 12, 2026

this is the reimplementation of kubescape/node-agent#720 in a way we can move to a shared library between node-agent and kubevuln (maybe via storage)
results on gitlab/gitlab-ee:latest:

-rw-r--r-- 1 mbertschy mbertschy  83M Feb 12 08:42 gitlab.json
-rw-r--r-- 1 mbertschy mbertschy  60M Feb 12 08:43 gitlab-stripped.json

Summary by CodeRabbit

  • Bug Fixes

    • Enhanced SBOM payload handling by stripping unnecessary metadata before processing to improve performance.
  • Chores

    • Upgraded Go runtime to version 1.25.0.
    • Updated multiple core dependencies including Kubernetes, Prometheus, and storage libraries to latest versions.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Feb 12, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between eba7826 and 73d53c3.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (6)
  • adapters/v1/syft.go
  • adapters/v1/testdata/alpine-embedded-sbom.json
  • adapters/v1/testdata/alpine-sbom.format.json
  • adapters/v1/testdata/hello-world-sbom.format.json
  • adapters/v1/testdata/stretch-slim-sbom.format.json
  • go.mod
🚧 Files skipped from review as they are similar to previous changes (1)
  • adapters/v1/syft.go

📝 Walkthrough

Walkthrough

This PR adds SBOM payload stripping functionality by invoking v1beta1.StripSBOM() before size calculation in the syft adapter and updates test fixture files to reflect stripped SBOM payloads with minimal metadata. Go version and multiple dependencies are also updated.

Changes

Cohort / File(s) Summary
SBOM Stripping Implementation
adapters/v1/syft.go
Adds import for github.com/kubescape/storage/pkg/apis/softwarecomposition/v1beta1 and invokes v1beta1.StripSBOM() on syft SBOM before size calculation to reduce payload.
Test Data Fixtures
adapters/v1/testdata/alpine-embedded-sbom.json, adapters/v1/testdata/alpine-sbom.format.json, adapters/v1/testdata/hello-world-sbom.format.json, adapters/v1/testdata/stretch-slim-sbom.format.json
Updates test SBOM files to reflect stripped payloads: clears artifact provenance fields (foundBy, accessPath, annotations), sets source metadata to null, removes descriptor configuration blocks, and replaces detailed metadata with null values.
Dependency Updates
go.mod
Updates Go version from 1.24.1 to 1.25.0; bumps multiple dependencies including kubescape/storage (v0.0.220 → v0.0.246), Kubernetes modules (k8s.io/apimachinery, k8s.io/client-go to v0.35.0), Prometheus modules, and various other packages to newer versions.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 A rabbit hops through stripped-down data flows,
Where metadata melts and each payload grows light,
With StripSBOM magic and versions uptight,
The dependencies dance in fresh spring sunlight!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main objective of the PR: stripping unnecessary fields from SBOMs to reduce file size, which aligns with the code changes.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch strip

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Comment thread adapters/v1/syft.go Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
go.mod (1)

32-32: Storage dependency pinned to a pseudo-version (unreleased commit).

v0.0.246-0.20260223110517-7f69ac32ce07 points to a specific commit rather than a tagged release. This is fine during development but should be updated to a proper tagged release before merging to main, to ensure reproducible builds and clear versioning.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@go.mod` at line 32, The go.mod entry for github.com/kubescape/storage is
pinned to a pseudo-version (github.com/kubescape/storage
v0.0.246-0.20260223110517-7f69ac32ce07); replace this pseudo-version with the
appropriate tagged release (e.g., v0.0.246 or newer official tag) in go.mod,
then run the module update commands (go get <module>@<tag> and go mod tidy) to
regenerate go.sum so the project uses a proper released version instead of an
unreleased commit.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@adapters/v1/grype.go`:
- Around line 227-231: The call to v1beta1.StripSBOM is currently invoked before
checking the error from domainToSyft, which can cause a nil dereference if
domainToSyft returned an error; move the v1beta1.StripSBOM(s) invocation to
after the err check (i.e., call domainToSyft(*sbom.Content) into s, check if err
!= nil and return the error, then call v1beta1.StripSBOM(s)), updating the code
paths that use s afterwards (function names: domainToSyft and
v1beta1.StripSBOM).

---

Nitpick comments:
In `@go.mod`:
- Line 32: The go.mod entry for github.com/kubescape/storage is pinned to a
pseudo-version (github.com/kubescape/storage
v0.0.246-0.20260223110517-7f69ac32ce07); replace this pseudo-version with the
appropriate tagged release (e.g., v0.0.246 or newer official tag) in go.mod,
then run the module update commands (go get <module>@<tag> and go mod tidy) to
regenerate go.sum so the project uses a proper released version instead of an
unreleased commit.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 852cec9 and 72ddc33.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (9)
  • adapters/v1/grype.go
  • adapters/v1/syft.go
  • adapters/v1/testdata/alpine-cve.format.json
  • adapters/v1/testdata/alpine-embedded-sbom.json
  • adapters/v1/testdata/alpine-sbom.format.json
  • adapters/v1/testdata/hello-world-sbom.format.json
  • adapters/v1/testdata/nginx-filtered-cve.format.json
  • adapters/v1/testdata/stretch-slim-sbom.format.json
  • go.mod
🚧 Files skipped from review as they are similar to previous changes (1)
  • adapters/v1/testdata/hello-world-sbom.format.json

Comment thread adapters/v1/grype.go
Comment on lines 227 to 231
s, err := domainToSyft(*sbom.Content)
v1beta1.StripSBOM(s)
if err != nil {
return domain.CVEManifest{}, err
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: StripSBOM called before error check — potential nil pointer panic.

domainToSyft returns (s, err). If it fails, s is likely nil, and v1beta1.StripSBOM(s) will dereference a nil pointer, causing a panic. Move the call after the error check.

🐛 Proposed fix
 	s, err := domainToSyft(*sbom.Content)
-	v1beta1.StripSBOM(s)
 	if err != nil {
 		return domain.CVEManifest{}, err
 	}
+	v1beta1.StripSBOM(s)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
s, err := domainToSyft(*sbom.Content)
v1beta1.StripSBOM(s)
if err != nil {
return domain.CVEManifest{}, err
}
s, err := domainToSyft(*sbom.Content)
if err != nil {
return domain.CVEManifest{}, err
}
v1beta1.StripSBOM(s)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@adapters/v1/grype.go` around lines 227 - 231, The call to v1beta1.StripSBOM
is currently invoked before checking the error from domainToSyft, which can
cause a nil dereference if domainToSyft returned an error; move the
v1beta1.StripSBOM(s) invocation to after the err check (i.e., call
domainToSyft(*sbom.Content) into s, check if err != nil and return the error,
then call v1beta1.StripSBOM(s)), updating the code paths that use s afterwards
(function names: domainToSyft and v1beta1.StripSBOM).

@github-actions
Copy link
Copy Markdown

Summary:

  • License scan: failure
  • Credentials scan: failure
  • Vulnerabilities scan: failure
  • Unit test: success
  • Go linting: failure

@github-actions
Copy link
Copy Markdown

Summary:

  • License scan: failure
  • Credentials scan: failure
  • Vulnerabilities scan: failure
  • Unit test: success
  • Go linting: failure

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@adapters/v1/domain_to_armo.go`:
- Around line 52-57: The current code in the conversion function unconditionally
swallows any json.Unmarshal error for src.Target into imageMetadata and returns
nil, but we should only skip parsing when Target is a plain JSON string or null;
change the logic in the function that handles src.Target (where json.Unmarshal
and imageMetadata are used) to first inspect src.Target bytes: if src.Target is
nil, json.Valid(src.Target) is false but treat as null, or check the first
non-whitespace byte to see if it is '"' (string) or 'n' (null) and only then
return target, nil; otherwise, return the actual unmarshal error so
corrupted/malformed SBOM metadata surfaces. Keep the rest of the flow (the
returned target variable and callers using IsImageTarget()) unchanged.
- Around line 329-332: In ParseImageManifest, don't silently swallow
json.Unmarshal errors: inspect grypeDocument.Source.Target before returning nil,
nil — if Target is empty or only JSON null/whitespace, return (nil, nil) to
indicate stripped metadata; otherwise propagate the unmarshal error so callers
can detect malformed JSON. Update the block around
json.Unmarshal(grypeDocument.Source.Target, &rawManifest) to check trimmed
Target bytes for empty/"null" and only return (nil, nil) in that case; for any
other unmarshal failure, return the error (referencing ParseImageManifest,
grypeDocument.Source.Target, and rawManifest).

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2cdbb06 and eba7826.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (7)
  • adapters/v1/domain_to_armo.go
  • adapters/v1/syft.go
  • adapters/v1/testdata/alpine-embedded-sbom.json
  • adapters/v1/testdata/alpine-sbom.format.json
  • adapters/v1/testdata/hello-world-sbom.format.json
  • adapters/v1/testdata/stretch-slim-sbom.format.json
  • go.mod
🚧 Files skipped from review as they are similar to previous changes (2)
  • adapters/v1/testdata/stretch-slim-sbom.format.json
  • adapters/v1/syft.go

Comment thread adapters/v1/domain_to_armo.go Outdated
Comment on lines +52 to +57
if err := json.Unmarshal(src.Target, &imageMetadata); err != nil {
// When source metadata has been stripped (e.g. by StripSBOM),
// the target may be a plain string (image name) instead of ImageMetadata.
// In that case, return a target with no image metadata — callers
// already guard with IsImageTarget() before using it.
return target, nil
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don’t swallow malformed image metadata; only skip when the target is string/null.

Returning nil for any unmarshal error hides corrupted SBOMs and silently drops layer context. Consider falling back only when Target is a plain string (or null), otherwise surface the error.

Proposed fix
-		if err := json.Unmarshal(src.Target, &imageMetadata); err != nil {
-			// When source metadata has been stripped (e.g. by StripSBOM),
-			// the target may be a plain string (image name) instead of ImageMetadata.
-			// In that case, return a target with no image metadata — callers
-			// already guard with IsImageTarget() before using it.
-			return target, nil
-		}
+		if err := json.Unmarshal(src.Target, &imageMetadata); err != nil {
+			// When source metadata has been stripped, the target may be a plain string (or null).
+			var imageName *string
+			if err2 := json.Unmarshal(src.Target, &imageName); err2 == nil {
+				return target, nil
+			}
+			return target, fmt.Errorf("failed to unmarshal image target: %w", err)
+		}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if err := json.Unmarshal(src.Target, &imageMetadata); err != nil {
// When source metadata has been stripped (e.g. by StripSBOM),
// the target may be a plain string (image name) instead of ImageMetadata.
// In that case, return a target with no image metadata — callers
// already guard with IsImageTarget() before using it.
return target, nil
if err := json.Unmarshal(src.Target, &imageMetadata); err != nil {
// When source metadata has been stripped, the target may be a plain string (or null).
var imageName *string
if err2 := json.Unmarshal(src.Target, &imageName); err2 == nil {
return target, nil
}
return target, fmt.Errorf("failed to unmarshal image target: %w", err)
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@adapters/v1/domain_to_armo.go` around lines 52 - 57, The current code in the
conversion function unconditionally swallows any json.Unmarshal error for
src.Target into imageMetadata and returns nil, but we should only skip parsing
when Target is a plain JSON string or null; change the logic in the function
that handles src.Target (where json.Unmarshal and imageMetadata are used) to
first inspect src.Target bytes: if src.Target is nil, json.Valid(src.Target) is
false but treat as null, or check the first non-whitespace byte to see if it is
'"' (string) or 'n' (null) and only then return target, nil; otherwise, return
the actual unmarshal error so corrupted/malformed SBOM metadata surfaces. Keep
the rest of the flow (the returned target variable and callers using
IsImageTarget()) unchanged.

Comment on lines 329 to 332
if err := json.Unmarshal(grypeDocument.Source.Target, &rawManifest); err != nil {
return nil, err
// Source metadata may have been stripped, return nil manifest
return nil, nil
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's verify the code at the specified location
cat -n adapters/v1/domain_to_armo.go | sed -n '325,335p'

Repository: kubescape/kubevuln

Length of output: 480


🏁 Script executed:

# Find all call sites of ParseImageManifest
rg -n "ParseImageManifest\s*\(" -C5

Repository: kubescape/kubevuln

Length of output: 2695


🏁 Script executed:

# Let's also look at the function signature to understand what it returns
rg -n "func.*ParseImageManifest" -A10

Repository: kubescape/kubevuln

Length of output: 1460


🏁 Script executed:

cat -n adapters/v1/backend.go | sed -n '195,230p'

Repository: kubescape/kubevuln

Length of output: 1428


🏁 Script executed:

cat -n adapters/v1/backend.go | sed -n '225,260p'

Repository: kubescape/kubevuln

Length of output: 2171


🏁 Script executed:

# Search for where imageManifest variable is used in backend.go
rg -n "imageManifest" adapters/v1/backend.go -C2

Repository: kubescape/kubevuln

Length of output: 460


🏁 Script executed:

# Find the Summarize function signature and implementation
rg -n "func Summarize" -A20

Repository: kubescape/kubevuln

Length of output: 2055


🏁 Script executed:

# Check if ImageManifest field is dereferenced anywhere in the summary processing
rg -n "ImageManifest" adapters/v1/backend_utils.go -C3

Repository: kubescape/kubevuln

Length of output: 830


🏁 Script executed:

# Check if CommonContainerScanSummaryResult.ImageManifest is dereferenced anywhere
rg -n "\.ImageManifest\s*\." -C2

Repository: kubescape/kubevuln

Length of output: 44


🏁 Script executed:

# Also check the type definition of CommonContainerScanSummaryResult to see if ImageManifest is a pointer
rg -n "type CommonContainerScanSummaryResult" -A50 | head -80

Repository: kubescape/kubevuln

Length of output: 44


🏁 Script executed:

# Find where CommonContainerScanSummaryResult is imported from
rg -n "CommonContainerScanSummaryResult" adapters/v1/backend_utils.go | head -5

Repository: kubescape/kubevuln

Length of output: 436


🏁 Script executed:

# Check if there are any other uses of imageManifest in the codebase
rg -n "imageManifest" --type go -C2

Repository: kubescape/kubevuln

Length of output: 5273


🏁 Script executed:

# Check the test to see what cases are tested for ParseImageManifest
cat -n adapters/v1/backend_test.go | sed -n '206,260p'

Repository: kubescape/kubevuln

Length of output: 1657


🏁 Script executed:

cat -n adapters/v1/domain_to_armo.go | sed -n '323,365p'

Repository: kubescape/kubevuln

Length of output: 1655


🏁 Script executed:

# Check what source.ImageMetadata looks like
rg -n "type ImageMetadata" -A10

Repository: kubescape/kubevuln

Length of output: 44


🏁 Script executed:

# Check if there are test cases with empty Source.Target
rg -n "Source.Target\s*=" -B2 -A2

Repository: kubescape/kubevuln

Length of output: 44


🏁 Script executed:

# Let's run the test to see if it passes or fails
cd /tmp && python3 << 'EOF'
import subprocess
import os

# Try to find and run the specific test
result = subprocess.run(
    ['bash', '-c', 'cd /repo && grep -n "malformed metadata base64 config" adapters/v1/backend_test.go -A30'],
    capture_output=True,
    text=True,
    timeout=30
)
print("Test case details:")
print(result.stdout)
print("---")

# Also check what the nginxSBOMMetadata contains to understand what valid input looks like
result2 = subprocess.run(
    ['bash', '-c', 'cd /repo && cat -n adapters/v1/backend_test.go | sed -n "200,210p"'],
    capture_output=True,
    text=True,
    timeout=30
)
print("Context around nginxSBOMMetadata:")
print(result2.stdout)
EOF

Repository: kubescape/kubevuln

Length of output: 120


🏁 Script executed:

# Check if Source.Target can be empty/null and how that's handled
rg -n "Source.Target" -B2 -A2 | head -30

Repository: kubescape/kubevuln

Length of output: 730


🏁 Script executed:

# Check source.ImageMetadata structure to understand if first unmarshal would really fail on malformed JSON
rg -n "type ImageMetadata" -B2 -A15

Repository: kubescape/kubevuln

Length of output: 44


🏁 Script executed:

# Let's look at the test data file to understand what valid metadata looks like
find . -name "nginx-document-source.json" -exec head -50 {} \;

Repository: kubescape/kubevuln

Length of output: 13352


🏁 Script executed:

# Let's verify the complete picture - check if imageManifest is ever dereferenced
# by searching for uses with field access
rg -n "imageManifest\." --type go

Repository: kubescape/kubevuln

Length of output: 160


Address silent error masking in ParseImageManifest by distinguishing malformed JSON from stripped metadata.

Returning (nil, nil) when json.Unmarshal fails masks real parse errors and makes it impossible to distinguish between intentionally stripped metadata and actual malformed JSON. The test case "malformed metadata base64 config" expects an error to be returned for malformed JSON, but the current code returns (nil, nil) instead. While nil manifests are safely handled as pointers in struct fields, the error suppression prevents callers and logging from detecting actual data corruption. Consider checking if the target is an empty/null JSON value before returning (nil, nil), and surfacing actual unmarshal errors to aid debugging.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@adapters/v1/domain_to_armo.go` around lines 329 - 332, In ParseImageManifest,
don't silently swallow json.Unmarshal errors: inspect
grypeDocument.Source.Target before returning nil, nil — if Target is empty or
only JSON null/whitespace, return (nil, nil) to indicate stripped metadata;
otherwise propagate the unmarshal error so callers can detect malformed JSON.
Update the block around json.Unmarshal(grypeDocument.Source.Target,
&rawManifest) to check trimmed Target bytes for empty/"null" and only return
(nil, nil) in that case; for any other unmarshal failure, return the error
(referencing ParseImageManifest, grypeDocument.Source.Target, and rawManifest).

@github-actions
Copy link
Copy Markdown

Summary:

  • License scan: failure
  • Credentials scan: failure
  • Vulnerabilities scan: failure
  • Unit test: success
  • Go linting: failure

Signed-off-by: Matthias Bertschy <matthias.bertschy@gmail.com>
@matthyx matthyx added the release Create release label Feb 24, 2026
@matthyx matthyx merged commit 8debfbc into main Feb 24, 2026
4 of 9 checks passed
@matthyx matthyx deleted the strip branch February 24, 2026 11:51
@matthyx matthyx moved this to To Archive in KS PRs tracking Feb 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release Create release

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

2 participants