Harden Scorecard artifact extraction#200
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughSummary by CodeRabbit릴리스 노트
WalkthroughOSSF Scorecard SARIF 아티팩트를 Changes
Sequence Diagram(s)sequenceDiagram
participant WF as "Workflow (GitHub Actions)"
participant DL as "actions/download-artifact\n(skip-decompress: true)"
participant EX as "extract_scorecard_artifact.py"
participant NORM as "normalize_scorecard_sarif.py"
participant UP as "actions/upload-sarif-results"
WF->>DL: 요청: ossf-scorecard-results (skip-decompress: true)
DL-->>WF: `scorecard-artifact` (zip 파일)
WF->>EX: 실행: extractor(scorecard-artifact → scorecard-sarif)
EX->>EX: ZIP 검증\n(단일 `results.sarif`, 절대경로/경로순회/심볼릭 차단, 크기 제한)
EX-->>WF: `scorecard-sarif/results.sarif`
WF->>NORM: normalize_scorecard_sarif.py 처리
NORM-->>WF: normalized `results.sarif`
WF->>UP: 업로드 SARIF
UP-->>WF: 업로드 완료
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Review rate limit: 4/5 reviews remaining, refill in 12 minutes. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@scripts/checks/extract_scorecard_artifact.py`:
- Around line 51-53: The code currently calls output_dir.mkdir(...) and
target.write_bytes(...), which can follow repository symlinks and overwrite
arbitrary files; change it to refuse symlinked paths and create the target
atomically and only if it does not already exist: verify each component of
output_dir is not a symlink (use os.lstat on each ancestor and on output_dir)
before creating directories, and when writing EXPECTED_MEMBER use low-level
os.open with flags O_CREAT|O_EXCL|O_WRONLY (and O_NOFOLLOW where available) to
get a fresh file descriptor, then write archive.read(member) to that FD and
close it (instead of target.write_bytes), raising on existence or symlink to
avoid following links. Ensure you reference output_dir, target and
EXPECTED_MEMBER in the changes.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: c1bf2b40-a702-4214-a1f0-ad91387a365a
📒 Files selected for processing (5)
.github/workflows/ossf-scorecard.ymlscripts/checks/extract_scorecard_artifact.pyscripts/checks/verify_supply_chain.pyservices/analysis-engine/tests/test_supply_chain_policy.pyskills/ci-warning-root-cause-remediation/SKILL.md
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@scripts/checks/extract_scorecard_artifact.py`:
- Around line 14-25: resolve_artifact_zip trusts source and discovered .zip
candidates without symlink checks; call the existing ensure_non_symlink_path
validation for the input path and for each candidate before accepting them.
Specifically, when source is a file, run ensure_non_symlink_path(source) before
returning; when source is a directory, filter/validate candidate paths by
calling ensure_non_symlink_path(candidate) (or skip if it raises) so you only
accept a single non-symlink .zip; raise the same ValueError if no valid
non-symlink candidate remains.
In `@services/analysis-engine/tests/test_supply_chain_policy.py`:
- Around line 1205-1218: The test only covers passing a zip file path directly
to extract_scorecard_artifact and misses the real-world directory-input branch
in resolve_artifact_zip; add three cases in
test_scorecard_artifact_extractor_extracts_expected_sarif: (1) a directory input
that contains exactly one results.sarif.zip (or a single .zip) and assert
extraction succeeds and content matches, (2) a directory containing zero zip
files and assert the function raises the expected error/returns failure, and (3)
a directory containing two zip files and assert the “exactly one zip allowed”
guard triggers the expected exception/failure; reference the
extract_scorecard_artifact helper and resolve_artifact_zip behavior so the new
assertions exercise both the directory-resolution path and the multi-zip
rejection.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: dac29018-ffa1-43c6-b935-ba84bc7fe3a0
📒 Files selected for processing (2)
scripts/checks/extract_scorecard_artifact.pyservices/analysis-engine/tests/test_supply_chain_policy.py
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@scripts/checks/extract_scorecard_artifact.py`:
- Around line 61-69: The code currently calls archive.read(member) which loads
the entire uncompressed member into memory; change the extraction to stream via
archive.open(member) and read/write in fixed-size chunks (e.g., 64KB) to write
to write_new_file_without_following_symlinks so memory stays bounded, and while
streaming enforce a maximum uncompressed size per-member and optionally a total
extraction cap (track bytes read and raise an error if it exceeds the limit).
Also update validate_member (or the extraction flow that calls it) to check the
ZipInfo.uncompressed_size when available and reject members that advertise sizes
above the allowed threshold before streaming starts; keep existing path/symlink
checks. Use the ZipFile.open(...) file-like object, loop reading chunks and
increment a counter, and raise a specific exception on size overrun instead of
using archive.read(member).
In `@services/analysis-engine/tests/test_supply_chain_policy.py`:
- Around line 1036-1393: The file fails ruff formatting checks; run the
formatter and commit the changes to satisfy `ruff format --check`. Reformat this
test module (functions such as
test_supply_chain_check_requires_scorecard_download_without_action_decompression,
test_supply_chain_check_rejects_commented_scorecard_decompression_tokens,
test_scorecard_artifact_extractor_extracts_expected_sarif, etc.) by running
`ruff format services/analysis-engine/tests/test_supply_chain_policy.py` (or
`ruff format .` for the repo), review the updated whitespace/line-wrapping
changes, and add/commit the modified file so CI passes.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 087263b7-e204-4c88-aeed-cc1a60addbe7
📒 Files selected for processing (2)
scripts/checks/extract_scorecard_artifact.pyservices/analysis-engine/tests/test_supply_chain_policy.py
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
scripts/checks/verify_supply_chain.py (1)
459-486: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win스텝 블록 파싱 로직을 공용 헬퍼로 단일화하세요.
현재 동일 파싱 로직이 두 함수에 복제되어 있어, 우회 케이스 보완 시 한쪽만 수정되는 드리프트 리스크가 큽니다. 보안 정책 경로이므로 공용 함수로 통합하는 편이 안전합니다.
Also applies to: 550-577
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/checks/verify_supply_chain.py` around lines 459 - 486, The step block parsing loop (building step_blocks via step_indent, has_steps_parent, step_lines and appending (index, step_indent, step_lines)) is duplicated; extract it into a single helper function (e.g., parse_step_blocks or extract_step_blocks) that takes lines: list[str] and returns list[tuple[int,int,list[str]]], move the logic from both locations (the shown block and the one around lines 550-577) into that helper, and replace both original inline loops with calls to the new helper to ensure a single source of truth for parsing.
♻️ Duplicate comments (1)
scripts/checks/extract_scorecard_artifact.py (1)
87-87:⚠️ Potential issue | 🟠 Major | ⚡ Quick win비신뢰 ZIP 멤버를 한 번에 읽어 메모리 고갈을 유발할 수 있습니다.
Line 87의
archive.read(member)는 압축 해제 결과를 통째로 메모리에 올립니다. 비신뢰 아티팩트 입력에서는 스트리밍 복사 + 최대 허용 크기 검증으로 fail-closed 해야 합니다.🔧 제안 수정
@@ EXPECTED_MEMBER = "results.sarif" +MAX_SARIF_BYTES = 10 * 1024 * 1024 # 10 MiB @@ -def write_new_file_without_following_symlinks(target: Path, data: bytes) -> None: - """Write ``data`` to a new file without following an existing symlink.""" +def write_new_file_without_following_symlinks(target: Path, source_file: zipfile.ZipExtFile) -> None: + """Stream-write to a new file without following an existing symlink.""" @@ - with os.fdopen(fd, "wb") as target_file: - target_file.write(data) + written = 0 + with os.fdopen(fd, "wb") as target_file: + while True: + chunk = source_file.read(64 * 1024) + if not chunk: + break + written += len(chunk) + if written > MAX_SARIF_BYTES: + raise ValueError("artifact member too large") + target_file.write(chunk) @@ - write_new_file_without_following_symlinks(target, archive.read(member)) + if member.file_size > MAX_SARIF_BYTES: + raise ValueError("artifact member too large") + with archive.open(member) as source_file: + write_new_file_without_following_symlinks(target, source_file) return target#!/bin/bash # Verify current implementation still performs full in-memory ZIP reads rg -n "archive\.read\(member\)|def write_new_file_without_following_symlinks|MAX_SARIF_BYTES|file_size" \ scripts/checks/extract_scorecard_artifact.py -C2 # Verify regression tests for oversized ZIP member are present/absent rg -n "too large|MAX_SARIF_BYTES|zip bomb|artifact member" \ services/analysis-engine/tests/test_supply_chain_policy.py -C2As per coding guidelines, "Treat files, URLs, metadata, model artifacts, and project files as untrusted input".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/checks/extract_scorecard_artifact.py` at line 87, The code currently calls archive.read(member) which loads a ZIP member fully into memory; change this to stream the member via archive.open(member) or .extractfile and write to disk in fixed-size chunks while tracking cumulative bytes against a configured MAX_SARIF_BYTES limit and aborting (fail-closed) if exceeded; update the caller path that invokes write_new_file_without_following_symlinks to accept a file-like stream or first stream-validate+save to a temp file before calling write_new_file_without_following_symlinks, and ensure you close the archive stream on errors and log/raise a clear exception when the size limit is hit.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@scripts/checks/verify_supply_chain.py`:
- Around line 459-486: The step block parsing loop (building step_blocks via
step_indent, has_steps_parent, step_lines and appending (index, step_indent,
step_lines)) is duplicated; extract it into a single helper function (e.g.,
parse_step_blocks or extract_step_blocks) that takes lines: list[str] and
returns list[tuple[int,int,list[str]]], move the logic from both locations (the
shown block and the one around lines 550-577) into that helper, and replace both
original inline loops with calls to the new helper to ensure a single source of
truth for parsing.
---
Duplicate comments:
In `@scripts/checks/extract_scorecard_artifact.py`:
- Line 87: The code currently calls archive.read(member) which loads a ZIP
member fully into memory; change this to stream the member via
archive.open(member) or .extractfile and write to disk in fixed-size chunks
while tracking cumulative bytes against a configured MAX_SARIF_BYTES limit and
aborting (fail-closed) if exceeded; update the caller path that invokes
write_new_file_without_following_symlinks to accept a file-like stream or first
stream-validate+save to a temp file before calling
write_new_file_without_following_symlinks, and ensure you close the archive
stream on errors and log/raise a clear exception when the size limit is hit.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 05b5914f-4b8d-46b8-b34c-b0509708b5af
📒 Files selected for processing (3)
scripts/checks/extract_scorecard_artifact.pyscripts/checks/verify_supply_chain.pyservices/analysis-engine/tests/test_supply_chain_policy.py
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@scripts/checks/verify_supply_chain.py`:
- Around line 560-601: Duplicate logic exists in workflow_job_content and
step_run_command and is also copied into
scorecard_sarif_upload_normalization_violations; extract the shared behavior
into two module-level helpers (e.g., workflow_job_content_for_step(lines:
list[str], line_index: int) -> str and step_run_command_from_block(step_lines:
list[str], step_indent: int) -> str) implementing the same logic currently in
workflow_job_content and step_run_command, then replace the bodies of
workflow_job_content and step_run_command to call these helpers and update
scorecard_sarif_upload_normalization_violations to use them as well; keep the
same parameter names and return types, preserve comment-stripping, indent checks
and return values, and run existing tests or add a small unit test ensuring
identical outputs before/after refactor.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 284c26cd-ca38-4ab6-940e-bba10d54fe0a
📒 Files selected for processing (3)
scripts/checks/extract_scorecard_artifact.pyscripts/checks/verify_supply_chain.pyservices/analysis-engine/tests/test_supply_chain_policy.py
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@scripts/checks/extract_scorecard_artifact.py`:
- Around line 25-35: The candidate selection currently only checks suffix and
symlinks via ensure_non_symlink_path but can still accept directories or special
files like "results.zip/" or device nodes; update the selection to require a
regular file: when iterating candidates from source.iterdir() (the variable
candidate/path in this block), after calling ensure_non_symlink_path(path,
path_kind="artifact path") also verify path.is_file() (or equivalent S_ISREG
check) and skip/ignore non-regular entries so only true regular .zip files are
appended to candidates before the len(candidates) check and subsequent call to
zipfile.ZipFile().
In `@scripts/checks/verify_supply_chain.py`:
- Around line 129-136: The loop that searches for a parent "steps:" wrongly
breaks on a blank line and thus can miss a "steps:" ancestor; update the search
in verify_supply_chain.py to skip empty or comment-only lines instead of
breaking: inside the loop that uses
previous_line/previous_stripped/previous_indent/step_indent, add a check like
"if previous_stripped == '': continue" (or treat comment-only lines as empty)
before the existing indent/steps checks so the loop only breaks once a
non-empty, less-indented line is found; this will fix how
scorecard_sarif_upload_normalization_violations() and
scorecard_artifact_download_decompression_violations() detect step blocks.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 51535d55-c8e9-4d6c-a926-1e71cad9e163
📒 Files selected for processing (3)
scripts/checks/extract_scorecard_artifact.pyscripts/checks/verify_supply_chain.pyservices/analysis-engine/tests/test_supply_chain_policy.py
|
@coderabbitai resolve |
✅ Actions performedComments resolved and changes approved. |
Summary
actions/download-artifactlegacy ZIP decompression path for OSSF Scorecard SARIF by downloading the artifact archive withskip-decompress: true.results.sarifand supply-chain guards that reject missing extraction, comment spoofing, and echo/non-executing command spoofing.Verification
uv run --project services/analysis-engine ruff check services/analysis-engine/tests/test_supply_chain_policy.py scripts/checks/verify_supply_chain.py scripts/checks/extract_scorecard_artifact.pyuv run --project services/analysis-engine pytest services/analysis-engine/tests/test_supply_chain_policy.py -qpython3 scripts/checks/verify_supply_chain.pypython3 scripts/checks/security_gates.pypython3 /Users/seonghobae/.agents/skills/skill-creator/scripts/quick_validate.py skills/ci-warning-root-cause-remediationgit diff --checkSecurity Notes
results.sarifmember.Closes #199.
Refs #194.