feat: implement enforced Semgrep scanning with clean baseline (#2984)#3406
feat: implement enforced Semgrep scanning with clean baseline (#2984)#3406kasya merged 19 commits intoOWASP:mainfrom
Conversation
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. WalkthroughThis PR integrates Semgrep security scanning into the CI/CD pipeline, adds defensive HTML sanitization to frontend components using DOMPurify, hardens Django admin rendering with format_html, and annotates existing code with nosemgrep directives to suppress specific security lint warnings. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In @.github/workflows/semgrep.yml:
- Around line 24-30: Update the "Run Semgrep" step so it no longer pulls
:latest; either pin the Docker image to the specific version used by pre-commit
(change returntocorp/semgrep to returntocorp/semgrep:v1.104.0) or replace the
step with the official action (uses: semgrep/semgrep-action@v1) which provides
caching and SARIF support; modify the step name "Run Semgrep" and its run/uses
invocation accordingly and ensure the same config entries (p/owasp-top-ten,
p/python, p/javascript) are preserved.
In @.pre-commit-config.yaml:
- Around line 90-95: Update the semgrep hook rev from v1.104.0 to v1.148.0 in
the pre-commit configuration: locate the semgrep repo block (repo:
https://github.com/semgrep/semgrep) and change the rev field to v1.148.0 so
local pre-commit scans use the latest rulesets referenced by the args
(--config=p/owasp-top-ten, --config=p/python, --config=p/javascript, --error).
🧹 Nitpick comments (1)
.github/workflows/semgrep.yml (1)
16-19: Add a timeout to prevent hung jobs.Consider adding
timeout-minutesto prevent indefinitely running jobs if Semgrep hangs on complex files.Suggested change
jobs: semgrep: name: Run Semgrep Security Scan runs-on: ubuntu-latest + timeout-minutes: 15 steps:
|
All changes addressed. Switched to official action, pinned to SHA hashes for security, and added the requested timeout. |
arkid15r
left a comment
There was a problem hiding this comment.
That's a good start, you need to shape it properly now.
.github/workflows/semgrep.yml
Outdated
There was a problem hiding this comment.
We need it to be part of the main ci/cd pipeline
There was a problem hiding this comment.
Understood. I'll integrate the Semgrep job directly into the main CI/CD workflow i.e. run-ci-cd.yaml as a required dependency.
.pre-commit-config.yaml
Outdated
There was a problem hiding this comment.
Please use consistent syntax
There was a problem hiding this comment.
Let's also use centralized configuration file for both pre-commit and ci/cd workflow.
There was a problem hiding this comment.
Okay, we can create a centralized .semgrep.yml file in the root directory to act as the single source of truth. I’ll then refactor the .pre-commit-config.yaml and the CI/CD workflow to point to it so the rules stay consistent everywhere. I'll get started on this refactor and push the changes for your review!
.semgrepignore
Outdated
There was a problem hiding this comment.
This needs to be fixed -- not ignored. What's the exact issue with this file?
There was a problem hiding this comment.
The issue Semgrep flagged in layout.tsx was a potential Client-Side Injection vulnerability. Because the organizationKey is a dynamic route parameter, Semgrep identified it as untrusted user input that was being used in a context (like a metadata field or a direct UI element) where it could lead to XSS if not handled carefully. I used .semgrepignore to keep the initial CI setup clean, but I'm now looking at implementing a proper fix either by ensuring the input is strictly validated or by using a safe React rendering pattern to neutralize the risk.
There was a problem hiding this comment.
I am thinking of using a strict regex validation for organizationKey prop or is there a centralized utility in the Nest codebase you'd prefer I use for sanitizing these route parameters?
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In @.github/workflows/run-ci-cd.yaml:
- Around line 63-86: The semgrep job has duplicate step names, redundant
targets, an unpinned semgrep install, and an inconsistent checkout SHA; fix by
(1) keeping only one security scan step (prefer running the CI-focused make
target security-scan-ci and remove the security-scan step) and give that step a
unique name like "Run Semgrep CI Scan", (2) pin the Semgrep install by changing
the pip install call to install the same version used by pre-commit (e.g., pip
install semgrep==1.148.0), and (3) standardize the actions/checkout reference to
the same SHA used in other jobs (replace `@b4ffde65f463`... with
`@8e8c483db84b`...). Update the semgrep job block and step names accordingly.
In `@Makefile`:
- Around line 99-100: Add the missing target name "security-scan-ci" to the
.PHONY declaration so the Makefile's phony targets list includes
security-scan-ci and avoids conflicts with any file of that name; update the
existing .PHONY line (the declaration containing build clean check pre-commit
prune run scan-images security-scan test update) to also include
security-scan-ci.
🧹 Nitpick comments (6)
proxy/production.conf (1)
82-83: Minor: Comment text is slightly inaccurate for redirect context.The comment mentions "reverse proxy configuration" and "Host header forwarding," but this is an HTTP→HTTPS redirect, not a proxy pass. The suppression itself is valid since
server_name nest.owasp.org;(line 74) ensures only matching hosts reach this block.Consider using the literal hostname for defense-in-depth:
Suggested improvement
- # Reason: Standard reverse proxy configuration; Host header forwarding is required for routing. - return 301 https://$host$request_uri; # nosemgrep: generic.nginx.security.request-host-used.request-host-used + # Reason: Redirect HTTP to HTTPS; server_name ensures only valid hosts reach this block. + return 301 https://nest.owasp.org$request_uri;Using the literal hostname eliminates the need for the nosemgrep suppression entirely and removes any theoretical Host header manipulation concern.
frontend/src/components/StructuredDataScript.tsx (1)
10-16: DOMPurify may corrupt JSON-LD data; consider JSON-specific escaping instead.While the security intent is correct and the static analysis warnings are false positives (sanitization is applied), DOMPurify is designed to sanitize HTML, not JSON. It may strip or modify valid JSON content that resembles HTML tags (e.g., a description containing
<strong>or similar).For JSON-LD in
<script type="application/ld+json">tags,JSON.stringify()already safely escapes content. The primary injection concern is</script>sequences, which can be handled more precisely:♻️ Consider targeted JSON escaping instead
-import DOMPurify from 'isomorphic-dompurify' import { ProfilePageStructuredData } from 'types/profilePageStructuredData' interface StructuredDataScriptProps { data: ProfilePageStructuredData } const StructuredDataScript: React.FC<StructuredDataScriptProps> = ({ data }) => { - const cleanData = DOMPurify.sanitize(JSON.stringify(data, null, 2)) + // Escape </script> sequences to prevent breaking out of script context + const cleanData = JSON.stringify(data, null, 2).replace(/<\/script/gi, '<\\/script') return (Based on learnings, DOMPurify is the established pattern in this project for
dangerouslySetInnerHTML. If the team prefers consistency over precision, the current approach is acceptable—just be aware it may alter data containing HTML-like strings.frontend/src/app/organizations/[organizationKey]/layout.tsx (1)
112-114: Same DOMPurify-on-JSON consideration applies here.The static analysis warnings (lines 128-129) are false positives since sanitization is applied. However, the same concern from
StructuredDataScript.tsxapplies: DOMPurify may alter valid JSON containing HTML-like strings. Consider a consistent approach across both files—either DOMPurify for uniformity with project patterns, or targeted</script>escaping for precision.docker/cspell/Dockerfile (1)
19-24: Clarify the USER directive placement.The
USER nodedirective on line 24 appears afterENTRYPOINT, which means the entrypoint still runs as root (the default). If the intent is to run cspell as thenodeuser, move theUSERdirective beforeENTRYPOINT. If root is truly required (as the comment suggests), consider removing theUSER nodeline to avoid confusion.Option A: If cspell should run as node user
WORKDIR /nest -# Reason: This is a dev-tool linter, not a production service. Root access is required for IO operations. -# nosemgrep: dockerfile.security.missing-user-entrypoint.missing-user-entrypoint -ENTRYPOINT ["/opt/node/node_modules/.bin/cspell"] USER node + +ENTRYPOINT ["/opt/node/node_modules/.bin/cspell"]Option B: If root is required, remove misleading USER directive
WORKDIR /nest # Reason: This is a dev-tool linter, not a production service. Root access is required for IO operations. # nosemgrep: dockerfile.security.missing-user-entrypoint.missing-user-entrypoint ENTRYPOINT ["/opt/node/node_modules/.bin/cspell"] - -USER node.pre-commit-config.yaml (1)
91-109: Configuration drift: missing timeout settings.The Makefile's
SEMGREP_CONFIGSincludes--timeout 10and--timeout-threshold 3, but this pre-commit hook configuration lacks these settings. This could cause timeouts during pre-commit runs that wouldn't occur withmake security-scan.Additionally, per the past discussion, consider extracting these configs into a centralized
.semgrep.ymlfile to maintain consistency.Add timeout settings for consistency
- --config - p/secrets - --error + - --timeout + - '10' + - --timeout-threshold + - '3'Makefile (1)
70-77: Success message may be misleading when findings exist.When
semgrepfinds issues and--erroris used, the command exits non-zero, so the "Scan Complete" echo won't execute. However, users may expect the findings file to always be created. Consider adjusting the flow to always save findings regardless of exit code:Proposed alternative to always capture findings
security-scan: `@echo` "🛡️ Running Centralized Security Scan..." - semgrep $(SEMGREP_CONFIGS) --error --output findings.txt . - `@echo` "✅ Scan Complete. Detailed findings saved to findings.txt" + semgrep $(SEMGREP_CONFIGS) --output findings.txt . || true + `@echo` "✅ Scan Complete. Findings saved to findings.txt" + `@semgrep` $(SEMGREP_CONFIGS) --error . > /dev/null 2>&1Alternatively, keep the current behavior if you want local scans to fail fast on findings.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@Makefile`:
- Around line 8-16: The SEMGREP_CONFIGS variable in the Makefile ends with a
trailing backslash which causes the following build: target to be folded into
the variable; remove the final backslash after "--timeout-threshold 3" so
SEMGREP_CONFIGS is a complete variable value (leave the preceding lines and
backslashes intact) and ensure the subsequent build: target starts on its own
line outside the variable definition.
♻️ Duplicate comments (1)
.github/workflows/run-ci-cd.yaml (1)
70-71: Align checkout action SHA with other jobs.
Line 71 uses a differentactions/checkoutpin than the rest of the workflow; please standardize on the same SHA for consistency and supply-chain tracking.🛠️ Proposed fix
- uses: actions/checkout@b4ffde65f463366882ddc3bfe93e684e7eeb5e40 + uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8
There was a problem hiding this comment.
We need to add artifacts for the tool out put. In the .pre-commit-config.yaml file under the semgrep job at a step for save the file to an artifact. Here is an example:
- name: Archive Semgrep Project Report
# 'always' ensures the report is saved even if Semgrep finds vulnerabilities
if: always()
uses: actions/upload-artifact@v4
with:
# This is the name of the ZIP file you download from GitHub
name: semgrep-results-run-${{ github.run_number }}
# This MUST match the filename created in the Makefile above
path: semgrep-security-report.txt
# Optional: Keep the report for only 14 days to save space
retention-days: 14
here is the documentation for GitHub artifacts
Improve security scan implementation. Co-authored-by: kart-u <kart-u@users.noreply.github.com> Co-authored-by: Noland Crane <noland-crane@users.noreply.github.com>
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3406 +/- ##
==========================================
- Coverage 85.60% 85.59% -0.02%
==========================================
Files 461 461
Lines 14222 14224 +2
Branches 1894 1894
==========================================
Hits 12175 12175
- Misses 1678 1680 +2
Partials 369 369
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report in Codecov by Sentry.
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In @.github/workflows/run-ci-cd.yaml:
- Around line 418-424: The Archive Semgrep Project Report workflow step
currently pins actions/upload-artifact to a different SHA
(actions/upload-artifact@65c86da0...) than the other usages; update the pinned
reference in the "Archive Semgrep Project Report" step (the step named "Archive
Semgrep Project Report" that uses actions/upload-artifact) to the same SHA used
elsewhere (actions/upload-artifact@b7c566a772e6b6bfb58ed0dc250532a479d7789f) so
all upload-artifact references are consistent for supply-chain integrity.
🧹 Nitpick comments (1)
Makefile (1)
61-83: Good implementation of the security scan target.The target correctly mounts the repo, applies multiple rule packs, and fails on findings with
--error. Minor observation: ifdocker/semgrep/Dockerfiledoesn't exist or lacks the expectedFROM semgrep/semgrep:line, line 66 will produce an empty string, causing a cryptic Docker error.Consider adding a guard:
Optional hardening
security-scan: `@echo` "Running Security Scan..." + `@test` -f docker/semgrep/Dockerfile || { echo "Error: docker/semgrep/Dockerfile not found"; exit 1; } `@docker` run --rm \ -v "$(PWD):/src" \ -w /src \
|



Proposed change
Resolves #2984
This PR centralizes the Semgrep security pipeline into the Makefile to ensure identical security enforcement for both local development and CI/CD.
Key Changes
1. Centralized Architecture (Makefile)
2. CI/CD Integration
3. Resolved Security Findings
💻 Local Verification Command
Standard Check:
pre-commit run --all-filesDetailed Report:
make security-scanChecklist
make check-testlocally and all checks pass except the frontend tests whict are there in main branch