t1032.2: Add Codacy collector with paginated API polling and severity mapping#1384
t1032.2: Add Codacy collector with paginated API polling and severity mapping#1384marcusquinn merged 2 commits intomainfrom
Conversation
Summary of ChangesHello @marcusquinn, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request delivers a new Codacy collector script that integrates Codacy's static analysis findings into the unified audit pipeline. The script efficiently polls the Codacy API with pagination and robust error handling, standardizes the collected data by mapping severities and categories, and stores it in a central SQLite database. This enhancement allows for consistent analysis and reporting of code quality issues across different tools within the existing framework. Highlights
Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
🔍 Code Quality Report�[0;35m[MONITOR]�[0m Code Review Monitoring Report �[0;34m[INFO]�[0m Latest Quality Status: �[0;34m[INFO]�[0m Recent monitoring activity: 📈 Current Quality Metrics
Generated on: Fri Feb 13 15:15:42 UTC 2026 Generated by AI DevOps Framework Code Review Monitoring |
There was a problem hiding this comment.
Code Review
This pull request introduces a comprehensive codacy-collector-helper.sh script, which is a solid addition to the audit pipeline. The script is well-structured, handling API pagination, rate limiting, and error retries effectively. My review focuses on improving robustness and security. I've suggested changes to ensure proper temporary file cleanup as per the style guide, enhance security in SQL query construction by validating inputs, and make the script more resilient by providing default values for externally defined variables.
WalkthroughThis PR introduces a comprehensive Codacy collector helper script that integrates with the Codacy API to fetch repository findings, manage pagination and rate limiting, persist results in SQLite with deduplication, and provides CLI commands for querying, summarizing, exporting, and reporting audit data. Changes
Sequence Diagram(s)sequenceDiagram
actor User
participant Script as Codacy Collector<br/>(Helper Script)
participant API as Codacy API
participant DB as SQLite DB
User->>Script: cmd_collect [org] [repo]
Script->>Script: Initialize DB schema & WAL mode
Script->>Script: Create audit_run record
loop Cursor-based Pagination
Script->>API: codacy_api_request(with retry & backoff)
API-->>Script: findings batch (or 429/5xx)
Script->>Script: Handle rate limits & exponential backoff
Script->>Script: Map & escape findings
Script->>DB: INSERT findings (run_id, severity, path, etc.)
end
Script->>DB: Deduplicate findings within run
Script->>Script: Mark audit_run complete
Script-->>User: Collection summary
User->>Script: cmd_query [severity] [category]
Script->>DB: SELECT findings with filters
DB-->>Script: Filtered results
Script-->>User: Text or JSON output
User->>Script: cmd_export [format]
Script->>DB: SELECT findings + metadata
DB-->>Script: Complete dataset
Script-->>User: JSON or CSV file
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related issues
Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
⚔️ Resolve merge conflicts (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
🔍 Code Quality Report�[0;35m[MONITOR]�[0m Code Review Monitoring Report �[0;34m[INFO]�[0m Latest Quality Status: �[0;34m[INFO]�[0m Recent monitoring activity: 📈 Current Quality Metrics
Generated on: Fri Feb 13 15:39:29 UTC 2026 Generated by AI DevOps Framework Code Review Monitoring |
…pping (t1032.2)
Implements codacy-collector-helper.sh — a dedicated Codacy API collector that:
- Polls POST /analysis/organizations/gh/{org}/repositories/{repo}/issues/search
- Handles cursor-based pagination (100 items/page, max 50 pages)
- Retries with exponential backoff on server errors (3 attempts)
- Waits 60s on rate limit (429) responses
- Classifies Codacy severity (Error/Warning/Info) to unified levels
- Maps Codacy categories (Security/ErrorProne/Performance/etc.) to audit categories
- Stores findings in shared audit_findings SQLite table (source='codacy')
- Deduplicates within-run findings by file:line
- Loads credentials from env vars or code-audit-config.json
- Provides query, summary, status, and export subcommands
Follows existing patterns from coderabbit-collector-helper.sh and
code-audit-helper.sh (t1032.1). Zero ShellCheck violations.
- Add input validation for --severity and --category to prevent SQL injection (line 634) - Add default value for MAX_RETRIES variable (line 248) - Refactor temp file cleanup to use trap pattern for robustness (line 253) - Add default value for ERROR_UNKNOWN_COMMAND variable (line 1070)
b0c77ce to
26154ea
Compare
🔍 Code Quality Report�[0;35m[MONITOR]�[0m Code Review Monitoring Report �[0;34m[INFO]�[0m Latest Quality Status: �[0;34m[INFO]�[0m Recent monitoring activity: 📈 Current Quality Metrics
Generated on: Fri Feb 13 15:50:50 UTC 2026 Generated by AI DevOps Framework Code Review Monitoring |
|
There was a problem hiding this comment.
Actionable comments posted: 6
🤖 Fix all issues with AI agents
In @.agents/scripts/codacy-collector-helper.sh:
- Around line 611-618: The --limit handler currently assigns user input directly
to the variable limit which is later interpolated into SQL as LIMIT $limit;
validate that limit is a positive integer immediately after parsing (for example
using a regex like ^[1-9][0-9]*$ or by testing with arithmetic) and
reject/return error if it fails, keeping the existing log_error "Missing value
for --limit" behavior but adding a new log_error "Invalid value for --limit"
path; ensure this validation runs before any SQL construction that uses LIMIT
$limit so only a numeric value is ever interpolated.
- Around line 406-412: The request_body is built by interpolating the cursor
directly into a JSON string which breaks if cursor contains JSON-special
characters; instead generate the JSON using jq to ensure proper escaping: use jq
-n with --arg for cursor and --argjson or --arg for CODACY_PAGE_SIZE to
construct an object with "limit" and conditionally include "cursor" only when
$cursor is non-empty, then assign the jq output back to request_body
(referencing the cursor variable, CODACY_PAGE_SIZE, and request_body in the
script).
- Around line 43-44: AUDIT_CONFIG and AUDIT_CONFIG_TEMPLATE are defined as
relative paths which break when the script is run from a non-repo CWD; change
their definitions to be anchored to the script/repo directory (use the existing
SCRIPT_DIR or computed repo root variable) so the values become absolute (e.g.,
prefix with "$SCRIPT_DIR/") and replace the current AUDIT_CONFIG and
AUDIT_CONFIG_TEMPLATE symbols with these anchored paths so config loading no
longer depends on the current working directory.
- Around line 255-319: The 429 handling currently consumes a retry attempt and
references ${MAX_RETRIES} without a default; update the 429 branch so rate-limit
waits do not consume the main retry budget by decrementing attempt
(attempt=$((attempt - 1))) before continuing or otherwise using a separate
counter, keep using CODACY_RATE_LIMIT_WAIT for sleep, and change the log message
to use the safe default ${MAX_RETRIES:-3} (and any other occurrences of
${MAX_RETRIES} without a default) so the script won’t fail under set -u; modify
the log_warn call and the branch for HTTP 429 accordingly (symbols: attempt,
MAX_RETRIES, CODACY_RATE_LIMIT_WAIT, log_warn).
- Around line 249-253: The script currently creates tmp files in the system temp
dir and allocates an unused tmp_headers file; change mktemp calls to create
files inside the agent workspace tmp directory (ensure that directory exists
before creating files) for tmp_response, remove the unused tmp_headers
allocation and its cleanup entry (remove references to tmp_headers in
push_cleanup and _run_cleanups), and also remove the corresponding curl argument
(-D "$tmp_headers") so headers are not written to a dead file; keep the existing
trap/_save_cleanup_scope and only push_cleanup for the actual tmp_response file.
- Around line 509-532: The page-level batch INSERTs are being executed as
individual autocommits and counted from the SQL file size; modify the execution
to wrap all generated INSERTs in a single transaction and compute the number of
successful inserts from the database, not the file. Concretely: when writing
$sql_file, prepend "BEGIN;" and append "COMMIT;" (or alter the jq output to emit
those statements) so db "$AUDIT_DB" <"$sql_file" runs the page as one
transaction; then replace the wc -l count with a DB-derived count by querying
total_changes() (e.g. run db "$AUDIT_DB" "SELECT total_changes();" immediately
after import and use that value for the count variable) so $count reflects
actual committed rows; keep using the existing db invocation and $sql_file and
the count variable names.
🧹 Nitpick comments (3)
.agents/scripts/codacy-collector-helper.sh (3)
79-86:sql_escapepre-processing is confusing and potentially lossy.Lines 81–82 strip backslash-escaped quotes (
\'→',\"→") before the actual SQL escaping on line 83. This is non-standard — if input legitimately contains a literal\'sequence (e.g., from a log message or file path), it silently mutates the data. The only escaping needed for SQLite string literals is doubling single quotes.Proposed simplification
sql_escape() { local val="$1" - val="${val//\\\'/\'}" - val="${val//\\\"/\"}" val="${val//\'/\'\'}" echo "$val" return 0 }
418-422: Entire API response passed as a positional argument — may exceedARG_MAXfor large payloads.
insert_findings "$run_id" "$response"passes potentially hundreds of KB of JSON as$2. On Linux,ARG_MAXis typically ~2 MB, but on some systems or with many environment variables it can be tighter. A safer pattern is to write the response to a temp file and pass the path.Sketch
local response - response=$(codacy_api_request "$endpoint" "POST" "$request_body") || { + local response_file + response_file=$(mktemp "${tmp_dir}/codacy-page.XXXXXX") + codacy_api_request "$endpoint" "POST" "$request_body" > "$response_file" || { log_warn "Failed to fetch page ${page} — stopping pagination" + rm -f "$response_file" break } ... - page_count=$(insert_findings "$run_id" "$response") + page_count=$(insert_findings "$run_id" "$response_file")Then in
insert_findings, read from the file instead ofecho "$response".Also applies to: 477-537
510-512: Dedup keyfile:linewill collapse distinct findings at the same location.Two different rules flagging the same file and line will share a
dedup_key(e.g.,src/foo.js:42), causing one to be marked as a duplicate. Consider including therule_idin the key:-($path + ":" + $line) as $dedup_key | +($path + ":" + $line + ":" + ((.patternInfo.id // "") | tostring)) as $dedup_key |



Summary
codacy-collector-helper.sh— a dedicated Codacy API collector for the unified audit pipelinePOST /analysis/organizations/gh/{org}/repositories/{repo}/issues/searchwith cursor-based pagination (100 items/page, max 50 pages)audit_findingsSQLite table withsource='codacy', compatible with the t1032.1 orchestratorCODACY_API_TOKENenv var orcode-audit-config.jsoncollect,query,summary,status, andexportsubcommands following the same pattern ascoderabbit-collector-helper.shTesting
bash -nsyntax check: PASShelpcommand: PASSstatuscommand: PASS (detects dependencies, token config, DB stats)summarycommand: PASS (handles empty results gracefully)query --format jsoncommand: PASSaudit_findingstableTask
Closes t1032.2 (blocked-by: t1032.1)
ref: GH#1364
Summary by CodeRabbit
Release Notes