Skip to content

t1032.2: Add Codacy collector with paginated API polling and severity mapping#1384

Merged
marcusquinn merged 2 commits intomainfrom
feature/t1032.2
Feb 13, 2026
Merged

t1032.2: Add Codacy collector with paginated API polling and severity mapping#1384
marcusquinn merged 2 commits intomainfrom
feature/t1032.2

Conversation

@marcusquinn
Copy link
Owner

@marcusquinn marcusquinn commented Feb 13, 2026

Summary

  • Implements codacy-collector-helper.sh — a dedicated Codacy API collector for the unified audit pipeline
  • Polls POST /analysis/organizations/gh/{org}/repositories/{repo}/issues/search with cursor-based pagination (100 items/page, max 50 pages)
  • Handles rate limits (429 → 60s wait), server errors (5xx → exponential backoff), and auth errors (401/403 → immediate fail)
  • Maps Codacy severity levels (Error/Warning/Info) and categories (Security/ErrorProne/Performance/CodeStyle/etc.) to the unified audit schema
  • Stores findings in the shared audit_findings SQLite table with source='codacy', compatible with the t1032.1 orchestrator
  • Loads credentials from CODACY_API_TOKEN env var or code-audit-config.json
  • Provides collect, query, summary, status, and export subcommands following the same pattern as coderabbit-collector-helper.sh

Testing

  • bash -n syntax check: PASS
  • ShellCheck: zero violations
  • help command: PASS
  • status command: PASS (detects dependencies, token config, DB stats)
  • summary command: PASS (handles empty results gracefully)
  • query --format json command: PASS
  • Unknown command handling: PASS (exit code 1)
  • Schema compatibility: verified against existing audit_findings table

Task

Closes t1032.2 (blocked-by: t1032.1)
ref: GH#1364

Summary by CodeRabbit

Release Notes

  • New Features
    • Added code quality audit collection system with persistent storage and comprehensive querying capabilities.
    • Filter audit findings by severity and category.
    • Export findings in JSON or CSV formats for external analysis.
    • Generate detailed audit summaries with statistics, affected files, and historical tracking.
    • Status reports for audit dependencies and metrics.

@gemini-code-assist
Copy link

Summary of Changes

Hello @marcusquinn, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request delivers a new Codacy collector script that integrates Codacy's static analysis findings into the unified audit pipeline. The script efficiently polls the Codacy API with pagination and robust error handling, standardizes the collected data by mapping severities and categories, and stores it in a central SQLite database. This enhancement allows for consistent analysis and reporting of code quality issues across different tools within the existing framework.

Highlights

  • New Codacy Collector Script: Introduced codacy-collector-helper.sh, a new shell script designed to collect static analysis findings from the Codacy API into a unified audit pipeline.
  • Paginated API Polling: Implemented cursor-based pagination for the Codacy API's /issues/search endpoint, fetching 100 items per page with a maximum of 50 pages to ensure comprehensive data collection.
  • Robust API Error Handling: Incorporated handling for various API response codes, including automatic retries with a 60-second wait for rate limits (429), exponential backoff for server errors (5xx), and immediate failure for authentication issues (401/403).
  • Unified Severity and Category Mapping: Mapped Codacy's severity levels (Error, Warning, Info) and categories (Security, ErrorProne, Performance, CodeStyle, etc.) to a standardized unified audit schema for consistent reporting.
  • SQLite Database Integration: Stores all collected findings in the shared audit_findings SQLite table, marking the source as 'codacy', ensuring compatibility with the existing audit orchestrator.
  • Flexible Credential Management: Supports loading Codacy API tokens from either the CODACY_API_TOKEN environment variable or a code-audit-config.json file, prioritizing environment variables.
  • Comprehensive Command-Line Interface: Provides a suite of subcommands including collect, query, summary, status, and export, mirroring the functionality of other collectors in the pipeline.
  • Finding Deduplication: Added logic to deduplicate findings within each collection run based on a generated dedup_key to prevent redundant entries.
Changelog
  • .agents/scripts/codacy-collector-helper.sh
    • Added a new shell script to collect and process Codacy API findings, including pagination, error handling, and data mapping to a unified audit schema.
    • Implemented collect, query, summary, status, and export subcommands for managing Codacy findings.
    • Integrated SQLite database for storing audit findings with deduplication logic.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@github-actions
Copy link

🔍 Code Quality Report

�[0;35m[MONITOR]�[0m Code Review Monitoring Report

�[0;34m[INFO]�[0m Latest Quality Status:
SonarCloud: 0 bugs, 0 vulnerabilities, 21 code smells

�[0;34m[INFO]�[0m Recent monitoring activity:
Fri Feb 13 15:15:39 UTC 2026: Code review monitoring started
Fri Feb 13 15:15:40 UTC 2026: SonarCloud - Bugs: 0, Vulnerabilities: 0, Code Smells: 21

📈 Current Quality Metrics

  • BUGS: 0
  • CODE SMELLS: 21
  • VULNERABILITIES: 0

Generated on: Fri Feb 13 15:15:42 UTC 2026


Generated by AI DevOps Framework Code Review Monitoring

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a comprehensive codacy-collector-helper.sh script, which is a solid addition to the audit pipeline. The script is well-structured, handling API pagination, rate limiting, and error retries effectively. My review focuses on improving robustness and security. I've suggested changes to ensure proper temporary file cleanup as per the style guide, enhance security in SQL query construction by validating inputs, and make the script more resilient by providing default values for externally defined variables.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 13, 2026

Walkthrough

This PR introduces a comprehensive Codacy collector helper script that integrates with the Codacy API to fetch repository findings, manage pagination and rate limiting, persist results in SQLite with deduplication, and provides CLI commands for querying, summarizing, exporting, and reporting audit data.

Changes

Cohort / File(s) Summary
Codacy Collector Helper
.agents/scripts/codacy-collector-helper.sh
New 1096-line bash script implementing full Codacy API integration with exponential backoff retry logic, SQLite schema initialization with WAL mode, cursor-based pagination, deduplication framework, and five CLI commands (collect, query, summary, status, export) with colorized logging and error handling.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant Script as Codacy Collector<br/>(Helper Script)
    participant API as Codacy API
    participant DB as SQLite DB

    User->>Script: cmd_collect [org] [repo]
    Script->>Script: Initialize DB schema & WAL mode
    Script->>Script: Create audit_run record
    loop Cursor-based Pagination
        Script->>API: codacy_api_request(with retry & backoff)
        API-->>Script: findings batch (or 429/5xx)
        Script->>Script: Handle rate limits & exponential backoff
        Script->>Script: Map & escape findings
        Script->>DB: INSERT findings (run_id, severity, path, etc.)
    end
    Script->>DB: Deduplicate findings within run
    Script->>Script: Mark audit_run complete
    Script-->>User: Collection summary

    User->>Script: cmd_query [severity] [category]
    Script->>DB: SELECT findings with filters
    DB-->>Script: Filtered results
    Script-->>User: Text or JSON output

    User->>Script: cmd_export [format]
    Script->>DB: SELECT findings + metadata
    DB-->>Script: Complete dataset
    Script-->>User: JSON or CSV file
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related issues

Possibly related PRs

Poem

🔍 A Codacy collector now takes flight,
Polling findings through the API's night,
SQLite stores each issue with care,
Deduplication removes the spare,
Query, export, and status shine bright! ✨

🚥 Pre-merge checks | ✅ 3 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Merge Conflict Detection ⚠️ Warning ⚠️ Unable to check for merge conflicts: Failed to fetch base branch: From https://github.com/marcusquinn/aidevops
! [rejected] main -> main (non-fast-forward)
+ 3e25491...aea54a2 main -> origin/main (forced update)
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: adding a Codacy collector script with paginated API polling and severity mapping, which directly aligns with the comprehensive script implementation detailed in the changeset.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feature/t1032.2
⚔️ Resolve merge conflicts (beta)
  • Auto-commit resolved conflicts to branch feature/t1032.2
  • Create stacked PR with resolved conflicts
  • Post resolved changes as copyable diffs in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link

🔍 Code Quality Report

�[0;35m[MONITOR]�[0m Code Review Monitoring Report

�[0;34m[INFO]�[0m Latest Quality Status:
SonarCloud: 0 bugs, 0 vulnerabilities, 21 code smells

�[0;34m[INFO]�[0m Recent monitoring activity:
Fri Feb 13 15:39:26 UTC 2026: Code review monitoring started
Fri Feb 13 15:39:27 UTC 2026: SonarCloud - Bugs: 0, Vulnerabilities: 0, Code Smells: 21

📈 Current Quality Metrics

  • BUGS: 0
  • CODE SMELLS: 21
  • VULNERABILITIES: 0

Generated on: Fri Feb 13 15:39:29 UTC 2026


Generated by AI DevOps Framework Code Review Monitoring

…pping (t1032.2)

Implements codacy-collector-helper.sh — a dedicated Codacy API collector that:
- Polls POST /analysis/organizations/gh/{org}/repositories/{repo}/issues/search
- Handles cursor-based pagination (100 items/page, max 50 pages)
- Retries with exponential backoff on server errors (3 attempts)
- Waits 60s on rate limit (429) responses
- Classifies Codacy severity (Error/Warning/Info) to unified levels
- Maps Codacy categories (Security/ErrorProne/Performance/etc.) to audit categories
- Stores findings in shared audit_findings SQLite table (source='codacy')
- Deduplicates within-run findings by file:line
- Loads credentials from env vars or code-audit-config.json
- Provides query, summary, status, and export subcommands

Follows existing patterns from coderabbit-collector-helper.sh and
code-audit-helper.sh (t1032.1). Zero ShellCheck violations.
- Add input validation for --severity and --category to prevent SQL injection (line 634)
- Add default value for MAX_RETRIES variable (line 248)
- Refactor temp file cleanup to use trap pattern for robustness (line 253)
- Add default value for ERROR_UNKNOWN_COMMAND variable (line 1070)
@github-actions
Copy link

🔍 Code Quality Report

�[0;35m[MONITOR]�[0m Code Review Monitoring Report

�[0;34m[INFO]�[0m Latest Quality Status:
SonarCloud: 0 bugs, 0 vulnerabilities, 22 code smells

�[0;34m[INFO]�[0m Recent monitoring activity:
Fri Feb 13 15:50:48 UTC 2026: Code review monitoring started
Fri Feb 13 15:50:48 UTC 2026: SonarCloud - Bugs: 0, Vulnerabilities: 0, Code Smells: 22

📈 Current Quality Metrics

  • BUGS: 0
  • CODE SMELLS: 22
  • VULNERABILITIES: 0

Generated on: Fri Feb 13 15:50:50 UTC 2026


Generated by AI DevOps Framework Code Review Monitoring

@sonarqubecloud
Copy link

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🤖 Fix all issues with AI agents
In @.agents/scripts/codacy-collector-helper.sh:
- Around line 611-618: The --limit handler currently assigns user input directly
to the variable limit which is later interpolated into SQL as LIMIT $limit;
validate that limit is a positive integer immediately after parsing (for example
using a regex like ^[1-9][0-9]*$ or by testing with arithmetic) and
reject/return error if it fails, keeping the existing log_error "Missing value
for --limit" behavior but adding a new log_error "Invalid value for --limit"
path; ensure this validation runs before any SQL construction that uses LIMIT
$limit so only a numeric value is ever interpolated.
- Around line 406-412: The request_body is built by interpolating the cursor
directly into a JSON string which breaks if cursor contains JSON-special
characters; instead generate the JSON using jq to ensure proper escaping: use jq
-n with --arg for cursor and --argjson or --arg for CODACY_PAGE_SIZE to
construct an object with "limit" and conditionally include "cursor" only when
$cursor is non-empty, then assign the jq output back to request_body
(referencing the cursor variable, CODACY_PAGE_SIZE, and request_body in the
script).
- Around line 43-44: AUDIT_CONFIG and AUDIT_CONFIG_TEMPLATE are defined as
relative paths which break when the script is run from a non-repo CWD; change
their definitions to be anchored to the script/repo directory (use the existing
SCRIPT_DIR or computed repo root variable) so the values become absolute (e.g.,
prefix with "$SCRIPT_DIR/") and replace the current AUDIT_CONFIG and
AUDIT_CONFIG_TEMPLATE symbols with these anchored paths so config loading no
longer depends on the current working directory.
- Around line 255-319: The 429 handling currently consumes a retry attempt and
references ${MAX_RETRIES} without a default; update the 429 branch so rate-limit
waits do not consume the main retry budget by decrementing attempt
(attempt=$((attempt - 1))) before continuing or otherwise using a separate
counter, keep using CODACY_RATE_LIMIT_WAIT for sleep, and change the log message
to use the safe default ${MAX_RETRIES:-3} (and any other occurrences of
${MAX_RETRIES} without a default) so the script won’t fail under set -u; modify
the log_warn call and the branch for HTTP 429 accordingly (symbols: attempt,
MAX_RETRIES, CODACY_RATE_LIMIT_WAIT, log_warn).
- Around line 249-253: The script currently creates tmp files in the system temp
dir and allocates an unused tmp_headers file; change mktemp calls to create
files inside the agent workspace tmp directory (ensure that directory exists
before creating files) for tmp_response, remove the unused tmp_headers
allocation and its cleanup entry (remove references to tmp_headers in
push_cleanup and _run_cleanups), and also remove the corresponding curl argument
(-D "$tmp_headers") so headers are not written to a dead file; keep the existing
trap/_save_cleanup_scope and only push_cleanup for the actual tmp_response file.
- Around line 509-532: The page-level batch INSERTs are being executed as
individual autocommits and counted from the SQL file size; modify the execution
to wrap all generated INSERTs in a single transaction and compute the number of
successful inserts from the database, not the file. Concretely: when writing
$sql_file, prepend "BEGIN;" and append "COMMIT;" (or alter the jq output to emit
those statements) so db "$AUDIT_DB" <"$sql_file" runs the page as one
transaction; then replace the wc -l count with a DB-derived count by querying
total_changes() (e.g. run db "$AUDIT_DB" "SELECT total_changes();" immediately
after import and use that value for the count variable) so $count reflects
actual committed rows; keep using the existing db invocation and $sql_file and
the count variable names.
🧹 Nitpick comments (3)
.agents/scripts/codacy-collector-helper.sh (3)

79-86: sql_escape pre-processing is confusing and potentially lossy.

Lines 81–82 strip backslash-escaped quotes (\'', \"") before the actual SQL escaping on line 83. This is non-standard — if input legitimately contains a literal \' sequence (e.g., from a log message or file path), it silently mutates the data. The only escaping needed for SQLite string literals is doubling single quotes.

Proposed simplification
 sql_escape() {
 	local val="$1"
-	val="${val//\\\'/\'}"
-	val="${val//\\\"/\"}"
 	val="${val//\'/\'\'}"
 	echo "$val"
 	return 0
 }

418-422: Entire API response passed as a positional argument — may exceed ARG_MAX for large payloads.

insert_findings "$run_id" "$response" passes potentially hundreds of KB of JSON as $2. On Linux, ARG_MAX is typically ~2 MB, but on some systems or with many environment variables it can be tighter. A safer pattern is to write the response to a temp file and pass the path.

Sketch
 		local response
-		response=$(codacy_api_request "$endpoint" "POST" "$request_body") || {
+		local response_file
+		response_file=$(mktemp "${tmp_dir}/codacy-page.XXXXXX")
+		codacy_api_request "$endpoint" "POST" "$request_body" > "$response_file" || {
 			log_warn "Failed to fetch page ${page} — stopping pagination"
+			rm -f "$response_file"
 			break
 		}
 		...
-		page_count=$(insert_findings "$run_id" "$response")
+		page_count=$(insert_findings "$run_id" "$response_file")

Then in insert_findings, read from the file instead of echo "$response".

Also applies to: 477-537


510-512: Dedup key file:line will collapse distinct findings at the same location.

Two different rules flagging the same file and line will share a dedup_key (e.g., src/foo.js:42), causing one to be marked as a duplicate. Consider including the rule_id in the key:

-($path + ":" + $line) as $dedup_key |
+($path + ":" + $line + ":" + ((.patternInfo.id // "") | tostring)) as $dedup_key |

@marcusquinn marcusquinn merged commit c9f9a78 into main Feb 13, 2026
19 checks passed
@marcusquinn marcusquinn deleted the feature/t1032.2 branch February 13, 2026 17:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant