Skip to content

GH#3179: fix critical review feedback on ai-judgment-helper.sh (PR #2914)#4236

Merged
alex-solovyev merged 3 commits intomainfrom
bugfix/GH-3179-ai-judgment-helper-review-fixes
Mar 12, 2026
Merged

GH#3179: fix critical review feedback on ai-judgment-helper.sh (PR #2914)#4236
alex-solovyev merged 3 commits intomainfrom
bugfix/GH-3179-ai-judgment-helper-review-fixes

Conversation

@alex-solovyev
Copy link
Collaborator

@alex-solovyev alex-solovyev commented Mar 12, 2026

Addresses all 10 findings from the Gemini code review on PR #2914 for .agents/scripts/ai-judgment-helper.sh.

Closes #3179

Changes

HIGH fixes

  • Remove 2>/dev/null on $AI_HELPER call — errors from auth/network issues are now visible for debugging
  • Replace fragile sed JSON extraction with a sed | tr | grep -o pipeline that correctly handles markdown code fences and multi-line LLM responses
  • Replace sed-based JSON field parsing with jq -r '.score // ""' and jq -r '.details // ""'
  • Replace sed-based JSONL field parsing with jq -r '.input // ""' etc. (handles whitespace, key order, escaped characters)

MEDIUM fixes

  • Replace echo -e in build_evaluator_message() with printf — prevents backslash escape injection from untrusted LLM input/output text
  • Replace manual JSON array construction with printf '%s\n' "${results[@]}" | jq -s .
  • Replace string-embedded result JSON with jq -n --argjson result "$result" for proper nested JSON objects
  • Replace manual summary JSON string with jq -n --argjson for safe, well-formed construction

Summary by CodeRabbit

  • Refactor
    • Improved message construction and JSON parsing mechanisms for evaluation workflows.
    • Enhanced error handling in AI invocation and output processing.
    • Strengthened structured output formatting for evaluation results and dataset summaries.
    • Upgraded result aggregation to use more robust formatting methods.

Address all 10 findings from PR #2914 Gemini code review:

CRITICAL:
- awk threshold comparison already used -v flags (was already safe)

HIGH:
- Remove 2>/dev/null suppression on AI_HELPER call to expose auth/network errors
- Replace fragile sed JSON extraction with sed+grep pipeline that handles
  markdown code fences and multi-line responses
- Replace sed-based JSON field parsing with jq (.score, .details)
- Replace sed-based JSONL field parsing with jq (.input, .output, .context, .expected)

MEDIUM:
- Replace echo -e with printf to prevent backslash escape injection from
  untrusted LLM input in build_evaluator_message()
- Replace manual JSON array construction with printf | jq -s .
- Replace string-embedded result JSON with jq --argjson for proper nesting
- Replace manual summary JSON string with jq -n --argjson for safe construction

All jq-based JSON construction uses --arg for strings and --argjson for
numeric/boolean types, preventing injection via special characters.
@github-actions github-actions bot added the bug Auto-created from TODO.md tag label Mar 12, 2026
@gemini-code-assist
Copy link

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refactors the .agents/scripts/ai-judgment-helper.sh script to address critical feedback from a code review. The primary goal was to enhance the script's robustness, security, and correctness, particularly in its handling of JSON data and untrusted inputs. By migrating from sed-based parsing to jq and replacing echo -e with printf, the changes ensure more reliable JSON processing, prevent potential prompt injection vulnerabilities, and improve overall script stability.

Highlights

  • Improved Error Visibility: Removed 2>/dev/null from $AI_HELPER calls, making authentication and network errors visible for debugging.
  • Robust JSON Extraction: Replaced fragile sed-based JSON extraction with a sed | tr | grep -o pipeline to correctly handle markdown code fences and multi-line LLM responses.
  • Enhanced JSON Parsing: Replaced sed-based JSON field parsing with jq -r '.score // ""' and jq -r '.details // ""' for more reliable extraction of scores and details.
  • Secure JSONL Field Parsing: Updated sed-based JSONL field parsing to use jq -r '.input // ""' and similar commands, improving handling of whitespace, key order, and escaped characters.
  • Prevented Prompt Injection: Replaced echo -e with printf in build_evaluator_message() to prevent backslash escape injection from untrusted LLM input/output text.
  • Safe JSON Array Construction: Replaced manual JSON array construction with printf '%s\n' "${results[@]}" | jq -s . for safer and more robust array generation.
  • Proper Nested JSON Objects: Replaced string-embedded result JSON with jq -n --argjson result "$result" for proper construction of nested JSON objects.
  • Well-Formed Summary JSON: Replaced manual summary JSON string construction with jq -n --argjson for safe and well-formed output.
Changelog
  • .agents/scripts/ai-judgment-helper.sh
    • Replaced echo -e with printf in build_evaluator_message to prevent backslash escape injection.
    • Removed 2>/dev/null from $AI_HELPER call to expose potential errors during AI helper execution.
    • Updated JSON extraction logic from raw AI responses to use a sed | tr | grep -o pipeline, improving handling of markdown code blocks and multi-line responses.
    • Migrated score and details parsing from sed to jq -r for increased robustness.
    • Added a check to ensure score is not 'null' before processing.
    • Modified result_json construction to use jq -cn --argjson for safer and more accurate JSON object creation.
    • Replaced manual JSON array construction for multiple evaluators with printf '%s\n' "${results[@]}" | jq -s ..
    • Updated JSONL field parsing (input, output, context, expected) to use jq -r for improved reliability.
    • Changed the method for combining row number and result into JSON using jq -n --argjson.
    • Updated aggregate statistics extraction (score, passed) to use jq -r.
    • Refactored the summary output to use jq -n --argjson for robust and well-formed JSON construction.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 12, 2026

Warning

Rate limit exceeded

@alex-solovyev has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 2 minutes and 43 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: d1dd96d0-20bf-4256-b9e6-172308161179

📥 Commits

Reviewing files that changed from the base of the PR and between 80384a8 and f9e675e.

📒 Files selected for processing (1)
  • .agents/scripts/ai-judgment-helper.sh

Walkthrough

This PR hardens the AI judgment helper script by replacing string-based message and JSON construction with printf and jq respectively. It removes stderr suppression, implements robust JSON extraction from AI responses, and ensures safe handling of untrusted LLM output through proper escaping and validation mechanisms.

Changes

Cohort / File(s) Summary
Message Construction
.agents/scripts/ai-judgment-helper.sh
Switches build_evaluator_message from echo-based string assembly to printf for safer output handling. Replaces all conditional message branches to emit formatted output directly via printf, eliminating backslash escape interpretation issues and removing echo -e usage.
JSON Parsing & Extraction
.agents/scripts/ai-judgment-helper.sh
Replaces naive sed-based JSON extraction with robust multi-step parsing: strips markdown fences, removes newlines, extracts first valid JSON object with "score" field, then validates presence before jq extraction. Uses jq to construct result JSON safely.
Error Handling & AI Invocation
.agents/scripts/ai-judgment-helper.sh
Removes stderr redirection (2>/dev/null) from AI helper calls to surface errors, while preserving fallback behavior when invocation fails.
JSONL & Batch Processing
.agents/scripts/ai-judgment-helper.sh
Upgrades JSONL field parsing to use jq instead of fragile sed patterns. Replaces manual JSON array construction with jq -s . for creating arrays from individual evaluator results. Updates row result formatting to use jq for proper structuring.
Dataset Summary Construction
.agents/scripts/ai-judgment-helper.sh
Switches from string concatenation to jq-based JSON object emission for final summary. Includes computed fields (rows, evaluations, average score, pass rate, counts) with proper type handling via jq rather than manual printf formatting.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

Suggested labels

needs-review-fixes

Poem

🛡️ Strings transform to structured streams,
Printf flows where echo once dreamed,
jq guards each JSON seam,
No more sed's fragile schemes—
Security and safety supreme! 🎯

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly references the PR objective (fixing GH#3179 review feedback on ai-judgment-helper.sh), accurately summarizing the main change.
Linked Issues check ✅ Passed The changes comprehensively address all 9 coding requirements from GH#3179: remove stderr suppression, replace sed JSON extraction with jq, use jq for field parsing, replace echo -e with printf, use jq for JSON construction, and implement robust error handling.
Out of Scope Changes check ✅ Passed All changes in ai-judgment-helper.sh are scoped to fixing the review findings in GH#3179; no unrelated modifications or scope creep detected.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch bugfix/GH-3179-ai-judgment-helper-review-fixes
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Contributor

🔍 Code Quality Report

�[0;35m[MONITOR]�[0m Code Review Monitoring Report

�[0;34m[INFO]�[0m Latest Quality Status:
SonarCloud: 0 bugs, 0 vulnerabilities, 402 code smells

�[0;34m[INFO]�[0m Recent monitoring activity:
Thu Mar 12 20:08:22 UTC 2026: Code review monitoring started
Thu Mar 12 20:08:22 UTC 2026: SonarCloud - Bugs: 0, Vulnerabilities: 0, Code Smells: 402

📈 Current Quality Metrics

  • BUGS: 0
  • CODE SMELLS: 402
  • VULNERABILITIES: 0

Generated on: Thu Mar 12 20:08:25 UTC 2026


Generated by AI DevOps Framework Code Review Monitoring

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request significantly improves the robustness of ai-judgment-helper.sh by addressing several critical and high-severity issues. The replacement of fragile sed-based parsing with jq for JSON manipulation is a major step forward, as is the switch from echo -e to printf for security. While the changes are excellent, a remaining point of fragility in the JSON extraction from the LLM response has been identified, emphasizing the need for robust JSON parsing tools like jq. Additionally, a minor code simplification is suggested, aligning with best practices for variable checks. Overall, this is a very positive set of fixes.

Comment on lines +896 to +900
json_result=$(echo "$raw_result" |
sed 's/^```[a-zA-Z]*//;s/^```//' |
tr -d '\n' |
grep -o '{[^{}]*"score"[^{}]*}' |
head -1)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

While this pipeline for extracting JSON is an improvement, the grep regex {[^{}]*"score"[^{}]*} is still fragile. It assumes that JSON values (like the details string) will not contain { or } characters. If the LLM returns a details string containing a code snippet or other structured text with these characters, the extraction will fail, and the valid response will be discarded.

For example, this valid response would fail to parse:

{"score": 0.8, "details": "The code has a bug in function foo() { ... }"}

This brittleness could lead to the script failing on valid LLM outputs. A fully robust solution in pure shell is difficult, but this limitation should be acknowledged. Consider either further constraining the LLM prompt to not use { or } in the details field, or using a tool with more advanced regex capabilities if portability allows. For robust JSON parsing, especially when content might contain special characters, jq is generally preferred over fragile grep regexes.

References
  1. To reliably parse JSON, especially when content might contain special characters, using jq is more robust than fragile grep regexes. While grep is preferred for simple tasks in portable scripts, complex JSON parsing often warrants a dedicated tool like jq.

details=$(echo "$json_result" | jq -r '.details // ""')

if [[ -n "$score" ]]; then
if [[ -n "$score" && "$score" != "null" ]]; then

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The condition "$score" != "null" is redundant and can be removed. The preceding command score=$(echo "$json_result" | jq -r '.score // ""') ensures that $score will be an empty string if the .score field is null or missing in the JSON; it will never be the literal string "null". Therefore, checking [[ -n "$score" ]] is sufficient to determine if a score was successfully parsed. This also aligns with the practice of adding guard clauses to check if a variable is empty before use.

Suggested change
if [[ -n "$score" && "$score" != "null" ]]; then
if [[ -n "$score" ]]; then
References
  1. In shell scripts, it's important to add guard clauses to check if a variable is empty before using it. This comment refines how to effectively check for an empty variable when its value is derived from jq using the // "" idiom, ensuring robust handling of potentially missing or null JSON fields.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in f9e675e — removed the redundant && "$score" != "null" check. jq -r '.score // ""' already coerces null to an empty string, so the literal-string guard was dead code. [[ -n "$score" ]] is now the sole condition.

@marcusquinn
Copy link
Owner

Dispatching remediation worker.\n- Model: default alternating (headless-runtime-helper)\n- Branch: bugfix/pr-4236-remediation\n- Scope: Repair failing checks on PR #4236 and close critical review loop.\n- Attempt: 1 of 1\n- Direction: Resolve CI/review blockers only; avoid unrelated refactors.

@github-actions
Copy link
Contributor

🔍 Code Quality Report

�[0;35m[MONITOR]�[0m Code Review Monitoring Report

�[0;34m[INFO]�[0m Latest Quality Status:
SonarCloud: 0 bugs, 0 vulnerabilities, 402 code smells

�[0;34m[INFO]�[0m Recent monitoring activity:
Thu Mar 12 20:49:46 UTC 2026: Code review monitoring started
Thu Mar 12 20:49:47 UTC 2026: SonarCloud - Bugs: 0, Vulnerabilities: 0, Code Smells: 402

📈 Current Quality Metrics

  • BUGS: 0
  • CODE SMELLS: 402
  • VULNERABILITIES: 0

Generated on: Thu Mar 12 20:49:50 UTC 2026


Generated by AI DevOps Framework Code Review Monitoring

@marcusquinn
Copy link
Owner

Addressed the critical review loop in commit 26220d8:\n\n- Replaced fragile brace-limited JSON extraction with a jq-based parse path that handles fenced output and details containing braces.\n- Simplified redundant score guard ( check only) per review feedback.\n- Verified checks: Codacy, CodeFactor, qlty, SonarCloud, Framework Validation all passing on latest commit.\n\nIf helpful, I can also resolve the two Gemini review threads manually in the UI.

@marcusquinn
Copy link
Owner

Follow-up clarification: the score guard now uses a single non-empty check only (no literal null comparison), matching the review suggestion.

jq -r '.score // ""' already coerces null to empty string, so
'&& "$score" != "null"' can never be true. Remove the dead condition.

Addresses gemini-code-assist review comment on PR #4236.
@github-actions
Copy link
Contributor

🔍 Code Quality Report

�[0;35m[MONITOR]�[0m Code Review Monitoring Report

�[0;34m[INFO]�[0m Latest Quality Status:
SonarCloud: 0 bugs, 0 vulnerabilities, 402 code smells

�[0;34m[INFO]�[0m Recent monitoring activity:
Thu Mar 12 21:06:28 UTC 2026: Code review monitoring started
Thu Mar 12 21:06:28 UTC 2026: SonarCloud - Bugs: 0, Vulnerabilities: 0, Code Smells: 402

📈 Current Quality Metrics

  • BUGS: 0
  • CODE SMELLS: 402
  • VULNERABILITIES: 0

Generated on: Thu Mar 12 21:06:31 UTC 2026


Generated by AI DevOps Framework Code Review Monitoring

@sonarqubecloud
Copy link

@alex-solovyev alex-solovyev merged commit bc517a2 into main Mar 12, 2026
17 checks passed
@alex-solovyev alex-solovyev deleted the bugfix/GH-3179-ai-judgment-helper-review-fixes branch March 12, 2026 21:41
alex-solovyev added a commit that referenced this pull request Mar 13, 2026
… details

The greedy capture("(?s)(?<json>\{.*\})") regex in the jq pipeline failed
when the LLM response contained multiple JSON objects (e.g. wrapper text with
its own braces before the score object). The greedy .* captured from the first
{ to the last }, producing invalid JSON that fromjson? discarded.

Add a second extraction strategy: after stripping fences, try each line as
JSON before falling back to the greedy capture. This handles the common case
where the LLM wraps the JSON object with prose on separate lines.

Strategy order:
  1. Parse whole response as JSON (plain response, no wrapper)
  2. Strip fences → try whole stripped text as JSON
  3. Strip fences → try each line as JSON (new — handles multi-object responses)
  4. Strip fences → greedy capture (last resort, validated by fromjson?)

All four strategies are validated by fromjson? so invalid captures are
silently discarded. Verified with 9 edge-case tests including braces in
details strings, fenced blocks, wrapper text, and multiple JSON objects.

Closes #4277 (HIGH finding from PR #4236 Gemini review)
alex-solovyev added a commit that referenced this pull request Mar 13, 2026
… details (#4320)

The greedy capture("(?s)(?<json>\{.*\})") regex in the jq pipeline failed
when the LLM response contained multiple JSON objects (e.g. wrapper text with
its own braces before the score object). The greedy .* captured from the first
{ to the last }, producing invalid JSON that fromjson? discarded.

Add a second extraction strategy: after stripping fences, try each line as
JSON before falling back to the greedy capture. This handles the common case
where the LLM wraps the JSON object with prose on separate lines.

Strategy order:
  1. Parse whole response as JSON (plain response, no wrapper)
  2. Strip fences → try whole stripped text as JSON
  3. Strip fences → try each line as JSON (new — handles multi-object responses)
  4. Strip fences → greedy capture (last resort, validated by fromjson?)

All four strategies are validated by fromjson? so invalid captures are
silently discarded. Verified with 9 edge-case tests including braces in
details strings, fenced blocks, wrapper text, and multiple JSON objects.

Closes #4277 (HIGH finding from PR #4236 Gemini review)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Auto-created from TODO.md tag needs-review-fixes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

quality-debt: .agents/scripts/ai-judgment-helper.sh — PR #2914 review feedback (critical)

2 participants