t3227: fix output-format duplication in paddleocr-helper.sh inline Python script by marcusquinn · Pull Request #4685 · marcusquinn/aidevops

marcusquinn · 2026-03-14T07:08:35Z

Summary

Refactors the inline Python OCR script in paddleocr-helper.sh to normalize both API paths into a common entries list before output formatting
Eliminates the triplicated json/tsv/plain output-format logic that existed separately in both the new API (.predict()) and legacy API (.ocr()) branches
Dismisses three false-positive date findings from the PR t1369.5: Verify PaddleOCR installation on Linux and fix 3.4.0 API breaking changes #2671 review (bot training cutoff predates 2026)

Changes

.agents/scripts/paddleocr-helper.sh — inline Python script refactor:

Before: output-format handling (json/tsv/plain) was duplicated inside both if use_new_api: and else: branches — 6 separate output blocks.

After: both API paths populate a common entries = [] list, then a single set of output-format handlers runs once. Net: -24 lines, zero behaviour change.

Review Findings Addressed

Finding	Severity	Action
Code duplication between new/legacy API output blocks	CRITICAL	Fixed — normalized to common entries list
Date `2026-03-01` flagged as future	MEDIUM	Dismissed — false positive, today is 2026-03-14
Date `Jan 2026` flagged as future	MEDIUM	Dismissed — false positive, PaddleOCR 3.4.0 released Jan 2026

Closes #3227

Summary by CodeRabbit

Refactor
- Enhanced OCR result processing consistency across different API versions.
- Streamlined result formatting for improved reliability in confidence scoring and coordinate handling.
- Maintained backward compatibility with legacy systems while optimizing normalization logic.

…ython script Normalize both new API (.predict()) and legacy API (.ocr()) paths into a common entries list before handling json/tsv/plain output. This removes the triplicated output-format logic that existed in both branches, addressing the code duplication finding from PR #2671 review. False-positive date findings (2026-03-01, Jan 2026) dismissed with explanatory comments — bot training cutoff predates 2026. Closes #3227

gemini-code-assist · 2026-03-14T07:08:46Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly improves the maintainability and readability of the paddleocr-helper.sh script by refactoring its inline Python component. The core change involves centralizing the output formatting logic, which was previously duplicated across different API branches. This consolidation reduces code redundancy and simplifies future modifications, while also addressing specific review feedback regarding false-positive date detections.

Highlights

Code Refactoring: Refactored the inline Python OCR script within paddleocr-helper.sh to normalize results from both new and legacy API paths into a common entries list before output formatting.
Duplication Elimination: Eliminated duplicated JSON, TSV, and plain text output-format logic that previously existed separately in both the new API (.predict()) and legacy API (.ocr()) branches.
Review Findings Addressed: Dismissed three false-positive date findings that were flagged during a previous review, clarifying that bot training cutoff predates 2026.

Changelog

.agents/scripts/paddleocr-helper.sh
- Fixed critical code duplication between new/legacy API output blocks by normalizing to a common entries list.
- Dismissed false-positive date finding '2026-03-01' as the bot training cutoff predates 2026.
- Dismissed false-positive date finding 'Jan 2026' as PaddleOCR 3.4.0 was released in Jan 2026.

Activity

Addressed critical code duplication identified in the inline Python script.
Dismissed three false-positive date findings from a previous review (t1369.5: Verify PaddleOCR installation on Linux and fix 3.4.0 API breaking changes #2671).

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

coderabbitai · 2026-03-14T07:09:00Z

Walkthrough

This refactor consolidates duplicated OCR result processing logic from two API response paths (new and legacy formats) into a unified entries structure, eliminating parallel formatting branches and centralizing output rendering.

Changes

Cohort / File(s)	Summary
OCR Result Processing Unification `.agents/scripts/paddleocr-helper.sh`	Normalizes both new API (rec_texts/rec_scores/rec_polys) and legacy API response formats into a common entries list containing text, confidence, and box fields. Replaces duplicated JSON/TSV/plain-text formatting logic with a single iteration over unified entries structure. Net reduction of 24 lines eliminates prior branching divergence.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Suggested labels

refactor

Poem

📋 Two APIs, once divided—now as one,
Duplicate paths converge before output is spun,
Text and confidence dance in entries so clean,
Where boxes align and the logic convenes. ✨

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main change: fixing output-format duplication in the paddleocr-helper script by normalizing API results.
Linked Issues check	✅ Passed	The PR successfully addresses all objectives from issue `#3227`: eliminates code duplication by normalizing API results into common entries list, and dismisses false-positive future-date findings with explanations.
Out of Scope Changes check	✅ Passed	All changes are directly scoped to issue `#3227` requirements: refactoring inline Python OCR result processing and addressing code review feedback; no extraneous modifications detected.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch bugfix/t3227-paddleocr-helper-critical

📝 Coding Plan

Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-03-14T07:09:12Z

🔍 Code Quality Report

�[0;35m[MONITOR]�[0m Code Review Monitoring Report

�[0;34m[INFO]�[0m Latest Quality Status:
SonarCloud: 0 bugs, 0 vulnerabilities, 347 code smells

�[0;34m[INFO]�[0m Recent monitoring activity:
Sat Mar 14 07:09:08 UTC 2026: Code review monitoring started
Sat Mar 14 07:09:08 UTC 2026: SonarCloud - Bugs: 0, Vulnerabilities: 0, Code Smells: 347

📈 Current Quality Metrics

BUGS: 0
CODE SMELLS: 347
VULNERABILITIES: 0

Generated on: Sat Mar 14 07:09:11 UTC 2026

Generated by AI DevOps Framework Code Review Monitoring

gemini-code-assist

Code Review

This pull request effectively refactors the inline Python script in paddleocr-helper.sh to remove significant code duplication. By normalizing the data from both new and legacy API paths into a common list before formatting the output, the code is now much more maintainable. The refactoring is well-executed. I have one security-related suggestion to further improve the robustness and integrity of the TSV output by sanitizing the text field.

gemini-code-assist · 2026-03-14T07:09:55Z

.agents/scripts/paddleocr-helper.sh

+elif output_format == "tsv":
+    print("text\tconfidence\tx1\ty1\tx2\ty2\tx3\ty3\tx4\ty4")
+    for entry in entries:
+        text = entry.get("text", "")


To ensure the integrity of the TSV output, it's important to sanitize the text field by removing any tab or newline characters. These characters can break the column structure of the TSV format, leading to parsing errors downstream.

Suggested change

text = entry.get("text", "")

text = entry.get("text", "").replace("\t", " ").replace("\n", " ")

References

Employ a defense-in-depth strategy for handling data. Sanitize data at each point of use, especially before outputting it in a specific format like TSV, to prevent parsing errors and maintain integrity.

sonarqubecloud · 2026-03-14T07:09:59Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

coderabbitai

🧹 Nitpick comments (1)

.agents/scripts/paddleocr-helper.sh (1)

515-523: Stabilize TSV column count for non-quad polygons.

The header is fixed to 4 points, but coordinate emission currently depends on the runtime polygon length. Consider normalizing to exactly 4 points (pad/truncate) before printing to keep downstream TSV parsing deterministic.

Proposed hardening

 elif output_format == "tsv":
     print("text\tconfidence\tx1\ty1\tx2\ty2\tx3\ty3\tx4\ty4")
     for entry in entries:
         text = entry.get("text", "")
         confidence = entry.get("confidence", 0.0)
         box = entry.get("box")
         if box:
-            coords = "\t".join(f"{p[0]:.0f}\t{p[1]:.0f}" for p in box)
+            pts = list(box)[:4]
+            while len(pts) < 4:
+                pts.append((0, 0))
+            coords = "\t".join(f"{p[0]:.0f}\t{p[1]:.0f}" for p in pts)
         else:
             coords = "\t".join(["0"] * 8)
         print(f"{text}\t{confidence:.4f}\t{coords}")

As per coding guidelines, "Automation scripts - focus on: Reliability and robustness".

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In @.agents/scripts/paddleocr-helper.sh around lines 515 - 523, The TSV output
can emit a variable number of coordinate columns because box may have fewer or
more than 4 points; normalize the polygon to exactly 4 points before building
coords in the loop over entries by padding with (0,0) points or truncating extra
points as needed so coords is always 8 values; update the code around the
entries loop (variables: entries, entry, box, coords) to create a normalized_box
of length 4 and then join its coordinates to produce a stable header-aligned TSV
row.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In @.agents/scripts/paddleocr-helper.sh:
- Around line 515-523: The TSV output can emit a variable number of coordinate
columns because box may have fewer or more than 4 points; normalize the polygon
to exactly 4 points before building coords in the loop over entries by padding
with (0,0) points or truncating extra points as needed so coords is always 8
values; update the code around the entries loop (variables: entries, entry, box,
coords) to create a normalized_box of length 4 and then join its coordinates to
produce a stable header-aligned TSV row.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 50189040-41c8-42bc-9212-1b9f8fd1244b

📥 Commits

Reviewing files that changed from the base of the PR and between 88a1ad6 and 69ed924.

📒 Files selected for processing (1)

.agents/scripts/paddleocr-helper.sh

…ython script (#4685) Normalize both new API (.predict()) and legacy API (.ocr()) paths into a common entries list before handling json/tsv/plain output. This removes the triplicated output-format logic that existed in both branches, addressing the code duplication finding from PR #2671 review. False-positive date findings (2026-03-01, Jan 2026) dismissed with explanatory comments — bot training cutoff predates 2026. Closes #3227

github-actions bot added the bug Auto-created from TODO.md tag label Mar 14, 2026

gemini-code-assist bot reviewed Mar 14, 2026

View reviewed changes

coderabbitai bot reviewed Mar 14, 2026

View reviewed changes

coderabbitai bot approved these changes Mar 14, 2026

View reviewed changes

alex-solovyev mentioned this pull request Mar 14, 2026

[Supervisor:alex-solovyev] 1 PR, 0 assigned, 1 worker at 17:30 UTC #4433

Closed

alex-solovyev merged commit db3dcab into main Mar 14, 2026
29 checks passed

alex-solovyev deleted the bugfix/t3227-paddleocr-helper-critical branch March 14, 2026 07:16

github-actions bot mentioned this pull request Mar 14, 2026

quality-debt: .agents/scripts/paddleocr-helper.sh — PR #2671 review feedback (critical) #3227

Closed

alex-solovyev mentioned this pull request Mar 14, 2026

t4696: fix Codacy not_collected misclassification in failure miner #4704

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

t3227: fix output-format duplication in paddleocr-helper.sh inline Python script#4685

t3227: fix output-format duplication in paddleocr-helper.sh inline Python script#4685
alex-solovyev merged 1 commit intomainfrom
bugfix/t3227-paddleocr-helper-critical

marcusquinn commented Mar 14, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

gemini-code-assist bot commented Mar 14, 2026

Uh oh!

coderabbitai bot commented Mar 14, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 14, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Mar 14, 2026

Uh oh!

sonarqubecloud bot commented Mar 14, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	text = entry.get("text", "")
	text = entry.get("text", "").replace("\t", " ").replace("\n", " ")

Conversation

marcusquinn commented Mar 14, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Review Findings Addressed

Summary by CodeRabbit

Uh oh!

gemini-code-assist bot commented Mar 14, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

coderabbitai bot commented Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Suggested labels

Poem

Uh oh!

github-actions bot commented Mar 14, 2026

🔍 Code Quality Report

📈 Current Quality Metrics

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 14, 2026

Choose a reason for hiding this comment

Uh oh!

sonarqubecloud bot commented Mar 14, 2026

Quality Gate passed

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

marcusquinn commented Mar 14, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 14, 2026 •

edited

Loading