Skip to content

chore(codex): bootstrap PR for issue #1407#1409

Merged
stranske merged 35 commits intomainfrom
codex/issue-1407
Feb 9, 2026
Merged

chore(codex): bootstrap PR for issue #1407#1409
stranske merged 35 commits intomainfrom
codex/issue-1407

Conversation

@stranske
Copy link
Copy Markdown
Owner

@stranske stranske commented Feb 9, 2026

Source: Issue #1407

Automated Status Summary

Scope

PR #1403 addressed issue #1402, but verification returned a CONCERNS verdict due to scope creep (unrelated file edits), non-standard glob matching, incomplete connector-side filtering for excluded paths (notably .agents/**), and insufficiently testable/configurable bot-comment dismissal behavior. This follow-up closes those gaps with a tighter scope, standard glob semantics, connector-side pre-filtering, improved CLI/env configuration, and stronger unit + mocked E2E coverage.

Context for Agent

Related Issues/PRs

Tasks

Scope Cleanup

  • Revert requirements.txt to its pre-follow-up version from the main branch
  • Restore scripts/langchain/structured_output.py to match the base branch version
  • Reset tests/test_fallback_chain_provider.py to its original state before PR chore(codex): bootstrap PR for issue #1402 #1403
  • Revert tests/test_structured_output.py to the main branch version
  • Restore agents/codex-1395.md to its pre-follow-up content

Bot Comment Dismissal Configuration

  • Update bot-comment-dismiss.js to accept maxAgeSeconds as an explicit input via CLI args and/or environment variable
  • Wire parsed maxAgeSeconds value into the dismissal logic
  • Add unit tests covering maxAgeSeconds parsing and defaulting behavior

Connector-Side File Filtering

  • Implement connector-side file filtering in chatgpt-codex-connector/ so that file selection excludes .agents/** before review comment construction
  • Add connector filtering tests that simulate repository file lists and assert excluded paths are ignored while included paths are processed

Standard Glob Matching

  • Add minimatch as an explicit dependency in package.json and update the lockfile
  • Replace custom regex-based glob matching with minimatch library calls in connector code
  • Remove or deprecate the custom regex converter implementation from the codebase
  • Add unit tests for character class glob patterns using minimatch semantics
  • Add unit tests for brace expansion glob patterns with multiple extensions
  • Add unit tests for escaped metacharacter handling in glob patterns

Timestamp Handling

  • Implement a timestamp picker helper function that prefers updated_at over created_at with snake_case and camelCase support
  • Integrate the timestamp picker into the age-filtering logic for comment dismissal
  • Add unit tests covering created_at timestamp field for age calculation
  • Add unit tests covering createdAt camelCase timestamp field for age calculation
  • Add unit tests covering updated_at timestamp field for age calculation
  • Add unit tests covering updatedAt camelCase timestamp field for age calculation
  • Add unit test for precedence when both created and updated timestamps are present

End-to-End Testing

  • Add an end-to-end style test using mocked GitHub API interactions that runs the dismissal flow and asserts no non-dismissed review comments remain for paths matching .agents/**

Acceptance criteria

Scope Cleanup Verification

  • The following files contain no diff relative to their pre-follow-up (main/base) versions: requirements.txt, scripts/langchain/structured_output.py, tests/test_fallback_chain_provider.py, tests/test_structured_output.py, agents/codex-1395.md.
    Verification: git diff --exit-code origin/main...HEAD -- requirements.txt scripts/langchain/structured_output.py tests/test_fallback_chain_provider.py tests/test_structured_output.py agents/codex-1395.md exits 0.

Bot Comment Dismissal Configuration Verification

  • bot-comment-dismiss.js accepts --maxAgeSeconds <int> and uses it to set the dismissal age threshold.
    Verification: unit test passes argv --maxAgeSeconds 3600 and asserts the value used by dismissal logic equals 3600 (not default).
  • When both CLI and environment are provided, CLI --maxAgeSeconds takes precedence over MAX_AGE_SECONDS (or the chosen env var name).
    Verification: unit test sets MAX_AGE_SECONDS=10 and argv --maxAgeSeconds 20; assert final maxAgeSeconds === 20.
  • If neither CLI nor environment specifies maxAgeSeconds, the script uses the documented default value (asserted in tests).
    Verification: unit test invokes parser with no env/argv; assert maxAgeSeconds === <default>.
  • Invalid maxAgeSeconds values are rejected: non-integer (abc), negative (-1), and zero (if not explicitly supported) cause a non-zero exit code or a thrown error in the parsing function.
    Verification: unit tests assert throw/non-zero and error message mentions parameter name and/or invalid value.

Connector-Side File Filtering Verification

  • Connector-side file selection excludes any paths under .agents/** before review comment construction, such that no review comments are constructed for excluded files.
    Verification: unit test inputs file list including .agents/issue-test-ledger.yml and src/app.ts and asserts .agents/... is absent from comment-construction inputs while src/app.ts remains included.
  • Connector-side filtering is applied even when an excluded file would otherwise match inclusion patterns; .agents/** remains excluded regardless of other include globs.
    Verification: unit test uses include globs like **/* and asserts .agents/... still excluded while src/app.ts included.
  • A connector filtering test suite exists and includes at least one test that simulates a repository file list and validates include/exclude behavior using concrete examples (.agents excluded, src included).
    Verification: test under chatgpt-codex-connector/**/__tests__/** (or existing test location) runs in normal JS test command and asserts those outcomes.

Standard Glob Matching Verification

  • Custom regex-based glob matching is removed from connector glob-matching code paths and replaced with a standard glob library (minimatch or equivalent) that is imported and used by call sites.
    Verification: code review confirms minimatch usage; prior regex converter deleted or unreferenced under chatgpt-codex-connector/**.
  • Glob matching supports character classes consistent with minimatch semantics (e.g., src/[ab].ts matches src/a.ts and src/b.ts but not src/c.ts).
    Verification: unit tests assert pass/fail for these examples.
  • Glob matching supports brace expansion (e.g., src/*.{ts,tsx} matches src/app.ts and src/view.tsx but not src/app.js).
    Verification: unit tests assert match behavior for these examples.
  • Glob matching correctly supports escaping of glob metacharacters (e.g., pattern docs/\[draft\].md matches literal filename docs/[draft].md).
    Verification: unit tests assert it matches exactly and does not match docs/draft.md.
  • The JS dependency for the chosen glob library is present in package.json (and lockfile updated accordingly) and npm test / pnpm test can resolve it without relying on transitive-only availability.
    Verification: dependency listed explicitly; CI install + tests pass without module-not-found errors.

Timestamp Handling Verification

  • Age-filtering logic treats created_at/createdAt and updated_at/updatedAt equivalently for determining comment age, using updated_* when present (or clearly defined precedence).
    Verification: unit tests cover four fixtures (created_at, createdAt, updated_at, updatedAt) with fixed now and maxAgeSeconds, asserting chosen timestamp and include/exclude decision.
  • If both created_* and updated_* are present on the same comment object, the logic uses the updated_* timestamp for age comparison (or documented precedence), covered by a unit test.
    Verification: fixture with both fields where created_* and updated_* straddle threshold; assert decision matches precedence.

End-to-End Testing Verification

  • An end-to-end style test exists that runs the dismissal flow with a mocked GitHub API client and verifies that, after execution, no non-dismissed review comments remain for paths matching .agents/**.
    Verification: mocked API seeded with mixed comments; test asserts dismissals were called for .agents/... and final state/list contains zero non-dismissed .agents/** comments.
  • The end-to-end dismissal test does not require live network access and fails if any real GitHub API request is attempted.
    Verification: explicit mock client and/or network blocking (e.g., nock disable-net-connect) is used; test fails on any unexpected network call.

Copilot AI review requested due to automatic review settings February 9, 2026 10:03
@stranske stranske added agent:codex Agent-created issues from Codex agents:keepalive Use to initiate keepalive functionality with agents autofix Opt-in automated formatting & lint remediation labels Feb 9, 2026
@stranske-keepalive
Copy link
Copy Markdown
Contributor

stranske-keepalive bot commented Feb 9, 2026

🤖 Keepalive Loop Status

PR #1409 | Agent: Codex | Iteration 5+6 🚀 extended

Current State

Metric Value
Iteration progress [##########] 5/5 5 base + 6 extended = 11 total
Action fix (fix-unknown)
Gate failure
Tasks 38/41 complete
Timeout 45 min (default)
Timeout usage 15m elapsed (34%, 30m remaining)
Keepalive ✅ enabled
Autofix ❌ disabled

🔍 Failure Classification

| Error type | infrastructure |
| Error category | resource |
| Suggested recovery | Confirm the referenced resource exists (repo, PR, branch, workflow, or file). |

🧠 Task Analysis

| Provider | ℹ️ anthropic |
| Model | claude-sonnet-4-5-20250929 |
| Confidence | 90% |

⚠️ Primary provider (GitHub Models) was unavailable; used anthropic instead.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds the standard Codex bootstrap markdown file for issue #1407 to integrate with the repo’s agent automation conventions.

Changes:

  • Create agents/codex-1407.md bootstrap marker for issue #1407.

@stranske-keepalive stranske-keepalive bot deleted a comment from agents-workflows-bot bot Feb 9, 2026
…tall

The minimatch dependency was declared as "file:node_modules/minimatch"
which causes npm install to fail with ENOENT when it tries to resolve
the local path as a tarball. Since minimatch is already vendored in
node_modules/minimatch/, require('minimatch') finds it automatically
without needing a package.json dependency entry.

Remove the broken file: reference and the lockfile that tracked it.
@stranske stranske merged commit a7fb6f8 into main Feb 9, 2026
@stranske stranske deleted the codex/issue-1407 branch February 9, 2026 17:12
@stranske stranske added the verify:compare Compare multiple LLM evaluations label Feb 9, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Feb 9, 2026

Provider Comparison Report

Provider Summary

Provider Model Verdict Confidence Summary
openai gpt-5.2 CONCERNS 78% The PR largely implements the requested behaviors: connector-side filtering now excludes .agents/** early via shared helpers (filterPaths/filterFileNodes), include-pattern support is added, u...
anthropic claude-sonnet-4-5-20250929 CONCERNS 85% The PR addresses most acceptance criteria but has significant gaps in the standard glob matching requirement. The implementation adds a custom glob-to-regex converter in bot-comment-dismiss.js rath...
📋 Full Provider Details (click to expand)

openai

  • Model: gpt-5.2
  • Verdict: CONCERNS
  • Confidence: 78%
  • Scores:
    • Correctness: 6.0/10
    • Completeness: 7.0/10
    • Quality: 5.0/10
    • Testing: 8.0/10
    • Risks: 4.0/10
  • Summary: The PR largely implements the requested behaviors: connector-side filtering now excludes .agents/** early via shared helpers (filterPaths/filterFileNodes), include-pattern support is added, unit tests cover character classes/brace expansion/escaped metacharacters, and bot-comment dismissal adds --maxAgeSeconds parsing with precedence and positive-int validation plus timestamp precedence favoring updated_* fields. The E2E-style mocked GitHub dismissal tests also satisfy the no-network requirement. However, the acceptance criteria around adopting the real minimatch library and removing custom regex glob matching are not fully met: a local minimatch reimplementation is committed, and bot-comment-dismiss.js still uses a custom regex-based glob engine. These are significant enough to warrant CONCERNS.
  • Concerns:
    • Standard glob matching acceptance requires replacing custom regex-based glob matching with the minimatch library. In .github/scripts/pr-context-graphql.js and .github/scripts/merge_manager.js, the code does require('minimatch'), but the repository adds a custom, local reimplementation under .github/scripts/node_modules/minimatch/index.js (version 0.0.0-local). This is not the real minimatch library and may diverge from minimatch semantics beyond the covered tests, undermining the stated acceptance criterion.
    • There is still custom regex-based glob matching logic in .github/scripts/bot-comment-dismiss.js (functions like globToRegExp, expandBraces, and a local minimatch() implementation). The acceptance criteria specifically called for replacing custom regex-based glob matching with minimatch and removing/deprecating the custom converter implementation. This script is central to the PR scope (dismissal behavior) and appears to retain the custom approach rather than using the same minimatch dependency.
    • Scope cleanup acceptance criterion expects no diff vs base for: requirements.txt, scripts/langchain/structured_output.py, tests/test_fallback_chain_provider.py, tests/test_structured_output.py, agents/codex-1395.md. These files are not shown in the diff excerpt, but given the PR changes are very broad (52 files, many unrelated docs/workflow/python scripts), the stated scope-tightening objective is questionable. Without explicit evidence in the diff that these are identical, this acceptance item cannot be confirmed from the provided changeset view.
    • Risk/maintainability: committing a vendored pseudo-node_modules/minimatch inside .github/scripts/ is nonstandard and may confuse dependency resolution, tooling, and future updates; it also risks subtle behavior differences compared to upstream minimatch (e.g., extglob, globstar edge cases, brace escaping nuances).
    • Path normalization and escaping behavior differs across components: pr-context-graphql.js uses minimatch options { dot: true, nocase: true, nocomment: true, nonegate: true }, while bot-comment-dismiss.js lowercases paths and patterns and uses a homegrown matcher. This inconsistency could cause include/ignore decisions to differ between connector-side filtering and dismissal logic.

anthropic

  • Model: claude-sonnet-4-5-20250929
  • Verdict: CONCERNS
  • Confidence: 85%
  • Scores:
    • Correctness: 7.0/10
    • Completeness: 6.0/10
    • Quality: 8.0/10
    • Testing: 7.0/10
    • Risks: 6.0/10
  • Summary: The PR addresses most acceptance criteria but has significant gaps in the standard glob matching requirement. The implementation adds a custom glob-to-regex converter in bot-comment-dismiss.js rather than using the minimatch library directly as required. While minimatch is added to package.json and used in merge_manager.js, bot-comment-dismiss.js implements its own pattern matching logic (lines 18-146). The acceptance criteria explicitly state 'Custom regex-based glob matching is removed from connector glob-matching code paths and replaced with a standard glob library (minimatch or equivalent) that is imported and used by call sites' and 'code review confirms minimatch usage; prior regex converter deleted or unreferenced'. The current implementation fails this requirement. Additionally, the timestamp handling logic prioritizes updated_at over created_at, which may cause incorrect age-based filtering of edited comments. Connector-side filtering is implemented but tests don't fully verify the comment construction flow. The maxAgeSeconds configuration, E2E testing, and most other requirements are properly implemented with good test coverage. Code quality is generally high with clear structure and comprehensive unit tests.
  • Concerns:
    • Standard glob matching verification incomplete: Acceptance criteria require unit tests for character classes, brace expansion, and escaped metacharacters using minimatch semantics, but the implementation uses a custom glob-to-regex converter instead of the minimatch library directly
    • Custom regex-based glob matching NOT removed: The acceptance criteria explicitly require 'Custom regex-based glob matching is removed from connector glob-matching code paths and replaced with a standard glob library (minimatch or equivalent)', but bot-comment-dismiss.js (lines 18-146) implements a custom globToRegExp function with expandBraces and escapeRegExp helpers
    • Minimatch library not used in bot-comment-dismiss.js: While minimatch is added to package.json and used in merge_manager.js, the bot-comment-dismiss.js file implements its own glob matching logic rather than importing and using minimatch
    • Connector-side filtering tests incomplete: While connector-exclusion-smoke.test.js adds tests for glob patterns, the acceptance criteria require verification that '.agents/**' is excluded 'before review comment construction', but the test file doesn't verify the comment construction flow
    • Timestamp handling uses updated_at with precedence over created_at: The implementation in bot-comment-dismiss.js line 249 uses 'updated_at || updatedAt || created_at || createdAt', but acceptance criteria state 'uses updated_* when present (or clearly defined precedence)' - this may cause age filtering to incorrectly dismiss recently edited old comments
    • End-to-end test network blocking incomplete: The E2E test at line 235 sets globalThis.fetch to throw, but doesn't verify that no real network calls occur - it only checks the mock was used

Agreement

  • Verdict: CONCERNS (all providers)
  • Correctness: scores within 1 point (avg 6.5/10, range 6.0-7.0)
  • Completeness: scores within 1 point (avg 6.5/10, range 6.0-7.0)
  • Testing: scores within 1 point (avg 7.5/10, range 7.0-8.0)

Disagreement

Dimension openai anthropic
Quality 5.0/10 8.0/10
Risks 4.0/10 6.0/10

Unique Insights

  • openai: Standard glob matching acceptance requires replacing custom regex-based glob matching with the minimatch library. In .github/scripts/pr-context-graphql.js and .github/scripts/merge_manager.js, the code does require('minimatch'), but the repository adds a custom, local reimplementation under .github/scripts/node_modules/minimatch/index.js (version 0.0.0-local). This is not the real minimatch library and may diverge from minimatch semantics beyond the covered tests, undermining the stated acceptance criterion.; There is still custom regex-based glob matching logic in .github/scripts/bot-comment-dismiss.js (functions like globToRegExp, expandBraces, and a local minimatch() implementation). The acceptance criteria specifically called for replacing custom regex-based glob matching with minimatch and removing/deprecating the custom converter implementation. This script is central to the PR scope (dismissal behavior) and appears to retain the custom approach rather than using the same minimatch dependency.; Scope cleanup acceptance criterion expects no diff vs base for: requirements.txt, scripts/langchain/structured_output.py, tests/test_fallback_chain_provider.py, tests/test_structured_output.py, agents/codex-1395.md. These files are not shown in the diff excerpt, but given the PR changes are very broad (52 files, many unrelated docs/workflow/python scripts), the stated scope-tightening objective is questionable. Without explicit evidence in the diff that these are identical, this acceptance item cannot be confirmed from the provided changeset view.; Risk/maintainability: committing a vendored pseudo-node_modules/minimatch inside .github/scripts/ is nonstandard and may confuse dependency resolution, tooling, and future updates; it also risks subtle behavior differences compared to upstream minimatch (e.g., extglob, globstar edge cases, brace escaping nuances).; Path normalization and escaping behavior differs across components: pr-context-graphql.js uses minimatch options { dot: true, nocase: true, nocomment: true, nonegate: true }, while bot-comment-dismiss.js lowercases paths and patterns and uses a homegrown matcher. This inconsistency could cause include/ignore decisions to differ between connector-side filtering and dismissal logic.
  • anthropic: Standard glob matching verification incomplete: Acceptance criteria require unit tests for character classes, brace expansion, and escaped metacharacters using minimatch semantics, but the implementation uses a custom glob-to-regex converter instead of the minimatch library directly; Custom regex-based glob matching NOT removed: The acceptance criteria explicitly require 'Custom regex-based glob matching is removed from connector glob-matching code paths and replaced with a standard glob library (minimatch or equivalent)', but bot-comment-dismiss.js (lines 18-146) implements a custom globToRegExp function with expandBraces and escapeRegExp helpers; Minimatch library not used in bot-comment-dismiss.js: While minimatch is added to package.json and used in merge_manager.js, the bot-comment-dismiss.js file implements its own glob matching logic rather than importing and using minimatch; Connector-side filtering tests incomplete: While connector-exclusion-smoke.test.js adds tests for glob patterns, the acceptance criteria require verification that '.agents/**' is excluded 'before review comment construction', but the test file doesn't verify the comment construction flow; Timestamp handling uses updated_at with precedence over created_at: The implementation in bot-comment-dismiss.js line 249 uses 'updated_at || updatedAt || created_at || createdAt', but acceptance criteria state 'uses updated_* when present (or clearly defined precedence)' - this may cause age filtering to incorrectly dismiss recently edited old comments; End-to-end test network blocking incomplete: The E2E test at line 235 sets globalThis.fetch to throw, but doesn't verify that no real network calls occur - it only checks the mock was used

@stranske
Copy link
Copy Markdown
Owner Author

🛑 Follow-up chain depth limit reached (depth 2/2)
This PR has been through multiple verification → follow-up cycles. To prevent diminishing-returns automation, needs-human has been applied instead of creating another follow-up issue.

A human should review whether the remaining concerns warrant manual action or can be accepted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agent:codex Agent-created issues from Codex agents:keepalive Use to initiate keepalive functionality with agents autofix Opt-in automated formatting & lint remediation needs-human Requires human intervention or review verify:compare Compare multiple LLM evaluations

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants