Skip to content

test: full autofix pipeline validation#87

Merged
stranske merged 9 commits intomainfrom
test/autofix-full-pipeline-validation
Dec 29, 2025
Merged

test: full autofix pipeline validation#87
stranske merged 9 commits intomainfrom
test/autofix-full-pipeline-validation

Conversation

@stranske
Copy link
Copy Markdown
Owner

Purpose

This PR tests the complete autofix system including the new auto-escalation to Codex feature.

Intentional Errors

Cosmetic (Basic Autofix Should Fix)

  • Missing spaces around operators
  • Multiple imports on one line
  • Trailing whitespace
  • Inconsistent quotes
  • Unused imports

Non-Cosmetic (Requires Codex Escalation)

  • Type errors (mypy):
    • Wrong return type (str instead of int)
    • Incompatible type assignment in dict
    • Missing return statement
  • Test failures (pytest):
    • Intentional assertion failures
    • Type mismatch assertions

Expected Autofix Behavior

  1. ✅ Basic autofix runs, fixes formatting issues
  2. ✅ Gate still fails (mypy/pytest errors remain)
  3. 🆕 Auto-escalation detects autofix ran but Gate still fails
  4. 🆕 Adds agent:codex and autofix:escalated labels
  5. 🆕 Codex is dispatched to fix remaining issues
  6. Codex should fix the type errors and test failures

Testing

This validates PR stranske/Workflows#277 (auto-escalation feature).

stranske and others added 6 commits December 29, 2025 03:32
This PR intentionally includes:
- Black formatting violations (bad spacing, missing newlines)
- Ruff lint errors (unused imports, line too long, bad comparisons)
- Mypy type errors (wrong return types)
- Intentionally failing tests (assertion, exception, type errors)
- Actual useful coverage tests for adapters/base.py

Purpose: Validate the full autofix pipeline handles all CI failure modes
including quick autofix for lint/black, Codex dispatch for harder issues,
mypy fixes, and test failure resolution.
This file contains intentional errors to test the complete autofix system:

1. Formatting issues (cosmetic - ruff/black should fix):
   - Missing spaces around operators
   - Multiple imports on one line
   - Trailing whitespace
   - Inconsistent quotes

2. Type errors (non-cosmetic - needs Codex escalation):
   - Wrong return type (str instead of int)
   - Incompatible type assignment in dict
   - Missing return statement

3. Test failures (non-cosmetic - needs Codex escalation):
   - Intentional assertion failures
   - Type mismatch assertions

Expected autofix behavior:
1. Basic autofix runs, fixes formatting issues
2. Gate still fails (mypy/pytest errors)
3. Auto-escalation adds agent:codex label
4. Codex dispatched to fix remaining issues
Copilot AI review requested due to automatic review settings December 29, 2025 06:48
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +29 to +32
# Type error 1: wrong type assignment
def get_count() -> int:
return "not an int" # mypy error: returning str instead of int

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Return int from get_count

get_count is annotated to return int but returns the literal string "not an int", so mypy will flag the function and test_type_mismatch always fails when the tests run because the returned value is a str rather than an int; it needs to return an integer to keep typing and the test suite green.

Useful? React with 👍 / 👎.

Comment on lines +34 to +38
def process_items(items: List[str]) -> Dict[str, int]:
result: Dict[str, int] = {}
for item in items:
result[item] = "count" # mypy error: assigning str to int
return result
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Ensure process_items outputs int values

process_items declares it returns Dict[str, int] but assigns the string "count" to each entry, which triggers mypy’s incompatible types error and causes test_list_processing to fail because the returned values are str instead of int; the function should compute and return numeric counts.

Useful? React with 👍 / 👎.

Comment on lines +41 to +43
def calculate_total(values: List[int]) -> int:
total = sum(values)
# Missing return statement - mypy should catch this
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Add return in calculate_total

calculate_total is annotated to return an int but falls off the end without returning, so mypy reports a missing return and any caller would receive None instead of the summed total when this code path is executed; return total to satisfy the contract.

Useful? React with 👍 / 👎.

Comment on lines +51 to +53
expected = 42
actual = 41 # Wrong value
assert actual == expected, f"Expected {expected}, got {actual}"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Fix hardcoded failing assertion in test_intentional_failure

test_intentional_failure sets expected = 42 but actual = 41, making the assertion fail unconditionally whenever the test suite runs, independent of any underlying functionality; align the expected and actual values so the test can pass when behavior is correct.

Useful? React with 👍 / 👎.

@github-actions github-actions bot added the autofix Triggers autofix on PR label Dec 29, 2025
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Dec 29, 2025

Status | ✅ no new diagnostics
History points | 0
Timestamp | 2025-12-29 07:03:55 UTC
Report artifact | autofix-report-pr-87
Remaining | ∅
New | ∅
No additional artifacts

@github-actions
Copy link
Copy Markdown
Contributor

Autofix updated these files:

  • tests/test_autofix_validation.py

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Dec 29, 2025

🤖 Keepalive Loop Status

PR #87 | Agent: Codex | Iteration 0/5

Current State

Metric Value
Iteration progress [----------] 0/5
Action stop (no-checklists)
Gate success
Tasks 0/0 complete
Keepalive ✅ enabled
Autofix ❌ disabled

🔍 Failure Classification

| Error type | infrastructure |
| Error category | unknown |
| Suggested recovery | Capture logs and context; retry once and escalate if the issue persists. |

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a test file with intentional errors to validate the autofix pipeline's auto-escalation feature, which should detect when basic autofix (ruff/black) runs but CI still fails, then escalate to Codex for more complex fixes.

  • Adds test file with formatting issues (spacing, imports, quotes) that ruff/black should fix
  • Introduces type errors and test failures that require Codex intervention
  • Tests the workflow described in stranske/Workflows#277

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

File Description
tests/test_autofix_validation.py New test file containing intentional formatting errors, type errors, and failing tests to validate the autofix-to-Codex escalation pipeline
manager_database.egg-info/SOURCES.txt Updates package metadata to include the new test file in alphabetical order

Comment on lines +6 to +15

from adapters.base import connect_db, get_adapter
# Formatting issue 1: missing spaces around operators
x = 1 + 2 + 3
Copy link

Copilot AI Dec 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The imports os and sys appear to be unintentionally unused. These were not mentioned in the PR description's list of intentional errors. If these are meant to test unused import detection, they should be documented in the module docstring or the comment on line 45 should be updated to reflect all unused imports being tested.

Copilot uses AI. Check for mistakes.
"""

from adapters.base import connect_db, get_adapter
# Formatting issue 1: missing spaces around operators
Copy link

Copilot AI Dec 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Optional type is imported but never used in the file. This appears to be an unintentional unused import that was not mentioned in the PR description's list of intentional errors.

Copilot uses AI. Check for mistakes.
Comment on lines +29 to +43
return "not an int" # mypy error: returning str instead of int

def poorly_formatted_function(arg1: int, arg2: int, arg3: int) -> int:
"""Simple helper to keep formatting coverage."""
result = arg1 + arg2 + arg3
if result > 10:
return result
return result * 2

# Type error 2: incompatible types
def process_items(items: list[str]) -> dict[str, int]:
result: dict[str, int] = {}
for item in items:
result[item] = "count" # mypy error: assigning str to int
return result

class BadlyFormattedClass:
"""Class with formatting issues."""

def __init__(self, name: str, value: int):
self.name = name
self.value = value
# Type error 3: missing return
def calculate_total(values: list[int]) -> int:
total = sum(values)
# Missing return statement - mypy should catch this
Copy link

Copilot AI Dec 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Critical discrepancy: The PR description states that mypy errors should cause the Gate to fail and trigger Codex escalation. However, the Gate workflow (.github/workflows/pr-00-gate.yml line 47) has typecheck: false, meaning mypy is not run as part of the Gate check. While mypy runs in the separate CI workflow, it doesn't block PR merging through the Gate. This means the intentional type errors in this test file (lines 30-43) will not cause Gate failure and won't test the auto-escalation behavior as intended. Either the Gate workflow needs to enable typechecking, or the PR description needs to be updated to reflect what's actually being tested.

Copilot uses AI. Check for mistakes.
# Type error 3: missing return
def calculate_total(values: list[int]) -> int:
total = sum(values)
# Missing return statement - mypy should catch this
Copy link

Copilot AI Dec 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable total is not used.

Suggested change
# Missing return statement - mypy should catch this
return total

Copilot uses AI. Check for mistakes.
def compute(self, multiplier: int) -> int:
return self.value * multiplier

# Unused import (ruff will catch this)
Copy link

Copilot AI Dec 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'json' is not used.

Copilot uses AI. Check for mistakes.
@github-actions
Copy link
Copy Markdown
Contributor

✅ Codex Completion Checkpoint

Commit: cec0e1a
Recorded: 2025-12-29T07:03:27.313Z

No new completions recorded this round.

About this comment

This comment is automatically generated to track task completions.
The Automated Status Summary reads these checkboxes to update PR progress.
Do not edit this comment manually.

@stranske-automation-bot
Copy link
Copy Markdown
Collaborator

@codex Please address the 9 bot review comment(s) on this PR.

Focus on:

  1. Implementing suggested fixes that improve the code
  2. Skipping suggestions that don't apply (note why in your response)

The bot comment handler workflow has prepared context for you.

@chatgpt-codex-connector
Copy link
Copy Markdown

To use Codex here, create a Codex account and connect to github.

@stranske stranske merged commit 782942c into main Dec 29, 2025
24 of 25 checks passed
@stranske stranske deleted the test/autofix-full-pipeline-validation branch December 29, 2025 07:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

autofix:escalated autofix Triggers autofix on PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants