test: autofix system validation with intentional failures#84
test: autofix system validation with intentional failures#84
Conversation
This PR intentionally includes: - Black formatting violations (bad spacing, missing newlines) - Ruff lint errors (unused imports, line too long, bad comparisons) - Mypy type errors (wrong return types) - Intentionally failing tests (assertion, exception, type errors) - Actual useful coverage tests for adapters/base.py Purpose: Validate the full autofix pipeline handles all CI failure modes including quick autofix for lint/black, Codex dispatch for harder issues, mypy fixes, and test failure resolution.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| def test_intentional_failure_assertion(): | ||
| """This test intentionally fails with an assertion error.""" | ||
| expected = 42 | ||
| actual = 41 | ||
| assert actual == expected, f"Expected {expected} but got {actual}" |
There was a problem hiding this comment.
Remove unconditional failing tests blocking suite
The three “intentional failure” tests here all fail unconditionally (41 == 42, followed by a KeyError and a TypeError in the two tests immediately below), and they are not marked xfail/skip. As written, any pytest run will halt on this block, keeping CI permanently red and preventing the rest of the suite from running; these should be guarded or deleted if the goal is a stable test run.
Useful? React with 👍 / 👎.
🤖 Keepalive Loop StatusPR #84 | Agent: Codex | Iteration 0/5 Current State
🔍 Failure Classification| Error type | infrastructure | |
There was a problem hiding this comment.
Pull request overview
This PR adds a validation test file that intentionally includes multiple categories of CI failures (Black formatting, Ruff lint errors, Mypy type errors, and failing tests) to validate the autofix system's ability to detect and fix various issue types. The file also includes some legitimate test coverage for adapters/base.py functions.
Key changes:
- Adds test file with intentional Black/Ruff violations for autofix validation
- Includes tests for
connect_db()andget_adapter()functions - Contains three intentionally failing tests to trigger autofix
| import collections | ||
|
|
||
| # E501: line too long | ||
| VERY_LONG_STRING_THAT_VIOLATES_LINE_LENGTH = "This is a very long string that definitely exceeds the maximum line length limit of 88 characters that ruff and black enforce by default" |
There was a problem hiding this comment.
The PR description lists E501 (line too long) as an intentional Ruff violation that should be caught, but the project's ruff configuration in pyproject.toml explicitly ignores E501 with ignore = ["E501"]. This intentional violation will not trigger a lint error and therefore will not validate the autofix system's handling of E501 errors.
| def bad_type_annotation(x: int) -> str: | ||
| return x # Returns int, claims str |
There was a problem hiding this comment.
The PR description lists Mypy type errors as intentional issues that should trigger Codex dispatch, but the project's mypy configuration in pyproject.toml excludes the tests directory and also has overrides that ignore errors in adapters. Since this test file imports from adapters.base, mypy type checking may not catch these errors as expected, preventing validation of the autofix system's handling of type errors.
| def test_intentional_failure_assertion(): | ||
| """This test intentionally fails with an assertion error.""" | ||
| expected = 42 | ||
| actual = 41 | ||
| assert actual == expected, f"Expected {expected} but got {actual}" |
There was a problem hiding this comment.
This test intentionally fails by asserting 41 == 42. While this is documented as an intentional failure for validation purposes, when this test runs it will cause the test suite to fail. Consider whether this test should be marked with a custom pytest marker (like @pytest.mark.skip(reason="Intentional failure for autofix validation")) until the autofix system is validated, to avoid breaking the main test suite.
| def test_intentional_failure_exception(): | ||
| """This test intentionally raises an exception.""" | ||
| data = {"key": "value"} | ||
| # This will raise KeyError | ||
| result = data["nonexistent_key"] |
There was a problem hiding this comment.
This test intentionally raises a KeyError. While this is documented as an intentional failure, when executed it will cause the test suite to fail. The same concern applies as with the other intentionally failing tests - consider using pytest markers to skip these until the autofix system has been validated, to prevent disrupting the normal development workflow.
| old_url = os.environ.pop("DB_URL", None) | ||
| old_path = os.environ.pop("DB_PATH", None) |
There was a problem hiding this comment.
The test manipulates environment variables directly with os.environ.pop(), but other tests in the codebase use pytest's monkeypatch fixture for environment variable management (see tests/test_adapter_base.py:16, tests/test_embeddings.py:31). Using monkeypatch ensures automatic cleanup and prevents test pollution. Consider accepting monkeypatch as a parameter and using monkeypatch.delenv() or monkeypatch.setenv() instead.
tests/test_autofix_validation.py
Outdated
|
|
||
| # --- BLACK VIOLATION: Bad formatting --- | ||
| import os,sys,time | ||
| from typing import Dict,List,Optional,Any |
There was a problem hiding this comment.
Import of 'Dict' is not used.
Import of 'List' is not used.
Import of 'Optional' is not used.
Import of 'Any' is not used.
| from typing import Dict,List,Optional,Any |
tests/test_autofix_validation.py
Outdated
| # --- BLACK VIOLATION: Bad formatting --- | ||
| import os,sys,time | ||
| from typing import Dict,List,Optional,Any | ||
| from adapters.base import connect_db,get_adapter,tracked_call |
There was a problem hiding this comment.
Import of 'tracked_call' is not used.
| from adapters.base import connect_db,get_adapter,tracked_call | |
| from adapters.base import connect_db,get_adapter |
tests/test_autofix_validation.py
Outdated
|
|
||
| # --- RUFF VIOLATIONS --- | ||
| # F401: unused import | ||
| import json |
There was a problem hiding this comment.
Import of 'json' is not used.
tests/test_autofix_validation.py
Outdated
| # --- RUFF VIOLATIONS --- | ||
| # F401: unused import | ||
| import json | ||
| import re |
There was a problem hiding this comment.
Import of 're' is not used.
| import re |
tests/test_autofix_validation.py
Outdated
| # F401: unused import | ||
| import json | ||
| import re | ||
| import collections |
There was a problem hiding this comment.
Import of 'collections' is not used.
| import collections |
🔍 Autofix System Evaluation CompleteThis PR was created to test the autofix pipeline. Findings documented in stranske/Workflows#266: Bugs Found
What Happened
Proposed FixAuto-dispatch Codex when quick autofix partially succeeds but can't complete. See stranske/Workflows#266 for implementation plan. This PR will remain open as a test case until the fixes in #266 are deployed. |
|
Status | ✅ no new diagnostics |
|
Clean-mode cosmetic autofix (
|
|
Closing to resubmit with fresh CI triggers after autofix label race condition fix (PR #274). |
Purpose
This PR validates the full autofix system by intentionally including multiple types of CI failures:
Intentional Issues
Expected Autofix Behavior
Quick autofix should handle:
Codex dispatch should handle:
Actual Coverage Added
The file also includes real tests for
adapters/base.py:test_connect_db_sqlite_default- Tests SQLite connectiontest_connect_db_with_timeout- Tests timeout parametertest_get_adapter_*- Tests adapter loadingEvaluation Criteria
Monitor the autofix pipeline for: