diff --git a/.github/workflows/gh-aw-test-improvement.lock.yml b/.github/workflows/gh-aw-test-improvement.lock.yml index 5a74b9fa..a1d67faf 100644 --- a/.github/workflows/gh-aw-test-improvement.lock.yml +++ b/.github/workflows/gh-aw-test-improvement.lock.yml @@ -41,7 +41,7 @@ # # inlined-imports: true # -# gh-aw-metadata: {"schema_version":"v1","frontmatter_hash":"268cfc3d5e16d7108c5458825a159d4d643f02a8f574fb79afcf1ee8aba3e945"} +# gh-aw-metadata: {"schema_version":"v1","frontmatter_hash":"0cb0e71f5f360f41d4858c9db3ca1aa5c20d667d07d3204f5c50947d71e43a0b"} name: "Test Improver" "on": @@ -335,7 +335,21 @@ jobs: - Run the most relevant test command(s). **All tests — new and existing — must pass.** If the full suite is too heavy, run targeted tests. - If required commands, tests, or coverage cannot be run, call `noop`. Do not open a PR with untested test code. - ## Step 5: Quality Gate — Test Value Check + ## Step 5: Stability check — run new tests repeatedly + + New tests that pass once may still be flaky. Before filing a PR, verify stability by running each new or modified test multiple times. + + 1. Run each new or modified test **at least 5 times** in sequence and confirm every run passes. + - Use the test framework's built-in repeat/count flag when available (e.g., `go test -count=5`, `pytest -x --count 5` with `pytest-repeat`, `--repeat 5` in Jest/Vitest). + - If no built-in mechanism exists, use a simple shell loop: `for i in $(seq 1 5); do || exit 1; done` + 2. If any run fails intermittently, investigate the root cause before proceeding. Common sources of flakiness: + - Reliance on timing, sleep, or wall-clock assertions + - Shared mutable state between test cases + - Non-deterministic iteration order (e.g., map/set ordering) + - Dependence on external services or network + 3. If the test cannot be made reliably stable, do not include it in the PR. Call `noop` if no stable tests remain. + + ## Step 6: Quality Gate — Test Value Check Before creating the PR, evaluate each new test: @@ -347,7 +361,7 @@ jobs: If the tests don't pass this bar, call `noop`. Low-value tests are worse than no tests — they create maintenance burden and false confidence. - ## Step 6: Create the PR + ## Step 7: Create the PR 1. Commit the changes locally. 2. Call `create_pull_request` with: diff --git a/.github/workflows/gh-aw-test-improver.lock.yml b/.github/workflows/gh-aw-test-improver.lock.yml index 1784aabd..6f54c8a9 100644 --- a/.github/workflows/gh-aw-test-improver.lock.yml +++ b/.github/workflows/gh-aw-test-improver.lock.yml @@ -36,7 +36,7 @@ # # inlined-imports: true # -# gh-aw-metadata: {"schema_version":"v1","frontmatter_hash":"268cfc3d5e16d7108c5458825a159d4d643f02a8f574fb79afcf1ee8aba3e945"} +# gh-aw-metadata: {"schema_version":"v1","frontmatter_hash":"0cb0e71f5f360f41d4858c9db3ca1aa5c20d667d07d3204f5c50947d71e43a0b"} name: "Test Improver" "on": @@ -330,7 +330,21 @@ jobs: - Run the most relevant test command(s). **All tests — new and existing — must pass.** If the full suite is too heavy, run targeted tests. - If required commands, tests, or coverage cannot be run, call `noop`. Do not open a PR with untested test code. - ## Step 5: Quality Gate — Test Value Check + ## Step 5: Stability check — run new tests repeatedly + + New tests that pass once may still be flaky. Before filing a PR, verify stability by running each new or modified test multiple times. + + 1. Run each new or modified test **at least 5 times** in sequence and confirm every run passes. + - Use the test framework's built-in repeat/count flag when available (e.g., `go test -count=5`, `pytest -x --count 5` with `pytest-repeat`, `--repeat 5` in Jest/Vitest). + - If no built-in mechanism exists, use a simple shell loop: `for i in $(seq 1 5); do || exit 1; done` + 2. If any run fails intermittently, investigate the root cause before proceeding. Common sources of flakiness: + - Reliance on timing, sleep, or wall-clock assertions + - Shared mutable state between test cases + - Non-deterministic iteration order (e.g., map/set ordering) + - Dependence on external services or network + 3. If the test cannot be made reliably stable, do not include it in the PR. Call `noop` if no stable tests remain. + + ## Step 6: Quality Gate — Test Value Check Before creating the PR, evaluate each new test: @@ -342,7 +356,7 @@ jobs: If the tests don't pass this bar, call `noop`. Low-value tests are worse than no tests — they create maintenance burden and false confidence. - ## Step 6: Create the PR + ## Step 7: Create the PR 1. Commit the changes locally. 2. Call `create_pull_request` with: diff --git a/.github/workflows/gh-aw-test-improver.md b/.github/workflows/gh-aw-test-improver.md index d62fd1d1..a83ebb5d 100644 --- a/.github/workflows/gh-aw-test-improver.md +++ b/.github/workflows/gh-aw-test-improver.md @@ -126,7 +126,21 @@ Identify under-tested code paths, add focused tests, and remove or consolidate d - Run the most relevant test command(s). **All tests — new and existing — must pass.** If the full suite is too heavy, run targeted tests. - If required commands, tests, or coverage cannot be run, call `noop`. Do not open a PR with untested test code. -## Step 5: Quality Gate — Test Value Check +## Step 5: Stability check — run new tests repeatedly + +New tests that pass once may still be flaky. Before filing a PR, verify stability by running each new or modified test multiple times. + +1. Run each new or modified test **at least 5 times** in sequence and confirm every run passes. + - Use the test framework's built-in repeat/count flag when available (e.g., `go test -count=5`, `pytest -x --count 5` with `pytest-repeat`, `--repeat 5` in Jest/Vitest). + - If no built-in mechanism exists, use a simple shell loop: `for i in $(seq 1 5); do || exit 1; done` +2. If any run fails intermittently, investigate the root cause before proceeding. Common sources of flakiness: + - Reliance on timing, sleep, or wall-clock assertions + - Shared mutable state between test cases + - Non-deterministic iteration order (e.g., map/set ordering) + - Dependence on external services or network +3. If the test cannot be made reliably stable, do not include it in the PR. Call `noop` if no stable tests remain. + +## Step 6: Quality Gate — Test Value Check Before creating the PR, evaluate each new test: @@ -138,7 +152,7 @@ Before creating the PR, evaluate each new test: If the tests don't pass this bar, call `noop`. Low-value tests are worse than no tests — they create maintenance burden and false confidence. -## Step 6: Create the PR +## Step 7: Create the PR 1. Commit the changes locally. 2. Call `create_pull_request` with: