From 6105b62b6db41fc8f997e775f6c302897534c8b5 Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Thu, 26 Feb 2026 13:48:23 +0000 Subject: [PATCH 1/3] Initial plan From 6127ba84a71eaa4e097cdf45a7bbf105ece7a076 Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Thu, 26 Feb 2026 13:54:16 +0000 Subject: [PATCH 2/3] Add flaky test stability check step to test-improver workflow MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add Step 5: Stability check — run new tests repeatedly, which instructs the test-improver agent to run each new or modified test at least 5 times before filing a PR. Provides language-agnostic guidance with examples for Go, Python, JavaScript, and Ruby test frameworks, plus a shell loop fallback. Renumber subsequent steps accordingly. Co-authored-by: strawgate <6384545+strawgate@users.noreply.github.com> --- .../workflows/gh-aw-test-improvement.lock.yml | 20 ++++++++++++++++--- .../workflows/gh-aw-test-improver.lock.yml | 20 ++++++++++++++++--- .github/workflows/gh-aw-test-improver.md | 18 +++++++++++++++-- 3 files changed, 50 insertions(+), 8 deletions(-) diff --git a/.github/workflows/gh-aw-test-improvement.lock.yml b/.github/workflows/gh-aw-test-improvement.lock.yml index 5a74b9fa..c2efe17f 100644 --- a/.github/workflows/gh-aw-test-improvement.lock.yml +++ b/.github/workflows/gh-aw-test-improvement.lock.yml @@ -41,7 +41,7 @@ # # inlined-imports: true # -# gh-aw-metadata: {"schema_version":"v1","frontmatter_hash":"268cfc3d5e16d7108c5458825a159d4d643f02a8f574fb79afcf1ee8aba3e945"} +# gh-aw-metadata: {"schema_version":"v1","frontmatter_hash":"83801e7e9acc5954b0b6a3ac5842532c89300a09ed7a9ab7138af32251c990ec"} name: "Test Improver" "on": @@ -335,7 +335,21 @@ jobs: - Run the most relevant test command(s). **All tests — new and existing — must pass.** If the full suite is too heavy, run targeted tests. - If required commands, tests, or coverage cannot be run, call `noop`. Do not open a PR with untested test code. - ## Step 5: Quality Gate — Test Value Check + ## Step 5: Stability check — run new tests repeatedly + + New tests that pass once may still be flaky. Before filing a PR, verify stability by running each new or modified test multiple times. + + 1. Run each new or modified test **at least 5 times** in sequence and confirm every run passes. + - Use the test framework's built-in repeat/count flag when available (e.g., `go test -count=5`, `pytest -x --count 5` with `pytest-repeat`, `--repeat 5` in Jest/Vitest, `rspec --bisect` or loop in RSpec). + - If no built-in mechanism exists, use a simple shell loop: `for i in $(seq 1 5); do || exit 1; done` + 2. If any run fails intermittently, investigate the root cause before proceeding. Common sources of flakiness: + - Reliance on timing, sleep, or wall-clock assertions + - Shared mutable state between test cases + - Non-deterministic iteration order (e.g., map/set ordering) + - Dependence on external services or network + 3. If the test cannot be made reliably stable, do not include it in the PR. Call `noop` if no stable tests remain. + + ## Step 6: Quality Gate — Test Value Check Before creating the PR, evaluate each new test: @@ -347,7 +361,7 @@ jobs: If the tests don't pass this bar, call `noop`. Low-value tests are worse than no tests — they create maintenance burden and false confidence. - ## Step 6: Create the PR + ## Step 7: Create the PR 1. Commit the changes locally. 2. Call `create_pull_request` with: diff --git a/.github/workflows/gh-aw-test-improver.lock.yml b/.github/workflows/gh-aw-test-improver.lock.yml index 1784aabd..860b7c89 100644 --- a/.github/workflows/gh-aw-test-improver.lock.yml +++ b/.github/workflows/gh-aw-test-improver.lock.yml @@ -36,7 +36,7 @@ # # inlined-imports: true # -# gh-aw-metadata: {"schema_version":"v1","frontmatter_hash":"268cfc3d5e16d7108c5458825a159d4d643f02a8f574fb79afcf1ee8aba3e945"} +# gh-aw-metadata: {"schema_version":"v1","frontmatter_hash":"83801e7e9acc5954b0b6a3ac5842532c89300a09ed7a9ab7138af32251c990ec"} name: "Test Improver" "on": @@ -330,7 +330,21 @@ jobs: - Run the most relevant test command(s). **All tests — new and existing — must pass.** If the full suite is too heavy, run targeted tests. - If required commands, tests, or coverage cannot be run, call `noop`. Do not open a PR with untested test code. - ## Step 5: Quality Gate — Test Value Check + ## Step 5: Stability check — run new tests repeatedly + + New tests that pass once may still be flaky. Before filing a PR, verify stability by running each new or modified test multiple times. + + 1. Run each new or modified test **at least 5 times** in sequence and confirm every run passes. + - Use the test framework's built-in repeat/count flag when available (e.g., `go test -count=5`, `pytest -x --count 5` with `pytest-repeat`, `--repeat 5` in Jest/Vitest, `rspec --bisect` or loop in RSpec). + - If no built-in mechanism exists, use a simple shell loop: `for i in $(seq 1 5); do || exit 1; done` + 2. If any run fails intermittently, investigate the root cause before proceeding. Common sources of flakiness: + - Reliance on timing, sleep, or wall-clock assertions + - Shared mutable state between test cases + - Non-deterministic iteration order (e.g., map/set ordering) + - Dependence on external services or network + 3. If the test cannot be made reliably stable, do not include it in the PR. Call `noop` if no stable tests remain. + + ## Step 6: Quality Gate — Test Value Check Before creating the PR, evaluate each new test: @@ -342,7 +356,7 @@ jobs: If the tests don't pass this bar, call `noop`. Low-value tests are worse than no tests — they create maintenance burden and false confidence. - ## Step 6: Create the PR + ## Step 7: Create the PR 1. Commit the changes locally. 2. Call `create_pull_request` with: diff --git a/.github/workflows/gh-aw-test-improver.md b/.github/workflows/gh-aw-test-improver.md index d62fd1d1..d90b4fce 100644 --- a/.github/workflows/gh-aw-test-improver.md +++ b/.github/workflows/gh-aw-test-improver.md @@ -126,7 +126,21 @@ Identify under-tested code paths, add focused tests, and remove or consolidate d - Run the most relevant test command(s). **All tests — new and existing — must pass.** If the full suite is too heavy, run targeted tests. - If required commands, tests, or coverage cannot be run, call `noop`. Do not open a PR with untested test code. -## Step 5: Quality Gate — Test Value Check +## Step 5: Stability check — run new tests repeatedly + +New tests that pass once may still be flaky. Before filing a PR, verify stability by running each new or modified test multiple times. + +1. Run each new or modified test **at least 5 times** in sequence and confirm every run passes. + - Use the test framework's built-in repeat/count flag when available (e.g., `go test -count=5`, `pytest -x --count 5` with `pytest-repeat`, `--repeat 5` in Jest/Vitest, `rspec --bisect` or loop in RSpec). + - If no built-in mechanism exists, use a simple shell loop: `for i in $(seq 1 5); do || exit 1; done` +2. If any run fails intermittently, investigate the root cause before proceeding. Common sources of flakiness: + - Reliance on timing, sleep, or wall-clock assertions + - Shared mutable state between test cases + - Non-deterministic iteration order (e.g., map/set ordering) + - Dependence on external services or network +3. If the test cannot be made reliably stable, do not include it in the PR. Call `noop` if no stable tests remain. + +## Step 6: Quality Gate — Test Value Check Before creating the PR, evaluate each new test: @@ -138,7 +152,7 @@ Before creating the PR, evaluate each new test: If the tests don't pass this bar, call `noop`. Low-value tests are worse than no tests — they create maintenance burden and false confidence. -## Step 6: Create the PR +## Step 7: Create the PR 1. Commit the changes locally. 2. Call `create_pull_request` with: From 40b7fc02ea51ece28c1602ca85bb02b28bfc2dad Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Thu, 26 Feb 2026 13:55:10 +0000 Subject: [PATCH 3/3] Remove incorrect rspec --bisect example from stability check step Co-authored-by: strawgate <6384545+strawgate@users.noreply.github.com> --- .github/workflows/gh-aw-test-improvement.lock.yml | 4 ++-- .github/workflows/gh-aw-test-improver.lock.yml | 4 ++-- .github/workflows/gh-aw-test-improver.md | 2 +- 3 files changed, 5 insertions(+), 5 deletions(-) diff --git a/.github/workflows/gh-aw-test-improvement.lock.yml b/.github/workflows/gh-aw-test-improvement.lock.yml index c2efe17f..a1d67faf 100644 --- a/.github/workflows/gh-aw-test-improvement.lock.yml +++ b/.github/workflows/gh-aw-test-improvement.lock.yml @@ -41,7 +41,7 @@ # # inlined-imports: true # -# gh-aw-metadata: {"schema_version":"v1","frontmatter_hash":"83801e7e9acc5954b0b6a3ac5842532c89300a09ed7a9ab7138af32251c990ec"} +# gh-aw-metadata: {"schema_version":"v1","frontmatter_hash":"0cb0e71f5f360f41d4858c9db3ca1aa5c20d667d07d3204f5c50947d71e43a0b"} name: "Test Improver" "on": @@ -340,7 +340,7 @@ jobs: New tests that pass once may still be flaky. Before filing a PR, verify stability by running each new or modified test multiple times. 1. Run each new or modified test **at least 5 times** in sequence and confirm every run passes. - - Use the test framework's built-in repeat/count flag when available (e.g., `go test -count=5`, `pytest -x --count 5` with `pytest-repeat`, `--repeat 5` in Jest/Vitest, `rspec --bisect` or loop in RSpec). + - Use the test framework's built-in repeat/count flag when available (e.g., `go test -count=5`, `pytest -x --count 5` with `pytest-repeat`, `--repeat 5` in Jest/Vitest). - If no built-in mechanism exists, use a simple shell loop: `for i in $(seq 1 5); do || exit 1; done` 2. If any run fails intermittently, investigate the root cause before proceeding. Common sources of flakiness: - Reliance on timing, sleep, or wall-clock assertions diff --git a/.github/workflows/gh-aw-test-improver.lock.yml b/.github/workflows/gh-aw-test-improver.lock.yml index 860b7c89..6f54c8a9 100644 --- a/.github/workflows/gh-aw-test-improver.lock.yml +++ b/.github/workflows/gh-aw-test-improver.lock.yml @@ -36,7 +36,7 @@ # # inlined-imports: true # -# gh-aw-metadata: {"schema_version":"v1","frontmatter_hash":"83801e7e9acc5954b0b6a3ac5842532c89300a09ed7a9ab7138af32251c990ec"} +# gh-aw-metadata: {"schema_version":"v1","frontmatter_hash":"0cb0e71f5f360f41d4858c9db3ca1aa5c20d667d07d3204f5c50947d71e43a0b"} name: "Test Improver" "on": @@ -335,7 +335,7 @@ jobs: New tests that pass once may still be flaky. Before filing a PR, verify stability by running each new or modified test multiple times. 1. Run each new or modified test **at least 5 times** in sequence and confirm every run passes. - - Use the test framework's built-in repeat/count flag when available (e.g., `go test -count=5`, `pytest -x --count 5` with `pytest-repeat`, `--repeat 5` in Jest/Vitest, `rspec --bisect` or loop in RSpec). + - Use the test framework's built-in repeat/count flag when available (e.g., `go test -count=5`, `pytest -x --count 5` with `pytest-repeat`, `--repeat 5` in Jest/Vitest). - If no built-in mechanism exists, use a simple shell loop: `for i in $(seq 1 5); do || exit 1; done` 2. If any run fails intermittently, investigate the root cause before proceeding. Common sources of flakiness: - Reliance on timing, sleep, or wall-clock assertions diff --git a/.github/workflows/gh-aw-test-improver.md b/.github/workflows/gh-aw-test-improver.md index d90b4fce..a83ebb5d 100644 --- a/.github/workflows/gh-aw-test-improver.md +++ b/.github/workflows/gh-aw-test-improver.md @@ -131,7 +131,7 @@ Identify under-tested code paths, add focused tests, and remove or consolidate d New tests that pass once may still be flaky. Before filing a PR, verify stability by running each new or modified test multiple times. 1. Run each new or modified test **at least 5 times** in sequence and confirm every run passes. - - Use the test framework's built-in repeat/count flag when available (e.g., `go test -count=5`, `pytest -x --count 5` with `pytest-repeat`, `--repeat 5` in Jest/Vitest, `rspec --bisect` or loop in RSpec). + - Use the test framework's built-in repeat/count flag when available (e.g., `go test -count=5`, `pytest -x --count 5` with `pytest-repeat`, `--repeat 5` in Jest/Vitest). - If no built-in mechanism exists, use a simple shell loop: `for i in $(seq 1 5); do || exit 1; done` 2. If any run fails intermittently, investigate the root cause before proceeding. Common sources of flakiness: - Reliance on timing, sleep, or wall-clock assertions