Conversation
Parse JUnit XML (pytest.xml) from each integration test job and aggregate results into a markdown trend report showing per-test pass/fail/skip status across the last 5 runs. Changes: - Add python/scripts/flaky_report/ package (JUnit XML parser + trend report generator following the sample_validation pattern) - Add upload-artifact steps to all 6 integration test jobs in both python-merge-tests.yml and python-integration-tests.yml - Add python-flaky-test-report aggregation job with history caching - Add --junitxml=pytest.xml to integration-tests.yml jobs (already present in merge-tests.yml) - Fix Cosmos job --junitxml path (use absolute path since uv run --directory changes cwd) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Guard against missing reports directory in load_current_run() - Only run report job when at least one integration test job completed (skip when all jobs are skipped, e.g. on pull_request events) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Use explicit provider name mapping in _derive_provider() so OpenAI renders correctly instead of 'Openai' - Fix operator precedence in workflow if-expressions by wrapping success/failure checks in parentheses Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Adds a Python-based “flaky test trend” report that aggregates per-job JUnit XML outputs from CI, persists a short history via GitHub Actions cache, and posts a consolidated markdown report to the workflow job summary.
Changes:
- Introduces
python/scripts/flaky_reportto parse multiplepytest.xmlartifacts, merge them, and generate a markdown trend report. - Updates Python CI workflows to produce
--junitxml=pytest.xmlper provider job and upload those XML files as artifacts. - Adds a new “Flaky Test Report” job to download artifacts, restore/save history cache, and publish the unified report.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| python/scripts/flaky_report/aggregate.py | Implements JUnit XML aggregation, history persistence, and markdown trend report generation. |
| python/scripts/flaky_report/main.py | Adds python -m scripts.flaky_report ... entry point that dispatches to the aggregator CLI. |
| python/scripts/flaky_report/init.py | Documents the flaky report package purpose and usage. |
| .github/workflows/python-merge-tests.yml | Uploads per-job pytest.xml artifacts and adds a downstream aggregation/report job with caching. |
| .github/workflows/python-integration-tests.yml | Uploads per-job pytest.xml artifacts and adds the same downstream aggregation/report job with caching. |
- Add File column showing module name (e.g., test_openai_chat_client) to disambiguate tests with the same function name across files - Detect pytest xfail tests in JUnit XML (type=pytest.xfail) and show them with a distinct warning emoji instead of skip emoji - Update legend to include xfail explanation Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When a test is inside a class, pytest writes the classname as e.g. 'pkg.test_file.TestClass'. The previous rsplit logic extracted 'TestClass' instead of 'test_file'. Now detect uppercase-starting segments as class names and use the preceding segment instead. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ocstring - Use datetime.now(timezone.utc) for accurate UTC timestamps - Catch ET.ParseError per-file so corrupt XML doesn't crash the report - Remove separate 'error' key from summary (errors folded into 'failed') - Fix _short_name docstring to show actual dotted classname::name format Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
TaoChenOSU
approved these changes
Apr 21, 2026
chetantoshniwal
approved these changes
Apr 22, 2026
This was referenced Apr 24, 2026
Open
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation and Context
As part of CI/CD hardening, we need better visibility into flaky integration tests to reduce noise and improve signal quality. Currently, there's no easy way to see which tests are intermittently failing across CI runs, which makes it hard to prioritize fixes or identify regressions.
This PR adds automated flaky test trend reporting and enables additional integration tests in CI.
Description
Flaky Test Trend Report
Adds a new CI job (
python-flaky-test-report) that runs after all integration test jobs complete. It follows the same artifact-upload → aggregate → Job Summary pattern used byscripts/sample_validation/.How it works:
pytest.xml) and uploads it as an artifactReport features:
CI Workflow Changes
--junitxml=pytest.xmlto all integration test jobs in both workflow filesactions/upload-artifactsteps to upload JUnit XML from each jobpython-flaky-test-reportaggregation job to bothpython-merge-tests.ymlandpython-integration-tests.yml--junitxmlpath (pre-existing bug:uv run --directorywrote XML to the wrong location)FOUNDRY_MODELS_ENDPOINT,FOUNDRY_MODELS_API_KEY,FOUNDRY_EMBEDDING_MODEL,FOUNDRY_EMBEDDING_DIMENSIONS) topython-merge-tests.ymlto enable the Foundry embedding integration test in CIFiles Changed
python/scripts/flaky_report/__init__.py— New packagepython/scripts/flaky_report/__main__.py— CLI entry pointpython/scripts/flaky_report/aggregate.py— JUnit XML parser, history management, trend report generator.github/workflows/python-merge-tests.yml— Artifact uploads, report job, Cosmos path fix, Foundry embedding env vars.github/workflows/python-integration-tests.yml— Artifact uploads, report job,--junitxmlflagContribution Checklist