Reduce integration test CI time from ~30 min to ~16 min#2562
Conversation
Use pytest-split to distribute integration tests into 2 groups that run as separate GitHub Actions matrix jobs. Each group gets its own Postgres container, so there are no shared-state conflicts. Changes: - Add split-group [1, 2] dimension to Run-Integration-Tests matrix - Pass PYTEST_SPLITS/PYTEST_SPLIT_GROUP env vars through to pytest - Update coverage artifact names to include split group - Add .test_durations file with uniform weights for bootstrapping - integration.sh conditionally adds --splits/--group flags (no-op when env vars are unset, preserving local dev behavior) This roughly halves wall-clock time per Airflow version by running ~half the tests in each parallel job. The .test_durations file can be refreshed with real timings via --store-durations after the first CI run. Related: #2302 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add speedup-integration-tests-run to the push trigger so the workflow changes (pytest-split matrix) are picked up from this branch instead of main. Must be removed before merging. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR reduces integration test wall-clock time in CI by splitting the integration pytest run into two parallel GitHub Actions matrix jobs using pytest-split, while keeping local developer runs unchanged when split env vars are not set.
Changes:
- Add a
split-group: [1, 2]dimension to theRun-Integration-Testsworkflow matrix and passPYTEST_SPLITS/PYTEST_SPLIT_GROUPinto the test step. - Update integration coverage artifact names to include the split group so both halves can be uploaded and later combined.
- Add a bootstrap
.test_durationsfile and updatescripts/test/integration.shto conditionally addpytest-splitCLI args when running in CI.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
scripts/test/integration.sh |
Conditionally appends pytest-split arguments based on CI env vars to distribute tests across parallel jobs. |
.test_durations |
Adds an initial durations map (uniform weights) for pytest-split to use in least-duration splitting. |
.github/workflows/test.yml |
Expands the integration test matrix to run two parallel split groups and updates coverage artifact naming accordingly. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #2562 +/- ##
=======================================
Coverage 98.04% 98.04%
=======================================
Files 103 103
Lines 7586 7586
=======================================
Hits 7438 7438
Misses 148 148 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Update .test_durations with actual timings averaged from CI run 24495658555, replacing the uniform 1.0 bootstrap weights. The file now has 184 tests with real durations, achieving a balanced split across three groups (~390s each). Increase split-group from [1, 2] to [1, 2, 3] to further reduce wall-clock time. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Push triggered run results on the branch taking approx ~16 min: https://github.com/astronomer/astronomer-cosmos/actions/runs/24496792084?pr=2562
|
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
pankajastro
left a comment
There was a problem hiding this comment.
Looks good. I was just wondering if it makes sense to document how to refresh the .test_durations file in the contributing docs.
tatiana
left a comment
There was a problem hiding this comment.
@pankajkoti Amazing results, @pankajkoti ! Really happy how you were able to get from the original 1h to 16min.
Two minor feedback:
Would it be worth changing the PR title to represent the ultimate goal of this work (reduce integration tests from X to Y, or by Z%) - instead of the how?
WDYT of creating a follow-up ticket to automate the generation of .test_durations?
|
Thanks for the reviews @pankajastro and @tatiana. I’ve created a follow-up ticket to automate updates to the |
The `pre_condition` task group in `cosmos_manifest_selectors_example`
used `select=["+customers"]`, which left the DAG dependent on state
leaked from other tests. This made the integration test
`test_example_dag[cosmos_manifest_selectors_example]` flaky; passing
only when `pytest-split` happened to order another jaffle_shop DAG
before it in the same split (which pre-populated the required tables in
Postgres).
**Root cause**
Two gaps in the +customers selection:
1. Orphan seeds. In `altered_jaffle_shop`, `stg_orders` and
`stg_payments` read their data via `source('postgres_db', 'raw_orders' |
'raw_payments')`. The corresponding seeds (`raw_orders`,
`raw_payments`) are orphans in the manifest, nothing references them, so
`+` traversal skips them, and they never get loaded. `raw_customers` is
pulled in because each staging model has a `force_seed_dep CTE that does
select * from {{ ref('raw_customers') }}`.
2. Missing `orders` model. The
`relationships_orders_customer_id__customer_id__ref_customers_ test` is
attached to both `customers` and `orders` and queries `public.orders`.
`+customers` pulls the test in (it's a child of customers) but doesn't
build the orders model, so `customers.test` fails with `relation
public.orders" does not exist`. This also matters because the downstream
`local_example` / `aws_s3_example` / `gcp_gs_example` /
`azure_abfs_example` task groups all run the critical_path selector,
which is the union of `customers` and `orders`, so pre_condition needs
to leave both models present.
**Fix**
Change the pre_condition selector to:
`select=["+customers", "+orders", "raw_orders", "raw_payments"]`
- `+customers` / `+orders` build both final models and their upstream
`stg_*` models (and pull in `raw_customers` via `ref`)
- `raw_orders`, `raw_payments` explicitly seed the two orphan seeds so
the `source()` reads in `stg_orders` / `stg_payments` resolve
related: #2562
related: #2592
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Summary
Use pytest-split to distribute integration tests into 3 groups that run as separate GitHub Actions matrix jobs. Each group gets its own Postgres container, so there are no shared-state conflicts.
Changes:
split-group: [1, 2, 3]dimension toRun-Integration-TestsmatrixPYTEST_SPLITS/PYTEST_SPLIT_GROUPenv vars through to pytest.test_durationsfile with real timings from CI (184 tests, balanced ~390s per group)integration.shconditionally adds--splits/--groupflags (no-op when env vars are unset, preserving local dev behavior)Results (bottleneck job wall-clock):
How it works:
.test_durationsand uses theleast_durationalgorithm to bin-pack tests into balanced groups.test_durationsget assigned to the lightest group automaticallypytest --store-durationsperiodically or when we see the splits are not balanced and some of them are taking longercloses: #2302
related: #2547