Skip to content

Reduce integration test CI time from ~30 min to ~16 min#2562

Merged
pankajkoti merged 4 commits into
mainfrom
speedup-integration-tests-run
Apr 16, 2026
Merged

Reduce integration test CI time from ~30 min to ~16 min#2562
pankajkoti merged 4 commits into
mainfrom
speedup-integration-tests-run

Conversation

@pankajkoti
Copy link
Copy Markdown
Contributor

@pankajkoti pankajkoti commented Apr 16, 2026

Summary

Use pytest-split to distribute integration tests into 3 groups that run as separate GitHub Actions matrix jobs. Each group gets its own Postgres container, so there are no shared-state conflicts.

Changes:

  • Add split-group: [1, 2, 3] dimension to Run-Integration-Tests matrix
  • Pass PYTEST_SPLITS/PYTEST_SPLIT_GROUP env vars through to pytest
  • Update coverage artifact names to include split group
  • Add .test_durations file with real timings from CI (184 tests, balanced ~390s per group)
  • integration.sh conditionally adds --splits/--group flags (no-op when env vars are unset, preserving local dev behavior)

Results (bottleneck job wall-clock):

Before splitting 2-way split 3-way split (this PR)
~30 min (Airflow 3.1) ~22 min ~16 min

How it works:

  • pytest-split reads .test_durations and uses the least_duration algorithm to bin-pack tests into balanced groups
  • Each matrix job gets its own GitHub Actions runner and Postgres service container — no shared state
  • New tests not in .test_durations get assigned to the lightest group automatically
  • The file can be refreshed with real timings via pytest --store-durations periodically or when we see the splits are not balanced and some of them are taking longer

closes: #2302
related: #2547

Use pytest-split to distribute integration tests into 2 groups that
run as separate GitHub Actions matrix jobs. Each group gets its own
Postgres container, so there are no shared-state conflicts.

Changes:
- Add split-group [1, 2] dimension to Run-Integration-Tests matrix
- Pass PYTEST_SPLITS/PYTEST_SPLIT_GROUP env vars through to pytest
- Update coverage artifact names to include split group
- Add .test_durations file with uniform weights for bootstrapping
- integration.sh conditionally adds --splits/--group flags (no-op
  when env vars are unset, preserving local dev behavior)

This roughly halves wall-clock time per Airflow version by running
~half the tests in each parallel job. The .test_durations file can
be refreshed with real timings via --store-durations after the
first CI run.

Related: #2302

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add speedup-integration-tests-run to the push trigger so the
workflow changes (pytest-split matrix) are picked up from this
branch instead of main. Must be removed before merging.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR reduces integration test wall-clock time in CI by splitting the integration pytest run into two parallel GitHub Actions matrix jobs using pytest-split, while keeping local developer runs unchanged when split env vars are not set.

Changes:

  • Add a split-group: [1, 2] dimension to the Run-Integration-Tests workflow matrix and pass PYTEST_SPLITS / PYTEST_SPLIT_GROUP into the test step.
  • Update integration coverage artifact names to include the split group so both halves can be uploaded and later combined.
  • Add a bootstrap .test_durations file and update scripts/test/integration.sh to conditionally add pytest-split CLI args when running in CI.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
scripts/test/integration.sh Conditionally appends pytest-split arguments based on CI env vars to distribute tests across parallel jobs.
.test_durations Adds an initial durations map (uniform weights) for pytest-split to use in least-duration splitting.
.github/workflows/test.yml Expands the integration test matrix to run two parallel split groups and updates coverage artifact naming accordingly.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 16, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 98.04%. Comparing base (f4fb470) to head (4db8cd2).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2562   +/-   ##
=======================================
  Coverage   98.04%   98.04%           
=======================================
  Files         103      103           
  Lines        7586     7586           
=======================================
  Hits         7438     7438           
  Misses        148      148           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Update .test_durations with actual timings averaged from CI run
24495658555, replacing the uniform 1.0 bootstrap weights. The file
now has 184 tests with real durations, achieving a balanced split
across three groups (~390s each).

Increase split-group from [1, 2] to [1, 2, 3] to further reduce
wall-clock time.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@pankajkoti
Copy link
Copy Markdown
Contributor Author

pankajkoti commented Apr 16, 2026

Push triggered run results on the branch taking approx ~16 min: https://github.com/astronomer/astronomer-cosmos/actions/runs/24496792084?pr=2562

Screenshot 2026-04-16 at 12 55 26 PM

Comment thread .github/workflows/test.yml Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread scripts/test/integration.sh
Copy link
Copy Markdown
Contributor

@pankajastro pankajastro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I was just wondering if it makes sense to document how to refresh the .test_durations file in the contributing docs.

Copy link
Copy Markdown
Collaborator

@tatiana tatiana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pankajkoti Amazing results, @pankajkoti ! Really happy how you were able to get from the original 1h to 16min.

Two minor feedback:

Would it be worth changing the PR title to represent the ultimate goal of this work (reduce integration tests from X to Y, or by Z%) - instead of the how?

WDYT of creating a follow-up ticket to automate the generation of .test_durations?

@pankajkoti pankajkoti changed the title Split integration tests across parallel CI jobs via pytest-split Reduce integration test CI time from ~30 min to ~16 min Apr 16, 2026
@pankajkoti
Copy link
Copy Markdown
Contributor Author

Thanks for the reviews @pankajastro and @tatiana. I’ve created a follow-up ticket to automate updates to the .test_durations file (I think this shouldn’t become urgent unless the distribution becomes noticeably imbalanced as we add more integration tests).

@pankajkoti pankajkoti merged commit aa4c770 into main Apr 16, 2026
89 checks passed
@pankajkoti pankajkoti deleted the speedup-integration-tests-run branch April 16, 2026 12:46
pankajkoti added a commit that referenced this pull request Apr 23, 2026
The `pre_condition` task group in `cosmos_manifest_selectors_example`
used `select=["+customers"]`, which left the DAG dependent on state
leaked from other tests. This made the integration test
`test_example_dag[cosmos_manifest_selectors_example]` flaky; passing
only when `pytest-split` happened to order another jaffle_shop DAG
before it in the same split (which pre-populated the required tables in
Postgres).

**Root cause**

Two gaps in the +customers selection:
1. Orphan seeds. In `altered_jaffle_shop`, `stg_orders` and
`stg_payments` read their data via `source('postgres_db', 'raw_orders' |
'raw_payments')`. The corresponding seeds (`raw_orders`,
`raw_payments`) are orphans in the manifest, nothing references them, so
`+` traversal skips them, and they never get loaded. `raw_customers` is
pulled in because each staging model has a `force_seed_dep CTE that does
select * from {{ ref('raw_customers') }}`.
2. Missing `orders` model. The
`relationships_orders_customer_id__customer_id__ref_customers_ test` is
attached to both `customers` and `orders` and queries `public.orders`.
`+customers` pulls the test in (it's a child of customers) but doesn't
build the orders model, so `customers.test` fails with `relation
public.orders" does not exist`. This also matters because the downstream
`local_example` / `aws_s3_example` / `gcp_gs_example` /
`azure_abfs_example` task groups all run the critical_path selector,
which is the union of `customers` and `orders`, so pre_condition needs
to leave both models present.

**Fix**

 Change the pre_condition selector to:

`select=["+customers", "+orders", "raw_orders", "raw_payments"]`

- `+customers` / `+orders` build both final models and their upstream
`stg_*` models (and pull in `raw_customers` via `ref`)
- `raw_orders`, `raw_payments` explicitly seed the two orphan seeds so
the `source()` reads in `stg_orders` / `stg_payments` resolve


related: #2562 
related: #2592

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@tatiana tatiana added this to the Cosmos 1.15.0 milestone May 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Identify what is leading integration tests run duration to be so long and fix it, if possible

4 participants