Skip to content

Conversation

srilman
Copy link
Contributor

@srilman srilman commented Sep 3, 2025

Changes Made

Right now, PR CI can have up ~30 active jobs at a time. Our plan limits GitHub Actions concurrency to 60, so we're starting to see throttling when there are multiple PRs at the same time. Furthermore, we're limited to 5 macOS runners at a time, and each PR CI needs 3.

A lot of the integration tests jobs are short and do a lot of repetitive work, so I combined them. In addition, I trimmed down the unit tests to only test 3.9 (the oldest) and 3.12 (the newest that all libraries support) Python versions. Lastly, I removed macOS tests for the old Ray runner.

Checklist

  • Documented in API Docs (if applicable)
  • Documented in User Guide (if applicable)
  • If adding a new documentation page, doc is added to docs/mkdocs.yml navigation
  • Documentation builds and is formatted properly (tag @/ccmao1130 for docs review)

@github-actions github-actions bot added the ci label Sep 3, 2025
@colin-ho
Copy link
Contributor

colin-ho commented Sep 3, 2025

Top 10 Slowest Test Files by Total Time (minus tests/ai):

io/delta_lake/test_table_read_pushdowns.py - 130.14s total (748 tests, avg 0.17s each)
expressions/test_legacy_udf.py - 69.63s total (72 tests, avg 0.97s each)
test_context.py - 55.37s total (18 tests, avg 3.08s each)
io/delta_lake/test_table_read.py - 28.98s total (197 tests, avg 0.15s each)
window/test_running_agg.py - 14.37s total (72 tests, avg 0.20s each)
recordbatch/test_tokenize.py - 12.76s total (48 tests, avg 0.27s each)
dataframe/test_joins.py - 10.47s total (1,037 tests, avg 0.01s each)
window/test_order_by_only.py - 6.60s total (108 tests, avg 0.06s each)
io/delta_lake/test_table_write.py - 4.56s total (30 tests, avg 0.15s each)
io/test_s3_credentials_refresh.py - 4.41s total (1 test, 4.41s each)

Thoughts on moving io/delta_lake/test_table_read_* to integration/deltalake?

Potentially also moving tests/ai out?

@srilman
Copy link
Contributor Author

srilman commented Sep 3, 2025

I'm ok with it, adding

@srilman srilman marked this pull request as ready for review September 4, 2025 00:27
@srilman
Copy link
Contributor Author

srilman commented Sep 4, 2025

The failing test is the flaky one

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Summary

This PR optimizes the GitHub Actions CI pipeline to reduce concurrent job count from approximately 30 jobs to a much smaller number, addressing GitHub Actions concurrency limits (60 total jobs, 5 macOS runners). The changes include several key optimizations:

CI Workflow Restructuring: The main workflow file (.github/workflows/pr-test-suite.yml) has been significantly refactored to consolidate repetitive integration test jobs. Previously separate jobs for iceberg, unity, and delta-lake tests have been combined into a single integration-test-catalogs job that runs tests sequentially rather than in parallel matrices. Similarly, TPCH and SQL integration tests have been converted from matrix-based execution to sequential execution with hardcoded configurations.

Python Version Matrix Reduction: Unit tests now only run on Python 3.9 (oldest supported) and 3.12 (newest widely supported) instead of the previous 3.9, 3.10, and 3.11, reducing the test matrix size while maintaining coverage of the critical version boundaries.

Test Directory Reorganization: Delta Lake integration tests have been moved from tests/io/delta_lake/ to tests/integration/delta_lake/ to better categorize them as integration tests and support the consolidated CI job structure. This includes moving conftest.py, test_table_read.py, test_table_write.py, test_table_read_pushdowns.py, and utils.py, along with creating the necessary __init__.py package marker.

Infrastructure Improvements: The Unity Catalog test configuration now supports configurable ports through the UNITY_CATALOG_PORT environment variable, enabling parallel test execution without port conflicts. The workflow also includes fixes to skip check logic and updates to Slack notification conditions.

Platform-Specific Optimizations: macOS testing has been removed for the old Ray runner (flotilla: 0) to reduce resource consumption on the limited macOS runner pool.

These changes maintain essential test coverage while dramatically reducing CI resource consumption, allowing multiple PRs to run simultaneously without hitting GitHub Actions throttling limits.

Confidence score: 4/5

  • This PR appears safe to merge with careful monitoring of CI performance
  • Score reflects the complexity of CI workflow changes that could potentially affect test coverage or execution
  • Pay close attention to the workflow file changes and test reorganization to ensure no tests are accidentally dropped

8 files reviewed, 1 comment

Edit Code Review Bot Settings | Greptile

Comment on lines 85 to +86
- daft-runner: native
flotilla: 1
flotilla: 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: The exclusion logic appears inverted - this excludes native with flotilla: 0, but line 96 excludes macOS with flotilla: 0. Should native be excluded with flotilla: 1 instead?

Suggested change
- daft-runner: native
flotilla: 1
flotilla: 0
- daft-runner: native
flotilla: 1

Copy link

codecov bot commented Sep 4, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 71.88%. Comparing base (81c2e1a) to head (e70d37f).
⚠️ Report is 11 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #5122      +/-   ##
==========================================
- Coverage   76.54%   71.88%   -4.66%     
==========================================
  Files         953      951       -2     
  Lines      130653   130473     -180     
==========================================
- Hits       100006    93797    -6209     
- Misses      30647    36676    +6029     

see 149 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@srilman srilman merged commit 927e22b into main Sep 4, 2025
72 of 75 checks passed
@srilman srilman deleted the slade/truncate-pr-ci branch September 4, 2025 03:07
venkateshdb pushed a commit to venkateshdb/Daft that referenced this pull request Sep 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants