Improve reliability of certain managed python tests on Windows CI#17177
Improve reliability of certain managed python tests on Windows CI#17177
Conversation
155f3fd to
efcbc01
Compare
python_install test reliability on Windows
|
Looks like it's only half working now...? (after the change to using |
|
Original results with an extra columns for max-threads and for serialising all the python_install tests:
Some of this could be noise, but some tests seem to be slow running when ran with a bunch of other tests, and some seem to be very slow only with other python_install tests. But most seem to fall into the latter group. As a bonus, however, in a full run with no filters, there seems to be almost no overhead to running these all as serialised. (Updated with change %) |
b2db2ad to
e724c2c
Compare
e724c2c to
405b315
Compare
405b315 to
965d844
Compare
python_install test reliability on Windows965d844 to
a8d2eb2
Compare
|
Unfortunately this will miss, e.g., your new test case in #17218 We should think about how we can do better in the future. Maybe we should even just write a tool that generates the nextest config from feature flagged tests, if we can? |
a8d2eb2 to
4b9c43c
Compare

Summary
Results:
Comparing https://github.com/astral-sh/uv/actions/runs/20342580132/job/58446553156?pr=17177 against https://github.com/astral-sh/uv/actions/runs/20338690535/job/58431797575.
Overall test time went down from 279.163s to 258.427s - presumably not related though as this should be marginally slower. Here are some local tests (on linux):
Summary [ 34.452s] 45 tests run: 44 passed (12 slow), 1 failed, 3314 skippedthreads-required = "num-test-threads":Summary [ 117.991s] 45 tests run: 44 passed (1 slow), 1 failed, 3314 skippedthreads-required = 4:Summary [ 48.793s] 45 tests run: 44 passed (4 slow), 1 failed, 3314 skippedSummary [ 371.911s] 3357 tests run: 3356 passed (49 slow), 1 failed, 2 skippedthreads-required = "num-test-threads":Summary [ 451.323s] 3357 tests run: 3356 passed (26 slow), 1 failed, 2 skippedthreads-required = 4:Summary [ 386.213s] 3357 tests run: 3356 passed (31 slow), 1 failed, 2 skipped(Failed test is not related)
Conclusion
I think this is a good way to gain more reliability for these tests on all platforms, but I am not certain that we should be setting this override in the config. I think realistically you want something more like
num-test-threads / 3rather than just4.