-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ci][dask][gpu] Run Dask tests with LightGBM GPU version #5292
Conversation
@jgiannuzzi I just updated this branch with the changes from latest
Thanks for experimenting with this and pushing it forward! The Dask tests are the most comprehensive tests on distributed training that this project has, so enabling them on more environment and installation types is really valuable to the project. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to say that this is working!
I think that the timeout observed was actually not about the tests taking too long to run, but was related some combination of:
- the library-loading issues described in OSError: dlopen: cannot load any more object with static TLS #5584, and fixed by upgrading the
Linux_*
jobs to an environment with a newer GLIBC in [ci] switch to manylinux_2_28 for Linux artifacts (fixes #5514, fixes #5589) #5580 - the network-cleanup strategy in the Python package ([ci] prefer CPython in Windows test environment and use safer approach for cleaning up network (fixes #5509) #5510 (comment)), fixed by [ci] prefer CPython in Windows test environment and use safer approach for cleaning up network (fixes #5509) #5510
Given that, I pushed 06c2d3b removing the proposed timeouts for Azure DevOps from this PR.
The GPU jobs are taking less than 30 mins (build link).
Thanks very much for improving the test coverage @jgiannuzzi !
This pull request has been automatically locked since there has not been any recent activity since it was closed. |
As it was suggested in #5282 (comment) (previous attempt: #5285).
Using findings from #5282 (comment):