pytorch v2.7.0#383
Conversation
|
Hi! This is the friendly automated conda-forge-linting service. I just wanted to let you know that I linted all conda-recipes in your PR ( I do have some suggestions for making it better though... For recipe/meta.yaml:
This message was generated by GitHub Actions workflow run https://github.com/conda-forge/conda-forge-webservices/actions/runs/14306414081. Examine the logs at this URL for more detail. |
…nda-forge-pinning 2025.04.07.04.26.55
|
First thing we'll have to take care of is that there's no more support for pytorch-cpu-feedstock/recipe/build.sh Lines 12 to 14 in 46274b8 |
FWICS it only controls copying libomp files on Darwin. I suppose we could either send a small patch upstream to add another switch for that, or remove the files again after build. |
|
Looks like upstream broke some system libs logic in rc9. I'm going to look into that now. |
…nda-forge-pinning 2025.04.13.21.56.35
|
Hi! This is the friendly automated conda-forge-linting service. I was trying to look for recipes to lint for you, but it appears we have a merge conflict. Please try to merge or rebase with the base branch to resolve this conflict. Please ping the 'conda-forge/core' team (using the |
|
Hi! This is the friendly automated conda-forge-linting service. I just wanted to let you know that I linted all conda-recipes in your PR ( I do have some suggestions for making it better though... For recipe/meta.yaml:
This message was generated by GitHub Actions workflow run https://github.com/conda-forge/conda-forge-webservices/actions/runs/14776421980. Examine the logs at this URL for more detail. |
|
Okay, so I see the Windows build failed because we aren't using system nccl there, and OSX builds are failing because we are not setting |
|
Ok, Darwin is easy to fix — we just need to explicitly set The remaining question is nccl on Windows. |
|
Looks like something changed in one of the files we were rebasing on triton (i.e. the diff we're adding was using |
|
Thanks for all the help with this one @mgorny! 🙏 |
|
Thanks! |
|
Ugh, such a pointless failure... the windows build failed to upload after uploading 100% of the 474MB artefact each time. Any thoughts what's happening here @danpetry @bkentropy? CC @jezdez DetailsRemoved only irrelevant progress lines for the upload part. Kept the artefact sizes from the inspection step for clarity. |
|
argh, it failed again in the same way 😑 |
|
Sigh, and aarch+CUDA failed again as well (though unrelated to conda), by hanging in the test suite Curiously, this passed for 9ea420c, though I fail to see what conda-forge/triton-feedstock@d9a1100 could have broken there. Perhaps some other ambient change. @mgorny, can I interest you in investigating? (BTW, this kind of situation is why I try to keep the timeouts such that they pass in 99% of the builds, but not looser than that: 222e167) |
|
Windows upload failures are weird indeed. My first thought was to check if filenames are correct, but they are, and there is definitely no existing Was the previous AArch64 failure at the same percentage, roughly? |
|
At least AAarch64 seems to have passed now. Crossing fingers for Windows. |
|
Sigh, I guess we can't expect the win64 upload to fix itself… |
|
Any help here with the windows upload issues? 🙏 @danpetry @bkentropy @jezdez |
|
Sorry, I wasn't following what was going on... I've raised an incident report to the anaconda.org team internally |
|
Is it because the packages already exist? What happens if you delete existing packages? |
|
This smells like a race condition during the upload or some other state issue, I'd also delete and try again uploading those Windows builds (apologies for late notice due to endless conda-forge GH notifications). If in doubt, don't hesitate to open a ticket on https://github.com/conda/infrastructure! |
That's not an issue (or never has been AFAIK). If it exists in staging already, it just gets re-uploaded. But the upload then fails after getting to 100%. The error message makes me think it's perhaps somehow taking too long, and running into a timeout on the connection (on the server side). If you look at the error logs, further up, the error is just a few seconds more than 5min after the start of the upload every time. |
|
@wolf @baszalmstra, could you check on the health of the Windows server perhaps? For the last few days, any restart that I triggered for CI on main failed to pick up a runner. |
|
FYI, I re-ran the failed job for Windows with CUDA 12.6 out of curiosity to see if the upload issue would be automatically resolved. It actually did resolve itself. However, I'm uncertain about what triggered this resolution or if it was just a one-time success. Nevertheless, I wanted to share this update. https://github.com/conda-forge/pytorch-cpu-feedstock/actions/runs/14808030316/job/43078398670 (9th attempt) |
|
OMG finally! 🥳 |
Try to build release candidates; complicated by a combination of lack of upstream tarballs (pytorch/pytorch#150649) and the fact that conda cannot patch submodules when using
git_urlAlso picking up some useful changes from #378