pytorch 2.7.1; switch label on windows; turn on artefact persistence#391
Conversation
…nda-forge-pinning 2025.05.28.11.40.02
|
Hi! This is the friendly automated conda-forge-linting service. I just wanted to let you know that I linted all conda-recipes in your PR ( I do have some suggestions for making it better though... For recipe/meta.yaml:
This message was generated by GitHub Actions workflow run https://github.com/conda-forge/conda-forge-webservices/actions/runs/15680385882. Examine the logs at this URL for more detail. |
|
I'm all for it if it works. Presumably we'll get the artifact from the pull request CI runs, correct? If so, perhaps we could limit CI to just the problematic Windows build. |
|
@wolfv @baszalmstra, sorry for the ping, just checking the health of the windows server again as the two windows jobs here didn't start (there's one windows job running in another job here, but in the past it was possible to have up to 4 concurrent jobs - perhaps there are other long-running jobs elsewhere that I cannot see? 🤔) |
|
Seems that on the 9th try, the CI from the merge of #383 actually ran through 🥳 |
|
Kinda surprised they didn't bump Triton pin, but I guess the new version didn't have any significant changes. These 3 generic+CUDA failures don't look important but mkl+CUDA looks significant: I'm guessing something went really wrong somewhere. |
|
Looks like it was a flake after all. |
|
Unfortunately we still have a bunch of test failures, including a few that look very related to triton - perhaps pointing at some interaction with conda-forge/triton-feedstock#51 |
|
We still have a problem when testing the CUDA builds: The stacktrace looks very much triton related, but then again, we're pulling in the same version (3.3.0, not 3.3.1) as for the last passing build. |
|
Looking closer, these may be genuine triton bugs |
|
OK, triton 3.3.1 makes the problems worse. We go from 3 test failures to |
|
Uh, Perhaps triton-lang/triton#6928 can help? Not sure about the necessity of |
|
@wolfv @baszalmstra quick question: did the windows server get resized recently? We used to have up to 4 concurrent jobs, though in recent weeks it seems we're down to at most one job at the same time. |
|
We didn't touch anything |
|
Thanks for the response! In that case, what I imagine might have happened that there are some dead jobs from some failure somewhere along the line that are clogging up the queue. Is there a server dashboard that I could perhaps get access to? Being able to delete stale jobs would help a lot (if you trust me to be careful with that responsibility... FWIW, I have the same kind of access for https://github.com/Quansight/open-gpu-server, including deleting the occasional stale job). |
|
OK, let's get this in. Any follow-up discussions can be addressed in #393 |
|
Sigh, now the |
These are each intended as a work-around for conda/infrastructure#1159. Let's hope at least one works out.