Horovod 0.26.1 for PyTorch 1.11.0#17580
Horovod 0.26.1 for PyTorch 1.11.0#17580absrocks wants to merge 9 commits intoeasybuilders:developfrom
Conversation
|
@absrocks: Tests failed in GitHub Actions, see https://github.com/easybuilders/easybuild-easyconfigs/actions/runs/4471729272 bleep, bloop, I'm just a bot (boegelbot v20200716.01) |
|
@absrocks: Tests failed in GitHub Actions, see https://github.com/easybuilders/easybuild-easyconfigs/actions/runs/4473775953 bleep, bloop, I'm just a bot (boegelbot v20200716.01) |
easybuild/easyconfigs/h/Horovod/Horovod-0.26.1-foss-2021b-CUDA-11.4.1-PyTorch-1.11.0.eb
Outdated
Show resolved
Hide resolved
easybuild/easyconfigs/p/PyTorch/PyTorch-1.11.0-foss-2021b-CUDA-11.4.1.eb
Outdated
Show resolved
Hide resolved
| ('GMP', '6.2.1'), | ||
| ('numactl', '2.0.14'), | ||
| ('FFmpeg', '4.3.2'), | ||
| ('Pillow', '8.2.0'), # 8.3.2 |
There was a problem hiding this comment.
Note that Pillow 8.3.2 is used in this toolchain as dependency. I see you put that in a comment, was there a particular reason not to use it here?
There was a problem hiding this comment.
I initially tried with Pillow 8.3.2, but it has a compatibility error. The configuration requires Pillow lower version. Hence I used Pillow 8.2.0 as the same version was used in PyTorch-1.11.0-foss-2021a-CUDA-11.3.1.eb
There was a problem hiding this comment.
Ok, in that case two things will need to be done:
-
A clear comment should be put in the EasyConfig as to why the version is different. Typically a one-sentence mention of the error you'd get if you use newer pillow, plus a reference link to an issue describing the error more extensively (if needed). Feel free to just open an issue for that on
easybuild-easyconfigs, describe the error with a full trace of the output/error you get, and close your own issue with a remark that the solution is to downgrade to Pillow 8.2.0. -
An exception needs to be made for the CI to accept that a different version of Pillow will be used here than in other parts of the toolchain. You can do that by adding
Pillowto the dictionary here. The key should bePillowand the value should be a list of tuples, where the first element is the version ofpillowthat should be accepted (i.e.'8.2.0'in this case) and the second should be a list of the packages that are allowed to use this dependency version ofPillow, i.e. in your caser'PyTorch-1\.11\.0-. These are regular expressions, so that's why the.should be escaped there (there are plenty of examples in that file).
By the way, I see the CI is also tripping over the typing-extensions version, for the same reason (another version of typing-extensions is already in use in this toolchain). The same two points as above will apply to that. I.e. justify & document why we deviate from the version already use in the toolchain (1) and add it as an exception to CI (2).
easybuild/easyconfigs/p/PyTorch/PyTorch-1.11.0-foss-2021b-CUDA-11.4.1.eb
Show resolved
Hide resolved
|
@absrocks: Tests failed in GitHub Actions, see https://github.com/easybuilders/easybuild-easyconfigs/actions/runs/4474495793 bleep, bloop, I'm just a bot (boegelbot v20200716.01) |
|
@absrocks: Tests failed in GitHub Actions, see https://github.com/easybuilders/easybuild-easyconfigs/actions/runs/4525965297 bleep, bloop, I'm just a bot (boegelbot v20200716.01) |
|
@absrocks: Tests failed in GitHub Actions, see https://github.com/easybuilders/easybuild-easyconfigs/actions/runs/4526928056 bleep, bloop, I'm just a bot (boegelbot v20200716.01) |
|
@absrocks: Tests failed in GitHub Actions, see https://github.com/easybuilders/easybuild-easyconfigs/actions/runs/4527424069 bleep, bloop, I'm just a bot (boegelbot v20200716.01) |
|
Wait a second... this is the same PyTorch as in https://github.com/easybuilders/easybuild-easyconfigs/pull/16385/files, right? Because there, it does use a newer Pillow. Then, there is also #17272 As PyTorch is rather difficult to get merged (lots of issues pop up, with varying systems), I'd like to focus on getting either of those two merged, and simply build this Horovod on top of that... |
|
@absrocks If you want to continue with this PR, might I suggest updating the pytorch to use the (already merged) EasyConfig from #17272 ? You can sync this PR with the develop branch using the If you're not up for it, feel free to close the PR. Someone else can use your Horovod EasyConfig as a starting point to make one on top of #17272 . |
|
I'm going to close this PR. If you want to build a Horovod on top of #17272 as I previously suggested, I think the best course of action is to open a new PR. That way, there's a better chance that a reviewer sees it and picks it up :) |
No description provided.