{math}[GCCcore/13.2.0] ArmComputeLibrary v23.08#21309
{math}[GCCcore/13.2.0] ArmComputeLibrary v23.08#21309migueldiascosta wants to merge 4 commits intoeasybuilders:developfrom
Conversation
|
|
||
| buildopts = "os=linux arch=armv8a build=native multi_isa=1 " | ||
| buildopts += "Werror=0 debug=0 neon=1 opencl=0 embed_kernels=0 " | ||
| buildopts += "fixed_format_kernels=1 openmp=1 cppthreads=0 " |
There was a problem hiding this comment.
in particular, arch=armv8a multi_isa=1 should be more generic without loosing functionality/performance:
benchmarks on a64fx compared to arch=armv8.2-a-sve didn't show any difference
|
Test report by @migueldiascosta |
…asyconfigs into 20240904163603_new_pr_ArmComputeLibrary2308
|
When we set |
|
@migueldiascosta Is it worth still merging this now that we have a newer version merged? See |
|
@boegel That one is for a different toolchain. Do you suggest to update the version for the toolchain used in this PR? |
|
I'm ok with closing this PR, since we're not likely to enable ACL for PyTorch/2.3-foss-2023b (the one this was originally targeted at) |
|
Why not? Given the huge performance difference I'd actually update all PyTorch easyconfigs to use imkl on x68 and ACL for Arm maybe starting at 2023a, as 2022b is the oldest active one As for versions I'd use the PYPI PyTorch packages as reference |
|
@Flamefire just thought we would likely not bother. let me fix the shared library extension in this PR then, same as in the merged one |
Updated software
|
|
I expect that less tests rather than more will fail so changing those ECs will be little work with high gain, which makes it worth going back as far as reasonably possible. |
|
I can create those ACL PRs, yes. For PyTorch-2.1.2-foss-2023a.eb (GCCcore 12.3.0) though, not sure which ACL version to use, there was no |
|
from https://github.com/pytorch/pytorch/blob/v2.1.2/cmake/public/ComputeLibrary.cmake looks like anything higher than ACL 21.02 should be ok, but probably safer to use exactly ACL 21.02 for PyTorch 2.1.2 |
|
I found a way: Extract the wheel and run
|
(created using
eb --new-pr)The motivation for this easyconfig was that on Arm (at least on a64fx, but probably also applies to other Arm processors) a pip-installed PyTorch was multiple times faster than an easybuilt one, and an analysis with
perfshowed that ACL was being used (also a recent OpenBLAS with support for ARM_SVE, but should be taken care by using PyTorch with a more recent toolchain and OpenBLAS, e.g. #20489)This is not the most recent version of ACL, but PyTorch 2.3 (the one in #20489) says that the maximum supported version is 23.08
Using this with PyTorch 2.3 requires setting
USE_MKLDNN=ON,USE_MKLDNN_ACL=ON,USE_MKLDNN_CBLAS=ON, and a patch derived from Ryo-not-rio/oneDNN@ca60ff4 to the bundled oneDNN