-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gfx1102 : import torchaudio : Caught signal 11 (Segmentation fault: address not mapped to object at address 0x41b40) #74
Comments
Can confirm the same happens for me on a wildly different configuration: Manjaro, ROCm 6.1.2, Python 3.11.9, gfx1030 (RX 6800 XT), gcc 14.1.1. |
I need to check if latest changes/torch update broke something.
And will print:
Another thing I tested that whisper was able to get lyrics from mp3 song I tested. source /opt/rocm_sdk_612/bin/env_rocm.sh |
FWIW
A stack trace gives no interesting information, and while I've tried to build |
Can confirm |
Just verified that rebuilded up to date torchaudio still works for me on mageia 9/6900hs laptop. Which Linux distro you are using? |
Ubuntu 22.04 with Linux Kernel 6.10-rc2 |
Thanks for confirming, I will try to reproduce this. In the meantime I re-tested with some audio examples from https://pytorch.org/tutorials/beginner/audio_preprocessing_tutorial.html that they still works for me on latest build and give similar results that that tutorial. |
Update: I tried After I ran
I tried |
Good find, |
Oh excellent! What did you do to block |
Obviously, don't try this at home except for troubleshooting purposes, this is not a stable solution. |
Per this, there are knobs to twist to influence the ffmpeg dependency. Setting Of course none of this is a solution to people who need the For completeness & comparison with other systems, the version information as provided by ffmpeg:
Also, installing vanilla |
I am still not able to reproduce this on Ubuntu 22.04.4 or on Mageia 9. Then I followed this tutorial:
If I run the attached torch_audio_play.py it will work and show same diagrams from the audio than on tutorial.
I have these libtorio libs installed under rocm_sdk_612
|
@jrl290 Would it help if you rebuild the pytorch, pytorch vision and audio on sdk after you have installed the additional packages. (ffmpeg, etc)? You can rebuild them by removing their directories from builddir |
So, I think I've found the cause and a possible solution. The main problem appears to be that torchaudio downloads statically built versions of FFmpeg to build and link against, but at runtime, the system's FFmpeg is dynloaded. As a result, any version discrepancy has the potential to cause things to blow up. The packaged FFmpeg links against As luck (?) would have it, as of today, Arch/Manjaro have upgraded to FFmpeg 7. As a result torchaudio no longer crashes, but that's because it can't load FFmpeg at all anymore. There appears to be no work ongoing in the repo to include FFmpeg 7 support yet. I understand FFmpeg has some LGPL/GPL weirdness going on in the licensing that encourages this kind of stuff. It may be necessary or at least desirable to build our own FFmpeg specifically to avoid these issues, though I don't know if torchaudio should be leading on that. I was able to make torchaudio work again by simply copying the FFmpeg libs it uses to the ROCm directory: curl https://pytorch.s3.amazonaws.com/torchaudio/ffmpeg/2023-07-06/linux_x86_64/6.0.tar.gz | tar x
cp -a ffmpeg/lib/lib* /opt/rocm_sdk_612/lib The URL is taken from After this, torchaudio imports and confirms that FFmpeg is supported: >>> import torchaudio
>>> torchaudio.list_audio_backends()
['ffmpeg', 'soundfile'] (If this code only outputs We may be able to integrate this into the build, since CMake ends up downloading this file at some point, but I'm no CMake wizard so I don't know how exactly this should be done. Note that I haven't tested if there's any potential conflicts with an existing FFmpeg yet, since I can't, since my system has no FFmpeg 6 anymore. :P I have also not tested if the library files torchaudio downloads are in any way optimized or usable across distros; if they're bare-bones unoptimized it again may be necessary to do a custom optimized build to include in the ROCm SDK itself (if possible), as at least Arch has seen fit to make the jump to FFmpeg 7 and other distros may follow. |
Important note: simply removing the builddir is not sufficient to get these to rebuild cleanly. They each leave intermediates in their |
To resolve the cleanup issue, I have patched python projects to have preconfig_*.sh script. In pytorch case the preconfig script calls now pythons setup.py clean.
I had there earlier the "rm -rf build" but thought that it's not needed, as pytorch setup,py seemed to implement clean method. Do you think that should be changed to just call rm -rf "build"? |
Oh, that may be my bad -- maybe I just remembered the problem from earlier builds and did not recheck. Indeed, if I try that now it appears to work correctly. |
I am checking the cmakefiles now for ffmpeg support and I can see at least 3 possible way to solve the issue.
This option is selectable in pytorch_audio's root CMakeLists.txt
Option (3) is easiest to implement but it has also own problems because it's possible that the so-files extracted from the 6.0.tar.gz are not compatible in all distributions. (They are anyway linked to some other files that are expected to be on distro). I think we should anyway add first the support for option 3 and then check how to build the ffmpeg from source and use that version for all packages. |
Yeah, the codec support may be an issue for real use cases and may be motivation to find some way to make things work in a stable way with the system-provided FFmpeg in all cases, but that seems to be difficult without overhauling the way pytorch does things. It's not entirely clear to me why pytorch does things the way it does, with the weird mix of linking against its own copies but then hoping that the version on the system will be compatible with it while dynamically loading. Either go full static or full dynamic, not this weird mix with potential ABI problems. But then I have no experience developing against FFmpeg so maybe this is just my ignorance showing. :P |
I just checked with In fact now that Arch/Manjaro have FFmpeg 7 I should probably do a full rebuild to see if things even work for other packages anymore; audio will get by with no support but I'm not sure about the rest. There is not yet a compatibility package for FFmpeg 6 in the Arch repos, though I suspect one will pop up in AUR before too long (as there are packages for FFmpeg 4 and 5). |
In Mageia 9, the copying of the so files from pytorch_audio dir actually breaks the torch_audio.
causes
|
For completely up-to-date versions of Arch/Manjaro, where FFmpeg is now at version 7, a workaround for now appears to be to install the |
So Arch and Manjaro does have ffmpeg 4, 5 and 7 but not 6? There is also option (4) to test So file needs to look like this:
You need to remove the builddir/039_04_pytorch_audio and rocm_sdk_612/lib/libav* files you may have copied. If you want to debug this, you can add this line temporarily to to src_projects/pytorch_audio/third_party/ffmpeg/single/CMakeLists.txt It will then stop there and printout whether it found the ffmpeg libraries. |
Arch/Manjaro have a package for FFmpeg 4 for compatibility. There is an unofficial, user-maintained package for FFmpeg 5. There is not yet any package to provide FFmpeg 6, because, up until yesterday, that was the official version used before upgrading to FFmpeg 7. So if you're on Arch or Manjaro unstable and you're up to date, you currently have no way of building torch with FFmpeg support (unless you deliberately downgrade the package, which is not generally recommended as it can easily break things on a rolling distro). Arch packages generally provide headers without the need for a separate development package and This may be an option for Ubuntu or other systems if they offer devel packages for FFmpeg though, so that's something that could be explored for the OP. The issue might need to be split between "how to make things compatible for distros that offer FFmpeg 4-5-6 but might have a minor incompatibility" vs. "how to support FFmpeg 7", as the latter is a much bigger thing. Eventually upstream should get to that, though it might take a while. :P |
- by default the pytorch_audio is build against the prebuild ffmpeg that is downloaded by the third_party/ffmpeg/multi/CMakeLists.txt - those libraries are however not used during the runtime and in some linux distributions they are incompatible with the distro version of ffmpeg that is loaded on runtime and that can cause segfaults - ffmpeg-devel dependencies needs to be added also to install_deps.sh file for all supported linux distributions - fixes #74 Signed-off-by: Mika Laitio <[email protected]>
- by default the pytorch_audio is build against the prebuild ffmpeg that is downloaded by the third_party/ffmpeg/multi/CMakeLists.txt - prebuild ffmpeg libraries are however not used during the runtime and in some linux distributions they are incompatible with the distro version of ffmpeg that is loaded on runtime and that can cause segfaults - add also a patch to search the ffmpeg-devel headers from /usr/include/ffmpeg in addition of /usr/include because fedora 40 installs them there - ffmpeg-devel dependencies needs to be added also to install_deps.sh file for all supported linux distributions - fixes #74 Signed-off-by: Mika Laitio <[email protected]>
- by default the pytorch_audio is build against the prebuild ffmpeg that is downloaded by the third_party/ffmpeg/multi/CMakeLists.txt - prebuild ffmpeg libraries are however not used during the runtime and in some linux distributions they are incompatible with the distro version of ffmpeg that is loaded on runtime and that can cause segfaults - add patch to search the ffmpeg-devel headers from /usr/include/ffmpeg in addition of /usr/include - add patch to search ffmpeg devel headers and libraries from the ubuntu 22.04 specific location under /usr - ffmpeg-devel dependencies needs to be added also to install_deps.sh file for all supported linux distributions - fixes #74 Signed-off-by: Mika Laitio <[email protected]>
Bad news (possibly): after manually extracting the FFmpeg 6 libraries from the archived package into a separate directory and building torchaudio against it, the resulting package still fails with a segfault, even though the package now only contains a module for the system-specific FFmpeg (the FFmpeg 6 libraries have been copied to the ROCm dir). This means my original find may have been a red herring and I'm back to square one figuring out what's going wrong. |
Well, it took a while, but with gdb by my side I finally narrowed it down: the segfault happens when @jrl290: it would be nice if you could test if removing/renaming this symlink fixes things on your end too, to see if it's the same problem. |
The option that seems to cause offense (easy to find since it smelled suspiciously like something that could cause loading problems) is |
@jeroen-mostert You beat me, nice catch! I was finally able to reproduce the exact @jrl290's segfault on address 0x41b40 and according to strace it happened on mprotect call. I also tested building against the ubuntus ffmpeg headers and libraries and that did fixed the problem. I was just doing the torch audio debug build to trace torio_ffmpged library when I read your message. --enable-dynamic-dispatcher option packages to my code for all different gpu's build to same library and the loading of that somehow now fails on Ubuntu. I removed the --enable-dynamic-dispatcher call from all 4 amd-fwd builds and it fixed the issue on Ubuntu 24.4 also for me. Do you want to made a pull request from a patch that removed the "--enable-dynamic-dispatcher" option from all of these? 020_01_amd_fftw_single_precision.binfo |
Finally got through a fresh install, removed --enable-dynamic-dispatcher, and rebuilt And it seems this problem is solved. Thank you! Unfortunately, the GPU is still unstable during processing. The original hope for trying a gfx1102 build instead of gfx1100 (which AMD provides directly) was to correct the random GPU hangs that occur. Specifically:
This is no doubt a larger problem. Not at all related to rocm_sdk_builder. But be forewarned that it has been occurring for me on the Ryzen 7840U. And any debugging tips would be welcome. I'm not quite as advanced in this area as you guys |
I have no specific advice, but it sounds like that should probably be a separate issue, with steps to reproduce if possible. Even if there's no (easy) fix it will confirm for others with a 7840U that they're not alone. |
Add inline copies of ifunc resolvers to fix the segfault. This allows us to turn --enable-dynamic-dispatcher back on (in case it ever does anyone any good). Signed-off-by: Jeroen Mostert <[email protected]>
Built rocm_sdk_builder on Ubuntu 22.04 with Linux Kernel 6.10-rc2
Simple
import torchaudio
yieldedUpgraded torchaudio with:
Import afterward resulted in:
The text was updated successfully, but these errors were encountered: