Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable ROCm RNN-T Loss #2485

Closed
wants to merge 17 commits into from
Closed

Enable ROCm RNN-T Loss #2485

wants to merge 17 commits into from

Conversation

jpvillam-amd
Copy link
Contributor

@jpvillam-amd jpvillam-amd commented Jun 14, 2022

Added HIPIFY code and small changes for ROCm. Targeting RNN-T loss.

@mthrok
Copy link
Collaborator

mthrok commented Jun 15, 2022

Hi @jpvillam-amd

Thanks for the contribution. Can you rebase on the latest main branch?
We had a bunch of CI breakage but it's been fixed on main branch.

@jpvillam-amd
Copy link
Contributor Author

Hi @jpvillam-amd

Thanks for the contribution. Can you rebase on the latest main branch? We had a bunch of CI breakage but it's been fixed on main branch.

Sure thing

@mthrok
Copy link
Collaborator

mthrok commented Jun 15, 2022

CircleCI is not running. Let me close and reopen to send the web hook again.

@mthrok mthrok closed this Jun 15, 2022
@mthrok mthrok reopened this Jun 15, 2022
Copy link
Contributor

@malfet malfet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please guard USE_ROCM with CMake version check (as TorchAudio should be builder on systems with CMake older than 3.21)

Also, can you please add an explanation why explicit #ifdef __HIP_PLATFORM_AMD__ must be explicitly added .cu files? I.e. why can't hipify module take care of those mechanistically changes?

@@ -85,6 +85,10 @@ if(USE_CUDA)
enable_language(CUDA)
endif()

if(USE_ROCM)
enable_language(HIP)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HIP language is only available since CMake-3.21(see https://cmake.org/cmake/help/v3.21/command/enable_language.html), so please make USE_ROCM conditional on 3.21 CMake version

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The need for enable_language(HIP) was removed in the latest commit.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted the commit that refactored the HIP cmake integration -- will revisit in later PR.

Comment on lines 25 to 26
#ifdef USE_ROCM
// the stream to launch kernels in when using GPU.
hipStream_t stream_;
#endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, can we just add #define cudaStream_t hipStream_t under first USE_ROCM and then we would not need to duplicate the code here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typedef was added. Please re-review.

@jpvillam-amd jpvillam-amd force-pushed the rocm_rnnt branch 2 times, most recently from 1cbcda2 to eeb8367 Compare June 23, 2022 15:46
@jpvillam-amd jpvillam-amd force-pushed the rocm_rnnt branch 2 times, most recently from 332e298 to 45ec17a Compare July 6, 2022 00:00
Added check for rocm language and CMake 3.21
Combined stream types between CUDA and HIP.
@mthrok mthrok self-assigned this Jul 19, 2022
@mthrok mthrok requested a review from malfet December 13, 2022 16:44
@facebook-github-bot
Copy link
Contributor

@mthrok has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@jithunnair-amd
Copy link
Contributor

jithunnair-amd commented Mar 7, 2023

@malfet @mthrok It's not clear to me at all if these CI failures need any action from our end, since the failures are CUDA related. Can you please let us know what the next step should be here? Do we need another rebase?

@nateanl
Copy link
Member

nateanl commented Mar 7, 2023

cc @mthrok can you review this pr?

@jithunnair-amd
Copy link
Contributor

@mthrok @malfet Can you please review?

@facebook-github-bot
Copy link
Contributor

@mthrok has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@mthrok
Copy link
Collaborator

mthrok commented Jul 13, 2023

@jeffdaily
Copy link
Collaborator

@jithunnair-amd

Should we be concerned about ROCm build job failures??

libtorchaudio.so: undefined symbol: __kmpc_fork_call https://github.com/pytorch/audio/actions/runs/5539782179/jobs/10111098244?pr=2485 https://github.com/pytorch/audio/actions/runs/5539782179/jobs/10111098349?pr=2485

Yes, we need to fix those before we can land this PR.

@mthrok
Copy link
Collaborator

mthrok commented Jul 14, 2023

@jithunnair-amd
Should we be concerned about ROCm build job failures??
libtorchaudio.so: undefined symbol: __kmpc_fork_call https://github.com/pytorch/audio/actions/runs/5539782179/jobs/10111098244?pr=2485 https://github.com/pytorch/audio/actions/runs/5539782179/jobs/10111098349?pr=2485

Yes, we need to fix those before we can land this PR.

@jeffdaily @jithunnair-amd Got it. Would you be able to look into them?

@mthrok
Copy link
Collaborator

mthrok commented Aug 16, 2023

@jeffdaily @jithunnair-amd

Seems like there was ROCm CI migration. Could you rebase or merge upstream? If possible, I would like to include this PR for the upcoming release.

@jeffdaily
Copy link
Collaborator

@jeffdaily @jithunnair-amd

Seems like there was ROCm CI migration. Could you rebase or merge upstream? If possible, I would like to include this PR for the upcoming release.

Done. @mthrok

@jeffdaily
Copy link
Collaborator

Note that the CMakeLists.txt refactor was reverted. Let's first focus on landing this PR that adds RNN-T for ROCm, then a follow-up PR that modernizes the CMake files.

@jeffdaily
Copy link
Collaborator

Please guard USE_ROCM with CMake version check (as TorchAudio should be builder on systems with CMake older than 3.21)

Also, can you please add an explanation why explicit #ifdef __HIP_PLATFORM_AMD__ must be explicitly added .cu files? I.e. why can't hipify module take care of those mechanistically changes?

Would you be okay with accepting this PR as-is and then a follow-up PR that revisits the hipify strategy here? It would be nice to get the feature in. The hipify and CMakeLists.txt refactor was causing more trouble than expected, so it was backed out of this PR.

@mthrok
Copy link
Collaborator

mthrok commented Aug 18, 2023

Please guard USE_ROCM with CMake version check (as TorchAudio should be builder on systems with CMake older than 3.21)
Also, can you please add an explanation why explicit #ifdef __HIP_PLATFORM_AMD__ must be explicitly added .cu files? I.e. why can't hipify module take care of those mechanistically changes?

Would you be okay with accepting this PR as-is and then a follow-up PR that revisits the hipify strategy here? It would be nice to get the feature in. The hipify and CMakeLists.txt refactor was causing more trouble than expected, so it was backed out of this PR.

@jeffdaily Yes, build CI jobs were green, so I will merge this.

@facebook-github-bot
Copy link
Contributor

@mthrok has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@mthrok merged this pull request in c593961.

@jithunnair-amd
Copy link
Contributor

@mthrok This PR introduces a dependency on libomp.so which is available with a standard ROCm installation in /opt/rocm/llvm/lib/libomp.so. However, since it's not packaged with the torchaudio wheel currently, and is also not packaged with the torch wheels, it results in an import-time failure for torchaudio:

>>> import torchaudio
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.8/dist-packages/torchaudio/__init__.py", line 1, in <module>
    from . import (  # noqa: F401
  File "/usr/local/lib/python3.8/dist-packages/torchaudio/_extension/__init__.py", line 45, in <module>
    _load_lib("libtorchaudio")
  File "/usr/local/lib/python3.8/dist-packages/torchaudio/_extension/utils.py", line 64, in _load_lib
    torch.ops.load_library(path)
  File "/usr/local/lib/python3.8/dist-packages/torch/_ops.py", line 839, in load_library
    ctypes.CDLL(path)
  File "/usr/lib/python3.8/ctypes/__init__.py", line 373, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: libomp.so: cannot open shared object file: No such file or directory

I'm investigating how to package the libomp.so with the torchaudio wheel for ROCm. If there are any existing instances where you do the same for any build of torchaudio (ROCm or otherwise), please point me to it, so I can re-use the same flow.

cc @jeffdaily @atalman

@atalman
Copy link
Contributor

atalman commented Aug 29, 2023

@jeffdaily @jithunnair-amd how critical this PR is ? maybe we can just rever it ?

atalman added a commit to atalman/audio that referenced this pull request Aug 30, 2023
facebook-github-bot pushed a commit that referenced this pull request Aug 30, 2023
Summary:
This reverts commit c593961.

Unblock 2.1.0 rc

Pull Request resolved: #3586

Reviewed By: osalpekar

Differential Revision: D48842032

Pulled By: atalman

fbshipit-source-id: bbdf9e45c9aa5fde00f315a2ff491ed050bc1707
atalman added a commit that referenced this pull request Aug 31, 2023
Summary:
This reverts commit c593961.

Unblock 2.1.0 rc

Pull Request resolved: #3586

Reviewed By: osalpekar

Differential Revision: D48842032

Pulled By: atalman

fbshipit-source-id: bbdf9e45c9aa5fde00f315a2ff491ed050bc1707
@mthrok
Copy link
Collaborator

mthrok commented Sep 7, 2023

Hi @jeffdaily @jithunnair-amd - So this PR was reverted due to the packaging issue. I wanted to have this in the upcoming release and I do welcome the re-attempt. As to make the process faster for the future, I invited @jeffdaily for write permission. I hope we can continue and iterate on this feature. Regards

pruthvistony added a commit to ROCm/audio that referenced this pull request Nov 4, 2023
pruthvistony added a commit to ROCm/audio that referenced this pull request Nov 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants