You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
On a multi-GPU machine, using apply_model() with a HF transformer model gives a runtime error if the model is moved to another GPU than the default one.
This gives a runtime error due to a device mismatch with the model preprocessor:
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! (when checking argument for argument weight in method wrapper_CUDA__cudnn_convolution)
System information
OS Platform and Distribution (e.g., Linux Ubuntu 22.04): Ubuntu 24.04 LTS
Python version (python --version): Python 3.12.7
FiftyOne version (fiftyone --version): FiftyOne v1.1.0, Voxel51, Inc.
FiftyOne installed from (pip or source): pip (via rye)
Other info/logs
After model conversion (into a FiftyOne Model), there are three occurrences of hardcoded "cuda" device like this:
=> Hence the mismatch when the model has been moved to another GPU than cuda:0.
This could be replaced by self.model.device and/or, at CTOR-time, storing the attribute as self.device = self.model.device.
Willingness to contribute
The FiftyOne Community encourages bug fix contributions. Would you or another
member of your organization be willing to contribute a fix for this bug to the
FiftyOne codebase?
Yes. I can contribute a fix for this bug independently
Yes. I would be willing to contribute a fix for this bug with guidance
from the FiftyOne community
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! (when checking argument for argument weight in method wrapper_CUDA__cudnn_convolution)
deltheil
changed the title
[BUG] Hugging Face Transformers: apply_model error due to hardcoded CUDA device
[BUG] apply_model fails on multi-GPU due to hardcoded CUDA device
Dec 17, 2024
Describe the problem
On a multi-GPU machine, using
apply_model()
with a HF transformer model gives a runtime error if the model is moved to another GPU than the default one.Code to reproduce issue
This gives a runtime error due to a device mismatch with the model preprocessor:
System information
python --version
): Python 3.12.7fiftyone --version
): FiftyOne v1.1.0, Voxel51, Inc.Other info/logs
After model conversion (into a FiftyOne
Model
), there are three occurrences of hardcoded"cuda"
device like this:fiftyone/fiftyone/utils/transformers.py
Lines 454 to 456 in e7f3edd
And then, at predict time,
self.device
is used to move the preprocessed inputs on the GPU:fiftyone/fiftyone/utils/transformers.py
Lines 702 to 705 in e7f3edd
=> Hence the mismatch when the model has been moved to another GPU than
cuda:0
.This could be replaced by
self.model.device
and/or, at CTOR-time, storing the attribute asself.device = self.model.device
.Willingness to contribute
The FiftyOne Community encourages bug fix contributions. Would you or another
member of your organization be willing to contribute a fix for this bug to the
FiftyOne codebase?
from the FiftyOne community
cc @brimoor
The text was updated successfully, but these errors were encountered: