-
Notifications
You must be signed in to change notification settings - Fork 553
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix the device inconsistency error in yolov7 training #397
Conversation
zhubochao seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
@zhubochao please fix lint and sign CLA |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pre-commit fixed lint error
Hi @zhubochao, thanks for your kind PR. Could you please click the badge to sign the CLA so that we could merge this PR. |
Motivation
when training custom data with yolov7, it give [RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)]. The same problem was shown stackoverflow and official yolov7 issue#1224
The cause seems to be that in
mmyolo/models/task_modules/assigners/batch_yolov7_assigner.py
line 309_from_which_layer = _from_which_layer[fg_mask_inboxes]
, the indicesfg_mask_inboxes
and_from_which_layer
are not in the same device.sys.platform: linux
Python: 3.9.15 (main, Nov 24 2022, 14:31:59) [GCC 11.2.0]
CUDA available: True
numpy_random_seed: 2147483648
GPU 0,1,2: Tesla T4
CUDA_HOME: /home/zbc/miniconda3/envs/test
NVCC: Cuda compilation tools, release 11.6, V11.6.124
GCC: gcc (GCC) 8.3.1 20190311 (Red Hat 8.3.1-3)
PyTorch: 1.13.1
PyTorch compiling details: PyTorch built with:
TorchVision: 0.14.1
OpenCV: 4.6.0
MMEngine: 0.3.2
MMCV: 2.0.0rc3
MMDetection: 3.0.0rc4
MMYOLO: 0.2.0+27487fd
In the commet of this blog, one says an easy downgrade of pytorch and cuda from
pytorch1.13 cu117
topytorch1.7 cu110
could solve this problem.Modification
move
matching_matrix
to the same device of_from_which_layer