Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cuda Error when use dynamic margin #3

Open
huylgia opened this issue Feb 5, 2023 · 0 comments
Open

Cuda Error when use dynamic margin #3

huylgia opened this issue Feb 5, 2023 · 0 comments

Comments

@huylgia
Copy link

huylgia commented Feb 5, 2023

Hey Bro,
I designed the "ArcFace Dynamic Margin" referring to your module for my task but I met a CUDA error. If you know it, can you help me ?

ArcFace Dynamic Margin

class ArcFace(torch.nn.Module):
    """ ArcFace (https://arxiv.org/pdf/1801.07698v1.pdf):
    """
    def __init__(self, s=64.0, margin=0.5):
        super(ArcFace, self).__init__()
        self.scale  = s
        self.margin = margin

        if isinstance(self.margin, float):
            self.cos_m = math.cos(margin)
            self.sin_m = math.sin(margin)
            self.theta = math.cos(math.pi - margin)
            self.sinmm = math.sin(math.pi - margin) * margin

        self.easy_margin = False

    def set_margin(self, labels):
        margin = self.margin[labels.cpu().numpy()]

        self.cos_m = torch.from_numpy(np.squeeze(np.cos(margin))).float().cuda()
        self.sin_m = torch.from_numpy(np.squeeze(np.sin(margin))).float().cuda()
        self.theta = torch.from_numpy(np.squeeze(np.cos(math.pi - margin))).float().cuda()
        self.sinmm = torch.from_numpy(np.squeeze(np.sin(math.pi - margin) * margin)).float().cuda()
            
    def forward(self, logits: torch.Tensor, labels: torch.Tensor):
        index = torch.where(labels != -1)[0]
        target_logit = logits[index, labels[index].view(-1)]

        # set up dynamic margin
        if not isinstance(self.margin, float):
            self.set_margin(labels[index])
            
        sin_theta = torch.sqrt(1.0 - torch.pow(target_logit, 2))
        cos_theta_m = target_logit * self.cos_m - sin_theta * self.sin_m  # cos(target+margin)

        if self.easy_margin:
            final_target_logit = torch.where(
                target_logit > 0, cos_theta_m, target_logit)
        else:
            final_target_logit = torch.where(
                target_logit > self.theta, cos_theta_m, target_logit - self.sinmm)

        logits[index, labels[index].view(-1)] = final_target_logit
        logits = logits * self.scale
        return logits

CUDA error:

Traceback (most recent call last):
  File "trainv3.py", line 356, in <module>
    main(args)
  File "trainv3.py", line 293, in main
    flag = callback_verification(global_step, backbone)
  File "/home/ctv.thanhly/FaceNet/ArcFacePytorch/utils/utils_callbacks.py", line 79, in __call__
    flag = self.ver_test(backbone, num_update)
  File "/home/ctv.thanhly/FaceNet/ArcFacePytorch/utils/utils_callbacks.py", line 35, in ver_test
    self.ver_list[i], backbone, 10, 10)
  File "/home/ctv.thanhly/FaceNet/ArcFacePytorch/eval/verification.py", line 431, in test
    net_out: torch.Tensor = backbone(img)
  File "/home/ctv.thanhly/miniconda3/envs/arcface2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/ctv.thanhly/miniconda3/envs/arcface2/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 799, in forward
    output = self.module(*inputs[0], **kwargs[0])
  File "/home/ctv.thanhly/miniconda3/envs/arcface2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/ctv.thanhly/FaceNet/ArcFacePytorch/backbones/efficientformerv2Custome.py", line 650, in forward
    x = self.feature(self.head_dropout(x))
  File "/home/ctv.thanhly/miniconda3/envs/arcface2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/ctv.thanhly/miniconda3/envs/arcface2/lib/python3.7/site-packages/torch/nn/modules/container.py", line 139, in forward
    input = module(input)
  File "/home/ctv.thanhly/miniconda3/envs/arcface2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/ctv.thanhly/miniconda3/envs/arcface2/lib/python3.7/site-packages/torch/nn/modules/batchnorm.py", line 178, in forward
    self.eps,
  File "/home/ctv.thanhly/miniconda3/envs/arcface2/lib/python3.7/site-packages/torch/nn/functional.py", line 2282, in batch_norm
    input, weight, bias, running_mean, running_var, training, momentum, eps, torch.backends.cudnn.enabled
RuntimeError: CUDA error: an illegal memory access was encountered
terminate called after throwing an instance of 'c10::CUDAError'
  what():  CUDA error: an illegal memory access was encountered
Exception raised from query at /pytorch/aten/src/ATen/cuda/CUDAEvent.h:95 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7f6259e24a22 in /home/ctv.thanhly/miniconda3/envs/arcface2/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: c10d::ProcessGroupNCCL::WorkNCCL::finishedGPUExecutionInternal() const + 0x132 (0x7f62ff23d342 in /home/ctv.thanhly/miniconda3/envs/arcface2/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #2: c10d::ProcessGroupNCCL::WorkNCCL::isCompleted() + 0x50 (0x7f62ff23efa0 in /home/ctv.thanhly/miniconda3/envs/arcface2/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #3: c10d::ProcessGroupNCCL::workCleanupLoop() + 0x11c (0x7f62ff23f9bc in /home/ctv.thanhly/miniconda3/envs/arcface2/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #4: <unknown function> + 0xd6de4 (0x7f62ffe06de4 in /lib/x86_64-linux-gnu/libstdc++.so.6)
frame #5: <unknown function> + 0x9609 (0x7f63061c0609 in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #6: clone + 0x43 (0x7f6305f8d293 in /lib/x86_64-linux-gnu/libc.so.6)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant