Skip to content

Conversation

@stas00
Copy link
Collaborator

@stas00 stas00 commented Dec 9, 2020

Now that pytorch/pytorch#48891 has been merged I have a better understanding of how the arch code is generated.

It looks like your original code wasn't producing it correctly, as -gencode=arch=compute_{cc},code=compute_{cc} generates a PTX code and not the normal compute code. Looking at

https://github.com/pytorch/pytorch/blob/4434c07a2c0ba4debc6330063546f600aee8deb3/torch/utils/cpp_extension.py#L1556-L1563

It needed to be -gencode=arch=compute_{cc},code=sm_{cc} in your code in first place. notice that the last bit is code=sm_ and not code=compute_.

So as a follow up to #578, this PR

  • replaces s/code=compute_/code=sm_/ for normal archs
  • adds +PTX for the highest arch in the jit-mode
  • correctly handles +PTX (especially in the case of TORCH_CUDA_ARCH_LIST which was missing from my previous PR)

Do note, I'm new to all this so please kindly verify that I'm doing the right thing. I did compile and tested the resulting binary on my code only.

The weird syntax for PTX I copied from pytorch (link above), https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#just-in-time-compilation explains that this is how PTX is encoded in nvcc flags. Quote:

By specifying a virtual code architecture instead of a real GPU, nvcc postpones the assembly of PTX code until application runtime, at which the target GPU is exactly known. For instance, the command below allows generation of exactly matching GPU binary code, when the application is launched on an sm_50 or later architecture.

nvcc x.cu --gpu-architecture=compute_50 --gpu-code=compute_50

So '-gencode=arch=compute_{cc},code=compute_{cc}')` tells nvcc to enable PTX for that arch. Very confusing.

Thanks.

@stas00 stas00 changed the title fix arch flags, add PTX [build] fix computer capability arch flags, add PTX, handle PTX Dec 9, 2020
args.append(f'-gencode=arch=compute_{cc},code=compute_{cc}')
args.append(f'-gencode=arch=compute_{cc},code=sm_{cc}')
if cc.endswith('+PTX'):
args.append(f'-gencode=arch=compute_{cc},code=compute_{cc}')
Copy link
Contributor

@RezaYazdaniAminabadi RezaYazdaniAminabadi Dec 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we use the compute-capability (cc) number alone, rather than concatenated with '+PTX'?

Copy link
Collaborator Author

@stas00 stas00 Dec 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

edit: I misunderstood what you said, you were pointing at a bug - fixed now. thank you!

But I will leave what I originally wrote below if it helps to explain things in general.


There are 2 unrelated things.

  1. The code you quoted above does the right thing (edit: after I fixed the bug below) if a user passes TORCH_CUDA_ARCH_LIST that contains +PTX, as my previous PR missed that part (I had no clue how +PTX was converted to compile flags at that time, but now I do)

  2. The question is whether to add +PTX for jit_mode or not here:
    https://github.com/microsoft/DeepSpeed/blob/03111cef09b33a25792246785ab0ded68be1733c/op_builder/builder.py#L246
    I added it to sync with pytorch CUDAExtension implementation, which is how it'll do it once pytorch-1.8 is released. But you can choose to not include it. I have no idea how and when jit_mode is used to tell whether it's needed or not.

So to conclude the code you quoted above does the right thing if any of the ccs contains '+PTX, which may come from user-defined TORCH_CUDA_ARCH_LIST`- so it's a must. But wrt to (2) you can choose not to include it.
If you do - please let me know and I will remove https://github.com/microsoft/DeepSpeed/blob/03111cef09b33a25792246785ab0ded68be1733c/op_builder/builder.py#L246

Let me know.

Copy link
Collaborator Author

@stas00 stas00 Dec 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, I see what you meant, there is a bug in my code - sorry - I forgot to check the result of test runs - thank you for flagging that, @RezaYazdaniAminabadi.

@RezaYazdaniAminabadi
Copy link
Contributor

Thanks @stas00 for the fix. 👍

@jeffra jeffra merged commit 8a184b6 into deepspeedai:master Dec 11, 2020
@stas00 stas00 deleted the correct-archs branch December 11, 2020 18:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants