-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Enable verbose asm #4528
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable verbose asm #4528
Conversation
|
@ravil-mobile Thank you for enabling this. |
aa27c02 to
76b664b
Compare
Hi @zhanglx13. Checked. Everything works for the NV backend |
|
how does this impact the ptx and sass dumped on the nv path? |
Hi @ThomasRaoux Regarding .reg .pred %p<13>;
.reg .b32 %r<54>;
.reg .f32 %f<32>;
.reg .b64 %rd<10>;
.loc 1 15 0 // softmax.py:15:0
$L__func_begin0:
.loc 1 15 0 // softmax.py:15:0
// %bb.0:
ld.param.u64 %rd3, [softmax_kernel_param_0];
ld.param.u32 %r20, [softmax_kernel_param_1];
$L__tmp0:
.loc 1 16 22 // softmax.py:16:22
// begin inline asm
mov.u32 %r1, %ctaid.x;
// end inline asm
.loc 1 17 30 // softmax.py:17:30
mul.lo.s32 %r21, %r1, %r20;
ld.param.u64 %rd4, [softmax_kernel_param_2];
ld.param.u32 %r22, [softmax_kernel_param_3];
.loc 1 17 24 // softmax.py:17:24
mul.wide.s32 %rd5, %r21, 4;
add.s64 %rd6, %rd3, %rd5;
ld.param.u32 %r23, [softmax_kernel_param_4];
.loc 1 18 27 // softmax.py:18:27
mov.u32 %r24, %tid.x;
and.b32 %r25, %r24, 31;
and.b32 %r26, %r24, 63;
.loc 1 19 58 // softmax.py:19:58Regarding I can also change the code to enable verbose outputting only for the AMD backend |
76b664b to
5889d97
Compare
antiagainst
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LG to have uniformly.
) This PR adds verbosity to assembly code after LLVM backend passes. This adds references to the source code for both NV and AMD. Additionally, it adds `Kernel Info` at the end of the dump for AMD. For example: ``` ; Kernel info: ; codeLenInByte = 7732 ; NumSgprs: 24 ; NumVgprs: 154 ; NumAgprs: 128 ; TotalNumVgprs: 284 ... ```
This PR adds verbosity to assembly code after LLVM backend passes.
Regarding the AMDGPU backend, the PR results in leaving references to the source code. For example:
Additionally, the PR results in adding
Kernel Infoat the end of a file. For example: