Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion csrc/transformer/general_kernels.cu
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ __global__ void column_sum_reduce(const T* __restrict__ inp,

if (threadIdx.x == 0) {
int pos = blockIdx.x * TILE_DIM + threadIdx.y;
if (pos < (rows * width)) out[pos] = sum;
if (pos < width) out[pos] = sum;
Copy link
Contributor

@RezaYazdaniAminabadi RezaYazdaniAminabadi Feb 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this! I would say it still was working when the hidden dimension was dividable by 32, however, it would have caused a memory leak for when the hidden is not dividable by 32!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes! thanks for your approval!!

}
}

Expand Down