cub::DeviceSpmv: Various fixes and tests #352

alliepiper · 2021-07-30T21:16:06Z

Fixes #139 and #285. Includes #160.

This is blocking #276, since we need tests to validate the if-target update for DeviceSpmv.

alliepiper · 2021-07-30T21:36:02Z

gpuCI: NVIDIA/thrust#1500

alliepiper · 2021-07-30T21:37:40Z

DVS CL: 30247358

gevtushenko

The code is moving towards production quality, thank you!

Few quite complex issues still make this algorithm unreliable. I'm afraid that we can't merge the test until these issues are addressed because this will lead to random tests failures. Please, let me know if I've misunderstood anything.

cub/agent/agent_spmv_orig.cuh

cub/device/device_spmv.cuh

test/test_device_spmv.cu

gevtushenko · 2021-08-01T23:19:10Z

cub/agent/agent_spmv_orig.cuh


            tile_carry.key += tile_start_coord.x;
+            if (tile_carry.key >= spmv_params.num_rows)


I've been thinking about this fix for too long 😄 My main concern can be formalized as follows. Tile key never exceeds num_rows. It can be equal to num_rows only when tile_idx == num_merge_tiles, which is never the case for this kernel. Only the main thread of each tile is writing some results here, so it was unclear how we could exceed num_rows.

tile_carry is actually a block aggregate with tile_carry.key standing for row number within tile indexation [0, tile_num_rows). The line above changes the indexation from tile-local to matrix indexation by adding a number of the first tile row tile_carry.key += tile_start_coord.x;. tile_num_rows in its turn, is calculated as tile_end_coord.x - tile_start_coord.x. The sum of these differences can't exceed num_rows. So the issue has to be somewhere else.

tile_carry is computed as

BlockScanT(temp_storage.aliasable.scan).ExclusiveScan(scan_item, scan_item, scan_op, tile_carry);

Application of the scan_op (ReduceByKeyOp<cub::Sum>) will cause the key value (row number) of the last thread in the thread block to be broadcasted to the whole thread block as tile_carry.key. So the issue has to appear in the last threads of the last tile. The issue can be found here:

for (int ITEM = 0; ITEM < ITEMS_PER_THREAD; ++ITEM) { if (tile_nonzero_indices[thread_current_coord.y] < row_end_offset) { // Move down (accumulate) scan_segment[ITEM].value = nonzero; running_total += nonzero; ++thread_current_coord.y; nonzero = s_tile_nonzeros[thread_current_coord.y]; } else { // Move right (reset) scan_segment[ITEM].value = 0.0; running_total = 0.0; ++thread_current_coord.x; row_end_offset = s_tile_row_end_offsets[thread_current_coord.x]; } scan_segment[ITEM].key = thread_current_coord.x; }

Up to tile_num_rows elements of the s_tile_row_end_offsets was filled, but thread_current_coord.x; grows without limit, so we are reading some random data into row_end_offset. This affects most of the further results. Shared memory isn't filled with zeroes before each kernel execution. So it's unsafe. I've filled shared memory with some random value and all tests are failing now:

// ---- DEBUG CODE ---- CTA_SYNC(); for (int item = threadIdx.x; item < TILE_ITEMS; item += BLOCK_THREADS) { s_tile_row_end_offsets[item] = 42; } CTA_SYNC(); // ---- DEBUG CODE ---- #pragma unroll 1 for (int item = threadIdx.x; item <= tile_num_rows; item += BLOCK_THREADS) { const OffsetT offset = (cub::min)(static_cast<OffsetT>(tile_start_coord.x + item), static_cast<OffsetT>(spmv_params.num_rows - 1)); s_tile_row_end_offsets[item] = wd_row_end_offsets[offset]; }

I am afraid I don't know how to fix this right now. I think we shouldn't mask the data race with this fix though. It's also unclear for me what's the consequence of ignoring the last tile results.

Just to note: the debugging code you added is overwriting the partial results in the s_tile_nonzeros array, since the shm layout is:

OffsetT* s_tile_row_end_offsets = &temp_storage.aliasable.merge_items[0].row_end_offset; ValueT* s_tile_nonzeros = &temp_storage.aliasable.merge_items[tile_num_rows + ITEMS_PER_THREAD].nonzero;

This is why the tests fail with that modification. The results are still correct if we only initialize the valid tile_num_rows + ITEMS_PER_THREAD entries in s_tile_row_end_offsets with random data.

This was only used for SMs < 3.5, which are no longer supported.

alliepiper · 2021-08-02T20:47:06Z

Addressed the review feedback, aside from the final issue you pointed out. We'll need to figure out a better solution for that bit.

This extends the solution in PR NVIDIA#160.

This fix was suggested in PR NVIDIA#160.

alliepiper · 2021-08-04T21:13:12Z

I extended the s_tile_row_end_offsets load to fill the first tile_num_rows + ITEMS_PER_THREAD elements, which should keep the reads in the merge-path traversal loop valid.

Also updated the comment on the hackaround for the fixup kernel's invalid memory access. I believe this is harmless, since it just drops the carry-over from the final tile in the input, which should be ignored anyway.

alliepiper · 2021-08-04T21:20:18Z

gpuCI: NVIDIA/thrust#1500
DVS CL: 30264714

alliepiper · 2021-08-16T21:12:46Z

@senior-zero I'd like to merge this for 1.14. Can you re-review?

gevtushenko

Thank you for these fixes!

test/test_device_spmv.cu

alliepiper added 3 commits July 30, 2021 17:11

Print more info in test asserts.

174809c

Fix conversion warnings.

7396203

Make DeviceSpmv API const-correct.

0963b23

alliepiper added this to the 1.14.0 milestone Jul 30, 2021

alliepiper requested a review from gevtushenko July 30, 2021 21:16

alliepiper force-pushed the test_spmv branch from af485a0 to 23e8843 Compare July 30, 2021 21:21

alliepiper added a commit to alliepiper/thrust that referenced this pull request Jul 30, 2021

Do not commit; testing NVIDIA/cub#352.

eb21ecd

alliepiper added the testing: gpuCI in progress Started gpuCI testing. label Jul 30, 2021

alliepiper linked an issue Jul 30, 2021 that may be closed by this pull request

cub::DeviceSpmv needs attention #285

Closed

alliepiper added the testing: internal ci in progress Currently testing on internal NVIDIA CI (DVS). label Jul 30, 2021

This was referenced Jul 30, 2021

cub::DeviceSpmv needs attention #285

Closed

Put in somewhat hacky fix for out of bounds accesses #160

Closed

SpMV out-of-bounds accesses #139

Closed

gevtushenko suggested changes Aug 1, 2021

View reviewed changes

alliepiper added 2 commits August 2, 2021 16:43

Make cub::DeviceSpmv robust against empty inputs.

85a6798

Remove unused texture iterator from cub::DeviceSpmv.

117fea3

This was only used for SMs < 3.5, which are no longer supported.

alliepiper force-pushed the test_spmv branch from 23e8843 to 84d3dc8 Compare August 2, 2021 20:43

alliepiper added 2 commits August 4, 2021 17:10

Fix past-the-end reads in cub::DeviceSpmv indexing calculations.

b4236cd

This extends the solution in PR NVIDIA#160.

Fix out-of-bounds atomic writes in spmv fixup kernel.

c80a7db

This fix was suggested in PR NVIDIA#160.

alliepiper force-pushed the test_spmv branch from 84d3dc8 to 02247a3 Compare August 4, 2021 21:10

alliepiper added a commit to alliepiper/thrust that referenced this pull request Aug 4, 2021

Do not commit; testing NVIDIA/cub#352.

4d3fcd2

alliepiper removed the testing: gpuCI in progress Started gpuCI testing. label Aug 16, 2021

alliepiper added testing: gpuCI passed Passed gpuCI testing. testing: internal ci passed Passed internal NVIDIA CI (DVS). and removed testing: internal ci in progress Currently testing on internal NVIDIA CI (DVS). labels Aug 16, 2021

gevtushenko approved these changes Aug 17, 2021

View reviewed changes

test/test_device_spmv.cu Outdated Show resolved Hide resolved

Add basic tests for cub::DeviceSpmv::CsrMV.

91c6e25

alliepiper force-pushed the test_spmv branch from 02247a3 to 91c6e25 Compare August 17, 2021 16:04

alliepiper merged commit f22ad19 into NVIDIA:main Aug 17, 2021

alliepiper deleted the test_spmv branch August 17, 2021 16:07

lilohuang mentioned this pull request Mar 19, 2022

out of bounds array access on CsrMV #190

Closed

gevtushenko mentioned this pull request Nov 8, 2023

Deprecate cub::DeviceSpmv NVIDIA/cccl#896

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cub::DeviceSpmv: Various fixes and tests #352

cub::DeviceSpmv: Various fixes and tests #352

alliepiper commented Jul 30, 2021

alliepiper commented Jul 30, 2021

alliepiper commented Jul 30, 2021

gevtushenko left a comment

gevtushenko Aug 1, 2021

alliepiper Aug 4, 2021

alliepiper commented Aug 2, 2021

alliepiper commented Aug 4, 2021

alliepiper commented Aug 4, 2021

alliepiper commented Aug 16, 2021

gevtushenko left a comment


		tile_carry.key += tile_start_coord.x;
		if (tile_carry.key >= spmv_params.num_rows)

cub::DeviceSpmv: Various fixes and tests #352

cub::DeviceSpmv: Various fixes and tests #352

Conversation

alliepiper commented Jul 30, 2021

alliepiper commented Jul 30, 2021

alliepiper commented Jul 30, 2021

gevtushenko left a comment

Choose a reason for hiding this comment

gevtushenko Aug 1, 2021

Choose a reason for hiding this comment

alliepiper Aug 4, 2021

Choose a reason for hiding this comment

alliepiper commented Aug 2, 2021

alliepiper commented Aug 4, 2021

alliepiper commented Aug 4, 2021

alliepiper commented Aug 16, 2021

gevtushenko left a comment

Choose a reason for hiding this comment