Skip to content

Commit 4ff45ec

Browse files
Jingwen ChenJingwen Chen
Jingwen Chen
authored and
Jingwen Chen
committed
drm/amd/amdgpu: fix corner case in SRIOV tdr
[Why] In SRIOV multi-vf, after using ordered workqueue for tdr, there could be a chance that a ring timeout continuously makes an innocent ring timeout. [How] 1. Use advance tdr mode in SRIOV as default 2. Use mdelay in flr work to make sure the waiting won't exceeds ring timeout. Signed-off-by: Jingwen Chen <[email protected]> Acked-by: Alex Deucher <[email protected]>
1 parent 87e71bc commit 4ff45ec

File tree

3 files changed

+6
-2
lines changed

3 files changed

+6
-2
lines changed

Diff for: drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c

+4
Original file line numberDiff line numberDiff line change
@@ -63,6 +63,10 @@ void amdgpu_virt_init_setting(struct amdgpu_device *adev)
6363
#endif
6464
adev->cg_flags = 0;
6565
adev->pg_flags = 0;
66+
67+
/*use advance recovery mode for SRIOV*/
68+
if (amdgpu_gpu_recovery)
69+
amdgpu_gpu_recovery = 2;
6670
}
6771

6872
void amdgpu_virt_kiq_reg_write_reg_wait(struct amdgpu_device *adev,

Diff for: drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c

+1-1
Original file line numberDiff line numberDiff line change
@@ -265,7 +265,7 @@ static void xgpu_ai_mailbox_flr_work(struct work_struct *work)
265265
if (xgpu_ai_mailbox_peek_msg(adev) == IDH_FLR_NOTIFICATION_CMPL)
266266
goto flr_done;
267267

268-
msleep(10);
268+
mdelay(10);
269269
timeout -= 10;
270270
} while (timeout > 1);
271271

Diff for: drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c

+1-1
Original file line numberDiff line numberDiff line change
@@ -294,7 +294,7 @@ static void xgpu_nv_mailbox_flr_work(struct work_struct *work)
294294
if (xgpu_nv_mailbox_peek_msg(adev) == IDH_FLR_NOTIFICATION_CMPL)
295295
goto flr_done;
296296

297-
msleep(10);
297+
mdelay(10);
298298
timeout -= 10;
299299
} while (timeout > 1);
300300

0 commit comments

Comments
 (0)