Skip to content

Commit 6ac6a32

Browse files
Jingwen ChenAsher Song
Jingwen Chen
authored and
Asher Song
committed
drm/amd/amdgpu: fix flr_work corner case
[Why] In SRIOV multi-vf environment, the flr_work can be entered even if the TDR thread has entered the recovery. This can lead to GMC flush tlb with SDMA during full access while SDMA is not initialized. [How] 1. flr_work should take write_lock, otherwise there maybe hw access during vf flr 2. (amdgpu_in_reset(adev) ||!down_write_trylock(&adev->reset_sem)) is the correct critera when the flr_work direct returns. Acked-by: Christian König <[email protected]> Signed-off-by: Jingwen Chen <[email protected]>
1 parent a4669ca commit 6ac6a32

File tree

2 files changed

+4
-2
lines changed

2 files changed

+4
-2
lines changed

Diff for: drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c

+2-1
Original file line numberDiff line numberDiff line change
@@ -259,7 +259,8 @@ static void xgpu_ai_mailbox_flr_work(struct work_struct *work)
259259
* otherwise the mailbox msg will be ruined/reseted by
260260
* the VF FLR.
261261
*/
262-
if (atomic_cmpxchg(&adev->reset_domain->in_gpu_reset, 0, 1) != 0)
262+
if (amdgpu_in_reset(adev) ||
263+
atomic_cmpxchg(&adev->reset_domain->in_gpu_reset, 0, 1) != 0)
263264
return;
264265

265266
down_write(&adev->reset_domain->sem);

Diff for: drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c

+2-1
Original file line numberDiff line numberDiff line change
@@ -292,7 +292,8 @@ static void xgpu_nv_mailbox_flr_work(struct work_struct *work)
292292
* otherwise the mailbox msg will be ruined/reseted by
293293
* the VF FLR.
294294
*/
295-
if (atomic_cmpxchg(&adev->reset_domain->in_gpu_reset, 0, 1) != 0)
295+
if (amdgpu_in_reset(adev) ||
296+
atomic_cmpxchg(&adev->reset_domain->in_gpu_reset, 0, 1) != 0)
296297
return;
297298

298299
down_write(&adev->reset_domain->sem);

0 commit comments

Comments
 (0)