Skip to content

Commit 5001ef3

Browse files
committed
drm/xe: Fix tlb invalidation when wedging
If GuC fails to load, the driver wedges, but in the process it tries to do stuff that may not be initialized yet. This moves the xe_gt_tlb_invalidation_init() to be done earlier: as its own doc says, it's a software-only initialization and should had been named with the _early() suffix. Move it to be called by xe_gt_init_early(), so the locks and seqno are initialized, avoiding a NULL ptr deref when wedging: xe 0000:03:00.0: [drm] *ERROR* GT0: load failed: status: Reset = 0, BootROM = 0x50, UKernel = 0x00, MIA = 0x00, Auth = 0x01 xe 0000:03:00.0: [drm] *ERROR* GT0: firmware signature verification failed xe 0000:03:00.0: [drm] *ERROR* CRITICAL: Xe has declared device 0000:03:00.0 as wedged. ... BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI CPU: 9 UID: 0 PID: 3908 Comm: modprobe Tainted: G U W 6.13.0-rc4-xe+ #3 Tainted: [U]=USER, [W]=WARN Hardware name: Intel Corporation Alder Lake Client Platform/AlderLake-S ADP-S DDR5 UDIMM CRB, BIOS ADLSFWI1.R00.3275.A00.2207010640 07/01/2022 RIP: 0010:xe_gt_tlb_invalidation_reset+0x75/0x110 [xe] This can be easily triggered by poking the GuC binary to force a signature failure. There will still be an extra message, xe 0000:03:00.0: [drm] *ERROR* GT0: GuC mmio request 0x4100: no reply 0x4100 but that's better than a NULL ptr deref. Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/3956 Fixes: 7dbe8af ("drm/xe: Wedge the entire device") Reviewed-by: Matthew Brost <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Lucas De Marchi <[email protected]>
1 parent 88fca61 commit 5001ef3

File tree

3 files changed

+8
-7
lines changed

3 files changed

+8
-7
lines changed

drivers/gpu/drm/xe/xe_gt.c

+4-4
Original file line numberDiff line numberDiff line change
@@ -387,6 +387,10 @@ int xe_gt_init_early(struct xe_gt *gt)
387387
xe_force_wake_init_gt(gt, gt_to_fw(gt));
388388
spin_lock_init(&gt->global_invl_lock);
389389

390+
err = xe_gt_tlb_invalidation_init_early(gt);
391+
if (err)
392+
return err;
393+
390394
return 0;
391395
}
392396

@@ -588,10 +592,6 @@ int xe_gt_init(struct xe_gt *gt)
588592
xe_hw_fence_irq_init(&gt->fence_irq[i]);
589593
}
590594

591-
err = xe_gt_tlb_invalidation_init(gt);
592-
if (err)
593-
return err;
594-
595595
err = xe_gt_pagefault_init(gt);
596596
if (err)
597597
return err;

drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c

+2-2
Original file line numberDiff line numberDiff line change
@@ -106,15 +106,15 @@ static void xe_gt_tlb_fence_timeout(struct work_struct *work)
106106
}
107107

108108
/**
109-
* xe_gt_tlb_invalidation_init - Initialize GT TLB invalidation state
109+
* xe_gt_tlb_invalidation_init_early - Initialize GT TLB invalidation state
110110
* @gt: graphics tile
111111
*
112112
* Initialize GT TLB invalidation state, purely software initialization, should
113113
* be called once during driver load.
114114
*
115115
* Return: 0 on success, negative error code on error.
116116
*/
117-
int xe_gt_tlb_invalidation_init(struct xe_gt *gt)
117+
int xe_gt_tlb_invalidation_init_early(struct xe_gt *gt)
118118
{
119119
gt->tlb_invalidation.seqno = 1;
120120
INIT_LIST_HEAD(&gt->tlb_invalidation.pending_fences);

drivers/gpu/drm/xe/xe_gt_tlb_invalidation.h

+2-1
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,8 @@ struct xe_gt;
1414
struct xe_guc;
1515
struct xe_vma;
1616

17-
int xe_gt_tlb_invalidation_init(struct xe_gt *gt);
17+
int xe_gt_tlb_invalidation_init_early(struct xe_gt *gt);
18+
1819
void xe_gt_tlb_invalidation_reset(struct xe_gt *gt);
1920
int xe_gt_tlb_invalidation_ggtt(struct xe_gt *gt);
2021
int xe_gt_tlb_invalidation_vma(struct xe_gt *gt,

0 commit comments

Comments
 (0)