Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

X11 unusably slow with DRM 5.15 and 6.1's amdgpu on a RX 800 #302

Open
OlCe2 opened this issue May 21, 2024 · 8 comments
Open

X11 unusably slow with DRM 5.15 and 6.1's amdgpu on a RX 800 #302

OlCe2 opened this issue May 21, 2024 · 8 comments
Labels
amdgpu amdgpu related problems bug Something isn't working

Comments

@OlCe2
Copy link
Member

OlCe2 commented May 21, 2024

Title: X11 unusably slow with DRM 5.15 and 6.1's amdgpu on a RX 800

Description and reproduction

With DRM 5.15 running with an AMD RX 800 card, after a few minutes to hours in a X11 session, just clicking on a program in the task bar to switch to it or using Alt-Tab can freeze the whole display during seconds. Generally speaking, any kind of desktop effect (such as application thumbnail display when hovering over the task bar) is slow. As the uptime progresses, freezes tend to last longer (I've measured that a few of them lasted for almost 10 minutes).

DRM 6.1 has the same problem but in a slightly lighter form: It takes more uptime for the problem to start manifesting, and freezes are initially shorter. However, they increase over time to the point that the desktop eventually becomes almost unusable, as for 5.15.

DRM 5.10 works correctly. This problem also doesn't show up on some laptop using Intel Gen10 integrated graphics (driver i915) with DRM 5.15 and DRM 6.1 (although another problem shows up, to be reported separately).

Tested mostly with KDE/KWin, but Xfce has similar problems. Turning off composition in KWin essentially doesn't solve the problem (barely makes an almost imperceptible difference).

System Information

FreeBSD version

FreeBSD 14.1-STABLE #0 n267671-9a8a26aefb36: Mon May 13 13:39:56 CEST 2024 MYCONFIG 1401500 1401500
Kernel MYCONFIG is a stripped-down version of GENERIC close to MINIMAL.

Same problem on an older version:
FreeBSD 14.0-STABLE #1 n266865-245844372d7e: Thu Feb 22 11:11:45 CET 2024

PCI Info

vgapci0@pci0:8:0:0: class=0x030000 rev=0xe7 hdr=0x00 vendor=0x1002 device=0x67df subvendor=0x1043 subdevice=0x0525
vendor = 'Advanced Micro Devices, Inc. [AMD/ATI]'
device = 'Ellesmere [Radeon RX 470/480/570/570X/580/580X/590]'
class = display
subclass = VGA

DRM KMOD version

Problem reproduced with:
drm-515-kmod 5.15.118_4
drm-61-kmod 6.1.69_2

No problem with:
drm-510-kmod 5.10.163_9

Preliminary Investigation

Before clear, long freezes, it is common to observe, after some uptime, Xorg
using ~5% CPU for several seconds or more. Some captured kernel stacks:

  PID    TID COMM                TDNAME              KSTACK
 1956 101446 Xorg                MainThread          ttm_pool_free+0x110 ttm_tt_destroy_common+0x25 amdgpu_ttm_backend_destroy+0x1d ttm_bo_put+0x331 amdgpu_bo_unref+0x1a amdgpu_gem_object_free+0x1b drm_gem_handle_delete+0xc2 drm_ioctl_kernel+0xc6 drm_ioctl+0x2ae linux_file_ioctl+0x269 kern_ioctl+0x25b sys_ioctl+0x113 amd64_syscall+0x120 fast_syscall_common+0xf8
  PID    TID COMM                TDNAME              KSTACK
 1956 101446 Xorg                MainThread          ttm_pool_free+0x110 ttm_tt_destroy_common+0x25 amdgpu_ttm_backend_destroy+0x1d ttm_bo_put+0x331 amdgpu_bo_unref+0x1a amdgpu_gem_object_free+0x1b drm_gem_handle_delete+0xc2 drm_ioctl_kernel+0xc6 drm_ioctl+0x2ae linux_file_ioctl+0x269 kern_ioctl+0x25b sys_ioctl+0x113 amd64_syscall+0x120 fast_syscall_common+0xf8
  PID    TID COMM                TDNAME              KSTACK
 1956 101446 Xorg                MainThread          pmap_page_set_memattr+0x5b lkpi_vmf_insert_pfn_prot_locked+0x268 ttm_bo_vm_fault_reserved+0x1c1 amdgpu_gem_fault+0x86 linux_cdev_pager_populate+0x128 vm_fault_allocate+0x39f vm_fault+0x3c6 vm_fault_trap+0x4c trap_pfault+0x1bd trap+0x405 calltrap+0x8
  PID    TID COMM                TDNAME              KSTACK
 1956 101446 Xorg                MainThread          ttm_pool_free+0x110 ttm_tt_destroy_common+0x25 amdgpu_ttm_backend_destroy+0x1d ttm_bo_put+0x331 amdgpu_bo_unref+0x1a amdgpu_gem_object_free+0x1b drm_gem_handle_delete+0xc2 drm_ioctl_kernel+0xc6 drm_ioctl+0x2ae linux_file_ioctl+0x269 kern_ioctl+0x25b sys_ioctl+0x113 amd64_syscall+0x120 fast_syscall_common+0xf8
  PID    TID COMM                TDNAME              KSTACK
 1956 101446 Xorg                MainThread          ttm_pool_free+0x110 ttm_tt_destroy_common+0x25 amdgpu_ttm_backend_destroy+0x1d ttm_bo_put+0x331 amdgpu_bo_unref+0x1a amdgpu_gem_object_free+0x1b drm_gem_handle_delete+0xc2 drm_ioctl_kernel+0xc6 drm_ioctl+0x2ae linux_file_ioctl+0x269 kern_ioctl+0x25b sys_ioctl+0x113 amd64_syscall+0x120 fast_syscall_common+0xf8
  PID    TID COMM                TDNAME              KSTACK
 1956 101446 Xorg                MainThread          drm_sched_entity_select_rq+0x6e drm_sched_job_init+0x1c amdgpu_job_submit+0x22 amdgpu_vm_sdma_commit+0xe2 amdgpu_vm_sdma_update+0x18c amdgpu_vm_bo_update_mapping+0x952 amdgpu_vm_clear_freed+0xd9 amdgpu_gem_va_update_vm+0x30 amdgpu_gem_va_ioctl+0x251 drm_ioctl_kernel+0xc6 drm_ioctl+0x2ae linux_file_ioctl+0x269 kern_ioctl+0x25b sys_ioctl+0x113 amd64_syscall+0x120 fast_syscall_common+0xf8
  PID    TID COMM                TDNAME              KSTACK
 1956 101446 Xorg                MainThread          lkpi_vmf_insert_pfn_prot_locked+0x268 ttm_bo_vm_fault_reserved+0x2b6 amdgpu_gem_fault+0x86 linux_cdev_pager_populate+0x128 vm_fault_allocate+0x39f vm_fault+0x3c6 vm_fault_trap+0x4c trap_pfault+0x1bd trap+0x405 calltrap+0x8
  PID    TID COMM                TDNAME              KSTACK
 1956 101446 Xorg                MainThread          vm_fault_allocate+0x39f vm_fault+0x3c6 vm_fault_trap+0x4c trap_pfault+0x1bd trap+0x405 calltrap+0x8
  PID    TID COMM                TDNAME              KSTACK
 1956 101446 Xorg                MainThread          ttm_pool_free+0x110 ttm_tt_destroy_common+0x25 amdgpu_ttm_backend_destroy+0x1d ttm_bo_put+0x331 amdgpu_bo_unref+0x1a amdgpu_gem_object_free+0x1b drm_gem_handle_delete+0xc2 drm_ioctl_kernel+0xc6 drm_ioctl+0x2ae linux_file_ioctl+0x269 kern_ioctl+0x25b sys_ioctl+0x113 amd64_syscall+0x120 fast_syscall_common+0xf8

The frequency of the stacks with ttm_pool_free() at the bottom increases as small glitches and freezes appear (not reproduced below). Other stacks that were captured much more rarely:

  PID    TID COMM                TDNAME              KSTACK
 1956 101446 Xorg                MainThread          cdev_pager_lookup+0x38 lkpi_unmap_mapping_range+0x16 ttm_bo_handle_move_mem+0x7a ttm_bo_validate+0xb4 ttm_bo_init_reserved+0x1a2 amdgpu_bo_create+0x1e1 amdgpu_bo_create_user+0x21 amdgpu_gem_create_ioctl+0x1f8 drm_ioctl_kernel+0xc6 drm_ioctl+0x2ae linux_file_ioctl+0x269 kern_ioctl+0x25b sys_ioctl+0x113 amd64_syscall+0x120 fast_syscall_common+0xf8
  PID    TID COMM                TDNAME              KSTACK
 1956 101446 Xorg                MainThread          pmap_remove_ptes+0xdc pmap_remove1+0x55f vm_map_delete+0x19f kern_munmap+0x8a amd64_syscall+0x120 fast_syscall_common+0xf8
  PID    TID COMM                TDNAME              KSTACK
 1956 101446 Xorg                MainThread          vm_page_insert+0x1f lkpi_vmf_insert_pfn_prot_locked+0x293 ttm_bo_vm_fault_reserved+0x2b6 amdgpu_gem_fault+0x86 linux_cdev_pager_populate+0x128 vm_fault_allocate+0x39f vm_fault+0x3c6 vm_fault_trap+0x4c trap_pfault+0x1bd trap+0x405 calltrap+0x8
  PID    TID COMM                TDNAME              KSTACK
 1956 101446 Xorg                MainThread          cdev_pager_lookup+0x38 lkpi_unmap_mapping_range+0x16 ttm_bo_handle_move_mem+0x7a ttm_bo_validate+0xb4 ttm_bo_init_reserved+0x1a2 amdgpu_bo_create+0x1e1 amdgpu_bo_create_user+0x21 amdgpu_gem_create_ioctl+0x1f8 drm_ioctl_kernel+0xc6 drm_ioctl+0x2ae linux_file_ioctl+0x269 kern_ioctl+0x25b sys_ioctl+0x113 amd64_syscall+0x120 fast_syscall_common+0xf8

Finally, outputting kernel stack traces every 0.1s during freezes seems to indicate that the process is stuck in (or repeatedly calling):

  PID    TID COMM                TDNAME              KSTACK
 1956 101446 Xorg                MainThread          vm_page_find_contig_domain+0x8f vm_page_alloc_noobj_contig_domain+0x73 vm_page_reclaim_contig_domain_ext+0x8f0 vm_page_reclaim_contig+0x5c linux_alloc_pages+0x8d ttm_pool_alloc+0x2bb ttm_tt_populate+0xc3 ttm_bo_handle_move_mem+0xc3 ttm_bo_validate+0xb4 ttm_bo_init_reserved+0x1a2 amdgpu_bo_create+0x1e1 amdgpu_bo_create_user+0x21 amdgpu_gem_create_ioctl+0x1f8 drm_ioctl_kernel+0xc6 drm_ioctl+0x2ae linux_file_ioctl+0x269 kern_ioctl+0x25b sys_ioctl+0x113

This stack also appears, although less frequently (it's in fact a sub-stack of the previous):

  PID    TID COMM                TDNAME              KSTACK
 1956 101446 Xorg                MainThread          vm_page_reclaim_contig+0x5c linux_alloc_pages+0x8d ttm_pool_alloc+0x2bb ttm_tt_populate+0xc3 ttm_bo_handle_move_mem+0xc3 ttm_bo_validate+0xb4 ttm_bo_init_reserved+0x1a2 amdgpu_bo_create+0x1e1 amdgpu_bo_create_user+0x21 amdgpu_gem_create_ioctl+0x1f8 drm_ioctl_kernel+0xc6 drm_ioctl+0x2ae linux_file_ioctl+0x269 kern_ioctl+0x25b sys_ioctl+0x113 amd64_syscall+0x120 fast_syscall_common+0xf8

(Part of the time spent on this report was sponsored by the FreeBSD Foundation.)

@khorben
Copy link

khorben commented May 21, 2024

I have the same issue on my system with an RX570 card, even with my environment limited to WindowMaker and XScreenSaver. DRM version is 5.15 here:

# uname -a
FreeBSD kwarx.office.defora 14.0-STABLE FreeBSD 14.0-STABLE #0 stable/14-n265273-803f088147d3: Fri Sep 29 17:41:17 CEST 2023     [email protected]:/usr/obj/home/khorben/Projects/FreeBSD/src-14/amd64.amd64/sys/GENERIC amd64
# pkg info | grep drm
drm-515-kmod-5.15.118_4        DRM drivers modules
drm-kmod-20220907_3            Metaport of DRM modules for the linuxkpi-based KMS components
gpu-firmware-kmod-20240401,1   Firmware modules for the drm-kmod drivers
libdrm-2.4.120_1,1             Direct Rendering Manager library and headers

From dmesg:

[drm] amdgpu kernel modesetting enabled.
drmn0: <drmn> on vgapci0
vgapci0: child drmn0 requested pci_enable_io
vgapci0: child drmn0 requested pci_enable_io
[drm] initializing kernel modesetting (POLARIS10 0x1002:0x67DF 0x1DA2:0xE343 0xEF).
drmn0: Trusted Memory Zone (TMZ) feature not supported
[drm] register mmio base: 0xFE8C0000
[drm] register mmio size: 262144
[drm] add ip block number 0 <vi_common>
[drm] add ip block number 1 <gmc_v8_0>
[drm] add ip block number 2 <tonga_ih>
[drm] add ip block number 3 <gfx_v8_0>
[drm] add ip block number 4 <sdma_v3_0>
[drm] add ip block number 5 <powerplay>
[drm] add ip block number 6 <dm>
[drm] add ip block number 7 <uvd_v6_0>
[drm] add ip block number 8 <vce_v3_0>
drmn0: Fetched VBIOS from ROM BAR
amdgpu: ATOM BIOS: 113-D00037-S03
[drm] UVD is enabled in VM mode
[drm] UVD ENC is enabled in VM mode
[drm] VCE enabled in VM mode
[drm] vm size is 64 GB, 2 levels, block size is 10-bit, fragment size is 9-bit
drmn0: successfully loaded firmware image 'amdgpu/polaris10_mc.bin'
drmn0: VRAM: 8192M 0x000000F400000000 - 0x000000F5FFFFFFFF (8192M used)
drmn0: GART: 256M 0x000000FF00000000 - 0x000000FF0FFFFFFF
[drm] Detected VRAM RAM=8192M, BAR=256M
[drm] RAM width 256bits GDDR5
[drm] amdgpu: 8192M of VRAM memory ready
[drm] amdgpu: 6064M of GTT memory ready.
[drm] GART: num cpu pages 65536, num gpu pages 65536
[drm] PCIE GART of 256M enabled (table at 0x000000F400900000).
drmn0: successfully loaded firmware image 'amdgpu/polaris10_pfp_2.bin'
drmn0: successfully loaded firmware image 'amdgpu/polaris10_me_2.bin'
drmn0: successfully loaded firmware image 'amdgpu/polaris10_ce_2.bin'
[drm] Chained IB support enabled!
drmn0: successfully loaded firmware image 'amdgpu/polaris10_rlc.bin'
drmn0: successfully loaded firmware image 'amdgpu/polaris10_mec_2.bin'
drmn0: successfully loaded firmware image 'amdgpu/polaris10_mec2_2.bin'
drmn0: successfully loaded firmware image 'amdgpu/polaris10_sdma.bin'
drmn0: successfully loaded firmware image 'amdgpu/polaris10_sdma1.bin'
amdgpu: hwmgr_sw_init smu backed is polaris10_smu
drmn0: successfully loaded firmware image 'amdgpu/polaris10_uvd.bin'
[drm] Found UVD firmware Version: 1.130 Family ID: 16
drmn0: successfully loaded firmware image 'amdgpu/polaris10_vce.bin'
[drm] Found VCE firmware Version: 53.26 Binary ID: 3
drmn0: successfully loaded firmware image 'amdgpu/polaris10_k_smc.bin'
[drm] Display Core initialized with v3.2.149!
lkpi_iic0: <LinuxKPI I2C> on drmn0
iicbus0: <Philips I2C bus> on lkpi_iic0
iic0: <I2C generic I/O> on iicbus0
lkpi_iic1: <LinuxKPI I2C> on drmn0
iicbus1: <Philips I2C bus> on lkpi_iic1
iic1: <I2C generic I/O> on iicbus1
lkpi_iic2: <LinuxKPI I2C> on drmn0
iicbus2: <Philips I2C bus> on lkpi_iic2
iic2: <I2C generic I/O> on iicbus2
[drm] UVD and UVD ENC initialized successfully.
[drm] VCE initialized successfully.
drmn0: SE 4, SH per SE 1, CU per SH 9, active_cu_number 32
[drm] fb mappable at 0xD0E30000
[drm] vram apper at 0xD0000000
[drm] size 9216000
[drm] fb depth is 24
[drm]    pitch is 7680
VT: Replacing driver "vga" with new "fb".
start FB_INFO:
type=11 height=1200 width=1920 depth=32
pbase=0xd0e30000 vbase=0xfffff800d0e30000
name=drmn0 flags=0x0 stride=7680 bpp=32
end FB_INFO
vgapci0: child drmn0 requested pci_get_powerstate
drmn0: Using BACO for runtime pm
sysctl_warn_reuse: can't re-use a leaf (hw.dri.debug)!
lkpi_iic3: <LinuxKPI I2C> on drm1
iicbus3: <Philips I2C bus> on lkpi_iic3
iic3: <I2C generic I/O> on iicbus3
[drm] Initialized amdgpu 3.42.0 20150101 for drmn0 on minor 0

From Xorg.0.log:

[3028021.065] (II) xfree86: Adding drm device (/dev/dri/card0)
[3028021.065] (II) Platform probe for /dev/dri/card0
[3028021.084] (--) PCI:*(1@0:0:0) 1002:67df:1da2:e343 rev 239, Mem @ 0xd0000000/268435456, 0xcfe00000/2097152, 0xfe8c0000/262144, I/O @ 0x0000c000/256, BIOS @ 0x????????/65536
[...]
[3028021.101] (II) Loading sub module "glamoregl"
[3028021.101] (II) LoadModule: "glamoregl"
[3028021.101] (II) Loading /usr/local/lib/xorg/modules/libglamoregl.so
[3028021.118] (II) Module glamoregl: vendor="X.Org Foundation"
[3028021.118]   compiled for 1.21.1.13, module version = 1.0.1
[3028021.118]   ABI class: X.Org ANSI C Emulation, version 0.4
[3028021.441] (II) modeset(0): glamor X acceleration enabled on AMD Radeon RX 570 Series (radeonsi, polaris10, LLVM 15.0.7, DRM 3.42, 14.0-STABLE)
[3028021.441] (II) modeset(0): glamor initialized

@evadot evadot added amdgpu amdgpu related problems bug Something isn't working labels May 21, 2024
@evadot
Copy link
Contributor

evadot commented May 28, 2024

I've tested leaving my amd machine (RX580 / polaris 11) all night running some vkcube/glxgears and some webgpu stuff running on firefox to see if I could trigger this but no, everything was smooth in the morning and also later during the day.
But I was running x11-wm/awesome, I'll add xfce to my local package list and try again with it.

@khorben
Copy link

khorben commented May 28, 2024 via email

@chaplina
Copy link

chaplina commented Jul 10, 2024

Yet another person encountering freezing with DRM 5.15 on 14p6.

DRM 5.10 works flawlessly.

I have 4 monitors and 40 virtual desktops using awesome. Many instances of Chrome (different profiles), remmina RDP to various Windows systems, and other programs.

I can do some testing but since this is my daily driver at work results might take a bit to report.

EDIT: forgot to include system specs

Radeon Pro WX5100, 8GB (5820T) (Polaris 10)
Intel Core i7-9800X 3.8GHz,(4. 5GHz Turbo, 8C, 16.5MB Cache, HT, (165W, DDR4-2666 Non-ECC)

@ekhramtsov
Copy link

I see this on -CURRENT with both Green Sardine 5600G and Polaris 20 RX 580, single DP-1 3840x2160 output and x11-wm/sway after several poudriere tmpfs runs (fragmented VM?).

Delay is significantly worse when all cores are used (not the case on 5.10) and/or larger surfaces are being damaged, and cpuset -c -l 0-(ncores-2) during poudriere run also helps (no difference on 5.10), which suggests that additional work is done $somewhere past 5.10.

I only glanced and didn't bisect/investigate, but I speculate past 302b3a8 (see ("ttm_pool.c: use_dma_alloc not implemented") cache-coherent physically contiguous DMA allocations like on Linux are not done which results in additional copying/VM work which makes lag more apparent on larger resolutions.

@chaplina
Copy link

chaplina commented Aug 27, 2024 via email

@ekhramtsov
Copy link

Reproducer (VM behavior not deterministic so could not come up with automated reproducer):

  1. Apply D40575 Implement the Free Memory Fragmentation Index (FMFI) metric to observe the moment where free phys mem is fragmented enough for lag to occur.
  2. Exhaust free phys mem with tmpfs (no swap), leaving "just enough" (amount to leave (RESERVED) is not deterministic) of Free to avoid OOM of x11-wm/sway session. If one sees [...] was killed: failed to reclaim memory one has to either wait/cause ARC and other pages reclamation or adjust RESERVED accordingly.
#!/bin/sh
set -e

FREE=$(vmstat -H | tail -fn1 | cut -w -f 6)
RESERVED="1024"
TMPDIR=$(mktemp -d)

mount -t tmpfs tmpfs $TMPDIR
dd if=/dev/random of=$TMPDIR/$(date +%s) bs=1M count=$(printf "$FREE / 2^20 - $RESERVED" | bc) status=progress
  1. Run make -j4 buildworld until free phys is fragmented enough (Mem numbers are likely bogus due to kernel reporting being buggy under memory pressure). 1000 for 16384K and >997 for 4096K are the point of lag on 4K with Green Sardine.
drm 6.1

sysctl vm.phys_frag_idx >REPRO && top | grep -F Free >>REPRO

vm.phys_frag_idx:
DOMAIN 0

  ORDER (SIZE) |  FMFI
--
  12 ( 16384K) |  1000
  11 (  8192K) |  999
  10 (  4096K) |  998
   9 (  2048K) |  995
   8 (  1024K) |  990
   7 (   512K) |  980
   6 (   256K) |  959
   5 (   128K) |  918
   4 (    64K) |  835
   3 (    32K) |  669
   2 (    16K) |  338
   1 (     8K) |  -324
   0 (     4K) |  -1648

Mem: 11M Active, 616M Inact, 513M Laundry, 2700M Wired, 40K Buf, 732M Free
  1. Try to resize/move www/firefox and x11/foot windows on the same workspace rapidly, observe lag. Lag that does not happen on 5.10-lts with (even worse) free phys mem fragmentation:
drm 5.10-lts

sysctl vm.phys_frag_idx >REPRO2 && top | grep -F Free >>REPRO2

vm.phys_frag_idx:
DOMAIN 0

  ORDER (SIZE) |  FMFI
--
  12 ( 16384K) |  1000
  11 (  8192K) |  999
  10 (  4096K) |  998
   9 (  2048K) |  996
   8 (  1024K) |  992
   7 (   512K) |  984
   6 (   256K) |  967
   5 (   128K) |  933
   4 (    64K) |  865
   3 (    32K) |  730
   2 (    16K) |  459
   1 (     8K) |  -83
   0 (     4K) |  -1167

Mem: 11G Active, 3519M Inact, 13G Laundry, 2470M Wired, 40K Buf, 691M Free

@OlCe2
Copy link
Member Author

OlCe2 commented Sep 2, 2024

Found a duplicate FreeBSD PR: PR 277476.

As I've just added there, I'd really like to get to the bottom of this. However, I don't plan to have the time to do so before end of September at the very least.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
amdgpu amdgpu related problems bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants