Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pool hangs under heavy load #17055

Open
owlshrimp opened this issue Feb 14, 2025 · 10 comments
Open

Pool hangs under heavy load #17055

owlshrimp opened this issue Feb 14, 2025 · 10 comments
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@owlshrimp
Copy link

owlshrimp commented Feb 14, 2025

System information

Type Version/Name
Distribution Name Ubuntu Server
Distribution Version 22.04.5
Kernel Version 5.15.0-131-generic
Architecture x86_64
OpenZFS Version 2.1.5-1ubuntu6~22.04.4

This machine is a fairly lightweight AMD fam16h quad core machine with 16GB of ram.

Describe the problem you're observing

During a large download job of videos which some transcoding going going on as well, after a few hours the pool locks up and becomes unresponsive. In dmesg are stack traces of hung kernel tasks.

The occurrences below happened about a week apart both from relatively fresh boots of the system (a few days at most), but I have no reason to think those time frames are relevant. It seems to happen pretty reliably as soon as I start archiving friends' livestreams, no more than about 12 hours in at the latest.

This could be coincidence, but it seems like yt-dlp is usually in the final stage of the download which involves fixing up the final file's video container, which I assume is heavy in both reading and writing. ([FixupM3u8] Fixing MPEG-TS in MP4 container of "<video file name>") I believe I had two instances of yt-dlp running concurrently this time and both seem to have hung during this stage.

I've turned up a few other possibly-related issues but some happen under different conditions and none seem at first glance to have similar backtraces:
#15481
#15283
#15217

Describe how to reproduce the problem

Install Ubuntu server 22.04, install ZFS from the ubuntu repo, make sure the system is up to date, and then use the yt-dlp utility to download several large (multiple GB) arbitrary VoD video archives of recent streams from Twitch. (several calls of "yt-dlp <video link>" in a batch file is appears sufficient, or one invocation with many links as arguments, say 10-15; repeatedly downloading and then rm-ing the same file in a loop would also likely work)

Include any warning/errors/backtraces from the system logs

$ uname -a
Linux ThinkPad-T430 5.15.0-131-generic #141~20.04.1-Ubuntu SMP Thu Jan 16 18:38:51 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
$ modinfo zfs | grep version
version:        2.1.5-1ubuntu6~22.04.4
srcversion:     C846B2D0C274CEADA66935D
vermagic:       5.15.0-131-generic SMP mod_unload modversions

First occurrence, relevant (post boot) portion of dmesg:

[386624.009405] perf: interrupt took too long (2513 > 2500), lowering kernel.perf_event_max_sample_rate to 79500
[455258.191184] perf: interrupt took too long (3142 > 3141), lowering kernel.perf_event_max_sample_rate to 63500
[1307589.560568] perf: interrupt took too long (3929 > 3927), lowering kernel.perf_event_max_sample_rate to 50750
[1436305.170628] INFO: task txg_sync:1051 blocked for more than 120 seconds.
[1436305.170668]       Tainted: P           O      5.15.0-130-generic #140-Ubuntu
[1436305.170685] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1436305.170698] task:txg_sync        state:D stack:    0 pid: 1051 ppid:     2 flags:0x00004000
[1436305.170716] Call Trace:
[1436305.170724]  <TASK>
[1436305.170734]  __schedule+0x24e/0x590
[1436305.170757]  schedule+0x69/0x110
[1436305.170768]  schedule_timeout+0x87/0x140
[1436305.170785]  ? zio_issue_async+0x12/0x20 [zfs]
[1436305.171393]  ? __bpf_trace_tick_stop+0x20/0x20
[1436305.171409]  io_schedule_timeout+0x51/0x80
[1436305.171422]  __cv_timedwait_common+0x12c/0x170 [spl]
[1436305.171468]  ? wait_woken+0x70/0x70
[1436305.171484]  __cv_timedwait_io+0x19/0x20 [spl]
[1436305.171526]  zio_wait+0x116/0x220 [zfs]
[1436305.172095]  dsl_pool_sync+0xb6/0x400 [zfs]
[1436305.172593]  ? __mod_timer+0x294/0x430
[1436305.172610]  spa_sync_iterate_to_convergence+0xe0/0x1f0 [zfs]
[1436305.173145]  spa_sync+0x2dc/0x5b0 [zfs]
[1436305.173680]  txg_sync_thread+0x266/0x2f0 [zfs]
[1436305.174230]  ? txg_dispatch_callbacks+0x100/0x100 [zfs]
[1436305.174823]  thread_generic_wrapper+0x64/0x80 [spl]
[1436305.174875]  ? __thread_exit+0x20/0x20 [spl]
[1436305.174923]  kthread+0x12a/0x150
[1436305.174939]  ? set_kthread_struct+0x50/0x50
[1436305.174953]  ret_from_fork+0x22/0x30
[1436305.174972]  </TASK>
[1436909.319127] INFO: task txg_sync:1051 blocked for more than 120 seconds.
[1436909.319159]       Tainted: P           O      5.15.0-130-generic #140-Ubuntu
[1436909.319172] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1436909.319182] task:txg_sync        state:D stack:    0 pid: 1051 ppid:     2 flags:0x00004000
[1436909.319197] Call Trace:
[1436909.319204]  <TASK>
[1436909.319213]  __schedule+0x24e/0x590
[1436909.319231]  schedule+0x69/0x110
[1436909.319241]  schedule_timeout+0x87/0x140
[1436909.319254]  ? zio_issue_async+0x12/0x20 [zfs]
[1436909.319746]  ? __bpf_trace_tick_stop+0x20/0x20
[1436909.319759]  io_schedule_timeout+0x51/0x80
[1436909.319770]  __cv_timedwait_common+0x12c/0x170 [spl]
[1436909.319808]  ? wait_woken+0x70/0x70
[1436909.319822]  __cv_timedwait_io+0x19/0x20 [spl]
[1436909.319855]  zio_wait+0x116/0x220 [zfs]
[1436909.320311]  dsl_pool_sync+0xb6/0x400 [zfs]
[1436909.320710]  ? __mod_timer+0x294/0x430
[1436909.320724]  spa_sync_iterate_to_convergence+0xe0/0x1f0 [zfs]
[1436909.321152]  spa_sync+0x2dc/0x5b0 [zfs]
[1436909.321579]  txg_sync_thread+0x266/0x2f0 [zfs]
[1436909.322019]  ? txg_dispatch_callbacks+0x100/0x100 [zfs]
[1436909.322457]  thread_generic_wrapper+0x64/0x80 [spl]
[1436909.322496]  ? __thread_exit+0x20/0x20 [spl]
[1436909.322534]  kthread+0x12a/0x150
[1436909.322546]  ? set_kthread_struct+0x50/0x50
[1436909.322557]  ret_from_fork+0x22/0x30
[1436909.322572]  </TASK>
[1437150.978622] INFO: task txg_sync:1051 blocked for more than 120 seconds.
[1437150.978662]       Tainted: P           O      5.15.0-130-generic #140-Ubuntu
[1437150.978680] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1437150.978697] task:txg_sync        state:D stack:    0 pid: 1051 ppid:     2 flags:0x00004000
[1437150.978715] Call Trace:
[1437150.978723]  <TASK>
[1437150.978734]  __schedule+0x24e/0x590
[1437150.978757]  schedule+0x69/0x110
[1437150.978769]  schedule_timeout+0x87/0x140
[1437150.978786]  ? zio_issue_async+0x12/0x20 [zfs]
[1437150.979397]  ? __bpf_trace_tick_stop+0x20/0x20
[1437150.979413]  io_schedule_timeout+0x51/0x80
[1437150.979426]  __cv_timedwait_common+0x12c/0x170 [spl]
[1437150.979473]  ? wait_woken+0x70/0x70
[1437150.979489]  __cv_timedwait_io+0x19/0x20 [spl]
[1437150.979531]  zio_wait+0x116/0x220 [zfs]
[1437150.980100]  dsl_pool_sync+0xb6/0x400 [zfs]
[1437150.980599]  ? __mod_timer+0x294/0x430
[1437150.980615]  spa_sync_iterate_to_convergence+0xe0/0x1f0 [zfs]
[1437150.981149]  spa_sync+0x2dc/0x5b0 [zfs]
[1437150.981684]  txg_sync_thread+0x266/0x2f0 [zfs]
[1437150.982234]  ? txg_dispatch_callbacks+0x100/0x100 [zfs]
[1437150.982862]  thread_generic_wrapper+0x64/0x80 [spl]
[1437150.982915]  ? __thread_exit+0x20/0x20 [spl]
[1437150.982966]  kthread+0x12a/0x150
[1437150.982982]  ? set_kthread_struct+0x50/0x50
[1437150.982996]  ret_from_fork+0x22/0x30
[1437150.983015]  </TASK>
[1437513.467850] INFO: task txg_sync:1051 blocked for more than 120 seconds.
[1437513.467892]       Tainted: P           O      5.15.0-130-generic #140-Ubuntu
[1437513.467911] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1437513.467926] task:txg_sync        state:D stack:    0 pid: 1051 ppid:     2 flags:0x00004000
[1437513.467944] Call Trace:
[1437513.467952]  <TASK>
[1437513.467963]  __schedule+0x24e/0x590
[1437513.467986]  schedule+0x69/0x110
[1437513.467997]  schedule_timeout+0x87/0x140
[1437513.468014]  ? zio_issue_async+0x12/0x20 [zfs]
[1437513.468622]  ? __bpf_trace_tick_stop+0x20/0x20
[1437513.468637]  io_schedule_timeout+0x51/0x80
[1437513.468650]  __cv_timedwait_common+0x12c/0x170 [spl]
[1437513.468696]  ? wait_woken+0x70/0x70
[1437513.468712]  __cv_timedwait_io+0x19/0x20 [spl]
[1437513.468754]  zio_wait+0x116/0x220 [zfs]
[1437513.469323]  dsl_pool_sync+0xb6/0x400 [zfs]
[1437513.469827]  ? __mod_timer+0x294/0x430
[1437513.469844]  spa_sync_iterate_to_convergence+0xe0/0x1f0 [zfs]
[1437513.470394]  spa_sync+0x2dc/0x5b0 [zfs]
[1437513.470934]  txg_sync_thread+0x266/0x2f0 [zfs]
[1437513.471485]  ? txg_dispatch_callbacks+0x100/0x100 [zfs]
[1437513.472082]  thread_generic_wrapper+0x64/0x80 [spl]
[1437513.472136]  ? __thread_exit+0x20/0x20 [spl]
[1437513.472185]  kthread+0x12a/0x150
[1437513.472201]  ? set_kthread_struct+0x50/0x50
[1437513.472216]  ret_from_fork+0x22/0x30
[1437513.472235]  </TASK>
[1437755.127332] INFO: task txg_sync:1051 blocked for more than 120 seconds.
[1437755.127373]       Tainted: P           O      5.15.0-130-generic #140-Ubuntu
[1437755.127393] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1437755.127409] task:txg_sync        state:D stack:    0 pid: 1051 ppid:     2 flags:0x00004000
[1437755.127427] Call Trace:
[1437755.127435]  <TASK>
[1437755.127445]  __schedule+0x24e/0x590
[1437755.127469]  schedule+0x69/0x110
[1437755.127481]  schedule_timeout+0x87/0x140
[1437755.127497]  ? zio_issue_async+0x12/0x20 [zfs]
[1437755.128125]  ? __bpf_trace_tick_stop+0x20/0x20
[1437755.128141]  io_schedule_timeout+0x51/0x80
[1437755.128154]  __cv_timedwait_common+0x12c/0x170 [spl]
[1437755.128201]  ? wait_woken+0x70/0x70
[1437755.128217]  __cv_timedwait_io+0x19/0x20 [spl]
[1437755.128259]  zio_wait+0x116/0x220 [zfs]
[1437755.128828]  dsl_pool_sync+0xb6/0x400 [zfs]
[1437755.129326]  ? __mod_timer+0x294/0x430
[1437755.129342]  spa_sync_iterate_to_convergence+0xe0/0x1f0 [zfs]
[1437755.129917]  spa_sync+0x2dc/0x5b0 [zfs]
[1437755.130457]  txg_sync_thread+0x266/0x2f0 [zfs]
[1437755.131008]  ? txg_dispatch_callbacks+0x100/0x100 [zfs]
[1437755.131604]  thread_generic_wrapper+0x64/0x80 [spl]
[1437755.131656]  ? __thread_exit+0x20/0x20 [spl]
[1437755.131704]  kthread+0x12a/0x150
[1437755.131720]  ? set_kthread_struct+0x50/0x50
[1437755.131734]  ret_from_fork+0x22/0x30
[1437755.131753]  </TASK>
[1438359.275268] INFO: task txg_sync:1051 blocked for more than 120 seconds.
[1438359.275310]       Tainted: P           O      5.15.0-130-generic #140-Ubuntu
[1438359.275329] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1438359.275345] task:txg_sync        state:D stack:    0 pid: 1051 ppid:     2 flags:0x00004000
[1438359.275364] Call Trace:
[1438359.275372]  <TASK>
[1438359.275383]  __schedule+0x24e/0x590
[1438359.275408]  schedule+0x69/0x110
[1438359.275419]  schedule_timeout+0x87/0x140
[1438359.275436]  ? zio_issue_async+0x12/0x20 [zfs]
[1438359.276049]  ? __bpf_trace_tick_stop+0x20/0x20
[1438359.276065]  io_schedule_timeout+0x51/0x80
[1438359.276079]  __cv_timedwait_common+0x12c/0x170 [spl]
[1438359.276130]  ? wait_woken+0x70/0x70
[1438359.276147]  __cv_timedwait_io+0x19/0x20 [spl]
[1438359.276190]  zio_wait+0x116/0x220 [zfs]
[1438359.276765]  dsl_pool_sync+0xb6/0x400 [zfs]
[1438359.277263]  ? __mod_timer+0x294/0x430
[1438359.277280]  spa_sync_iterate_to_convergence+0xe0/0x1f0 [zfs]
[1438359.277816]  spa_sync+0x2dc/0x5b0 [zfs]
[1438359.278352]  txg_sync_thread+0x266/0x2f0 [zfs]
[1438359.278906]  ? txg_dispatch_callbacks+0x100/0x100 [zfs]
[1438359.279505]  thread_generic_wrapper+0x64/0x80 [spl]
[1438359.279559]  ? __thread_exit+0x20/0x20 [spl]
[1438359.279607]  kthread+0x12a/0x150
[1438359.279623]  ? set_kthread_struct+0x50/0x50
[1438359.279637]  ret_from_fork+0x22/0x30
[1438359.279656]  </TASK>
[1438359.279712] INFO: task ffmpeg:539854 blocked for more than 120 seconds.
[1438359.279737]       Tainted: P           O      5.15.0-130-generic #140-Ubuntu
[1438359.279756] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1438359.279772] task:ffmpeg          state:D stack:    0 pid:539854 ppid:501232 flags:0x00004002
[1438359.279789] Call Trace:
[1438359.279795]  <TASK>
[1438359.279801]  __schedule+0x24e/0x590
[1438359.279815]  ? kmem_cache_free+0x24f/0x290
[1438359.279830]  schedule+0x69/0x110
[1438359.279842]  cv_wait_common+0xf8/0x130 [spl]
[1438359.279885]  ? wait_woken+0x70/0x70
[1438359.279899]  __cv_wait+0x15/0x20 [spl]
[1438359.279942]  dmu_tx_wait+0x8e/0x1e0 [zfs]
[1438359.280440]  dmu_tx_assign+0x49/0x80 [zfs]
[1438359.280914]  zfs_write+0x45c/0xdb0 [zfs]
[1438359.281492]  zpl_iter_write+0xe7/0x130 [zfs]
[1438359.282058]  new_sync_write+0x114/0x1a0
[1438359.282077]  vfs_write+0x1d8/0x270
[1438359.282090]  ksys_write+0x67/0xf0
[1438359.282101]  __x64_sys_write+0x19/0x20
[1438359.282110]  x64_sys_call+0x47c/0x1fa0
[1438359.282124]  do_syscall_64+0x56/0xb0
[1438359.282136]  ? syscall_exit_to_user_mode+0x2c/0x50
[1438359.282148]  ? x64_sys_call+0x47c/0x1fa0
[1438359.282160]  ? do_syscall_64+0x63/0xb0
[1438359.282170]  ? do_syscall_64+0x63/0xb0
[1438359.282179]  ? do_syscall_64+0x63/0xb0
[1438359.282188]  ? do_syscall_64+0x63/0xb0
[1438359.282198]  entry_SYSCALL_64_after_hwframe+0x6c/0xd6
[1438359.282213] RIP: 0033:0x7f2e8af5c887
[1438359.282223] RSP: 002b:00007fff4aa53478 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[1438359.282237] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f2e8af5c887
[1438359.282245] RDX: 0000000000040000 RSI: 00005610cf15ab00 RDI: 0000000000000004
[1438359.282253] RBP: 00005610cf15ab00 R08: 00005610cf19aaf0 R09: 00007f2e8afc3371
[1438359.282260] R10: 7377670f78a3c054 R11: 0000000000000246 R12: 00005610cef3ad80
[1438359.282267] R13: 0000000000040000 R14: 0000000000000005 R15: 00005610cef3adb0
[1438359.282280]  </TASK>
[1438600.934282] INFO: task txg_sync:1051 blocked for more than 120 seconds.
[1438600.934323]       Tainted: P           O      5.15.0-130-generic #140-Ubuntu
[1438600.934342] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1438600.934358] task:txg_sync        state:D stack:    0 pid: 1051 ppid:     2 flags:0x00004000
[1438600.934376] Call Trace:
[1438600.934384]  <TASK>
[1438600.934395]  __schedule+0x24e/0x590
[1438600.934418]  schedule+0x69/0x110
[1438600.934430]  schedule_timeout+0x87/0x140
[1438600.934446]  ? zio_issue_async+0x12/0x20 [zfs]
[1438600.935053]  ? __bpf_trace_tick_stop+0x20/0x20
[1438600.935068]  io_schedule_timeout+0x51/0x80
[1438600.935081]  __cv_timedwait_common+0x12c/0x170 [spl]
[1438600.935127]  ? wait_woken+0x70/0x70
[1438600.935143]  __cv_timedwait_io+0x19/0x20 [spl]
[1438600.935185]  zio_wait+0x116/0x220 [zfs]
[1438600.935754]  dsl_pool_sync+0xb6/0x400 [zfs]
[1438600.936253]  ? __mod_timer+0x294/0x430
[1438600.936269]  spa_sync_iterate_to_convergence+0xe0/0x1f0 [zfs]
[1438600.936803]  spa_sync+0x2dc/0x5b0 [zfs]
[1438600.937338]  txg_sync_thread+0x266/0x2f0 [zfs]
[1438600.937887]  ? txg_dispatch_callbacks+0x100/0x100 [zfs]
[1438600.938483]  thread_generic_wrapper+0x64/0x80 [spl]
[1438600.938535]  ? __thread_exit+0x20/0x20 [spl]
[1438600.938583]  kthread+0x12a/0x150
[1438600.938599]  ? set_kthread_struct+0x50/0x50
[1438600.938613]  ret_from_fork+0x22/0x30
[1438600.938632]  </TASK>
[1438600.938689] INFO: task ffmpeg:539854 blocked for more than 120 seconds.
[1438600.938711]       Tainted: P           O      5.15.0-130-generic #140-Ubuntu
[1438600.938729] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1438600.938744] task:ffmpeg          state:D stack:    0 pid:539854 ppid:501232 flags:0x00004002
[1438600.938760] Call Trace:
[1438600.938765]  <TASK>
[1438600.938772]  __schedule+0x24e/0x590
[1438600.938785]  ? kmem_cache_free+0x24f/0x290
[1438600.938801]  schedule+0x69/0x110
[1438600.938812]  cv_wait_common+0xf8/0x130 [spl]
[1438600.938856]  ? wait_woken+0x70/0x70
[1438600.938871]  __cv_wait+0x15/0x20 [spl]
[1438600.938914]  dmu_tx_wait+0x8e/0x1e0 [zfs]
[1438600.939407]  dmu_tx_assign+0x49/0x80 [zfs]
[1438600.939885]  zfs_write+0x45c/0xdb0 [zfs]
[1438600.940466]  zpl_iter_write+0xe7/0x130 [zfs]
[1438600.941035]  new_sync_write+0x114/0x1a0
[1438600.941053]  vfs_write+0x1d8/0x270
[1438600.941067]  ksys_write+0x67/0xf0
[1438600.941077]  __x64_sys_write+0x19/0x20
[1438600.941087]  x64_sys_call+0x47c/0x1fa0
[1438600.941101]  do_syscall_64+0x56/0xb0
[1438600.941112]  ? do_syscall_64+0x63/0xb0
[1438600.941122]  ? syscall_exit_to_user_mode+0x2c/0x50
[1438600.941134]  ? x64_sys_call+0x47c/0x1fa0
[1438600.941147]  ? do_syscall_64+0x63/0xb0
[1438600.941157]  ? exit_to_user_mode_prepare+0x96/0xb0
[1438600.941172]  ? syscall_exit_to_user_mode+0x2c/0x50
[1438600.941183]  ? x64_sys_call+0x47c/0x1fa0
[1438600.941196]  ? do_syscall_64+0x63/0xb0
[1438600.941205]  ? exit_to_user_mode_prepare+0x37/0xb0
[1438600.941219]  ? irqentry_exit_to_user_mode+0xe/0x20
[1438600.941231]  ? irqentry_exit+0x1d/0x30
[1438600.941242]  ? sysvec_apic_timer_interrupt+0x4e/0x90
[1438600.941255]  entry_SYSCALL_64_after_hwframe+0x6c/0xd6
[1438600.941270] RIP: 0033:0x7f2e8af5c887
[1438600.941281] RSP: 002b:00007fff4aa53478 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[1438600.941296] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f2e8af5c887
[1438600.941304] RDX: 0000000000040000 RSI: 00005610cf15ab00 RDI: 0000000000000004
[1438600.941311] RBP: 00005610cf15ab00 R08: 31c0ff49dcc16645 R09: f74988b23900fdca
[1438600.941319] R10: 00e293aaab379add R11: 0000000000000246 R12: 00005610cef3ad80
[1438600.941326] R13: 0000000000040000 R14: 0000000000000005 R15: 00005610cef3adb0
[1438600.941339]  </TASK>
[1438842.593771] INFO: task ffmpeg:539854 blocked for more than 120 seconds.
[1438842.593811]       Tainted: P           O      5.15.0-130-generic #140-Ubuntu
[1438842.593831] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1438842.593847] task:ffmpeg          state:D stack:    0 pid:539854 ppid:501232 flags:0x00000002
[1438842.593865] Call Trace:
[1438842.593873]  <TASK>
[1438842.593884]  __schedule+0x24e/0x590
[1438842.593904]  ? kmem_cache_free+0x24f/0x290
[1438842.593922]  schedule+0x69/0x110
[1438842.593936]  cv_wait_common+0xf8/0x130 [spl]
[1438842.593983]  ? wait_woken+0x70/0x70
[1438842.594000]  __cv_wait+0x15/0x20 [spl]
[1438842.594043]  dmu_tx_wait+0x8e/0x1e0 [zfs]
[1438842.594552]  dmu_tx_assign+0x49/0x80 [zfs]
[1438842.595022]  zfs_write+0x45c/0xdb0 [zfs]
[1438842.595599]  ? crfree+0xe/0x20 [spl]
[1438842.595641]  ? generic_write_checks+0x65/0xc0
[1438842.595654]  zpl_iter_write+0xe7/0x130 [zfs]
[1438842.596221]  new_sync_write+0x114/0x1a0
[1438842.596239]  vfs_write+0x1d8/0x270
[1438842.596252]  ksys_write+0x67/0xf0
[1438842.596263]  __x64_sys_write+0x19/0x20
[1438842.596272]  x64_sys_call+0x47c/0x1fa0
[1438842.596287]  do_syscall_64+0x56/0xb0
[1438842.596298]  ? do_syscall_64+0x63/0xb0
[1438842.596307]  entry_SYSCALL_64_after_hwframe+0x6c/0xd6
[1438842.596322] RIP: 0033:0x7f2e8af5c887
[1438842.596332] RSP: 002b:00007fff4aa53478 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[1438842.596346] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f2e8af5c887
[1438842.596354] RDX: 0000000000040000 RSI: 00005610cf15ab00 RDI: 0000000000000004
[1438842.596361] RBP: 00005610cf15ab00 R08: 00005610cf15ab00 R09: 0000000000000000
[1438842.596368] R10: a7e0460b52defca0 R11: 0000000000000246 R12: 00005610cef3ad80
[1438842.596375] R13: 0000000000040000 R14: 0000000000000005 R15: 00005610cef3adb0
[1438842.596388]  </TASK>

Seccond occurrence, relevant portion of dmesg:

[ 2544.047802] perf: interrupt took too long (2509 > 2500), lowering kernel.perf_event_max_sample_rate to 79500
[ 6868.896924] perf: interrupt took too long (3140 > 3136), lowering kernel.perf_event_max_sample_rate to 63500
[365028.145173] INFO: task txg_sync:1091 blocked for more than 120 seconds.
[365028.145211]       Tainted: P           O      5.15.0-131-generic #141-Ubuntu
[365028.145227] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[365028.145240] task:txg_sync        state:D stack:    0 pid: 1091 ppid:     2 flags:0x00004000
[365028.145257] Call Trace:
[365028.145265]  <TASK>
[365028.145277]  __schedule+0x24e/0x590
[365028.145300]  schedule+0x69/0x110
[365028.145312]  schedule_timeout+0x87/0x140
[365028.145328]  ? zio_issue_async+0x12/0x20 [zfs]
[365028.145934]  ? __bpf_trace_tick_stop+0x20/0x20
[365028.145949]  io_schedule_timeout+0x51/0x80
[365028.145962]  __cv_timedwait_common+0x12c/0x170 [spl]
[365028.146008]  ? wait_woken+0x70/0x70
[365028.146024]  __cv_timedwait_io+0x19/0x20 [spl]
[365028.146066]  zio_wait+0x116/0x220 [zfs]
[365028.146635]  dsl_pool_sync+0xb6/0x400 [zfs]
[365028.147134]  ? __mod_timer+0x294/0x430
[365028.147151]  spa_sync_iterate_to_convergence+0xe0/0x1f0 [zfs]
[365028.147688]  spa_sync+0x2dc/0x5b0 [zfs]
[365028.148222]  txg_sync_thread+0x266/0x2f0 [zfs]
[365028.148772]  ? txg_dispatch_callbacks+0x100/0x100 [zfs]
[365028.149374]  thread_generic_wrapper+0x64/0x80 [spl]
[365028.149428]  ? __thread_exit+0x20/0x20 [spl]
[365028.149477]  kthread+0x12a/0x150
[365028.149492]  ? set_kthread_struct+0x50/0x50
[365028.149507]  ret_from_fork+0x22/0x30
[365028.149526]  </TASK>
[365028.149579] INFO: task yt-dlp:602438 blocked for more than 120 seconds.
[365028.149597]       Tainted: P           O      5.15.0-131-generic #141-Ubuntu
[365028.149612] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[365028.149624] task:yt-dlp          state:D stack:    0 pid:602438 ppid:172074 flags:0x00004002
[365028.149640] Call Trace:
[365028.149645]  <TASK>
[365028.149652]  __schedule+0x24e/0x590
[365028.149665]  ? kmem_cache_free+0x24f/0x290
[365028.149680]  schedule+0x69/0x110
[365028.149691]  cv_wait_common+0xf8/0x130 [spl]
[365028.149734]  ? wait_woken+0x70/0x70
[365028.149748]  __cv_wait+0x15/0x20 [spl]
[365028.149791]  dmu_tx_wait+0x8e/0x1e0 [zfs]
[365028.150282]  dmu_tx_assign+0x49/0x80 [zfs]
[365028.150751]  zfs_write+0x45c/0xdb0 [zfs]
[365028.151330]  zpl_iter_write+0xe7/0x130 [zfs]
[365028.151896]  new_sync_write+0x114/0x1a0
[365028.151914]  vfs_write+0x1d8/0x270
[365028.151927]  ksys_write+0x67/0xf0
[365028.151938]  __x64_sys_write+0x19/0x20
[365028.151947]  x64_sys_call+0x47c/0x1fa0
[365028.151961]  do_syscall_64+0x56/0xb0
[365028.151971]  ? syscall_exit_to_user_mode+0x2c/0x50
[365028.151984]  ? x64_sys_call+0x1dba/0x1fa0
[365028.151997]  ? do_syscall_64+0x63/0xb0
[365028.152006]  ? do_syscall_64+0x63/0xb0
[365028.152015]  ? do_syscall_64+0x63/0xb0
[365028.152024]  ? sysvec_apic_timer_interrupt+0x4e/0x90
[365028.152036]  entry_SYSCALL_64_after_hwframe+0x6c/0xd6
[365028.152051] RIP: 0033:0x7f4469e17887
[365028.152062] RSP: 002b:00007ffdacafcef8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[365028.152076] RAX: ffffffffffffffda RBX: 0000561f15a45b40 RCX: 00007f4469e17887
[365028.152084] RDX: 000000000008c4d2 RSI: 0000561f170dcbd0 RDI: 0000000000000005
[365028.152091] RBP: 0000561f1648ad50 R08: 0000000000000000 R09: 0000000000000000
[365028.152098] R10: 0000000000000000 R11: 0000000000000246 R12: 000000000008c4d2
[365028.152105] R13: 00007f4469d01f80 R14: 0000000000000005 R15: 0000561f170dcbd0
[365028.152118]  </TASK>
[365028.152124] INFO: task mv:605544 blocked for more than 120 seconds.
[365028.152193]       Tainted: P           O      5.15.0-131-generic #141-Ubuntu
[365028.152209] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[365028.152221] task:mv              state:D stack:    0 pid:605544 ppid:169840 flags:0x00000002
[365028.152236] Call Trace:
[365028.152241]  <TASK>
[365028.152248]  __schedule+0x24e/0x590
[365028.152261]  ? kmem_cache_free+0x24f/0x290
[365028.152275]  schedule+0x69/0x110
[365028.152286]  cv_wait_common+0xf8/0x130 [spl]
[365028.152332]  ? wait_woken+0x70/0x70
[365028.152347]  __cv_wait+0x15/0x20 [spl]
[365028.152388]  dmu_tx_wait+0x8e/0x1e0 [zfs]
[365028.152866]  dmu_tx_assign+0x49/0x80 [zfs]
[365028.153375]  zfs_write+0x45c/0xdb0 [zfs]
[365028.153957]  ? generic_write_checks+0x65/0xc0
[365028.153971]  zpl_iter_write+0xe7/0x130 [zfs]
[365028.154537]  new_sync_write+0x114/0x1a0
[365028.154554]  vfs_write+0x1d8/0x270
[365028.154567]  ksys_write+0x67/0xf0
[365028.154578]  __x64_sys_write+0x19/0x20
[365028.154587]  x64_sys_call+0x47c/0x1fa0
[365028.154601]  do_syscall_64+0x56/0xb0
[365028.154611]  ? exit_to_user_mode_prepare+0x37/0xb0
[365028.154626]  ? syscall_exit_to_user_mode+0x2c/0x50
[365028.154637]  ? x64_sys_call+0x1dba/0x1fa0
[365028.154649]  ? do_syscall_64+0x63/0xb0
[365028.154660]  ? exit_to_user_mode_prepare+0x96/0xb0
[365028.154672]  ? syscall_exit_to_user_mode+0x2c/0x50
[365028.154683]  ? x64_sys_call+0x47c/0x1fa0
[365028.154695]  ? do_syscall_64+0x63/0xb0
[365028.154704]  ? syscall_exit_to_user_mode+0x2c/0x50
[365028.154715]  ? x64_sys_call+0x47c/0x1fa0
[365028.154727]  ? do_syscall_64+0x63/0xb0
[365028.154736]  ? do_syscall_64+0x63/0xb0
[365028.154746]  entry_SYSCALL_64_after_hwframe+0x6c/0xd6
[365028.154760] RIP: 0033:0x7f0bd45a3887
[365028.154769] RSP: 002b:00007ffd4e821ba8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[365028.154782] RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007f0bd45a3887
[365028.154790] RDX: 0000000000020000 RSI: 00007f0bd40ec000 RDI: 0000000000000004
[365028.154797] RBP: 0000000000020000 R08: 0000000000000000 R09: 0000000000000000
[365028.154804] R10: 0000000000000022 R11: 0000000000000246 R12: 00007f0bd40ec000
[365028.154811] R13: 0000000000000004 R14: 0000000000020000 R15: 0000000000000000
[365028.154823]  </TASK>
[365269.804701] INFO: task txg_sync:1091 blocked for more than 120 seconds.
[365269.804737]       Tainted: P           O      5.15.0-131-generic #141-Ubuntu
[365269.804754] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[365269.804771] task:txg_sync        state:D stack:    0 pid: 1091 ppid:     2 flags:0x00004000
[365269.804789] Call Trace:
[365269.804797]  <TASK>
[365269.804808]  __schedule+0x24e/0x590
[365269.804831]  schedule+0x69/0x110
[365269.804842]  schedule_timeout+0x87/0x140
[365269.804858]  ? zio_issue_async+0x12/0x20 [zfs]
[365269.805465]  ? __bpf_trace_tick_stop+0x20/0x20
[365269.805480]  io_schedule_timeout+0x51/0x80
[365269.805493]  __cv_timedwait_common+0x12c/0x170 [spl]
[365269.805539]  ? wait_woken+0x70/0x70
[365269.805555]  __cv_timedwait_io+0x19/0x20 [spl]
[365269.805597]  zio_wait+0x116/0x220 [zfs]
[365269.806172]  dsl_pool_sync+0xb6/0x400 [zfs]
[365269.806672]  ? __mod_timer+0x294/0x430
[365269.806689]  spa_sync_iterate_to_convergence+0xe0/0x1f0 [zfs]
[365269.807224]  spa_sync+0x2dc/0x5b0 [zfs]
[365269.807760]  txg_sync_thread+0x266/0x2f0 [zfs]
[365269.808310]  ? txg_dispatch_callbacks+0x100/0x100 [zfs]
[365269.808923]  thread_generic_wrapper+0x64/0x80 [spl]
[365269.808978]  ? __thread_exit+0x20/0x20 [spl]
[365269.809026]  kthread+0x12a/0x150
[365269.809042]  ? set_kthread_struct+0x50/0x50
[365269.809056]  ret_from_fork+0x22/0x30
[365269.809075]  </TASK>
[365269.809130] INFO: task yt-dlp:602438 blocked for more than 120 seconds.
[365269.809154]       Tainted: P           O      5.15.0-131-generic #141-Ubuntu
[365269.809170] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[365269.809186] task:yt-dlp          state:D stack:    0 pid:602438 ppid:172074 flags:0x00004002
[365269.809203] Call Trace:
[365269.809208]  <TASK>
[365269.809215]  __schedule+0x24e/0x590
[365269.809229]  ? kmem_cache_free+0x24f/0x290
[365269.809246]  schedule+0x69/0x110
[365269.809257]  cv_wait_common+0xf8/0x130 [spl]
[365269.809301]  ? wait_woken+0x70/0x70
[365269.809316]  __cv_wait+0x15/0x20 [spl]
[365269.809358]  dmu_tx_wait+0x8e/0x1e0 [zfs]
[365269.809861]  dmu_tx_assign+0x49/0x80 [zfs]
[365269.810348]  zfs_write+0x45c/0xdb0 [zfs]
[365269.810945]  zpl_iter_write+0xe7/0x130 [zfs]
[365269.811522]  new_sync_write+0x114/0x1a0
[365269.811544]  vfs_write+0x1d8/0x270
[365269.811558]  ksys_write+0x67/0xf0
[365269.811569]  __x64_sys_write+0x19/0x20
[365269.811579]  x64_sys_call+0x47c/0x1fa0
[365269.811593]  do_syscall_64+0x56/0xb0
[365269.811605]  ? exit_to_user_mode_prepare+0x37/0xb0
[365269.811621]  ? syscall_exit_to_user_mode+0x2c/0x50
[365269.811633]  ? x64_sys_call+0x1dba/0x1fa0
[365269.811646]  ? do_syscall_64+0x63/0xb0
[365269.811655]  ? syscall_exit_to_user_mode+0x2c/0x50
[365269.811666]  ? x64_sys_call+0x1dba/0x1fa0
[365269.811679]  ? do_syscall_64+0x63/0xb0
[365269.811688]  ? sysvec_apic_timer_interrupt+0x4e/0x90
[365269.811700]  entry_SYSCALL_64_after_hwframe+0x6c/0xd6
[365269.811717] RIP: 0033:0x7f4469e17887
[365269.811728] RSP: 002b:00007ffdacafcef8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[365269.811743] RAX: ffffffffffffffda RBX: 0000561f15a45b40 RCX: 00007f4469e17887
[365269.811751] RDX: 0000000000080276 RSI: 0000561f17b1ae40 RDI: 0000000000000005
[365269.811759] RBP: 0000561f1648ad50 R08: 0000000000000000 R09: 0000000000000000
[365269.811766] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000080276
[365269.811772] R13: 00007f4469d01f80 R14: 0000000000000005 R15: 0000561f17b1ae40
[365269.811786]  </TASK>
[365269.811794] INFO: task mv:605544 blocked for more than 120 seconds.
[365269.811819]       Tainted: P           O      5.15.0-131-generic #141-Ubuntu
[365269.811835] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[365269.811851] task:mv              state:D stack:    0 pid:605544 ppid:169840 flags:0x00000002
[365269.811867] Call Trace:
[365269.811873]  <TASK>
[365269.811879]  __schedule+0x24e/0x590
[365269.811892]  ? kmem_cache_free+0x24f/0x290
[365269.811908]  schedule+0x69/0x110
[365269.811919]  cv_wait_common+0xf8/0x130 [spl]
[365269.811965]  ? wait_woken+0x70/0x70
[365269.811981]  __cv_wait+0x15/0x20 [spl]
[365269.812023]  dmu_tx_wait+0x8e/0x1e0 [zfs]
[365269.812644]  dmu_tx_assign+0x49/0x80 [zfs]
[365269.813136]  zfs_write+0x45c/0xdb0 [zfs]
[365269.813727]  ? xas_load+0x17/0xd0
[365269.813753]  ? generic_write_checks+0x65/0xc0
[365269.813768]  zpl_iter_write+0xe7/0x130 [zfs]
[365269.814358]  new_sync_write+0x114/0x1a0
[365269.814378]  vfs_write+0x1d8/0x270
[365269.814391]  ksys_write+0x67/0xf0
[365269.814402]  __x64_sys_write+0x19/0x20
[365269.814412]  x64_sys_call+0x47c/0x1fa0
[365269.814428]  do_syscall_64+0x56/0xb0
[365269.814440]  ? vfs_read+0x14b/0x1a0
[365269.814452]  ? vfs_read+0x14b/0x1a0
[365269.814465]  ? ksys_read+0x67/0xf0
[365269.814475]  ? exit_to_user_mode_prepare+0x37/0xb0
[365269.814490]  ? syscall_exit_to_user_mode+0x2c/0x50
[365269.814502]  ? x64_sys_call+0x1dba/0x1fa0
[365269.814515]  ? do_syscall_64+0x63/0xb0
[365269.814525]  ? syscall_exit_to_user_mode+0x2c/0x50
[365269.814536]  ? x64_sys_call+0x47c/0x1fa0
[365269.814548]  ? do_syscall_64+0x63/0xb0
[365269.814557]  ? do_syscall_64+0x63/0xb0
[365269.814567]  entry_SYSCALL_64_after_hwframe+0x6c/0xd6
[365269.814583] RIP: 0033:0x7f0bd45a3887
[365269.814594] RSP: 002b:00007ffd4e821ba8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[365269.814608] RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007f0bd45a3887
[365269.814616] RDX: 0000000000020000 RSI: 00007f0bd40ec000 RDI: 0000000000000004
[365269.814624] RBP: 0000000000020000 R08: 0000000000000000 R09: 0000000000000000
[365269.814631] R10: 0000000000000022 R11: 0000000000000246 R12: 00007f0bd40ec000
[365269.814638] R13: 0000000000000004 R14: 0000000000020000 R15: 0000000000000000
[365269.814651]  </TASK>
[365511.464146] INFO: task txg_sync:1091 blocked for more than 120 seconds.
[365511.464188]       Tainted: P           O      5.15.0-131-generic #141-Ubuntu
[365511.464206] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[365511.464221] task:txg_sync        state:D stack:    0 pid: 1091 ppid:     2 flags:0x00004000
[365511.464239] Call Trace:
[365511.464247]  <TASK>
[365511.464259]  __schedule+0x24e/0x590
[365511.464283]  schedule+0x69/0x110
[365511.464296]  schedule_timeout+0x87/0x140
[365511.464312]  ? zio_issue_async+0x12/0x20 [zfs]
[365511.464937]  ? __bpf_trace_tick_stop+0x20/0x20
[365511.464954]  io_schedule_timeout+0x51/0x80
[365511.464967]  __cv_timedwait_common+0x12c/0x170 [spl]
[365511.465016]  ? wait_woken+0x70/0x70
[365511.465032]  __cv_timedwait_io+0x19/0x20 [spl]
[365511.465074]  zio_wait+0x116/0x220 [zfs]
[365511.465644]  dsl_pool_sync+0xb6/0x400 [zfs]
[365511.466154]  ? __mod_timer+0x294/0x430
[365511.466175]  spa_sync_iterate_to_convergence+0xe0/0x1f0 [zfs]
[365511.466729]  spa_sync+0x2dc/0x5b0 [zfs]
[365511.467264]  txg_sync_thread+0x266/0x2f0 [zfs]
[365511.467815]  ? txg_dispatch_callbacks+0x100/0x100 [zfs]
[365511.468417]  thread_generic_wrapper+0x64/0x80 [spl]
[365511.468471]  ? __thread_exit+0x20/0x20 [spl]
[365511.468519]  kthread+0x12a/0x150
[365511.468536]  ? set_kthread_struct+0x50/0x50
[365511.468551]  ret_from_fork+0x22/0x30
[365511.468570]  </TASK>
[365511.468626] INFO: task yt-dlp:602438 blocked for more than 120 seconds.
[365511.468655]       Tainted: P           O      5.15.0-131-generic #141-Ubuntu
[365511.468671] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[365511.468687] task:yt-dlp          state:D stack:    0 pid:602438 ppid:172074 flags:0x00004002
[365511.468705] Call Trace:
[365511.468712]  <TASK>
[365511.468719]  __schedule+0x24e/0x590
[365511.468733]  ? kmem_cache_free+0x24f/0x290
[365511.468749]  schedule+0x69/0x110
[365511.468761]  cv_wait_common+0xf8/0x130 [spl]
[365511.468804]  ? wait_woken+0x70/0x70
[365511.468820]  __cv_wait+0x15/0x20 [spl]
[365511.468862]  dmu_tx_wait+0x8e/0x1e0 [zfs]
[365511.469354]  dmu_tx_assign+0x49/0x80 [zfs]
[365511.469831]  zfs_write+0x45c/0xdb0 [zfs]
[365511.470412]  zpl_iter_write+0xe7/0x130 [zfs]
[365511.470997]  new_sync_write+0x114/0x1a0
[365511.471020]  vfs_write+0x1d8/0x270
[365511.471034]  ksys_write+0x67/0xf0
[365511.471046]  __x64_sys_write+0x19/0x20
[365511.471056]  x64_sys_call+0x47c/0x1fa0
[365511.471071]  do_syscall_64+0x56/0xb0
[365511.471084]  ? syscall_exit_to_user_mode+0x2c/0x50
[365511.471096]  ? x64_sys_call+0x1e12/0x1fa0
[365511.471109]  ? do_syscall_64+0x63/0xb0
[365511.471118]  ? sysvec_apic_timer_interrupt+0x4e/0x90
[365511.471130]  entry_SYSCALL_64_after_hwframe+0x6c/0xd6
[365511.471146] RIP: 0033:0x7f4469e17887
[365511.471158] RSP: 002b:00007ffdacafd498 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[365511.471173] RAX: ffffffffffffffda RBX: 0000561f15a45b40 RCX: 00007f4469e17887
[365511.471181] RDX: 00000000007a91a4 RSI: 0000561f17bd17c0 RDI: 0000000000000003
[365511.471189] RBP: 0000561f1648ad50 R08: 0000000000000000 R09: 0000000000000000
[365511.471196] R10: 0000000000000000 R11: 0000000000000246 R12: 00000000007a91a4
[365511.471203] R13: 00007f4469d01f80 R14: 0000000000000003 R15: 0000561f17bd17c0
[365511.471216]  </TASK>
[365511.471224] INFO: task mv:605544 blocked for more than 120 seconds.
[365511.471250]       Tainted: P           O      5.15.0-131-generic #141-Ubuntu
[365511.471268] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[365511.471283] task:mv              state:D stack:    0 pid:605544 ppid:169840 flags:0x00000002
[365511.471298] Call Trace:
[365511.471304]  <TASK>
[365511.471310]  __schedule+0x24e/0x590
[365511.471322]  ? kmem_cache_free+0x24f/0x290
[365511.471339]  schedule+0x69/0x110
[365511.471351]  cv_wait_common+0xf8/0x130 [spl]
[365511.471398]  ? wait_woken+0x70/0x70
[365511.471414]  __cv_wait+0x15/0x20 [spl]
[365511.471456]  dmu_tx_wait+0x8e/0x1e0 [zfs]
[365511.471941]  dmu_tx_assign+0x49/0x80 [zfs]
[365511.472452]  zfs_write+0x45c/0xdb0 [zfs]
[365511.473036]  ? generic_write_checks+0x65/0xc0
[365511.473049]  zpl_iter_write+0xe7/0x130 [zfs]
[365511.473616]  new_sync_write+0x114/0x1a0
[365511.473633]  vfs_write+0x1d8/0x270
[365511.473647]  ksys_write+0x67/0xf0
[365511.473657]  __x64_sys_write+0x19/0x20
[365511.473667]  x64_sys_call+0x47c/0x1fa0
[365511.473680]  do_syscall_64+0x56/0xb0
[365511.473692]  ? syscall_exit_to_user_mode+0x2c/0x50
[365511.473704]  ? x64_sys_call+0x47c/0x1fa0
[365511.473716]  ? do_syscall_64+0x63/0xb0
[365511.473726]  ? exit_to_user_mode_prepare+0x96/0xb0
[365511.473741]  ? syscall_exit_to_user_mode+0x2c/0x50
[365511.473752]  ? x64_sys_call+0x47c/0x1fa0
[365511.473765]  ? do_syscall_64+0x63/0xb0
[365511.473774]  ? exit_to_user_mode_prepare+0x37/0xb0
[365511.473787]  ? syscall_exit_to_user_mode+0x2c/0x50
[365511.473798]  ? x64_sys_call+0x1dba/0x1fa0
[365511.473810]  ? do_syscall_64+0x63/0xb0
[365511.473820]  ? do_syscall_64+0x63/0xb0
[365511.473830]  entry_SYSCALL_64_after_hwframe+0x6c/0xd6
[365511.473845] RIP: 0033:0x7f0bd45a3887
[365511.473854] RSP: 002b:00007ffd4e821ba8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[365511.473867] RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007f0bd45a3887
[365511.473875] RDX: 0000000000020000 RSI: 00007f0bd40ec000 RDI: 0000000000000004
[365511.473882] RBP: 0000000000020000 R08: 0000000000000000 R09: 0000000000000000
[365511.473888] R10: 0000000000000022 R11: 0000000000000246 R12: 00007f0bd40ec000
[365511.473895] R13: 0000000000000004 R14: 0000000000020000 R15: 0000000000000000
[365511.473908]  </TASK>
[365753.123874] INFO: task txg_sync:1091 blocked for more than 120 seconds.
[365753.123915]       Tainted: P           O      5.15.0-131-generic #141-Ubuntu
[365753.123932] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[365753.123948] task:txg_sync        state:D stack:    0 pid: 1091 ppid:     2 flags:0x00004000
[365753.123966] Call Trace:
[365753.123973]  <TASK>
[365753.123985]  __schedule+0x24e/0x590
[365753.124008]  schedule+0x69/0x110
[365753.124020]  schedule_timeout+0x87/0x140
[365753.124036]  ? zio_issue_async+0x12/0x20 [zfs]
[365753.124642]  ? __bpf_trace_tick_stop+0x20/0x20
[365753.124657]  io_schedule_timeout+0x51/0x80
[365753.124670]  __cv_timedwait_common+0x12c/0x170 [spl]
[365753.124716]  ? wait_woken+0x70/0x70
[365753.124732]  __cv_timedwait_io+0x19/0x20 [spl]
[365753.124774]  zio_wait+0x116/0x220 [zfs]
[365753.125343]  dsl_pool_sync+0xb6/0x400 [zfs]
[365753.125842]  ? __mod_timer+0x294/0x430
[365753.125857]  spa_sync_iterate_to_convergence+0xe0/0x1f0 [zfs]
[365753.126393]  spa_sync+0x2dc/0x5b0 [zfs]
[365753.126927]  txg_sync_thread+0x266/0x2f0 [zfs]
[365753.127477]  ? txg_dispatch_callbacks+0x100/0x100 [zfs]
[365753.128080]  thread_generic_wrapper+0x64/0x80 [spl]
[365753.128133]  ? __thread_exit+0x20/0x20 [spl]
[365753.128181]  kthread+0x12a/0x150
[365753.128197]  ? set_kthread_struct+0x50/0x50
[365753.128212]  ret_from_fork+0x22/0x30
[365753.128230]  </TASK>
@owlshrimp owlshrimp added the Type: Defect Incorrect behavior (e.g. crash, hang) label Feb 14, 2025
@owlshrimp
Copy link
Author

@scineram why the downvote?

@YhenZN
Copy link

YhenZN commented Feb 14, 2025

@owlshrimp Ignore the guy. I honestly have no idea why he hasn't been banned from the repo yet. He follows the discussions 24/7 and gives downvotes to everything without ever providing any useful information.

@tonyhutter
Copy link
Contributor

@owlshrimp zfs-2.1.5 is fairly old. It's possible the hang may have been fixed in subsequent releases. Consider upgrading to 2.1.16 if you want to stay on the 2.1.x branch.

@owlshrimp
Copy link
Author

If that is indeed the case I guess someone is going to need to go convince ubuntu to ship a newer version :/

@amotin
Copy link
Member

amotin commented Feb 15, 2025

With 2.3.0 out 2.1.x branch is out of future updates. So update to 2.2.x would be even better. I can't speak about Ubuntu, but 2.1.5 was tagged Jun 21, 2022.

Meanwhile, looking on the logs provided, I'd say that problem may be not in ZFS, but in some very slow (dying?) storage device. Kernel regularly complains about 120 seconds delays in sync thread, but the number is not growing, which means that at some point the writes are completing until the problem happen again in next TXG.

@owlshrimp
Copy link
Author

@amotin that would be quite odd since there are almost-new WD reds, but I suppose not impossible. I've had brand new drives fail on me before. Is there a good way to narrow things down to a particular drive if that's the case?

@amotin
Copy link
Member

amotin commented Feb 15, 2025

@owlshrimp Use whatever tools you have in your distro to view active device queue depth and request latency. ZFS has own zpool iostat command, but I am not used to it.

@owlshrimp
Copy link
Author

owlshrimp commented Feb 15, 2025

Weirdly it's now switched to triggering nearly as soon as I reboot the machine. I haven't even started hammering it, before:

[   86.350605] loop6: detected capacity change from 0 to 8
[  484.640021] INFO: task txg_sync:1230 blocked for more than 120 seconds.
[  484.640060]       Tainted: P           O      5.15.0-131-generic #141-Ubuntu
[  484.640076] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  484.640089] task:txg_sync        state:D stack:    0 pid: 1230 ppid:     2 flags:0x00004000
[  484.640107] Call Trace:
[  484.640115]  <TASK>
[  484.640125]  __schedule+0x24e/0x590
[  484.640148]  schedule+0x69/0x110
[  484.640160]  schedule_timeout+0x87/0x140
[  484.640173]  ? __bpf_trace_tick_stop+0x20/0x20
[  484.640188]  io_schedule_timeout+0x51/0x80
[  484.640204]  __cv_timedwait_common+0x12c/0x170 [spl]
[  484.640252]  ? wait_woken+0x70/0x70
[  484.640268]  __cv_timedwait_io+0x19/0x20 [spl]
[  484.640311]  zio_wait+0x116/0x220 [zfs]
[  484.640915]  ? spa_sync_aux_dev+0x1b0/0x1b0 [zfs]
[  484.641448]  spa_sync_frees+0x40/0x90 [zfs]
[  484.641980]  spa_sync_iterate_to_convergence+0x10d/0x1f0 [zfs]
[  484.642515]  spa_sync+0x2dc/0x5b0 [zfs]
[  484.643049]  txg_sync_thread+0x266/0x2f0 [zfs]
[  484.643598]  ? txg_dispatch_callbacks+0x100/0x100 [zfs]
[  484.644199]  thread_generic_wrapper+0x64/0x80 [spl]
[  484.644252]  ? __thread_exit+0x20/0x20 [spl]
[  484.644300]  kthread+0x12a/0x150
[  484.644316]  ? set_kthread_struct+0x50/0x50
[  484.644331]  ret_from_fork+0x22/0x30
[  484.644350]  </TASK>

The only change in behaviour on my end is I haven't run a scrub immediately after booting and waited for it to complete. Probably can't, if it occurs in the first 484 seconds.

I don't see anything particularly weird in zpool iostat. Nothing unbalanced between the disks in the mirror, and latency writing to disks is in the low tens of ms, which seems reasonable for spinning disks that need to seek.

Even with things broken:

                              capacity     operations     bandwidth    total_wait     disk_wait    syncq_wait    asyncq_wait  scrub   trim
pool                        alloc   free   read  write   read  write   read  write   read  write   read  write   read  write   wait   wait
--------------------------  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----
big-pool                    2.40T  6.69T    122      0   491K      0   27ms      -   27ms      -    9us      -      -      -      -      -
  mirror-0                  2.40T  6.69T    122      0   491K      0   27ms      -   27ms      -    9us      -      -      -      -      -
    wwn-0x5000cca0c8c63bc9      -      -     62      0   249K      0   25ms      -   25ms      -   10us      -      -      -      -      -
    wwn-0x5000cca0d8d4de1c      -      -     60      0   241K      0   28ms      -   28ms      -    9us      -      -      -      -      -
--------------------------  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----
initial-pool                3.47T   156G      0      0      0      0      -      -      -      -      -      -      -      -      -      -
  mirror-0                  3.47T   156G      0      0      0      0      -      -      -      -      -      -      -      -      -      -
    WD_NEW                      -      -      0      0      0      0      -      -      -      -      -      -      -      -      -      -
    SEAGATE_REFURB              -      -      0      0      0      0      -      -      -      -      -      -      -      -      -      -
--------------------------  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----

I suppose if this is a kernel bug the best next step would be to rip out the ubuntu package and put in a DKMS of the latest zfs in this stable series, then if that makes it go away file a bug with ubuntu.

@filip-paczynski
Copy link

there are almost-new WD reds

Some of the WD reds were/are SMR.
You can check your drive here: https://nascompares.com/answer/list-of-wd-cmr-and-smr-hard-drives-hdd

SMR is not a good choice for ZFS

@owlshrimp
Copy link
Author

owlshrimp commented Feb 17, 2025

there are almost-new WD reds

Some of the WD reds were/are SMR. You can check your drive here: https://nascompares.com/answer/list-of-wd-cmr-and-smr-hard-drives-hdd

SMR is not a good choice for ZFS

I'm aware of that and these drives are known not to be SMR. (They even proclaim their suitability for ZFS on the back of the box, as WD started doing to mark CMR drives after the bad press they received) They also test fine as far as I can tell from a performance standpoint.

I'll share my findings as soon as I can get a newer version of ZFS up and running.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

No branches or pull requests

5 participants