Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vmtests: address RCU stalls #317

Merged
merged 3 commits into from
Aug 12, 2022
Merged

vmtests: address RCU stalls #317

merged 3 commits into from
Aug 12, 2022

Conversation

kkourt
Copy link
Contributor

@kkourt kkourt commented Aug 12, 2022

see commits

kkourt added 2 commits August 12, 2022 08:46
--just-boot does not work anymore because the service is executed even
if it was not enabled. Not sure why. As a simple solution, don't add the
service if user specifies --just-boot.

Signed-off-by: Kornilios Kourtis <[email protected]>
This reverts commit dd396c6, which was
setting /proc/sys/kernel/panic_on_rcu_stall on the host system rather
than the VM and was thus innefective.

Instead, add an entry to /etc/sysctl.d/local.conf that properly sets
this inside the VM.

Signed-off-by: Kornilios Kourtis <[email protected]>
@kkourt kkourt requested review from willfindlay and a team as code owners August 12, 2022 06:49
@kkourt kkourt marked this pull request as draft August 12, 2022 06:49
We 've been seeing RCU stalls such as, when running qemu in GH:

Running test pkg.sensors.test.TestSensorLseekLoad .[  116.892213] rcu: INFO: rcu_sched self-detected stall on CPU
[  116.892213] rcu: 	0-...!: (20987 ticks this GP) idle=d3e/1/0x4000000000000002 softirq=23120/23120 fqs=0
[  116.892213] 	(t=21004 jiffies g=49257 q=8)
[  116.892213] rcu: rcu_sched kthread starved for 21004 jiffies! g49257 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0
[  116.892213] rcu: RCU grace-period kthread stack dump:
[  116.892213] rcu_sched       R  running task    14920    11      2 0x90004000
[  116.892213] Call Trace:
[  116.892213]  __schedule+0x288/0x600
[  116.892213]  ? __mod_timer+0x1a6/0x3c0
[  116.892213]  schedule+0x34/0xa0
[  116.892213]  schedule_timeout+0x84/0x140
[  116.892213]  ? __next_timer_interrupt+0xc0/0xc0
[  116.892213]  rcu_gp_kthread+0x4f6/0xd40
[  116.892213]  ? kfree_call_rcu+0x10/0x10
[  116.892213]  kthread+0x107/0x120
[  116.892213]  ? __kthread_bind_mask+0x60/0x60
[  116.892213]  ret_from_fork+0x35/0x40
[  116.892213] NMI backtrace for cpu 0
[  116.892213] CPU: 0 PID: 413 Comm: pkg.sensors.tes Not tainted 5.4.209 #1
[  116.892213] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
[  116.892213] Call Trace:
[  116.892213]  <IRQ>
[  116.892213]  dump_stack+0x50/0x63
[  116.892213]  nmi_cpu_backtrace.cold+0x13/0x50
[  116.892213]  ? lapic_can_unplug_cpu+0x60/0x60
[  116.892213]  nmi_trigger_cpumask_backtrace+0x7c/0x90
[  116.892213]  rcu_dump_cpu_stacks+0x7c/0xaa
[  116.892213]  rcu_sched_clock_irq.cold+0x1b3/0x39e
[  116.892213]  ? can_stop_idle_tick+0x70/0x70
[  116.892213]  update_process_times+0x56/0x90
[  116.892213]  tick_sched_handle+0x2f/0x40
[  116.892213]  tick_sched_timer+0x4b/0xb0
[  116.892213]  __hrtimer_run_queues+0x127/0x2a0
[  116.892213]  hrtimer_interrupt+0xf0/0x280
[  116.892213]  smp_apic_timer_interrupt+0x5d/0x120
[  116.892213]  apic_timer_interrupt+0xf/0x20
[  116.892213]  </IRQ>
... repeted until timeout ...

From reading https://www.kernel.org/doc/Documentation/RCU/stallwarn.txt,
one of my theories is that writes to the console get delayed and the
kernel enters some weird livelock state. This patch buffers qemu output
aiming to avoid hitting RCU  stalls such as the one above.

Signed-off-by: Kornilios Kourtis <[email protected]>
@kkourt kkourt force-pushed the pr/kkourt/vmtests-improvements branch from 90ada0f to 8efe9bf Compare August 12, 2022 06:51
@kkourt kkourt marked this pull request as ready for review August 12, 2022 11:44
@kkourt kkourt merged commit 9cfbd21 into main Aug 12, 2022
@kkourt kkourt deleted the pr/kkourt/vmtests-improvements branch August 12, 2022 14:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants