-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Timer issue #14
Comments
I went looking to check that we were updating count with the reload value when starting the timer. By mistake, I added some code to aspeed_timer_ctrl_enable appears to fix resolve the issue. Something as simple as a usleep(1) results in a cleanly booting openbmc userspace: diff --git a/hw/timer/aspeed_timer.c b/hw/timer/aspeed_timer.c
index 54b400b94aa9..60b80c7fb349 100644
--- a/hw/timer/aspeed_timer.c
+++ b/hw/timer/aspeed_timer.c
@@ -289,6 +289,7 @@ static void aspeed_timer_ctrl_enable(AspeedTimer *t, bool enable)
trace_aspeed_timer_ctrl_enable(t->id, enable);
if (enable) {
t->start = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
+ usleep(1);
aspeed_timer_mod(t);
} else {
timer_del(&t->timer); |
https://ozlabs.org/~joel/0001-aspeed_timer-Fix-behaviour-running-Linux.patch This fix is more comprehensive. We should cause the aspeed timer to expire and reload instead of creating a new qemu timer in the future (the last few lines of the diff) |
I tested the latest. Results are good but I haven't taken a close look at the problem yet. |
I'd like to see the rework of the patch along these lines |
When kexec'ing (and thus not resetting the device) I'm seeing what I believe is some memory corruption, see u-root/u-bmc#114 (comment). Note the value of EDIT: Not memory corruption, but I guess some signness and things getting negative where they where thought to be that they wouldn't. What I'm observing is the following:
This is my home-made dump of the action:
I patched my local qemu with this:
And now Linux boots :-) |
Let's move @bluecmd's discussion to a different issue. Back on the "go slow" issue; we committed a fix cb98173 that resolved the issue on x86 hosts. The same code running on ppc64le still shows the "go slow" behaviour. I believe we need to write some tests for the existing code, and perhaps re-work it to make the aspeed timer state clearer. In my mind we have the following: ENABLE: timer is enabled and is counting "down" into the future Notes:
Linux uses TIMER2 with MATCH1 = MATCH2 = ZERO, and reload is 0xFFFFFFFF as a free running timer. It uses TIMER1 with MATCH1 = MATCH2 = ZERO, and reload is set to:
(For non-aspeed variants there is an irq enabled (on overflow) for the perioidc timer. It's not clear why the aspeed version shouldn't also have an IRQ) |
Some notes on things I've tried since this went in. On all of our builder (x86) systems, we saw an approximate increase in QEMU test time by 50%. We disabled our builder (ppc64le) system due to the issues noted above. The interesting thing is that on some systems, like our Internal Red Hat, or even my laptop Mac running VirtualBox Ubuntu, we saw no differences in timing. I did a variety of things to help our builder (x86) systems. Mostly it was just updating the Ubuntu level and/or the Kernel on the systems. These changes helped and got things at least passing most of the time on our builder (x86) systems. There are still weird discrepancies, for instance builder2 consistently passes, and takes the shortest amount of time, whereas builder1 consistently takes the longest and fails about 20% of the time due to performance related issues. I had builder1 with identical software levels as builder2 but saw these discrepancies. In fact, builder1 wouldn't work at all once the new c++ mapper came in until I updated it to Ubuntu 18.04. My only guess at this point is it has to do with the hardware (CPU model?) in the different systems. https://github.com/openbmc/openbmc/wiki/OpenBMC-Infrastructure-Workgroup#current-infrastructure has details on each system. https://openpower.xyz/job/openbmc-test-qemu-ci/buildTimeTrend can be used to look at how long each builder is taking and on which one's we're getting the intermittent failure. Our builder4 (ppc64le) system was on Ubuntu 14 and a really old kernel. So I went ahead and updated it to Ubuntu 18 and 4.19.6-041906-generic kernel. With this, it can once again run QEMU, but there's a huge delay in the kernel startup still:
vs. this on builder1:
So still some optimizations to do. I tried grabbing the QEMU 3.1 rc branch but didn't see any improvements in using it vs. what we have out on openpower.xyz. |
We are hitting a bug in the aspeed qemu model: openbmc/qemu#14 Signed-off-by: Joel Stanley <[email protected]>
We are hitting a bug in the aspeed qemu model: openbmc/qemu#14 Signed-off-by: Joel Stanley <[email protected]>
We are hitting a bug in the aspeed qemu model: openbmc/qemu#14 Signed-off-by: Joel Stanley <[email protected]>
We are hitting a bug in the aspeed qemu model: openbmc/qemu#14 Signed-off-by: Joel Stanley <[email protected]>
We are hitting a bug in the aspeed qemu model: openbmc/qemu#14 Signed-off-by: Joel Stanley <[email protected]>
This issue is resolved as of 151081f. The patches are making their way upstream thanks to @legoater |
torvalds/linux@4451d3f causes the aspeed qemu machines to misbehave. The symptoms are extremely slow boot with openbmc userspace. It can be replicated with simpler userspace by typing 'dmesg' and observing the output 'stutter'.
This commit was backported to 4.18 as part of 4.18.16. Current dev-4.18 (d653b87e2a26221) contains this commit.
The text was updated successfully, but these errors were encountered: