Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aspeed QEMU locks up #114

Closed
bluecmd opened this issue Nov 11, 2018 · 7 comments
Closed

Aspeed QEMU locks up #114

bluecmd opened this issue Nov 11, 2018 · 7 comments
Labels
bug Something isn't working

Comments

@bluecmd
Copy link
Contributor

bluecmd commented Nov 11, 2018

While debugging a possibly timer releated freeze with kexec, I rebuilt qemu with debugging enabled. This made the whole emulator freeze in the bootloader at this stage:

     1	qemu-system-arm \
     2		-m 128 \
     3		-M palmetto-bmc \
     4		-nographic \
     5		-drive file=flash.sim.img,format=raw,if=mtd \
     6		-d guest_errors,unimp
     7	
     8	DRAM Init-DDR3
     9	CBR0-1

Looking at gdb, it seems like there is somebody holding the global CPU lock, which after a bit of digging seems to be the calculate_next function in aspeed_timer. I added this trace output:

diff --git a/hw/timer/aspeed_timer.c b/hw/timer/aspeed_timer.c
index 5e3f51b..479ab35 100644
--- a/hw/timer/aspeed_timer.c
+++ b/hw/timer/aspeed_timer.c
@@ -113,6 +113,7 @@ static uint64_t calculate_next(struct AspeedTimer *t)
     uint64_t next = 0;
     uint32_t rate = calculate_rate(t);
 
+    fprintf(stderr, "!! Entered calculate_next\n");
     while (!next) {
         /* We don't know the relationship between the values in the match
          * registers, so sort using MAX/MIN/zero. We sort in that order as the
@@ -139,6 +140,7 @@ static uint64_t calculate_next(struct AspeedTimer *t)
             break;
         }
     }
+    fprintf(stderr, "!! Exited calculate_next\n");
 
     return next;
 }

And the output is:

[...]
!! Entered calculate_next
!! Exited calculate_next
!! Entered calculate_next
!! Exited calculate_next
aspeed_vic_write: Ignoring write to legacy registers at 0x38[4] <- 0x40000
aspeed_vic_write: Ignoring write to legacy registers at 0x38[4] <- 0x40000
!! Entered calculate_next
!! Exited calculate_next
!! Entered calculate_next
!! Exited calculate_next
aspeed_vic_write: Ignoring write to legacy registers at 0x38[4] <- 0x40000
aspeed_vic_write: Ignoring write to legacy registers at 0x38[4] <- 0x40000
!! Entered calculate_next

And then it hangs.

@bluecmd bluecmd added the bug Something isn't working label Nov 11, 2018
@bluecmd
Copy link
Contributor Author

bluecmd commented Nov 11, 2018

openbmc/qemu#14 did some changes to qemu and aspeed timer.
Let's see if this is solved by pulling the latest qemu.

EDIT: That particular issue is solved by the latest qemu

@bluecmd
Copy link
Contributor Author

bluecmd commented Nov 11, 2018

Next issue is the kexec one:

ifdown: socket: Function not implemented
[    5.146822] kexec_core: Starting new kernel
[    5.147671] Bye!
[    0.000000] Booting Linux on physical CPU 0x0
[    0.000000] Linux version 4.18.16-ubmc (bluecmd@bluecmd) (gcc version 6.3.1 20170620 (15:6.3.1+svn253039-1)) #3 Sun Nov 11 11:06:29 CET 2018
[    0.000000] CPU: ARM926EJ-S [41069265] revision 5 (ARMv5TEJ), cr=00093177
[    0.000000] CPU: VIVT data cache, VIVT instruction cache
[    0.000000] OF: fdt: Machine model: Quanta F06 Leopard DDR3 BMC
[    0.000000] bootconsole [earlycon0] enabled
[    0.000000] Memory policy: Data cache writeback
[    0.000000] random: get_random_bytes called from start_kernel+0x8c/0x4e4 with crng_init=0
[    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 28448
[    0.000000] Kernel command line: earlyprintk=1 ubi.mtd=ubi dyndbg="file net/ncsi/* +p" console=ttyS4,57600n8
[    0.000000] Dentry cache hash table entries: 16384 (order: 4, 65536 bytes)
[    0.000000] Inode-cache hash table entries: 8192 (order: 3, 32768 bytes)
[    0.000000] Memory: 103800K/114688K available (5120K kernel code, 369K rwdata, 1284K rodata, 1024K init, 144K bss, 10888K reserved, 0K cma-reserved)
[    0.000000] Virtual kernel memory layout:
[    0.000000]     vector  : 0xffff0000 - 0xffff1000   (   4 kB)
[    0.000000]     fixmap  : 0xffc00000 - 0xfff00000   (3072 kB)
[    0.000000]     vmalloc : 0x87800000 - 0xff800000   (1920 MB)
[    0.000000]     lowmem  : 0x80000000 - 0x87000000   ( 112 MB)
[    0.000000]     modules : 0x7f000000 - 0x80000000   (  16 MB)
[    0.000000]       .text : 0x(ptrval) - 0x(ptrval)   (6112 kB)
[    0.000000]       .init : 0x(ptrval) - 0x(ptrval)   (1024 kB)
[    0.000000]       .data : 0x(ptrval) - 0x(ptrval)   ( 370 kB)
[    0.000000]        .bss : 0x(ptrval) - 0x(ptrval)   ( 145 kB)
[    0.000000] ftrace: allocating 18510 entries in 55 pages
[    0.000000] NR_IRQS: 16, nr_irqs: 16, preallocated irqs: 16
aspeed_vic_write: Software interrupts unavailable. IRQs to be cleared: 0x00000000ffffffff
aspeed_vic_write: Software interrupts unavailable. IRQs to be cleared: 0x0007ffff00000000
[    0.000000] i2c controller registered, irq 16
aspeed_timer_write: Bad offset 0x38

Then nothing

QEMU shows:

(gdb) bt
#0  timer_del_locked (timer_list=0x55a00d3eba80, ts=0x7f7df64fe950)
    at /home/bluecmd/Downloads/qemu/util/qemu-timer.c:378
#1  0x000055a00b815a0f in timer_mod_ns (ts=0x7f7df64fe950, expire_time=-6148914676520386312)
    at /home/bluecmd/Downloads/qemu/util/qemu-timer.c:432
#2  0x000055a00b815b59 in timer_mod (ts=0x7f7df64fe950, expire_time=-6148914676520386312)
    at /home/bluecmd/Downloads/qemu/util/qemu-timer.c:467
#3  0x000055a00b60f3e8 in aspeed_timer_mod (t=0x7f7df64fe940)
    at /home/bluecmd/Downloads/qemu/hw/timer/aspeed_timer.c:141
#4  0x000055a00b60f4f4 in aspeed_timer_expire (opaque=0x7f7df64fe940)
    at /home/bluecmd/Downloads/qemu/hw/timer/aspeed_timer.c:170
#5  0x000055a00b815dca in timerlist_run_timers (timer_list=0x55a00d3eba80)
    at /home/bluecmd/Downloads/qemu/util/qemu-timer.c:574
#6  0x000055a00b815e74 in qemu_clock_run_timers (type=QEMU_CLOCK_VIRTUAL)
    at /home/bluecmd/Downloads/qemu/util/qemu-timer.c:588
#7  0x000055a00b816265 in qemu_clock_run_all_timers ()
    at /home/bluecmd/Downloads/qemu/util/qemu-timer.c:708
#8  0x000055a00b816981 in main_loop_wait (nonblocking=0)
    at /home/bluecmd/Downloads/qemu/util/main-loop.c:503
#9  0x000055a00b41db1c in main_loop () at /home/bluecmd/Downloads/qemu/vl.c:1884
#10 0x000055a00b4253a3 in main (argc=10, argv=0x7ffe0a111168, envp=0x7ffe0a1111c0)
    at /home/bluecmd/Downloads/qemu/vl.c:4618
(gdb) info threads
  Id   Target Id         Frame 
* 1    Thread 0x7f7df658e300 (LWP 83035) "qemu-system-arm" timer_del_locked (timer_list=0x55a00d3eba80, 
    ts=0x7f7df64fe950) at /home/bluecmd/Downloads/qemu/util/qemu-timer.c:378
  2    Thread 0x7f7deea0c700 (LWP 83036) "qemu-system-arm" syscall ()
    at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
  3    Thread 0x7f7dee20b700 (LWP 83037) "qemu-system-arm" 0x00007f7df3d1f701 in __GI_ppoll (
    fds=0x7f7de80008c0, nfds=1, timeout=<optimized out>, sigmask=0x0)
    at ../sysdeps/unix/sysv/linux/ppoll.c:39
  4    Thread 0x7f7deda0a700 (LWP 83038) "qemu-system-arm" __lll_lock_wait ()
    at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135

@bluecmd
Copy link
Contributor Author

bluecmd commented Nov 11, 2018

The negative expire_time does not bode well.

Printing the expiry times shows this:

[... stream of ever increasing positive expires ...]
     10 aspeed_vic_write: Software interrupts unavailable. IRQs to be cleared: 0x0007ffff00000000
      9 !! expire: 15082174089                                                           
      8 !! expire: 15091799892                                                           
      7 !! expire: 15091809892                                                           
      6 !! expire: 15101431809                                                           
      5 !! expire: 15101441809                                                           
      4 !! expire: 15111084769                                                           
      3 !! expire: 15111094769                                                           
      2 !! expire: 15120726519                                                           
      1 aspeed_timer_write: Bad offset 0x38                                              
15496   !! expire: -6148914676121914022                                                  
      1 !! expire: -6148914676121914022                                                  
      2 !! expire: -6148914676121914022       

then inf loop and hang

Interesting thing is that aspeed_timer_mod's next is an uint64 and timer_mod takes int64

@bluecmd
Copy link
Contributor Author

bluecmd commented Nov 11, 2018

Applying this super-naive patch makes the kernel continue, but everything is quite broken and it hangs later on.


diff --git a/hw/timer/aspeed_timer.c b/hw/timer/aspeed_timer.c
index 9acd1de485..693e0b1949 100644
--- a/hw/timer/aspeed_timer.c
+++ b/hw/timer/aspeed_timer.c
@@ -136,8 +136,8 @@ static uint64_t calculate_next(struct AspeedTimer *t)
 
 static void aspeed_timer_mod(AspeedTimer *t)
 {
-    uint64_t next = calculate_next(t);
-    if (next) {
+    int64_t next = (int64_t)calculate_next(t);
+    if (next > 0) {
         timer_mod(&t->timer, next);
     }
 }
[    0.000000] clocksource: FTTMR010-TIMER2: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 79635851949 ns
[    0.000000] sched_clock: 32 bits at 24MHz, resolution 41ns, wraps every 89478484971ns
[    0.000000] Switching to timer-based delay loop, resolution 41ns
[..]
[    0.000000] console [ttyS4] enabled
[    0.000000] bootconsole [earlycon0] disabled
[    0.000000] bootconsole [earlycon0] disabled
[    0.000000] 1e78e000.serial: ttyS2 at MMIO 0x1e78e000 (irq = 31, base_baud = 1500000) is a 16550A
[    0.000000] random: fast init done
[    0.000000] random: crng init done

@bluecmd
Copy link
Contributor Author

bluecmd commented Nov 11, 2018

Digging some more with:

diff --git a/hw/timer/aspeed_timer.c b/hw/timer/aspeed_timer.c
index 9acd1de485..8fc492cbda 100644
--- a/hw/timer/aspeed_timer.c
+++ b/hw/timer/aspeed_timer.c
@@ -117,26 +117,30 @@ static uint64_t calculate_next(struct AspeedTimer *t)
      * timer counts down to zero. */
 
     next = calculate_time(t, MAX(t->match[0], t->match[1]));
+    fprintf(stderr, "!!1 next = %lx\n", next);
     if (now < next)
         return next;
 
     next = calculate_time(t, MIN(t->match[0], t->match[1]));
+    fprintf(stderr, "!!2 next = %lx\n", next);
     if (now < next)
         return next;
 
     next = calculate_time(t, 0);
+    fprintf(stderr, "!!3 next = %lx\n", next);
     if (now < next)
         return next;
 
     /* We've missed all of the deadlines. Set a timer in the future to let
      * execution catch up */
     t->start = now;
+    fprintf(stderr, "!!4 next = %lx\n", next);
     return next + 10000;
 }
 
 static void aspeed_timer_mod(AspeedTimer *t)
 {
-    uint64_t next = calculate_next(t);
+    uint64_t next = (int64_t)calculate_next(t);
     if (next) {
         timer_mod(&t->timer, next);
     }
@@ -223,6 +227,7 @@ static uint64_t aspeed_timer_read(void *opaque, hwaddr offset, unsigned size)
         break;
     }
     trace_aspeed_timer_read(offset, size, value);
+    fprintf(stderr, "!> timer read 0x%lx = %lx (size: %d)\n", offset, value, size);
     return value;
 }
 
@@ -394,6 +399,8 @@ static void aspeed_timer_write(void *opaque, hwaddr offset, uint64_t value,
     const int reg = (offset & 0xf) / 4;
     AspeedTimerCtrlState *s = opaque;
 
+    fprintf(stderr, "!! timer write 0x%lx = %lx (size: %d)\n", offset, value, size);
+
     switch (offset) {
     /* Control Registers */
     case 0x30:

     9 !!2 next = 3d5ba92c6                                                             
     8 !!3 next = 3d5ba92c6                                                             
     7 !!4 next = 3d5ba92c6                                                             
     6 !!1 next = 3d64e49d9                                                             
     5 !! timer write 0x38 = 1ff (size: 4)                                              
     4 aspeed_timer_write: Bad offset 0x38                                              
     3 !! timer write 0x34 = 0 (size: 4)                                                
     2 !! timer write 0x30 = 10 (size: 4)                                               
     1 !! timer write 0x10 = 0 (size: 4)                                                
     0 !!1 next = aaaaaaae8074285b                                                      
     1 !! timer write 0x18 = 0 (size: 4)                                                
     2 !!1 next = aaaaaaae8074285b                                                      
     3 !! timer write 0x1c = 0 (size: 4)                                                
     4 !!1 next = aaaaaaae8074285b                                                      
     5 !!1 next = aaaaaaae8074285b       

@bluecmd
Copy link
Contributor Author

bluecmd commented Nov 11, 2018

diff --git a/hw/timer/aspeed_timer.c b/hw/timer/aspeed_timer.c
index 9acd1de485..1a54d85e9d 100644
--- a/hw/timer/aspeed_timer.c
+++ b/hw/timer/aspeed_timer.c
@@ -253,7 +253,7 @@ static void aspeed_timer_set_value(AspeedTimerCtrlState *s, int timer, int reg,
             int64_t delta = (int64_t) value - (int64_t) calculate_ticks(t, now);
             uint32_t rate = calculate_rate(t);
 
-            t->start += muldiv64(delta, NANOSECONDS_PER_SECOND, rate);
+            t->start = (int64_t)t->start + ((__int128_t)delta * NANOSECONDS_PER_SECOND / rate);
             aspeed_timer_mod(t);
         }
         break;

@bluecmd
Copy link
Contributor Author

bluecmd commented Feb 4, 2019

Patches have been merged upstream, openbmc/qemu@5ff119d

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant