Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: debug black screen at desktop #8

Open
popey opened this issue May 13, 2024 · 14 comments
Open

bug: debug black screen at desktop #8

popey opened this issue May 13, 2024 · 14 comments
Labels
bug Something isn't working

Comments

@popey
Copy link
Contributor

popey commented May 13, 2024

Expected behavior

Booting 10 times in a row, the same Ubuntu 24.04 ISO image, I expect 10 out of 10 or 0 out of 0 to boot successfully.

Actual behavior

Sometimes, Qemu will boot to Grub, then fail to show the Ubuntu desktop. The desktop is running, we know this because we can hear the chimes which play after the installer has begun. However, it seems impossible to get the GUI to appear. Just a black screen

Steps to reproduce the behavior

for f in {1..100}; do QT_KEEP_SCREENSHOTS=true QT_KEEP_TESSERACT_TEXT=true QUICKEMU_OPTS=--status-quo ./quicktest test_boot_to_welcome_screen ubuntu 24.04 && sleep 10 && kill -9 ${cat machines/ubuntu-24.04/ubuntu-24.04.pid}; done

It doesn't take long to fail:

🎉 20240513-223741 20240513-223907 ubuntu 24.04  test_boot_to_welcome_screen
🎉 20240513-223917 20240513-224037 ubuntu 24.04  test_boot_to_welcome_screen
🎉 20240513-224047 20240513-224208 ubuntu 24.04  test_boot_to_welcome_screen
🎉 20240513-224218 20240513-224339 ubuntu 24.04  test_boot_to_welcome_screen
🚨 20240513-224349 20240513-224906 ubuntu 24.04  test_boot_to_welcome_screen
🚨 20240513-224906 20240513-225022 ubuntu 24.04  test_boot_to_welcome_screen

Additional context

test_boot_to_welcome_screen.zip

@popey popey added the bug Something isn't working label May 13, 2024
@popey
Copy link
Contributor Author

popey commented May 13, 2024

It looks like the VM is still left running sometimes at the end. We need a better way to brutally kill off the VM.

@ali1234
Copy link
Contributor

ali1234 commented May 18, 2024

This happened to me the first time I tried to run one of the Ubuntu tests.

I noticed that the mouse cursor is the default X11 "X" that you see when mousing over the root window. It's a fallback X session and no windows are opening?

@popey
Copy link
Contributor Author

popey commented May 19, 2024

While trying to debug #30 I managed to get a few black screens. I was able to use the qemu monitor to poke some keypresses and get the screen to draw.

I tried switching between various VT terminals, with

socat -,echo=0,icanon=0 unix-connect:./machines/ubuntu-24.04/ubuntu-24.04-monitor.socket

then:

QEMU 8.0.4 monitor - type 'help' for more information
(qemu) sendkey ctrl-alt-f1
(qemu) sendkey ctrl-alt-f2
(qemu) sendkey ctrl-alt-f1

However, nothing appeared, other than the default X11 "X" as you noticed @ali1234 .

I did manage to get back to the GUI, and the screen redrew partially - only the installer window, and then the title bar when the clock changed - so I expect some kind of xdamage screen refresh optimisation.

I sent an ALT+F4 to close the installer, upon which the display redrew completely.

I got the journal. Not sure if it's useful for debugging this or if there's other logs I need.

journal.txt

By way of comparison, a subsequent run ran fine, and here's the journal from that.
working.txt

@popey
Copy link
Contributor Author

popey commented May 20, 2024

Just ran every test I have, twice on my decently specced ThinkPad Z13. It shows failure only for the Ubuntu Daily Live (24.10).

🚨 FAIL  20240520-090114 20240520-090710 ubuntu daily-live  test_install_entire_disk_with_defaults_fde
🎉 PASS  20240520-090710 20240520-090741 alpine v3.11  test_boot_to_login
🎉 PASS  20240520-090741 20240520-090818 alpine v3.19  test_boot_to_login
🎉 PASS  20240520-090818 20240520-091004 ubuntu-mate daily-live  test_boot_to_live_environment
🎉 PASS  20240520-091004 20240520-091147 ubuntu-mate 24.04  test_boot_to_live_environment
🎉 PASS  20240520-091147 20240520-091328 ubuntu 24.04  test_boot_to_live_environment
🚨 FAIL  20240520-091328 20240520-091912 ubuntu daily-live  test_boot_to_live_environment
🎉 PASS  20240520-091912 20240520-092348 kubuntu daily-live  test_install_calamares_defaults
🎉 PASS  20240520-092348 20240520-093137 ubuntu 24.04  test_install_entire_disk_with_defaults
🎉 PASS  20240520-093137 20240520-093303 ubuntu 24.04  test_post_install_clean_first_run
🚨 FAIL  20240520-093303 20240520-093904 ubuntu daily-live  test_install_entire_disk_with_defaults
🎉 PASS  20240520-093904 20240520-094920 ubuntu-mate 24.04  test_install_entire_disk_with_defaults
🎉 PASS  20240520-094921 20240520-095910 ubuntu-mate daily-live  test_install_entire_disk_with_defaults

In all three cases, it's booting to a black screen. Just some extra data points.

@popey
Copy link
Contributor Author

popey commented May 20, 2024

Minor update. I added some debugging code that detects a black screen multiple screenshots in a row, and triggers a 'remedy' routine. In that I tried CTRL+ALT+F1 to switch to the first TTY via the Qemu monitor. On one occasion this caused the login screen to appear - as one would expect, but switching back via CTRL+ALT+F2 didn't reload the desktop.

Something is certainly wrong here, but I'm not sure if it's inside Qemu or in GNOME/Mutter/Wayland/kernel.

@ali1234
Copy link
Contributor

ali1234 commented May 20, 2024

Probably a combination of both and also the viewer in use (sdl, spice etc), similar to quickemu-project/quickemu#454

@popey
Copy link
Contributor Author

popey commented May 20, 2024

We can likely eliminate the viewer. I have been running the tests recently with the display set to none, so no viewer is involved. But that still leaves all of Qemu and the hardware emulation, and all of the Ubuntu stack. I mostly see the black screens only on Ubuntu 24.04 and daily-live though. I will more aggressively test the others in a loop over the coming day or so, to see if I can trigger it elsewhere.

This is my silly 'run it till it fails' script.

#!/bin/bash
while true; do
        export QT_OPEN_RESULTS=false
        export QUICKEMU_DISPLAY=none
        export QT_KEEP_SCREENSHOTS=true
        export QT_KEEP_TESSERACT_TEXT=true
        if ! ./quicktest test_install_entire_disk_with_defaults ubuntu 24.04; then
                exit 99
        fi
        rm machines/ubuntu-24.04/disk.qcow2
done

@popey
Copy link
Contributor Author

popey commented May 21, 2024

Bah!

I don't understand why it works fine and then suddenly doesn't, with no changes to anything.

🎉 PASS  20240521-005151 20240521-010054 ubuntu 24.04  test_install_entire_disk_with_defaults
🎉 PASS  20240521-010054 20240521-010855 ubuntu 24.04  test_install_entire_disk_with_defaults
🎉 PASS  20240521-010855 20240521-011656 ubuntu 24.04  test_install_entire_disk_with_defaults
🎉 PASS  20240521-011656 20240521-012758 ubuntu 24.04  test_install_entire_disk_with_defaults
🎉 PASS  20240521-012759 20240521-013631 ubuntu 24.04  test_install_entire_disk_with_defaults
🎉 PASS  20240521-013632 20240521-014514 ubuntu 24.04  test_install_entire_disk_with_defaults
🚨 FAIL  20240521-014514 20240521-015048 ubuntu 24.04  test_install_entire_disk_with_defaults

Feels like some kind of race when the system boots. Next step is to try with no network connected.

@ali1234
Copy link
Contributor

ali1234 commented May 21, 2024

This would be a lot easier to debug if quickemu could expose an OOB debug shell somewhere, so we can get into the system to see what is happening without having to go through the (broken) UI.

@popey
Copy link
Contributor Author

popey commented May 21, 2024

I just caught it being black-screen for a while, and added a 'bodge' to jump into the monitor and press ctrl-alt-f1 then enter (which presses 'login') on the gdm3 style 'lock' screen in the live session. This did something, as it went back to the desktop and did that xdamage thing I mentioned.

screenshot_0010_test_installer_initial_load

Note in this image only portions of the screen are updated. This was all done with display set to 'none' so no spice viewer.

What does this? Is it mutter perhaps? This suggests it might be.

Maybe this is only confined to Ubuntu proper with GNOME 3?

Anyway, it continued through the install (and failed for another reason I need to debug).

On the plus side this rules out network too, because this black screen happened when there was no network.

@ali1234 I think there is a serial terminal, but not sure how much use it is.

I ran socat -,echo=0,icanon=0 unix-connect:ubuntu-24.04/ubuntu-24.04-serial.socket while it was running and got no output. I doubt the stock Ubuntu ISO has serial console enabled by default?

quickemu does enable a fair amount of connectivity like copy/paste between host and guest, ssh, webdav and all that. But none of those things are enabled by default in the live environment. My policy of not changing the iso holds, because I don't want to affect the tests (either way).

But you're right, we do need some better way to debug this.

One option would be to pause the entire test suite as soon as a succession of black screenshots are taken. So we can then maybe inject ctrl-alt-t and sudo apt update && sudo apt install openssh-server to do some debugging. I think that would be sufficient?

I currently test for that in a local branch where qt_screenshot_ppm is doing a lot more debugging . I'll publish that to a new branch in the morning to play with.

@ali1234
Copy link
Contributor

ali1234 commented May 21, 2024

You have to enable the serial console at the grub menu. Let me see if I can script that up...

@ali1234
Copy link
Contributor

ali1234 commented May 21, 2024

Okay I got the console working, but only manually. Scripting it does not work for some reason: #38

In order for the kernel to output anything you also have to delete "quiet" from the command line. Ubuntu will then log stuff to the serial port and you can interact with socat.

Unfortunately Ubuntu does not start a getty on the console at the end of boot, so you can only read the kernel log. It might still be useful. Alpine does start a getty on the tty, but it only waits for 1 second at the grub menu, which is difficult to catch manually and impossible with the current scripting facilities. We need a way to hold down the shift key to stop the timeout...

@ali1234
Copy link
Contributor

ali1234 commented May 21, 2024

I also noticed that the Ubuntu live image is pointlessly spamming:

error: cannot find current revision for snap subiquity: readlink /snap/subiquity/current: no such file or directory

a few times per second to the console. I doubt this is related to the black screen since it does it even when the desktop does load up. Kind of funny/annoying though because it fills your scrollback very quickly.

@popey
Copy link
Contributor Author

popey commented May 21, 2024

Meanwhile, the kubuntu calamares install test I left running in a loop last night ran a lot more times before failing, and didn't fail for the same reason (it wasn't a blank screen). Making me think this is very much a GNOME stack issue.

🎉 PASS  20240521-031447 20240521-031925 kubuntu daily-live  test_install_calamares_defaults
🎉 PASS  20240521-031925 20240521-032442 kubuntu daily-live  test_install_calamares_defaults
🎉 PASS  20240521-032443 20240521-032919 kubuntu daily-live  test_install_calamares_defaults
🎉 PASS  20240521-032920 20240521-033430 kubuntu daily-live  test_install_calamares_defaults
🎉 PASS  20240521-033430 20240521-033859 kubuntu daily-live  test_install_calamares_defaults
🎉 PASS  20240521-033900 20240521-034409 kubuntu daily-live  test_install_calamares_defaults
🎉 PASS  20240521-034410 20240521-034904 kubuntu daily-live  test_install_calamares_defaults
🎉 PASS  20240521-034905 20240521-035351 kubuntu daily-live  test_install_calamares_defaults
🎉 PASS  20240521-035352 20240521-035811 kubuntu daily-live  test_install_calamares_defaults
🎉 PASS  20240521-035812 20240521-040309 kubuntu daily-live  test_install_calamares_defaults
🎉 PASS  20240521-040309 20240521-040836 kubuntu daily-live  test_install_calamares_defaults
🎉 PASS  20240521-040837 20240521-041346 kubuntu daily-live  test_install_calamares_defaults
🎉 PASS  20240521-041347 20240521-041822 kubuntu daily-live  test_install_calamares_defaults
🎉 PASS  20240521-041823 20240521-042257 kubuntu daily-live  test_install_calamares_defaults
🎉 PASS  20240521-042258 20240521-042823 kubuntu daily-live  test_install_calamares_defaults
🎉 PASS  20240521-042824 20240521-043300 kubuntu daily-live  test_install_calamares_defaults
🎉 PASS  20240521-043301 20240521-043718 kubuntu daily-live  test_install_calamares_defaults
🎉 PASS  20240521-043718 20240521-044151 kubuntu daily-live  test_install_calamares_defaults
🎉 PASS  20240521-044151 20240521-044619 kubuntu daily-live  test_install_calamares_defaults
🎉 PASS  20240521-044619 20240521-045032 kubuntu daily-live  test_install_calamares_defaults
🎉 PASS  20240521-045033 20240521-045443 kubuntu daily-live  test_install_calamares_defaults
🎉 PASS  20240521-045444 20240521-045852 kubuntu daily-live  test_install_calamares_defaults
🚨 FAIL  20240521-045853 20240521-050154 kubuntu daily-live  test_install_calamares_defaults

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants