Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid check of unity_support_test for Genio Devices (ARM Mali GPU) #1630

Open
1 task done
baconYao opened this issue Dec 2, 2024 · 10 comments
Open
1 task done

Invalid check of unity_support_test for Genio Devices (ARM Mali GPU) #1630

baconYao opened this issue Dec 2, 2024 · 10 comments
Assignees
Labels
bug Something isn't working

Comments

@baconYao
Copy link
Contributor

baconYao commented Dec 2, 2024

Bug Description

Problem

Recently, the cold/warm boot stress tests failed on Genio and ADVANTECH - RSB-3810 devices in SRU and enabledment phase.

The failed output like below and it's caused by checking unity support test

Test: Scan kernel log for errors and warnings.                              
                                                                     :   0.0% /
  Kernel log error check.                                            :   0.0% -
  Kernel log error check.                                            :  50.0% \
                                                                               
  Kernel log error check.                                      
Test: Scan kernel log for Oopses.                                           
                                                                     :  50.0% |
  Kernel log oops check.                                             :  50.0% /
  Kernel log oops check.                                             : 100.0% -
                                                                               
  Kernel log oops check.                                  2 passed
Running 2 tests, results appended to /var/tmp/checkbox-ng/sessions/session_title-2024-11-20T08.50.24.session/session-share/cold_reboot_cycle1/fwts_klog_oops.log
No errors detected
[ ERR ] unity support test returned 1
Comparing devices in (expected) /var/tmp/checkbox-ng/sessions/session_title-2024-11-20T08.50.24.session/session-share/before_reboot against (actual) /var/tmp/checkbox-ng/sessions/session_title-2024-11-20T08.50.24.session/session-share/cold_reboot_cycle1...
[ OK ] Devices match!
[ OK ] fwts checks passed!
[ OK ] Didn't find any failed system services!
These nodes ['renderD128', 'card0-eDP-1', 'card0-DSI-1', 'card0'] exist
card0-eDP-1 is connected to display!
Checking $DISPLAY=:0

Reason

There were two commits to enhance and fix the validation for cold/warm stress test

And there’s one validation is for Unity utils. This validation is also used by gl_support case. The check and output on my PC (201712-26032) is like below

$ cd /snap/checkbox22/current/usr/lib/nux
$ DISPLAY=:0 ./unity_support_test -p

OpenGL vendor string:   Intel
OpenGL renderer string: Mesa Intel(R) UHD Graphics 620 (KBL GT2)
OpenGL version string:  4.6 (Compatibility Profile) Mesa 22.0.1

Not software rendered:    yes
Not blacklisted:          yes
GLX fbconfig:             yes
GLX texture from pixmap:  yes
GL npot or rect textures: yes
GL vertex program:        yes
GL fragment program:      yes
GL vertex buffer object:  yes
GL framebuffer object:    yes
GL version is 1.4+:       yes

Unity 3D supported:       yes

However, this check point is NOT valid some devices with ARM Mali GPU (e.g., Genio G1200, G700, G510 with Desktop image).

DISPLAY=:0 ./unity_support_test -p
OpenGL vendor string:   Mesa
OpenGL renderer string: llvmpipe (LLVM 15.0.7, 128 bits)
OpenGL version string:  4.5 (Compatibility Profile) Mesa 23.2.1-1ubuntu3.1~22.04.2

Not software rendered:    no
Not blacklisted:          yes
GLX fbconfig:             yes
GLX texture from pixmap:  yes
GL npot or rect textures: yes
GL vertex program:        yes
GL fragment program:      yes
GL vertex buffer object:  yes
GL framebuffer object:    yes
GL version is 1.4+:       yes

Unity 3D supported:       no

Although the outcome shows llvmpipe not ARM Mali, it's EXPECTED result. See detail in https://bugs.launchpad.net/baoshan/+bug/2025696/comments/5

In Mesa, it is actually loading the swrast DRI driver if there are no relevant DRI driver available. swrast is a software based DRI driver in Mesa to provide support on OpenGL. So Tte test failed since OpenGL is using a software rendering driver as shown as above result, where Unity 3D failed in similar way.

And it is expected that Mali driver do not support OpenGL. So any OpenGL tools are actually querying the default software rendering support from Mesa.

Cert-blocker Test Case

  • cert-blocker

To Reproduce

  1. Install Desktop Image
  2. Install Checkbox via commands
    $ sudo snap install checkbox22 --beta
    $ sudo snap install checkbox --channel="22.04/beta" --classic
  3. Run Checkbox via command checkbox.checkbox-cli control <IP to DUT>
  4. Choose the com.canonical.certification::client-cert-desktop-22-04-stress test plan
  5. Run Cold-boot Stress Test and Warm-boot Stress Test`

Expected Result

No [ ERR ] unity support test returned 1 be raised during each loop-test iteration

Actual Result

Test: Scan kernel log for errors and warnings.                              
                                                                     :   0.0% /
  Kernel log error check.                                            :   0.0% -
  Kernel log error check.                                            :  50.0% \
                                                                               
  Kernel log error check.                                      
Test: Scan kernel log for Oopses.                                           
                                                                     :  50.0% |
  Kernel log oops check.                                             :  50.0% /
  Kernel log oops check.                                             : 100.0% -
                                                                               
  Kernel log oops check.                                  2 passed
Running 2 tests, results appended to /var/tmp/checkbox-ng/sessions/session_title-2024-11-20T08.50.24.session/session-share/cold_reboot_cycle1/fwts_klog_oops.log
No errors detected
[ ERR ] unity support test returned 1
Comparing devices in (expected) /var/tmp/checkbox-ng/sessions/session_title-2024-11-20T08.50.24.session/session-share/before_reboot against (actual) /var/tmp/checkbox-ng/sessions/session_title-2024-11-20T08.50.24.session/session-share/cold_reboot_cycle1...
[ OK ] Devices match!
[ OK ] fwts checks passed!
[ OK ] Didn't find any failed system services!
These nodes ['renderD128', 'card0-eDP-1', 'card0-DSI-1', 'card0'] exist
card0-eDP-1 is connected to display!
Checking $DISPLAY=:0

Environment

OS: Jammy Desktop Image
Checkbox Type: Snap
Hardware Testing: ARM Mali GPU

checkbox-baoshan-classic   1.2               191    latest/edge      ce-certification-qa  classic
checkbox-ce-oem            1.0-jammy         611    22.04/edge       ce-certification-qa  classic
checkbox22                 4.2.0-dev152      1328   latest/beta      ce-certification-qa  -

Relevant log output

No response

Additional context

No response

@baconYao baconYao added the bug Something isn't working label Dec 2, 2024
Copy link

Thank you for reporting us your feedback!

The internal ticket has been created: https://warthogs.atlassian.net/browse/CHECKBOX-1678.

This message was autogenerated

@tomli380576
Copy link
Contributor

tomli380576 commented Dec 3, 2024

For experiments I used MESA_LOADER_DRIVER_OVERRIDE=kms_swrast in ~/.pam_environment to force software rendering on desktop images (at least the unity_support_test thinks it's using software rendering) https://gitlab.gnome.org/GNOME/gnome-shell/-/issues/7087

I tried running the glmark2-es benchmark on one of the genio boards but it still shows llvmpipe. Are there any steps that I need to do before running the benchmark?

Edit: The snap version of glmark2-es from graphics-test-tools shows llvm, but the debian version shows MALI

Edit2: The discrepancy might have come from what the snap OpenGL connector exposes.

$ snap connections graphics-test-tools

Interface                 Plug                                 Slot                         Notes
content[graphics-core22]  graphics-test-tools:graphics-core22  mesa-core22:graphics-core22  -
opengl                    graphics-test-tools:opengl           :opengl                      -
wayland                   graphics-test-tools:wayland          :wayland                     -
x11                       graphics-test-tools:x11              :x11                         -

Looks like the snap is picking up the mesa implementation from the base core22. Since it's using mesa it's definitely going to be llvmpipe -> software rendering

Maybe we need this: https://canonical-mir.readthedocs-hosted.com/stable/how-to/how-to-enable-graphics-core22-on-a-device/ or do the steps in the readme file: https://github.com/canonical/checkbox-mir

@tomli380576 tomli380576 self-assigned this Dec 3, 2024
@tomli380576
Copy link
Contributor

tomli380576 commented Dec 3, 2024

In case anyone is interested in the history of GLX and EGL https://utcc.utoronto.ca/~cks/space/blog/linux/EGLAndGLXAndOpenGL

@tomli380576
Copy link
Contributor

tomli380576 commented Dec 4, 2024

So it looks like anything that uses GLX (the glue layer between OpenGL and X.org) will use the llvmpipe renderer. But if we run eglinfo it at least picks up the correct vendor

Wayland platform:
EGL API version: 1.5
EGL vendor string: ARM
EGL version string: 1.5 Valhall-"r48p0-01eac0"
EGL client APIs: OpenGL_ES

On my own computer which is running wayland with an intel iGPU, it picks up the following:

Wayland platform:
EGL API version: 1.5
EGL vendor string: Mesa Project
EGL version string: 1.5
EGL client APIs: OpenGL OpenGL_ES 

X11 platform:
EGL API version: 1.5
EGL vendor string: Mesa Project
EGL version string: 1.5
EGL client APIs: OpenGL OpenGL_ES 

So the ARM Mali platform has no native X support, which is expected since it only supports EGL (API binding is OpenGL_ES) and Vulkan.

unity_support_test explicitly creates an X window so if the window system is wayland, it will be run through Xwayland. This is typically not an issue on PCs since most GPUs implement the full OpenGL spec, so glx* programs like glxgears and glxinfo all work normally. But ARM Mali has no implementation for that so mesa decides to use the software implementation of OpenGL to render.

Good thing is that Wayland only uses EGL so we may be able to pick glx vs egl tests depending on if the window system is X or wayland.

To check if Xwayland is running, do pgrep Xwayland. To see if a program is using Xwayland the easiest way I found is just killing Xwayland and see if something closes.

@tomli380576
Copy link
Contributor

tomli380576 commented Dec 5, 2024

Warning

This is for Wayland only

I hacked together a bunch of EGL examples to show the correct OpenGL renderer string
https://gist.github.com/tomli380576/27a2a3f8cb513e76bd642a517b396340

  • See the top of the file for the gcc command
  • This probably has a bunch of unnecessary steps so feel free to correct me if you find any

We may be able to convert the C code into python with ctypes so we don't need to include binaries (plus it needs to be compiled for both ARM & x86, AND it needs to be compiled on the oldest Ubuntu version that uses wayland, which is 18.04 iirc, to avoid libc complaining about not having the correct version)

Warning

If you use the following method it must be run on the DUT, i.e. remotely executing this code with SSH will give you incorrect results or trash values.

Dec 16 update:

I found a much simpler way to do this without wayland headers so it works both on X11 and Wayland: https://gist.github.com/tomli380576/abfdc4caec2ab7f3de9474a54ed615a1 Note that this is basically asking EGL to find a default display and then do the OpenGL stuff so make sure to have a display connected

@tomli380576
Copy link
Contributor

tomli380576 commented Dec 13, 2024

Ok looks like we don't need to write C since we have glmark2 in checkbox (declared here). We can just run glmark2 --validate --off-screen and parse the line that starts with GL_RENDERER

=======================================================
    glmark2 2021.02
=======================================================
    OpenGL Information
    GL_VENDOR:     Intel
    GL_RENDERER:   Mesa Intel(R) UHD Graphics (ICL GT1)
    GL_VERSION:    4.6 (Compatibility Profile) Mesa 23.2.1-1ubuntu3.1~22.04.2
=======================================================
...skip the stuff after this line

And we can pretty much copy how unity_support_test determines the value for "Not Software Rendered"

if (results->renderer != NULL && (
        strncmp (results->renderer, "Software Rasterizer", 19) == 0 
        || strncmp (results->renderer, "Mesa X11", 8) == 0 
        || strstr (results->renderer, "llvmpipe") 
        || strstr (results->renderer, "on softpipe")
    )
) { /** it is software rendered */ }

python:

# renderer: str | None
if renderer is not None:
    if renderer in ("Software Rasterizer", "Mesa X11"):
        # is software rendered
    if "llvmpipe" in renderer or "on softpipe" in renderer:
        # also software rendered
else:
   # panic

@zongminl
Copy link
Collaborator

On an arm platform, the command being used to check the renderer probably will have to be replaced as glmark2-es2-wayland --validate --off-screen

@tomli380576
Copy link
Contributor

tomli380576 commented Dec 17, 2024

Speaking of this I think we can use the ES2 version everywhere since ES2 is a subset of OpenGL and the renderer string API exists in both specs. @zongminl Do you know if there are any exceptions to this? (where the ES2 renderer string says the desktop is hardware rendered but it's actually not) Thanks.

@zongminl
Copy link
Collaborator

I also gave it a try with glmark2-es2-wayland on my x86 laptop, it works well. I think yes, we can use glmark2-es2 in both specs.

@tomli380576
Copy link
Contributor

We might also want to move the 1_gl_support_* tests to this method too

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants