Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CPU collector broken on illumos #3238

Open
DvdGiessen opened this issue Feb 6, 2025 · 0 comments
Open

CPU collector broken on illumos #3238

DvdGiessen opened this issue Feb 6, 2025 · 0 comments

Comments

@DvdGiessen
Copy link

DvdGiessen commented Feb 6, 2025

PR #2963 seems to have introduced a bug that breaks the CPU collector on illumos by trying to retrieve a named kstat statistic that does not exist; specifically cpu_nsec_wait.

See cpu.c in the illumos source code which implements these kstats, and while cpu_ticks_wait exists in there cpu_nsec_wait does not.

This results in the following error message being printed:

ts=2025-02-02T14:36:26.162Z caller=collector.go:169 level=error msg="collector failed" name=cpu duration_seconds=0.000767328 err="no such file or directory"

And in the /metrics endpoint only the node_cpu_seconds_total{cpu="0",mode="user"} value shows up (no other modes or CPU's); presumably to do with the order in which the iteration works and this one gets saved before we error out on the non-existant kstat?

@discordianfish in #2963 (comment):

Can you (...) provide some more insight in the differences between solaris versions? None of the maintainers have solaris systems at hand afaik, so its kinda harder to support and not break these things

illumos and Oracle Solaris are generally quite compatible given their common ancestry, but over the years have drifted apart a bit. For example the exact kstats implemented in both does differ a bit. We can simply query the kstat values from the command line to see which ones are available:

On illumos:

# uname -a && kstat -c misc -m cpu -i 0 | egrep 'cpu_(ticks|nsec)_'
SunOS wookiee 5.11 joyent_20250123T000246Z i86pc i386 i86pc
        cpu_nsec_dtrace                 0
        cpu_nsec_idle                   603566651597601
        cpu_nsec_intr                   5759459488248
        cpu_nsec_kernel                 295601044993590
        cpu_nsec_user                   44010238329188
        cpu_ticks_idle                  603566651
        cpu_ticks_kernel                295601044
        cpu_ticks_user                  44010238
        cpu_ticks_wait                  0

On Oracle Solaris:

# uname -a && kstat -c misc -m cpu -i 0 | egrep 'cpu_(ticks|nsec)_'
SunOS solaris 5.11 11.4.0.15.0 i86pc i386 i86pc
        cpu_nsec_idle                   6427553705227
        cpu_nsec_intr                   6201354153
        cpu_nsec_kernel                 19453658924
        cpu_nsec_stolen                 0
        cpu_nsec_user                   10548394436
        cpu_ticks_idle                  642755
        cpu_ticks_kernel                1945
        cpu_ticks_stolen                0
        cpu_ticks_user                  1054
        cpu_ticks_wait                  0

So it appears that in neither illumos nor Oracle Solaris the cpu_nsec_wait kstat exists.

Seems like @rexagod assumed cpu_nsec_wait existed based on the previous issue description:

@davepacheco in #1837 (comment):

The straightforward solution would be to use the cpu_nsec_{idle,kernel,user,wait} kstats instead of the cpu_ticks_{idle,kernel,user,wait} kstats.

Question remains if/how to implement wait if the nsec counter for it does not exist. I see it's zero on both systems I tested; and indeed in the illumos source code we can see it's just always set to zero and it's been that way since the fork from Solaris. So perhaps the best way forward is to just remove it / hardcode it to zero.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant