You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
PR #2963 seems to have introduced a bug that breaks the CPU collector on illumos by trying to retrieve a named kstat statistic that does not exist; specifically cpu_nsec_wait.
See cpu.c in the illumos source code which implements these kstats, and while cpu_ticks_wait exists in there cpu_nsec_wait does not.
This results in the following error message being printed:
ts=2025-02-02T14:36:26.162Z caller=collector.go:169 level=error msg="collector failed" name=cpu duration_seconds=0.000767328 err="no such file or directory"
And in the /metrics endpoint only the node_cpu_seconds_total{cpu="0",mode="user"} value shows up (no other modes or CPU's); presumably to do with the order in which the iteration works and this one gets saved before we error out on the non-existant kstat?
Can you (...) provide some more insight in the differences between solaris versions? None of the maintainers have solaris systems at hand afaik, so its kinda harder to support and not break these things
illumos and Oracle Solaris are generally quite compatible given their common ancestry, but over the years have drifted apart a bit. For example the exact kstats implemented in both does differ a bit. We can simply query the kstat values from the command line to see which ones are available:
The straightforward solution would be to use the cpu_nsec_{idle,kernel,user,wait} kstats instead of the cpu_ticks_{idle,kernel,user,wait} kstats.
Question remains if/how to implement wait if the nsec counter for it does not exist. I see it's zero on both systems I tested; and indeed in the illumos source code we can see it's just always set to zero and it's been that way since the fork from Solaris. So perhaps the best way forward is to just remove it / hardcode it to zero.
The text was updated successfully, but these errors were encountered:
PR #2963 seems to have introduced a bug that breaks the CPU collector on illumos by trying to retrieve a named kstat statistic that does not exist; specifically
cpu_nsec_wait
.See
cpu.c
in the illumos source code which implements these kstats, and whilecpu_ticks_wait
exists in therecpu_nsec_wait
does not.This results in the following error message being printed:
And in the
/metrics
endpoint only thenode_cpu_seconds_total{cpu="0",mode="user"}
value shows up (no other modes or CPU's); presumably to do with the order in which the iteration works and this one gets saved before we error out on the non-existant kstat?@discordianfish in #2963 (comment):
illumos and Oracle Solaris are generally quite compatible given their common ancestry, but over the years have drifted apart a bit. For example the exact kstats implemented in both does differ a bit. We can simply query the kstat values from the command line to see which ones are available:
On illumos:
On Oracle Solaris:
So it appears that in neither illumos nor Oracle Solaris the
cpu_nsec_wait
kstat exists.Seems like @rexagod assumed
cpu_nsec_wait
existed based on the previous issue description:@davepacheco in #1837 (comment):
Question remains if/how to implement
wait
if thensec
counter for it does not exist. I see it's zero on both systems I tested; and indeed in the illumos source code we can see it's just always set to zero and it's been that way since the fork from Solaris. So perhaps the best way forward is to just remove it / hardcode it to zero.The text was updated successfully, but these errors were encountered: