feat(aix): host metrics - system calls, interrupts, context switches, and file descriptor limits - for OTel Compatibility#1969
Conversation
|
Missing |
1719c10 to
90884fa
Compare
90884fa to
705d3c8
Compare
- Fix binary.Read to use lwpStatFile/lwpInfoFile for thread-level structs (tid > -1) - Add bounds checks to splitProcStat to prevent panic on malformed input
- Correct AIXPSInfo struct layout and field offsets - Prioritize Fname over address space for process names - Re-enable Psargs for command line extraction
- Map 0x05 to Idle (SIDL) - Map 0x06 to Wait (SWAIT) - Map 0x07 to Running (SORPHAN) - Return UnknownState for unrecognized codes
- Handle transient socket cleanup errors gracefully - Set correct socket Type and Family fields - Remove debug output
- Trim whitespace from ps output lines and skip headers
- Map AIX states correctly: A+I -> ProcsRunning, W+T+Z -> ProcsBlocked
- Use common.Invoke{} for command execution
- Tested on AIX: 44 processes counted correctly
- Use Berkeley-style ps for environment (ps eww <PID>) - Return NotImplementedError for CPU Affinity and Context Switches - Remove unused parseCPUList helper
- Collect all metric errors using errors.Join() - Return partial data with stacked errors - Caller gets available info plus notification of failures
d03c8c6 to
ab22978
Compare
|
Sorry for all the linter push chaos. For some reason my local linting and the CI linting were disagreeing there for awhile on the proper formats. |
|
Sorry to bother you. This project has a somewhat strict linting policy. Since this PR is still in draft and I haven’t reviewed it yet, please feel free to squash your commits if that makes things easier to follow. |
Add SignalsPending() API and platform implementations; context_switches confirmed unimplementable on AIX
72b20b4 to
3cba5be
Compare
Done, and good idea. :) It will remain a draft until the first prerequisite checklist item is complete and I can handle the changes that will require. I can move as fast on all of this as needed to get it done quickly; pending your availability. |
|
Now I have merged #1967. I haven't looked at this PR yet, but it seems too big to review. Could you split into some of PRs to easy review. And please do not add a new function like |
Prerequisites
Description
This PR implements comprehensive AIX metrics collection aligned with OpenTelemetry host metrics specification, achieving 99% coverage (103/104 metrics) of the OpenTelemetry hostmetricsreceiver standard.
System Metrics Implementation
vmstat-based Metrics
vmstatsy columnvmstatic columnload.Misc().Ctxtfield fromvmstatcs columnSystemCalls(),SystemCallsWithContext(),Interrupts(),InterruptsWithContext()File Descriptor Limits
ulimit -Sandulimit -HcommandsProcess Metrics Implementation
New Process Metrics
CPUPercentWithContext()(uses ps-based CPU calculation)/proc/<pid>/psinfobinary structurepr_sigpendfield from AIX psinfoAnalysis Findings
nvcsw/vcswfield specifiersErrNotImplementedErrorwith documentationArchitecture: Injectable Invoker Pattern
testInvokervariable andgetInvoker()helper inloadandhostmodules*_aix_test.go,//go:build aix): Execute actual AIX commands*_mock_test.go, no tag): Run on any OS with mocked outputNew Public Functions
load module:
SystemCalls() (int, error)- Total syscalls since bootSystemCallsWithContext(ctx) (int, error)- Context-aware variantInterrupts() (int, error)- Total interrupts since bootInterruptsWithContext(ctx) (int, error)- Context-aware varianthost module:
FDLimits() (soft, hard uint64, error)- File descriptor limitsFDLimitsWithContext(ctx) (soft, hard uint64, error)- Context-aware variantprocess module:
SignalsPending() (SignalInfoStat, error)- Pending signal maskSignalsPendingWithContext(ctx) (SignalInfoStat, error)- Context-aware variantnfs package:
Test Coverage
AIX-specific tests (build-tagged, run on AIX 7.3):
Mock-based tests (cross-platform, no special build tag):
Test File Organization:
process_test.go: Added//go:build !aixtag to prevent generic test failures on AIX (AIX has different ps syntax requirements)Implementation Details
System Metrics Parsing:
vmstat 1 1execution yields all three metricsparseVmstatLine(),getVmstatMetrics()FD Limits Special Cases:
(1<<63 - 1)(max int64 as uint64)Process Metrics Details:
/proc/<pid>/psinfoCoverage Achievement
OpenTelemetry Metric Support:
process.disk.operations(not available at process level on any tested OS)process.handles(Windows-only metric)Files Modified/Created
Modified:
load/load_aix_nocgo.go- Add injectable invoker, system metrics functionsload/load_aix.go- Public wrapper functionshost/host_aix.go- Add injectable invoker, FD limits functionprocess/process.go- Add SignalsPending public wrapperprocess/process_aix.go- Add SignalsPendingWithContext, confirm context_switches unimplementableprocess/process_test.go- Add//go:build !aixtagprocess/process_linux.go- Add SignalsPendingWithContext implementationprocess/process_windows.go- Add SignalsPendingWithContext stubprocess/process_freebsd.go- Add SignalsPendingWithContext stubprocess/process_solaris.go- Add SignalsPendingWithContext stubprocess/process_fallback.go- Add SignalsPendingWithContext stubinternal/common/common_aix.go- ParseUptime bounds fixNew Test Files:
load/load_aix_test.go- Real AIX testsload/load_aix_test_mock.go- MockInvoker for load metricsload/load_mock_test.go- Cross-platform mock testshost/host_aix_test.go- Real AIX testshost/host_aix_test_mock.go- MockInvoker for host metricshost/host_mock_test.go- Cross-platform mock testsprocess/process_aix_test.go- Process metric tests for AIXNew Files:
nfs/nfs_aix.go- AIX NFS metrics implementationTesting Results
✅ AIX 7.3 System Tests
✅ Cross-Platform Mock Tests
Backward Compatibility
✅ All existing functions and APIs unchanged
✅ New functions are purely additive
✅ No breaking changes to public interfaces
✅ Existing load, host, and process metrics continue working
OpenTelemetry Alignment
This implementation follows the OpenTelemetry Host Metrics specification and process metrics specification for:
These metrics enable comprehensive host and process-level observability in OpenTelemetry-instrumented applications running on AIX systems.
References