A convenient command line utility to log system and process metrics.
$ multimonitor --utc_nice --gpu=min --process valley_x64 --process Xorg
# Waiting for process valley_x64
# Waiting for process valley_x64
# Waiting for process valley_x64
# For process name valley_x64 found pids: [1996445]
# For process name Xorg found pids: [2066]
# ticks_per_second: 100
# With interval 200 ms and 100 ticks/s, expect CPU% error of ± 5.0%
# Xorg
# valley_x64 |
# | |
# 1996445 2066
DATETIME-UTC TIME RELTIME GPU% VRAM SCLK CPU% RSS CPU% RSS
2020-12-31T22:28:44.688313 709668.550807 0.200063 0.0% 309.6MiB 386.7MHz 0.00% 77MiB 5.00% 407MiB
2020-12-31T22:28:44.888319 709668.750763 0.400018 0.0% 309.6MiB 386.7MHz 0.00% 77MiB 0.00% 407MiB
2020-12-31T22:28:45.088325 709668.950799 0.600054 0.0% 311.6MiB 326.5MHz 15.00% 82MiB 10.00% 407MiB
2020-12-31T22:28:45.288331 709669.150770 0.800025 0.0% 311.6MiB 326.5MHz 0.00% 82MiB 0.00% 407MiB
2020-12-31T22:28:45.488337 709669.350800 1.000056 0.0% 311.6MiB 326.5MHz 0.00% 82MiB 10.00% 407MiB
2020-12-31T22:28:45.688343 709669.550775 1.200030 0.0% 311.6MiB 326.5MHz 0.00% 82MiB 5.00% 407MiB
2020-12-31T22:28:45.888350 709669.750795 1.400050 0.0% 311.6MiB 326.5MHz 0.00% 82MiB 5.00% 407MiB
2020-12-31T22:28:46.088356 709669.950932 1.600187 0.0% 311.6MiB 326.5MHz 0.00% 82MiB 0.00% 407MiB
2020-12-31T22:28:46.288362 709670.150692 1.799947 0.0% 294.1MiB 588.2MHz 5.01% 82MiB 5.01% 407MiB
2020-12-31T22:28:46.488368 709670.350815 2.000071 0.0% 294.1MiB 588.2MHz 0.00% 82MiB 5.00% 407MiB
2020-12-31T22:28:46.688374 709670.550755 2.200010 0.0% 294.1MiB 588.2MHz 5.00% 82MiB 5.00% 407MiB
2020-12-31T22:28:46.888381 709670.750808 2.400063 0.0% 294.1MiB 588.2MHz 0.00% 82MiB 10.00% 407MiB
2020-12-31T22:28:47.088387 709670.950767 2.600023 0.0% 294.1MiB 588.2MHz 0.00% 82MiB 5.00% 407MiB
2020-12-31T22:28:47.288393 709671.150816 2.800071 0.0% 298.2MiB 724.0MHz 5.00% 82MiB 10.00% 407MiB
2020-12-31T22:28:47.488399 709671.350754 3.000009 0.0% 298.2MiB 724.0MHz 0.00% 82MiB 0.00% 407MiB
2020-12-31T22:28:47.688405 709671.550808 3.200063 0.0% 298.2MiB 724.0MHz 0.00% 82MiB 0.00% 407MiB
2020-12-31T22:28:47.888411 709671.750837 3.400093 0.0% 298.2MiB 724.0MHz 0.00% 82MiB 5.00% 407MiB
2020-12-31T22:28:48.088418 709671.950742 3.599998 0.0% 298.2MiB 724.0MHz 0.00% 82MiB 0.00% 407MiB
To build multimonitor
from source, you will need a D programming language
compiler. GDC, LDC2 and DMD are all supported. On most Linux distributions it is
easiest to install gdc, which is part of gcc, and prepackaged on most
distributions.
After obtaining source code, just execute ./build.sh
(you can adjust options in
that script), or use dub
to build it.
You should then get a multimonitor
binary to use.
Multimonitor - sample information about system and processes.
--sub Launch a single external command, monitor it just like
--pid and finish once all of them finish
--pids List of process pids to monitor
--process List of process names to monitor
--process_map Assign short names to processes, i.e. a=firefox,b=123
--cpu Overall CPU stats, i.e. load, average and max frequency
--loadavg System-wide load average. Avaiable: none, min (1-min avg),
med (+5 min avg), max (+runnables and tasks count,
and forks per second)
--cpu_temp CPU temperature
--sched CPU scheduler details
--vm Virtual memory subsystem
--interrupts Interrupts details
--io System-wide IO details. Available: none, min, max
--net System-wide networking metrics
--gpu Gather GPU stats. Available: none, min, max
--mangohud_fps Gather FPS information for given processes using MangoHud RPC
--exec Run external command with arbitrary output once per sample
--exec_async Run external command with arbitrary output asynchronously
--pipe Run external command and consume output lines as they come
--async_delay_msec Change how often to run --exec_async and --gpu commands.
(default: 200ms)
--wait_for_all Wait until all named processes are up
--find_new_when_dead If the named process is dead, try searching again
--exit_when_dead Stop collecting metrics and exit, when any of requested
pids exits too.
--sum_all_matching For named processes, sum all matching processes metrics
(sum CPU, smart memory sum)
--auto_output Automatically create timestamped output file in current
working directory with data, instead of using standard
output. (default: false)
--interval_msec Target interval for main metric sampling and output.
(default: 200ms)
--duration_sec How long to log. (default: forever)
--time Time mode, one of: relative, boottime, absolute, all.
(default: all)
--utc_nice Show absolute time as ISO 8601 formated date and time in
UTC timezone. Otherwise Unix time is printed.
(default: false)
-H --human_friendly Use human friendly (pretty), but still fixed units
(default: true)
--verbose Show timeing loop debug info
-h --help This help information.
Primary purpose is debugging processes, system load, memory usage, memory leaks, GPU usage, frame rate tests, etc.
Combination of ps
, top
, iotop
, powertop
, radeontop
, vmstat
, free
,
mpstat
, pidstat
, slabtop
, cpufreq-info
, mangohud
and more, all in one.
In some areas the accuracy is significantly better than any of the above tools.
- Extremely accurate timestamps. Absolute (ISO 8601), monotonic from system start, and relative from tool startup.
- Very accurate repetition rate (usually <10us).
- Syscall delay compensation.
- Asynchronous sampling of expensive statistics.
- Automatic compensation of delays when calculating rates.
- Rich set of available metrics:
- System CPU
- System Memory
- System IO, total and per-device
- CPU temperature, frequency, and scaling governor
- GPU frequency, temperature, load and memory usage
- (multiple) Process CPU usage, thread count
- (multiple) Process memory usage
- (multiple) OpenGL / Vulkan FPS / frame time measurements (using MangoHud RPC system)
- (multiple) Process IO statistics
- (multiple) Process scheduling and IO priority logging
- System networking statistics
- System-wide and per-process context switches and interrupts
- (multiple) Custom asynchronous metrics (using external scripts), like:
- System power / current from PSU / SMBus
- Number of files in a directory
- ZFS statistics
- GPU power save mode
- Window resolution of a benchmarked game
- Many more easy to add on the fly
- Custom events annotation (i.e. using external scripts)
- Ability to sample some metrics at slower rate than main metrics.
- Human and machine friendly output in one format.
- Self documenting output. Clear units.
- Monitor processes by pid, or by name.
- Continue monitoring even if pid dies, or stop. Configurable.
- Ability to sum multiple pids or names. I.e. sum all processes with given name under one column.
- Autostart/prestart - wait for a process by name, and start logging as soon as one is found.
- Fast. 5Hz by default, 100Hz possible.
- Extremely low CPU and memory overhead (<0.5% CPU, <5MiB).
- Integration with Gnuplot for plotting.
- Automatic detection of various failures, like signals, interrupts, processes death, slow syscalls / preemption, system sleep, process SIGSTOP, clock jumps, etc. Automatically output nan, or empty lines when discontinuities are detected, so plotting can use the data correctly.
- Automatically calculate expected error / accuracy, and warn if it is low.
Many metrics support both rates and cumulative figures. Some other measure only "gauges" (i.e. memory usage, GPU load or frequency).
The output format is a simple text format with nicely aligned and annotated columns, but also optimized to be easily parsed by automated tools, most notably Gnuplot. Relative and absolute timestamps allows correlating with other tools and events, as well overlying multiple runs for comparison. Most of the data in various columns use fixed non-configurable formats. This is mostly done to speed-up processing and output, reduce memory allocations, and make it less likely for user to mess things up. It also means the logs produced now, will have exactly same format and units as the ones produced years from now, no matter the used options. Which is great for comparing measurements with year old logs.
Some commands, like --pid
, --pids
, --process
, --sub
, --exec
,
--exec_async
, --pipe
can be repated multiple times to monitor multiple
processes or pluging. Some, like --pids
, also accept a comma separated lists.
Relative ordering of columns in the output, will in general follow relative
order of arguments. But some system level information will be output in more
rigid order. I.e. --cpu
, --io
, --gpu
, will be displayed after timestamps,
in this order, and before any sub-process / per-process related ones.
Here is a general ordering:
- Timestamps (influenced by
--utc_nice
and--time
) --cpu
--loadvg
--cpu_temp
--mem
--gpu
--sched
--vm
--io
--net
--pids
,--pid
--process
--sub
--mangohud_fps
--exec
--exec_async
--pipe
Some ordering restrictions might be relaxed in the future.
$ multimonitor --utc_nice
DATETIME-UTC TIME RELTIME
2020-12-31T22:28:44.688313 709668.550807 0.200063
2020-12-31T22:28:44.888319 709668.750763 0.400018
2020-12-31T22:28:45.088325 709668.950799 0.600054
Without this flag, Unix time is output in the first column instead.
# multimonitor ...
UNIX-TIME TIME RELTIME
1609465795.020941 721738.570333 0.200061
1609465795.220947 721738.770319 0.400047
1609465795.424953 721738.970353 0.600081
1609465795.620959 721739.170320 0.800048
Note, Unix time is roughly number of seconds since Unix Epoch
(1970-01-01 00:00:00 "UTC"
) minus leap seconds. A day in Unix time always has
exactly 86400 seconds. Unix time is often (even in official standards and
manuals) colloquially refereed as "seconds since the Epoch", even if that is not
strictly true.
To select just one of the 3 timestamp to be present in the output, use --time
with one of absolute
, boottime
or relative
.
$ multimonitor --time=absolute
UNIX-TIME
1609655314.023888
1609655314.223895
1609655314.423901
1609655314.623907
$ multimonitor --time=absolute --utc_nice
DATETIME-UTC
2021-01-03T06:28:31.495809
2021-01-03T06:28:31.695815
2021-01-03T06:28:31.895821
2021-01-03T06:28:32.095828
$ multimonitor --time=boottime
TIME
911283.419439
911283.619396
911283.819425
911284.019410
911284.219418
$ multimonitor --time=relative
RELTIME
0.200084
0.400050
0.600066
0.800058
1.000044
Monitor CPU% and RSS of processes by PID. Multiple --pids
and --pid
can be
specified.
CPU%
of 100% means a fully utilized (by user space and system time spent in
kernel on behalf of the process) one core (or logical thread on the core). So for
example, a 8 thread CPU-bound process, on a 8 core system which is otherwise
idle, will show close to 800%. You can think of it as CPU seconds per second.
Utilization of 100% doesn't actually mean the core is fully utilized, it just
means that 100% of the time it was assigned to particular process, and not
assigned to other processes, interrupt handling or idle/sleep process. 100% CPU
can still have a lot of resources available that are not utilized (this is more
complicated with SMT), because of memory stalls, page faults, long dependency
chains, etc. If process is migrated between cores, the CPU%
will track a total
time it was assigned and running on some cores. For multi-threaded processes, the
CPU%
is a sum of all its threads.
RSS
stands for "Resident (segment) size". It basically is a total amount of
used physical memory, so for example, it doesn't include swapped out memory. Nor
does it count mapped memory, that wasn't yet allocated to physical memory. Be
careful when interpreting these numbers, for complex processes, and multi-process
setups, as shared memory (mostly libraries, but also buffers for communication)
can be counted multiple times for different processes. (For example: You can't
just add up all RSS
figures for Firefox or Chrome, to get a total memory usage
for them).
$ multimonitor --pids 1
# systemd
# |
# 1
UNIX-TIME TIME RELTIME CPU% RSS
1609465795.020941 721738.570333 0.200061 0.00% 12MiB
1609465795.220947 721738.770319 0.400047 0.00% 12MiB
1609465795.424953 721738.970353 0.600081 0.00% 12MiB
1609465795.620959 721739.170320 0.800048 0.00% 12MiB
1609465795.824966 721739.370335 1.000063 0.00% 12MiB
1609465796.020972 721739.570303 1.200031 0.00% 12MiB
1609465796.224978 721739.770336 1.400064 0.00% 12MiB
Multiple PIDs can be specified, as a comma separated list (--pids 1,2,3
), or
repeated arguments (--pids 1 --pids 2
), or even --pid 1,1 --pids 1,1
.
Columns will be ordered in the same order as requested pid in the list(s).
Above each CPU%
figure a name of the process (comm
) will be displayed,
together with its pid.
Note: On Linux, you can track individual threads using --pids
, by passing
thread task id. These are not pthread_t
IDs (POSIX Thread IDs). One way of
finding thread pids, is by checking ls -1 /proc/<pid>/task
entries. Other
tools, like ps
and top
can also display all threads on the system, and do
filtering. Thread pids, can also be obtained from inside the process using
gettid
call. There is no easy way to convert pthread_t
to tid
, even on
Linux, without hacky hacks. There is also no easy way to convert tid
back to
pthread_t
, or pthread name and such. This is because technically pthread to
kernel task threads doesn't need to be 1:1 mapping.
Monitor CPU%
and RSS
of process similar to --pid
, but instead search for
process by name. Additionally by default, logging (and RELTIME
) will not start
counting until all requested processes are found. Multiple --process
can be
specified.
$ multimonitor --process steam
# Waiting for process steam
# Waiting for process steam
# Waiting for process steam
# For process name steam found pids: [2066]
# steam
# |
# 2066
UNIX-TIME TIME RELTIME CPU% RSS
1609465961.498080 721905.041175 0.200079 14.99% 405MiB
1609465961.694086 721905.241138 0.400043 10.00% 405MiB
1609465961.894092 721905.441157 0.600062 15.00% 405MiB
Executes external command in the shell, and monitors it just like the --pids
.
Multiple --sub
can be specified. They will be started in left to right order,
and displayed also from left to right.
Because a shell is used, and the top most process is monitored, usually an
shell's exec
need to be performed to switch execution to a desired process.
Otherwise one would be monitoring CPU and RSS of the shell itself, which is
usually not what one wants.
Shell is the intermediary, to allow using features like shell file globing, redirection, piping, pre-start configuration and environment variable overrides, or starting some auxiliary background processes.
Example:
$ multimonitor --io=min --sub "exec sha256sum /dev/random >/dev/null"
UNIX-TIME TIME RELTIME READ WRITE CPU% RSS
1609566507.918133 822470.120617 0.200078 64MiB/s 0MiB/s 100.0% 1.8MiB
1609566508.122140 822470.320565 0.400026 65MiB/s 0MiB/s 100.0% 1.9MiB
1609566508.318146 822470.520595 0.600056 1001MiB/s 0MiB/s 100.0% 1.8MiB
1609566508.518152 822470.720580 0.800041 1010MiB/s 0MiB/s 100.0% 1.8MiB
1609566508.718158 822470.920588 1.000049 1031MiB/s 0MiB/s 100.0% 1.8MiB
1609566508.918164 822471.120587 1.200048 1022MiB/s 0MiB/s 100.0% 1.8MiB
1609566509.118170 822471.320584 1.400045 1031MiB/s 0MiB/s 100.0% 1.8MiB
1609566509.318177 822471.520587 1.600048 1022MiB/s 0MiB/s 100.0% 1.8MiB
...
The standard output is unchanged (it remains same as standard output of
multimonitor
). If the launched programs do have substantial own output, it
might be wise to use shell redirection in each --sub
invocation, or use
--auto_output
.
If --duration_sec
is specified, the after duration passes, all sub-processes
will immedietly receive SIGTERM
, and if after short delay they still are
not terminated, then after additional short delay, all non-terminated ones
will receive SIGKILL
and be waited for.
If the child process dies, SIGCHLD
will be ignored, but the zombie process,
will still be sampled, it will read as 0% CPU and 0MiB RSS. In the future, it
might be possible to handle this signal, and instead show nan
, while also
handling the death process to clean it from process tables.
If the ^C
is hit, or multimonitor
dies in some other way, SIGHUP
will be
delivered by kernel to the child processes, then reparanted under some other
system specific process (often pid 1). In the future, there might be an option to
call something like prctl(PR_SET_PDEATHSIG, SIGTERM);
before doing actual
exec
after the fork
.
Or does it maybe sends SIGINT
already? To check.
The processes for --pipe
and --exec
will most likely receive the SIGPIPE, in
addition to other signals mentioned above.
$ multimonitor --utc_nice --gpu=min
DATETIME-UTC TIME RELTIME GPU% VRAM SCLK
2020-12-31T22:28:44.688313 709668.550807 0.200063 0.0% 309.6MiB 386.7MHz
2020-12-31T22:28:44.888319 709668.750763 0.400018 0.0% 309.6MiB 386.7MHz
2020-12-31T22:28:45.088325 709668.950799 0.600054 0.0% 311.6MiB 326.5MHz
--gpu=max
could provide more information, including various GPU sub-system
loads, memory clocks, temperature, and such.
GPU stats will always be displayed before any of monitored processes (specified
via --pids
, --pid
, --process
or --sub
), as well plugins (--pipe
,
--exec*
).
Shows system-wide block device IO.
$ multimonitor --utc_nice --io=min
DATETIME-UTC TIME RELTIME RDkB/s WRkB/s
2021-02-02T05:43:48.568163 875927.542727 0.200118 164384KB/s 0KB/s
2021-02-02T05:43:48.768168 875927.742620 0.400011 165211KB/s 0KB/s
2021-02-02T05:43:48.968174 875927.942660 0.600052 163808KB/s 0KB/s
2021-02-02T05:43:49.168179 875928.142829 0.800221 164340KB/s 0KB/s
--io=max
additionally provides information about swap bandwidth.
$ multimonitor --utc_nice --io=max
DATETIME-UTC TIME RELTIME RDkB/s WRkB/s SWAPRDkB/s SWAPWRkB/s
2021-02-02T05:43:45.804091 875924.776632 0.200065 164425KB/s 0KB/s 0KB/s 0KB/s
2021-02-02T05:43:46.004096 875924.976592 0.400025 163875KB/s 0KB/s 0KB/s 0KB/s
2021-02-02T05:43:46.204101 875925.176618 0.600051 165097KB/s 0KB/s 0KB/s 0KB/s
2021-02-02T05:43:46.404107 875925.376602 0.800034 164493KB/s 0KB/s 0KB/s 0KB/s
IO stats will always be displayed before any of monitored processes (specified
via --pids
, --pid
, --process
or --sub
), as well plugins (--pipe
,
--exec*
).
Launches an asynchronous process and reads back lines from each. The process should output fixed width and consistent output on each line.
$ multimonitor --pipe "while true; do date '+%s.%N'; sleep 1; done" \
--pipe "while true; do cat /proc/loadavg; sleep 1; done" \
--pipe "while true; do cat /proc/uptime; sleep 1; done"
UNIX-TIME TIME RELTIME PIPE PIPE PIPE
1609465621.259576 721564.812290 0.200065 1609465621.112124784 3.44 3.73 3.86 4/1628 2125166 721583.31 22579423.26
1609465621.459583 721565.012258 0.400033 1609465621.112124784 3.44 3.73 3.86 4/1628 2125166 721583.31 22579423.26
1609465621.659589 721565.212438 0.600213 1609465621.112124784 3.44 3.73 3.86 4/1628 2125166 721583.31 22579423.26
You can think of these processes as multimonitor plugins.
Executes synchronously external command on each sample. Newline characters from the output are converted to spaces.
$ multimonitor --exec "awk '/^(nr_free_pages|nr_zone_inactive_anon)/ {print \$2;}' /proc/vmstat"
UNIX-TIME TIME RELTIME EXEC
1609478558.602970 734501.821340 0.200089 13816115 8242963
1609478558.802977 734502.021271 0.400020 13816141 8243038
1609478559.002983 734502.221306 0.600056 13816141 8243046
1609478559.202989 734502.421289 0.800038 13816141 8243050
1609478559.402995 734502.621292 1.000041 13816177 8243169
You can think of these processes as multimonitor plugins, to augment with extra capabilities quickly.
Executes asynchronously external command. Newlines from the output are converted to spaces.
$ multimonitor --exec_async 'echo 42 $(date +%s.%N)'
UNIX-TIME TIME RELTIME EXEC
1609476390.052024 732333.326417 0.200084 42 1609476390.058566475
1609476390.252030 732333.526350 0.400016 42 1609476390.258601844
1609476390.452037 732333.726391 0.600057 42 1609476390.458978396
1609476390.652043 732333.926383 0.800050 42 1609476390.658840135
Caching behaviour can be changed with --async_delay_msec
(default: 200ms).
A difference between the two can be ilustrated here:
$ multimonitor --exec "date +%s.%N" \
--exec_async "date +%s.%N" \
--async_delay_msec=1000
UNIX-TIME TIME RELTIME EXEC EXEC
1609478808.394682 734751.606517 0.200071 1609478808.401597625 1609478808.200443789
1609478808.594688 734751.806484 0.400039 1609478808.601534096 1609478808.200443789
1609478808.794694 734752.006507 0.600062 1609478808.801675976 1609478808.200443789
1609478808.994700 734752.206494 0.800049 1609478809.001560833 1609478808.200443789
1609478809.194707 734752.406500 1.000055 1609478809.201414051 1609478808.200443789
1609478809.394713 734752.606497 1.200052 1609478809.401672778 1609478809.202125682
1609478809.594719 734752.806509 1.400064 1609478809.601640469 1609478809.202125682
1609478809.794725 734753.006488 1.600043 1609478809.801603550 1609478809.202125682
1609478809.994731 734753.206505 1.800060 1609478810.001602391 1609478809.202125682
1609478810.194737 734753.406499 2.000053 1609478810.201564425 1609478809.202125682
1609478810.394744 734753.606650 2.200204 1609478810.401723765 1609478810.203523733
1609478810.594750 734753.806403 2.399958 1609478810.601920961 1609478810.203523733
1609478810.794756 734754.006529 2.600083 1609478810.801647833 1609478810.203523733
1609478810.994762 734754.206483 2.800037 1609478811.001605257 1609478810.203523733
1609478811.194768 734754.406504 3.000059 1609478811.201543125 1609478810.203523733
1609478811.394774 734754.606501 3.200056 1609478811.401612507 1609478811.204954820
1609478811.594781 734754.806497 3.400051 1609478811.601693833 1609478811.204954820
Notice how the --exec
part is executed every sample, but --exec_async
on
average is executed every 5 samples.
--exec_async
is excellent for more expensive computations, like calculating a
hash of a file, traversing big filesystem tree, reading from network, or reading
sysfs files that could be very slow. While the command is executing, the previous
saved state will be displayed, allowing one to continue logging uninterrupted
other things, as well asynchronously execute other command that use
--exec_async
.
multimonitor
can act as a multiplexer from multiple sources in parallel. It can
for example prefix log files or outputs of a command with timestamps, but do so
even when using multiple files and processes in parallel. And monitor other
system metrics in parallel if desired. This is not a main usage of
multimonitor
, but it is sometimes handy. A bit more convenient tool is ts
from moreutils
is worth looking at (there are other handy tools there that
could be used together with multimonitor
, for example ifdata
, pee
and
sponge
).
When using --interval_msec=0
, instead of sleeping, a pooling on all pipes to
consume their output will be performed.
$ sudo multimonitor --pipe "exec tail -f /var/log/syslog" \
--pipe "exec tail -f /var/log/kern.log" \
--interval_msec=0
...
...
This is not a super useful feature at the moment, because by default text on each column is right aligned, making it somehow silly to use.
Here we monitor GPU, process CPU and RSS, but additionally we monitor
number of memory mappings and VmSize (in KiB) using external script, run
continously. The --pipe
scripts has a bit of extra boilerplate to find
automatically the process, and preserve the format (number of columns),
if the process is gone (so it is easier to parse later in other tools).
$ multimonitor \
--gpu=med \
--process cat \
--pipe 'while true; do P=$(pidof -s cat); if [ "$P" != "" ]; then while wc -l "/proc/$P/maps" 2>/dev/null; do sleep 0.22; done; else echo "nan" "-"; fi; done' \
--pipe 'while true; do P=$(pidof -s cat); if [ "$P" != "" ]; then while awk "/^VmSize/ {print \$2;}" "/proc/$P/status" 2>/dev/null; do sleep 0.22; done; else echo "nan"; fi; done'
# ProcStat Initializing ticks_per_second = 100
# ProcStat Initializing page_size_kb = 4
# Arguments: ["multimonitor", "--gpu=med", "--process", "cat", "--pipe", "while true; do P=$(pidof -s cat); if [ \"$P\" != \"\" ]; then while wc -l \"/proc/$P/maps\" 2>/dev/null; do sleep 0.22; done; else echo \"nan\" \"-\"; fi; done", "--pipe", "while true; do P=$(pidof -s cat); if [ \"$P\" != \"\" ]; then while awk \"/^VmSize/ {print \\$2;}\" \"/proc/$P/status\" 2>/dev/null; do sleep 0.22; done; else echo \"nan\"; fi; done"]
# Waiting for process cat
# Waiting for process cat
# Waiting for process cat
# Waiting for process cat
# Waiting for process cat
# Waiting for process cat
# Waiting for process cat
# Waiting for process cat
# Waiting for process cat
# For process name cat found pids: [451314]
# Spawned 451315 for --pipe: while true; do P=$(pidof -s cat); if [ "$P" != "" ]; then while wc -l "/proc/$P/maps" 2>/dev/null; do sleep 0.22; done; else echo "nan" "-"; fi; done
# Spawned 451320 for --pipe: while true; do P=$(pidof -s cat); if [ "$P" != "" ]; then while awk "/^VmSize/ {print \$2;}" "/proc/$P/status" 2>/dev/null; do sleep 0.22; done; else echo "nan"; fi; done
UNIX-TIME TIME RELTIME GPU% VRAM SCLK GPUT GPUP CPU% RSS PIPE PIPE
1620578670.951151 30070.936277 0.200060 0% 209.4MiB 1050MHz 35°C 112.13W 0.00% 0MiB
1620578671.151156 30071.136242 0.400025 0% 209.4MiB 1050MHz 35°C 112.13W 0.00% 0MiB 24 /proc/451314/maps 5440
1620578671.351161 30071.336259 0.600041 0% 201.9MiB 1050MHz 35°C 112.13W 0.00% 0MiB 24 /proc/451314/maps 5440
1620578671.551167 30071.536250 0.800033 0% 201.9MiB 1050MHz 35°C 112.13W 0.00% 0MiB 24 /proc/451314/maps 5440
1620578671.751172 30071.736255 1.000037 0% 201.9MiB 1050MHz 35°C 112.13W 0.00% 0MiB 24 /proc/451314/maps 5440
1620578671.951177 30071.936253 1.200035 0% 201.9MiB 1050MHz 35°C 112.13W 0.00% 0MiB 24 /proc/451314/maps 5440
1620578672.151183 30072.136258 1.400041 0% 202.4MiB 1050MHz 35°C 112.13W 0.00% 0MiB 24 /proc/451314/maps 5440
1620578672.351188 30072.336251 1.600034 0% 202.4MiB 1050MHz 35°C 112.13W 0.00% 0MiB 24 /proc/451314/maps 5440
1620578672.551194 30072.536254 1.800036 0% 202.4MiB 1050MHz 35°C 112.13W 0.00% 0MiB 24 /proc/451314/maps 5440
1620578672.751199 30072.736253 2.000035 0% 202.4MiB 1050MHz 35°C 112.13W 0.00% 0MiB 24 /proc/451314/maps 5440
1620578672.951204 30072.936253 2.200036 0% 202.4MiB 1050MHz 35°C 112.13W 0.00% 0MiB 24 /proc/451314/maps 5440
1620578673.151210 30073.136254 2.400036 0% 202.4MiB 1050MHz 35°C 112.13W 0.00% 0MiB 24 /proc/451314/maps 5440
1620578673.351215 30073.336257 2.600040 0% 202.4MiB 1050MHz 35°C 112.13W 0.00% 0MiB 24 /proc/451314/maps 5440
1620578673.551220 30073.536251 2.800034 0% 202.4MiB 1050MHz 35°C 112.13W 0.00% 0MiB 24 /proc/451314/maps 5440
1620578673.751226 30073.736254 3.000036 0% 202.4MiB 1050MHz 35°C 112.13W 0.00% 0MiB 24 /proc/451314/maps 5440
1620578673.951231 30073.936253 3.200036 0% 202.4MiB 1050MHz 35°C 112.13W nan% 0MiB 24 /proc/451314/maps 5440
1620578674.151236 30074.136263 3.400046 0% 202.4MiB 1050MHz 35°C 112.13W nan% 0MiB nan - nan
1620578674.351242 30074.336259 3.600042 0% 202.4MiB 1050MHz 35°C 112.15W nan% 0MiB nan - nan
1620578674.551247 30074.536256 3.800039 0% 202.4MiB 1050MHz 35°C 112.15W nan% 0MiB nan - nan
1620578674.751252 30074.736259 4.000041 0% 202.4MiB 1050MHz 35°C 112.15W nan% 0MiB nan - nan
1620578674.951258 30074.936257 4.200040 0% 203.7MiB 1050MHz 35°C 112.13W nan% 0MiB nan - nan
1620578675.151263 30075.136259 4.400041 0% 203.7MiB 1050MHz 35°C 112.13W nan% 0MiB nan - nan
1620578675.351269 30075.336263 4.600045 0% 203.7MiB 1050MHz 35°C 112.13W nan% 0MiB nan - nan
Note, how we use single quotes to pass --pipe
script - this allows
easier use of $
inside the script.
Often, it might be easier to put such --pipe
scripts into own script
files, for easier reuse.
Also note that we use slightly bigger sleep (220ms), compared to the
interval of multimonitor (200ms), so we are not flooded by output of
--pipe
script, which by the time is display is stale / old (by many
lines). Instead, we allow the reader to block asynchronously, and display
previous value for one cycle, but updated value will come just on the
next cycle / line, not 10 or 100 lines later, depending on pipe/fifo
buffering. (PS. Note that in some locales sleep 0.22
will not work,
use a comma, like sleep 0,22
, or switch to more sane locale).
Alternatively a simple approach is to use --exec_async
, which is easier
to use, but has a bit extra overhead, and might be delayed a bit extra
due to internal caching of commands executed using --exec_async
:
$ multimonitor \
--gpu=med \
--process cat \
--exec_async 'wc -l "/proc/$(pidof -s cat)/maps" 2>/dev/null || echo nan -' \
--exec_async 'awk "/^VmSize/ {print \$2;}" "/proc/$(pidof -s cat)/status" || echo nan'
Of course, if you know the pid before hand and the target process is
already running, you can compute it and pass once before launching the
multimonitor
, to make script even simpler and slightly faster too.
$ P=$(pidof -s cat)
$ multimonitor \
--gpu=med \
--pid $P \
--exec_async "wc -l '/proc/$P/maps' 2>/dev/null || echo nan -" \
--exec_async "awk '/^VmSize/ {print \$2;}' '/proc/$P/status' || echo nan"
Notice the inversion of single quotes and double quotes, to make it work correctly.
When --exit_when_dead=true
is used, then multimonitor
will finish
(and terminate other processes) as soon as any of the monitored processes
is in zombie, dead state or gone. Even if that is before the end of
--duration_sec
settings.
$ multimonitor --exit_when_dead --sub="exec sleep 2" --duration_sec=3600
# Spawned 2708230 for --sub: exec sleep 2
UNIX-TIME TIME RELTIME CPU% RSS
1614694484.970314 768700.437572 0.200172 0.00% 0MiB
1614694485.170316 768700.637394 0.399994 0.00% 0MiB
1614694485.370317 768700.837564 0.600165 0.00% 0MiB
1614694485.570318 768701.037498 0.800098 0.00% 0MiB
1614694485.770320 768701.237419 1.000019 0.00% 0MiB
1614694485.970321 768701.437574 1.200174 0.00% 0MiB
1614694486.170322 768701.637485 1.400085 0.00% 0MiB
1614694486.370324 768701.837537 1.600137 0.00% 0MiB
1614694486.570325 768702.037491 1.800092 0.00% 0MiB
1614694486.770326 768702.237535 2.000135 0.00% 0MiB
$
$ multimonitor_ldc --exit_when_dead --sub="exec sleep 2" --sub="exec sleep 100" --duration_sec=3600
# Spawned 2727867 for --sub: exec sleep 2
# Spawned 2727868 for --sub: exec sleep 100
UNIX-TIME TIME RELTIME CPU% RSS CPU% RSS
1614694874.680933 769090.149657 0.200049 0.00% 0MiB 0.00% 0MiB
1614694874.880935 769090.349626 0.400018 0.00% 0MiB 0.00% 0MiB
1614694875.080936 769090.549686 0.600078 0.00% 0MiB 0.00% 0MiB
1614694875.280938 769090.749626 0.800018 0.00% 0MiB 0.00% 0MiB
1614694875.480940 769090.949682 1.000074 0.00% 0MiB 0.00% 0MiB
1614694875.680942 769091.149658 1.200050 0.00% 0MiB 0.00% 0MiB
1614694875.880943 769091.349651 1.400043 0.00% 0MiB 0.00% 0MiB
1614694876.080945 769091.549656 1.600048 0.00% 0MiB 0.00% 0MiB
1614694876.280947 769091.749663 1.800055 0.00% 0MiB 0.00% 0MiB
1614694876.480949 769091.949641 2.000033 0.00% 0MiB 0.00% 0MiB
# Sending SIGTERM to not yet terminated pid 2727868
Without this argument, the multimonitor will keep reporting (including possibly other metrics) until duration is finished, even if any or all processes are done:
$ ./multimonitor_ldc --exit_when_dead=false --sub="exec sleep 2" --duration_sec=3600
# Spawned 2708485 for --sub: exec sleep 2
UNIX-TIME TIME RELTIME CPU% RSS
1614694563.114823 768778.583917 0.200051 0.00% 0MiB
1614694563.314824 768778.783901 0.400035 0.00% 0MiB
1614694563.514825 768778.983913 0.600046 0.00% 0MiB
1614694563.714827 768779.183920 0.800053 0.00% 0MiB
1614694563.914828 768779.383904 1.000038 0.00% 0MiB
1614694564.114829 768779.583923 1.200057 0.00% 0MiB
1614694564.314831 768779.783897 1.400031 0.00% 0MiB
1614694564.514832 768779.983923 1.600056 0.00% 0MiB
1614694564.718833 768780.184043 1.800176 0.00% 0MiB
1614694564.914835 768780.383834 1.999968 0.00% 0MiB
1614694565.114836 768780.583934 2.200068 0.00% 0MiB
1614694565.314837 768780.783898 2.400032 0.00% 0MiB
1614694565.514838 768780.983894 2.600028 0.00% 0MiB
1614694565.714840 768781.183896 2.800029 0.00% 0MiB
1614694565.914841 768781.383904 3.000037 0.00% 0MiB
1614694566.114842 768781.583919 3.200052 0.00% 0MiB
1614694566.314844 768781.783887 3.400021 0.00% 0MiB
1614694566.514845 768781.983901 3.600034 0.00% 0MiB
1614694566.714846 768782.183894 3.800028 0.00% 0MiB
1614694566.914848 768782.383899 4.000033 0.00% 0MiB
1614694567.114849 768782.583880 4.200013 0.00% 0MiB
1614694567.314850 768782.783921 4.400055 0.00% 0MiB
...
...
...
During that period, dead processes might have their CPU%
figure
reported as NaN%
or nan%
or 0.00%
. For external processes also the
RSS
might be reported as NaNMiB
or nanMiB
or 0MiB
, after they are
gone, for example:
$ ( sleep 3 & ); # Process we will be monitoring.
$ sleep 1; # Give it some time to start.
$ multimonitor --exit_when_dead=false --process="sleep" --duration_sec=4
# For process name sleep found pids: [2717459]
UNIX-TIME TIME RELTIME CPU% RSS
1614694744.711994 768960.183894 0.200052 0.00% 0MiB
1614694744.911996 768960.383852 0.400010 0.00% 0MiB
1614694745.111997 768960.583890 0.600047 0.00% 0MiB
1614694745.311998 768960.783867 0.800024 0.00% 0MiB
1614694745.512000 768960.983882 1.000039 0.00% 0MiB
1614694745.712001 768961.183871 1.200029 0.00% 0MiB
1614694745.912002 768961.383893 1.400050 0.00% 0MiB
1614694746.112003 768961.583865 1.600023 0.00% 0MiB
1614694746.312005 768961.783884 1.800041 nan% 0MiB
1614694746.512006 768961.983868 2.000026 nan% 0MiB
1614694746.712007 768962.183880 2.200038 nan% 0MiB
1614694746.912009 768962.383890 2.400047 nan% 0MiB
1614694747.112010 768962.583872 2.600029 nan% 0MiB
1614694747.312011 768962.783895 2.800052 nan% 0MiB
1614694747.512012 768962.983883 3.000040 nan% 0MiB
1614694747.712014 768963.183895 3.200053 nan% 0MiB
1614694747.912015 768963.383888 3.400046 nan% 0MiB
1614694748.112016 768963.583888 3.600046 nan% 0MiB
1614694748.312018 768963.783887 3.800045 nan% 0MiB
1614694748.512019 768963.983889 4.000047 nan% 0MiB
$
TODO(baryluk): Implement --exit_when_dead=any
and --exit_when_dead=all
.
TODO(baryluk): Implement --exit_when_dead=anysub
and
--exit_when_dead=allsub
, which would only take into account the --sub
processes in the early termination logic, not the other ones.
TODO(baryluk): If all (at least one) monitored processes are of --sub
type, automatically use --exit_when_dead=allsub
.
TODO(baryluk): Implement / fix this for --pipe
too?
By default multimonitor
flushes unconditionally output after each line. This
makes it nicer to use with other tools like tee
, grep
, awk
, kolumny
or
tail -f
, for post-processing output in real time, as well saving to a file and
displaying in terminal at the same time. Without this, it would often appear
there is no output, despite small interval used, and for reasonable intervals it
makes sense to disable full output buffering, to make multimonitor
more
convenient to use.
However, for very small intervals, it might come at somehow high overhead. If you
use --interval_msec=50
, or less, it might be better to use buffered output. To
enable it use --buffered
. A standard C library FILE
buffers will be used.
This is usually 4KiB, and the behavior can be manipulated using stdbuf
command
for example (stdbuf --output=1M multimonitor --buffered --interval_msec=10
can be used to use large - 1MiB - output buffer). Note however, that because
multimonitor
does syscalls to clock_nanosleep
and clock_gettime
at least
once per each output line, the overheads savings aren't going to that
significant.
You are free to do whatever you like with the output. Very often it will be just scrolling in the terminal for human consumption. However, often it will be saved to a file or processed in real time by other tools. Here are some suggestions.
- Usually column 3 will be used as x axis, as it is easiest to use, but there are cases where other columns will be more appropriate.
Gnuplot is an advanced command-line driven graphing utility. https://www.gnuplot.info/
This is just scratching the surface of what is possible with gnuplot, but should provide some ideas what is possible. The author is using Gnuplot for 20 years and still discovering new useful features in it.
- As an input to gnuplot. For example
gnuplot -e "plot 'mm.txt' using 3:4; pause -1"
A good idea for long runs is to outputmultimonitor
stdout to file (redirecting in shell, using--auto_output
or usingtee
). Then as it grows, see results in gnuplot by doingreplot
, or re executing your own gnuplot script. This makes it a very interactive and insightful, even while the output is still being created bymultimonitor
. - Gnuplot can process multiple columns, either as single plot, or multiple
plots. For example:
plot "mm.txt" using 3:($4+$6)
will sum up columns 4 and 6 and plot them as one line. Andplot "mm.txt" u 3:4, "" u 3:6
will plot them as two separate lines on the same plot (""
means to use same input file as previous one). - For very short runs, especially when using
--duration
and/or--sub
, it might be possible to usemultimonitor
directly in the gnuplot using its popen functionality, like this:plot "<multimonitor --gpu --duration_sec=5 --sub 'exec glxgears'" using 3:4
, but this do have limited usability (it is hard to plot more than one column from singlemultimonitor
run), and usually it is better to save output to the file instead (this could be done using gnuplotsystem
of course, or some external script or by hand). In general I don't recommend it. - When using gnuplot, and trying to plot multiple separate runs, it is handy
to use
for
construct. Likeplot for [filename in system("ls -1 mm*.txt")] filename u 3:4 w step title filename noenhanced
or similar. This will automatically create multiple plots for multiple files using same column specifiers. Of course multiple other plots can be ploted on the same plot, some possible using rigth axis, or usingmultiplot
functionality.noenhahced
is to not convert underscore as a hint to do subscripts. Without itmm_1.txt
, would convert intomm₁.txt
instead, which is ugly, and not what you usually want. - When processing multiple files in gnuplot using
for
, it might be handy to modify the title based on a part of filename, for example:… t sprintf("CPU %s", strstrt(filename, "zink") > 0 ? "Z" : "R"))
to extract some details from a filename, instead of a full filename. - Similar trick can be applied to offsets of time or normalization of values:
plot for [...] filename u (offset(filename, $3):4 w step
whereoffset
, could be defined similar to this:offset(filename, t) = strstrt(filename, "zink" > 0 ? 123.1 : 125.0
. - When using
multiplot
feature. It often is a good idea to make left and right margins a fixed size, so the time axis aligns properly and perfectly. Combined with grid, and disabling xtics for all but the last plot, is another neat trick to increase the information density. - Using
set term sixelgd
(orGNUPLOT_TERM=sixelgd
) in a terminal emulator, supporting sixel protocol, allows to easily view graphs directly in your terminal (even over SSH). mlterm, xterm, terminology for X11 do support it. yarf for Linux framebuffer (without X11). libvte (Gnome/Mate Terminal, Terminator) support is coming. Otherwise, usingpngcairo
,pdfcairo
,svg
for saving to files, or usingwxt
for interactive display is recommended.
See example *.gnuplot
files for some inspirations.
http://gnuplot.sourceforge.net/demo_5.5/ is also helpful for new users.
- As input to
awk
(or sometimesgrep
orsed
), for example to do simple calculations between columns, or to detect specific patterns. Example:multimonitor --gpu | awk '{ if ($4 > 80) print $0; }'
will only display lines with high GPU usage. Doing sums, moving averages, or ratios between different columns, or computing own rates is another option. - The
awk
(or other tools) could be processing that in real time using pipes, or as a post-processing step later, either in a script, or i.e. inpopen
construct of gnuplot. Example:plot "<awk '{print $3, $5+$7;}' mm.txt using 1:2
kolumny
(https://github.com/baryluk/kolumny) is a type of streaming command
line spreadsheet, primarly used for processing multiple input files in parallel.
It is a good match to processing multimonitor
output, as well for many many
more uses.
- It is extremely handy for doing comparisons between separate runs of
multimonitor
, because of its ability to do mathematical post-processing between multiple files. Example:kolumny mm1.txt u t1:=3,~a:=4 mm2.txt u ~t2:=3,~b:=4 :~check(isclose(t1,t2)) :a/b
- Here, the
t1:=3
means to use column 3 from filemm1.txt
and assign it to variablet1
, and print it. ~a:=4
means to read column 4 from filemm1.txt
and assign it to variablea
, without printing it.- Reading multiple columns at once is possible using arrays too, for example
~pids:=4...12
. - After input file definitions and main variables, various statements and
expressions are used. Each expression starts with
:
(or with#
, which means to turn it off). :~check(isclose(t1,t2))
is to ensuret1
andt2
are close (which use 3rd column, so thatRELTIME
from both files are aligned on each row).~
is to suppress output ofTrue
to the output.:a/b
computes value ofa/b
(ratio between 4th columns) and outputs it.- It is also possible to declare new variables, i.e.
:~ratio:=a/b
, and use them in other expression (as long there are no cycles, they will be evaluated in proper order, while maintaining same print order as given on command line - forward references are supported). For example:(1.0-ratio)**2
to output a new column that uses other column, columns or expressions defined on command line. - Any Python expression can be used, and extra functions can be imported
using
--import
. - There are also tricks to carry variables and state across rows, do dynamic column lookups, arrays (multiple columns) and other tricks.
u
is shorthand forusing
(just like Gnuplot).- Similarly
s
is shorthand forskip
(just like Gnuplot). kolumny
also supports reading directly from standard input, or subprocesses<program
(Just like Gnuplot), somultimonitor --pids 1,2,3 | kolumny - u t:=3,~cm:=4...9 ':sum(cm[0:2:])'
is an option for example (to sum CPU usage of 3 processes). Or equivalently:kolumny "<multimonitor --pids 1,2,3" u t:=3,~cm:=4...9 ':sum(cm[0:2:])'"
The second form supports multiple conccurent input processes (and other input files), if required.kolumny
can also read variable amount of columns into its arrays, by indexing from right:kolumny "<multimonitor --process firefox" u t:=3,~cpu_mem:=4...-1 ':~cpu:=cpu_mem[0:2:]' ':sum(cpu)'
will sum CPU usage of all firefox processes, no matter how many there are.
- Here, the
- Combining
kolumny
andgnuplot
in one, example:plot "<kolumny mm1.txt using t1:=3,~a:=4 mm2.txt using ~t2:=3,~b:=4 :~check(isclose(t1,t2)) :a/b" using 1:2 with lines
to compute ratio between column 4 from filemm1.txt
andmm2.txt
, without needing to code this in a separate file or script. The CLI syntax ofkolumny
was optimized to not require extensive use of quoting and escaping when using with tandem with gnuplot (this is why usage of spaces is limited, and varaiables doesn't use$
likeawk
for example). - Of course it is also to combine all in one gnuplot+multimonitor+kolumny, but even on a single command line, but these can get out of hand quickly, if you are not very familiar with these tools.
- For extremely simple cases, i.e. two files being compared, as primitive
substitute for
kolumny
is to usepaste
, then combine with other tools (likeawk
orgnuplot
). Example:paste mm1.txt mm2.txt | awk '{print $3, $4/$9; }'
could be roughly equivalent to example above, assuming 5 columns inmm1.txt
file.kolumny
has advantage of easier processing of bigger number of files, and not needing to manually count columns inpaste
output, which is very tedious and errorprone, especially when changing output format. - Consuming data in various programming languages like Python or R, should
be equally easy. Often as easy as doing
line.split()
.
Importing or just pasting into your favorite spreadsheet (LibreOffice Calc, Google Sheets, Microsoft Excel, Gnumeric, Calligra Sheets, etc) is obviously also a reasonable option, if you are into that.
- Be aware of date handling and autoformating / "autocorrecting" that many of
these programs do. Usually incorrectly!
- For example
nan
will not be recognized usually. - Similarly Unix time might not be fully preserved with full accuracy, and usually microseconds will be silently dropped.
- Also some software might convert Unix time back to date and time not
correctly, so double check your software (Example
1609631793.656
should translate to2021-01-02T23:56:33.656+00:00
).
- For example
- These tools does not like too much units attached directly to numbers,
so using
--human_friendly=false
is probably good idea. If you already captured some files, something likesed -E -e 's/[KMGT]i?(B|Hz|)\b//gi' -e 's/%//g'
could often do a decent job removing these units.
Right now multimonitor
doesn't support output in other formats. Author thinks
that the simple column and white space design is super easy to use with many
tools (as can be seen above with various example, like gnuplot
, awk
,
kolumny
). If there is sufficient demand, from users, it might be possible to
easily add csv
, tsv
or ods
output formats for example.
For now, using output as is, or with --human_friendly=false
, should work well.
In --pipe
, --exec
and --exec_async
be aware of shell escaping rules.
This is quite important for example when using awk
. See examples above how it
could be made to work.
If the text returned by --pipe
, --exec
and --exec_async
, is going to have a
variable number of elements, or a text with unknown number of words, AND it is
not a last column of multimonitor
, it is recommended to put such output into
double quotes, so Gnuplot (and CSV) can consider it a single column anyway.
Example:
$ multimonitor --exec 'echo "\"$(date)\""'
SECONDS-FROM-EPOCH TIME RELTIME EXEC
1609563691.299229 819653.500171 0.200085 "Sat 02 Jan 2021 05:01:31 AM UTC"
1609563691.499235 819653.700140 0.400054 "Sat 02 Jan 2021 05:01:31 AM UTC"
1609563691.699241 819653.900147 0.600061 "Sat 02 Jan 2021 05:01:31 AM UTC"
1609563691.899248 819654.100148 0.800062 "Sat 02 Jan 2021 05:01:31 AM UTC"
$ LC_ALL=pl_PL.UTF-8 TZ=Europe/Zurich multimonitor --exec 'echo "\"$(date)\""'
SECONDS-FROM-EPOCH TIME RELTIME EXEC
1609563713.831924 819676.035970 0.200090 "sob, 2 sty 2021, 06:01:53 CET"
1609563714.031930 819676.235925 0.400045 "sob, 2 sty 2021, 06:01:54 CET"
1609563714.231936 819676.435947 0.600067 "sob, 2 sty 2021, 06:01:54 CET"
1609563714.431942 819676.635932 0.800052 "sob, 2 sty 2021, 06:01:54 CET"
Similar, it might be useful when using ls -l
, which might format modification
times, in multitude of ways (affected by relative time, time zone, locale
installed, locale set, user preferences set in shell aliases or environment
variables etc, etc).
Usually these problems can be completely avoided, by properly constructing environment, or command line options. In other cases (i.e. reading actual content of some dynamic file, or output from network), it might still be good to do quoting.
TODO(baryluk): Add --qexec
(quoted exec), to do it automatically (as well turn
any other quotes into escaped quotes in the output).
Sometimes when using shell pipes in --pipe
, it might be wise to unbuffer the
output, this can be done using unbuffer
or stdbuf -oL
, example:
$ multimonitor --pipe 'sudo stdbuf -oL tail -f /var/log/syslog | stdbuf -oL sed -E -e "s/^(.* kernel: )//"'
SECONDS-FROM-EPOCH TIME RELTIME PIPE
1609565358.558647 821320.761188 0.200214 [821272.965974] usb usb8-port1: Cannot enable. Maybe the USB cable is bad?
1609565358.758653 821320.960940 0.399966 [821277.038003] usb usb8-port1: Cannot enable. Maybe the USB cable is bad?
1609565358.958659 821321.161257 0.600282 [821281.117976] usb usb8-port1: Cannot enable. Maybe the USB cable is bad?
1609565359.158665 821321.360902 0.799928 [821285.201977] usb usb8-port1: Cannot enable. Maybe the USB cable is bad?
1609565359.358672 821321.561090 1.000115 [821289.273981] usb usb8-port1: Cannot enable. Maybe the USB cable is bad?
...
TODO(baryluk): Maybe add option to suppress new line output, or compress it to
empty, if the output of --exec
, --pipe
, etc, is same as before. Could be
interesting to mark event transitions, while maintaining a readability.
TODO(baryluk): It would be nice to interpret numbers from the output as
cumulative, and let the multimonitor
calculate rates. There is many sources
that would work really well for this, for example: /proc/<pid>/io
,
/proc/vmstat
, ip -s l
. Maybe --exec_rate
?
TODO(baryluk): Conversely, how about also --exec_cumul
, so we SUM the numbers
from the pipe, or integrate in time, interpreting samples as average value, but
also taking into account the delay between calls. We can also do version that
just does sum (i.e. if the command returns current cumulative number, and then
resets it to zero).
TODO(baryluk): A more modular approach would be better, for example
--exec:async:in_cumul:out_rate:cache=5s "exec something"
, this would allow
passing extra options and flags easily to each exec, and also allow for changing
order of options, without needing to remember all things. Other options sync
,
text
, format
, etc. This might be handing when reading multiple values from a
single command, where some values are rates, some others are not, some should be
ignored, etc. Also, :off
, could be used to keep it on the command line, parse,
but ultimately ignore. (similar to '#' prefix in kolumny
, could be supported
too probably).
multimonitor
is written in D programming language, and can be compiled using
GDC, LDC2 or DMD compilers. multimonitor
does use D standard library Phobos,
included with these compilers. There are no other dependencies. Some parts of the
code are written with @nogc
to ensure smooth and predictable performance. Other
parts do use GC, but very little allocations are actually performed. It is normal
to see about 1 GC collection cycle (each ≈1ms) per hour in steady state. Code is
optimized for correctness, and speed, with extensive usage of meta programming
facilities of D programming language.
Only Linux is supported. Linux kernel 2.6.32+ required. There are no plans to
support other operating system, as there is a lot of Linux specific code. FreeBSD
version is a reasonable option tho, but would require some porting and testing.
multimonitor
extensively use pread
syscall, but readfile
from Linux 5.11 is
also a possibility.
multimonitor
is not a replacement for generic monitoring frameworks and tools
like Prometheus (including node-exporter), collectd, Nagios, etc. These tools are
extensible, support very long collections (even years), from multiple sources and
machines, query languages, dashboards, alerting, application specific
instrumentation, dynamic configuration changes, resampling, time realignment of
multiple metrics, interpolation, etc. Also most of these tools do sample about
once a minute, sometimes once every 5 minutes.
multimonitor
instead is used for ad-hoc high-precision high-rate logging of
specific metrics, especially for monitoring few apps, without extra
instrumentation. multimonitor
output is also designed to be easily consumed by
other tools like gnuplot, awk, kolumny, etc.
multimonitor
is not indented for extremely high rate sampling. It is not a
general data acquisition system, nor it is suitable for isochronous data sampling
with extremely low jitter. 100Hz would be practical limit of the multimonitor
.
Higher is possible, but not recommended, due to better tools available (with
lower disk usage, and lower jitter, and lower CPU overheads). But i.e. sampling
external sensors once per second, like temperature, or power usage, are
reasonable use cases, as long as the fact that the output format is somehow
verbose is acceptable. For very constrained systems (microcontrollers or small
SBC systems, with limited storage, small write throughput or slow network
connectivity) other solutions should be considered.
multimonitor
is also not well suited for very big number of metrics collected.
40-60 is probably reasonable max, but is already hard to keep track. Sure, more
can be done, but there are better tools available for this. multimonitor
is
also not well suited for gathering variable length statistics of generic type.
I.e. want to see temperature of all SCSI drives in the system, or core frequency
of all 32 core on the CPU? Not the best idea, as the number of columns or order
in the output will very between systems, making it very hard to process such
files.
The author use it mostly to monitor GPU benchmarks during run, and log CPU load, GPU load, GPU frequency, GPU temperature, GPU VRAM, benchmark CPU load, benchmark RSS usage, benchmark thread count, benchmark active thread count, benchmark min-avg-max frame rates. These can be then fed to Gnuplot to do plots, possibly from different runs, for example with different GPU driver version, different compiler options, etc.
Author was just tired of running 3 or 4 different tools concurrently (from
command line or ad-hoc scripts, which always got lost and forgotten, so needed to
be reinvented each time, usually with slightly different format), all with own
latencies, timestamps formats (or lack of them), significant measurement
inaccuracies, different units, too many columns to easily count in gnuplot, then
needing to do time offsets between multiple input files to align them in time,
and between different runs to correlate changes, setup weird awk
or kolumny
scripts to, bring the formats to sanity, do computations between them, or plot
with titles, and labels, etc. multimonitor
makes things more uniform, faster,
more accurate and easier to use.
Other tools, were too limiting, too high overhead or too much time to setup. I.e. Prometheus can't really do sub-second scraping, and was too inaccurate at high repetition rates, and took hours to setup or fully automate.
With multimonitor
graphs like this to do comparisons of many metrics between
different setups are easy to do:
- Linux
perf
integration, i.e. IPC, context switches, CPU migrations, cache hits / misses, TLS misses, branch mispredicions, etc. - Use
pidfd
ordirfd
/openat
for processing processes in/proc
and/sys
. Similarly for searchinghwmon
entries. - Disk usage.
- Output to file with file rotation / compression.
- Trigger external tools periodically (i.e. generate webpage or image with plot) with new data.
- More testing on Intel and Nvidia GPUs
- More testing on SBCs, like Raspberry Pi, Orange Pi, Banana Pi, Odroid, etc.
- Multi-arch QA (i386, amd64, arm64, ppc64el, riscv64, alpha, m68k, s390x).
- More generic plugin framework, where each "Reader" class can dynamically register, providing own formatting of header and columns, adding own command line option parsing, own preferences of sync vs async, caching policy for async, preferred rates, etc.
- Support for BSD systems (FreeBSD, NetBSD, OpenBSD, DragonFly).
- More networking statistics (i.e. routing, iptable, etc)
- NTP and general time and realtime offset / jitter monitoring.
- Sampling metrics from Prometheus
- Netlink TASKSTAT metrics
- More Linux VM statistics for memory
- Per-core CPU metrics
- Better multi GPU measurements, i.e. iGPU + dGPU, 2x dGPU
- Per-file system statistics, i.e. IO/s, kB/s.
- Support more
hwmon
stuff, i.e. NVMe temperature. smartmontools
integration, i.e. HDD temperature.- Linux namespaces support.
- Linux
cgroups
support. - KVM and Xen guest monitoring.
- PCIe power status / link status / speed
- Battery level readback. Charging power. Input / output voltage for USB PD, etc.
- USB power / current output from host.
- IMPI / OpenBMC integration for power
- PMBus integration
- A framework for I2C and One-Wire sensors maybe.
- Ability to compute number of OTHER processes / threads active in the system, to estimate any background noise when doing benchmarks.
- Lower overhead output formats, with nesting and per-object labels:
- Protocol buffer output
- JSON output
- These could help with some stuff, like supporting multiple CPU cores, multiple network interfaces, multiple processes, multiple hard drives, while supporting order independence by including proper names, or other stable ids, etc.
- GPIB / IEEE 488 and LX support, i.e. reading voltage, current, resistance from multimeters, frequency counters, power supplies, electronic loads, etc. etc.
- Logic analyzer support.
- MIDI input, i.e. reading rotary encoders, pressure sensitive keyboards, etc.
- SNMP readback sampling. It is an atrocious format, but could be useful to read metrics for example from network switches, or printers.
- Octoprint / 3D printer status sampling, i.e. progress of 3D printing, amount of used filament, temperature, etc.
- Execute read queries in databases, i.e. arbitrary SELECT in SQLite, MySQL or PostgreSQL to track something, or performance of these queries (i.e. latency).
- ODB 2 (automotive) support.
- Configurable system and per-user daemon to capture some log with fixed format, with output file rotation.
- Generic logging.
- Event logging.
- Streaming to databases, or such. You can use other tools to do that with
multimonitor
output. - Query language integration. Instead you can feed data from
multimonitor
to lets saykolumny
, Python'spandas
ornumpy
, or SQL database of your choice. - Built-in data processing. These can be done using other tools easily,
like gnuplot,
kolumny
,awk
, or small Python program. - GUI integrated into it. It should be possible to create a separate
GUI program ("driver") that launches
multimonitor
and shows results on graphs, possibly in real-time, and allows to easily configure command line, if needed. - Cross-machine integration. You can just call
multimonitor
on different machines usingssh
and merge files if needed, either directly, usingkolumny
, or feed directly to tools that don't require time alignment, like Gnuplot. - Windows or MacOS support. Too much work, complications to the code and author doesn't use them.
- Config files - all options should be passed explicitly on the command line
for full reproducibility. (However: It might be wise to allow setting
--utc_nice
or--interval_msec
by default using an environment variable, without loosing column numbering).