A minimalistic tool designed to run qemu-based virtual machines based on definitions stored in configuration files with support for vCPU pinning.
Qemu allows to specify all necessary configuration flags via command line arguments, for example:
qemu -M q35 -m 8G -cpu host -smp 4 ...
but this is very inconvenient. Qemu also supports its own configuration file format, however it is incomplete and does not allow to set all the options that are supported via command line arguments.
Initial approach taken to resolve this was using a collection of shell scripts that had all the needed options
listed, but after some time, as an amount of virtual machines needed has grown, this approach become a little
cumbersome, so the decision was made to write a tiny C++ application
that accepts a configuration file name and reads a list of command line options to pass to the qemu
binary. This
served the purpose well for a while.
Lately, however a need to run high performance and low latency virtual machines arose, and in order to achieve this, CPU pinning was identified as a requirement. Libvirt was used for a while at the beginning, but it is extremely heavy weight, very opinionated on how things are arranged, depends on a significant amount of dependencies and is simply overkill for the use case at hand.
Hence, this project was born - an evolution of previously mentioned qemu-wrapper
, rewritten in Rust and extended
to support CPU pinning functionality. And as a pleasant bonus, instead of the homegrown configuration file format
it uses Yaml.
qemu-launcher foo
will attempt to load the foo.yml
file from the configuration directory (which is /usr/local/etc/qemu-launcher
by default) and launch the virtual machine defined that file. A path to the directory where virtual machine
definition files are located can be changed by setting the QEMU_LAUNCHER_CONFIG_DIR
environment variable, e.g.:
QEMU_LAUNCHER_CONFIG_DIR="/etc/my-vms" qemu-launcher bar
which will attempt to load the /etc/my-vms/bar.yml
configuration file. Additionally, when using vCPU pinning in
the configuration file, qemu-launcher
will attempt to create the directory, if it does not exist, and mount the
cpuset cgroup tree under the /sys/fs/cgroup/cpuset
path. This path can be controlled by setting another
environment variable - QEMU_LAUNCHER_CPUSET_MOUNT_PATH
. By default, the qemu launcher will create the qemu
prefix subdirectory under the mount path. This can be controlled by the QEMU_LAUNCHER_CPUSET_PREFIX
environment
variable. It will then create a pool
subdirectory inside of the prefix, which will use only non-pinned cores and
an additional directories will be created for each pinned core as needed. All running tasks are migrated to the
pool
cpuset and only the qemu virtual machine vCPU threads are pinned to the core-specific sets.
All virtual machine configuration files should be stored in a single directory and must use the .yml
file
extension. There are two top level keys supported:
launcher
- to control the launcher settings itself;qemu
- to specify command line options to be passed to qemu binary;
All keys but binary
in the launcher
section are optional. The following keys are supported:
binary
- string, mandatory. Used to specify the name, or full path if the binary is not in the default$PATH
variable, of the qemu emulator binary.clear_env
- boolean, optional, defaults tofalse
. If set to true, the environment variables of theqemu-launcher
process will not be forwarded to the qemu child process.env
- hash, optional. Allows to provide additional environment variables for the child qemu process. Example:
env:
# Set the audio driver for qemu to use PulseAudio
QEMU_AUDIO_DRV: pa
debug
- boolean, optional, default tofalse
. Controls where additional debugging information should be printed by theqemu-launcher
, such as vCPU pinning mapping.user
- integer, optional. Set an effective user ID that will be used to launch the qemu child process. This can be useful when theqemu-launcher
is executed with elevated privileges, i.e. when using vCPU pinning feature.group
- integer, optional. Same asuser
, but setting the effective group ID for the child process.priority
- integer, optional. Does not work if thescheduler
is not specified. Set a priority to be set usingchrt
for each of the vCPU threads (requires elevated privileges).scheduler
- string, optional. Must be one ofbatch
,deadline
,fifo
,idle
,other
orrr
. Does not work if thepriority
option is not specified. Set a policy using thechrt
for each of the vCPU threads (requires elevated privileges).vcpu_pinning
- hash, optional. Configures how to pin threads responsible for each vCPU core to a logical processor of the hypervisor machine. First dimension matches thesocket
of the virtual machine processor, second matches thecore
and third matches thethread
, for example:
vcpu_pinning:
# vCPU socket 0
0:
# vCPU core 0
0:
# vCPU thread 0 : host logical CPU 2
0: 2
1: 6
1:
0: 3
1: 7
the thread 0 of the core 0 on the socket 0 will be pinned to the logical host processor 2, thread 1 core 0 socket 0 to 6, thread 0 core 1 socket 0 to 3 and thread 1 core 1 socket 0 to 7.
rlimit_memlock
- boolean, optional, defaults tofalse
. When set totrue
theqemu-launcher
will change an amount of memory that can be locked by theqemu
process tounlimited
, using thesetrlimit(2)
system call. Both, soft and hard limits are unset. This is necessary for systems that have a low limit set by default for the amount of memory that a single process can lock.
In order to achieve the best possible virtual machine performance, it is necessary to match the number of threads
per core of the virtual machine to the threads per core of the hypervisor, and for hyper-threaded processors, vCPU
threads of the same virtual core should be allocated to the logical cores of the same host core. For example,
looking at the /proc/cpuinfo
of a random host machine:
$ cat /proc/cpuinfo
processor : 0
...
core id : 0
...
processor : 1
...
core id : 1
...
processor : 2
...
core id : 0
...
processor : 3
...
core id : 1
...
note how core id is the same for logical processor 0 and 2, meaning that they are threads of the same core. So if we to allocate a single multi-threaded vCPU core, it is best to map it like:
vcpu_pinning:
0:
0:
0: 0
1: 2
or
vcpu_pinning:
0:
0:
0: 1
1: 3
and the latter is actually preferred, because the logical processor 0 in Linux usually has some tasks associated with it, such as handling IRQs and should be avoided to achieve the maximum performance.
This section is largely driven by the supported qemu options and can be easily translated back and forth. The best way to explain this would be using the following example:
qemu -name "my virtual machine" -machine pc,accel=kvm -nographic -monitor unix:/run/my-vm.sock -cpu host \
-smp 2,sockets=1,cores=1,threads=2 -m 2G -device ide-hd,bus=ide.0,drive=drive0 \
-drive file=/var/storage/my-vm.qcow,if=none,id=drive0,format=qcow,readonly
would translate into the following YAML definition:
qemu:
- name: my virtual machine
- machine: [ pc, accel: kvm ]
- nographic
- monitor: [ unix:/run/my-vm.sock ]
- cpu: host
- smp: [ 2, sockets: 1, cores: 1, threads: 1 ]
- m: 2G
- device: [ ide-hd, bus: ide.0, drive: drive0 ]
- drive: [ file: /var/storage/my-vm.qcow, if: none, id: drive0, format: qcow, readonly ]
these options can equally be written in the following way (all argument values are just string):
qemu:
- name: my virtual machine
- machine: pc,accel=kvm
- nographic
- monitor: unix:/run/my-vm.sock
- cpu: host
- smp: 2,sockets=1,cores=1,threads=1
- m: 2G
- device: ide-hd,bus=ide.0,drive=drive0
- drive: file=/var/storage/my-vm.qcow,if=none,id=drive0,format=qcow,readonly
or even in the mixed way:
qemu:
- name: [ my virtual machine ] # Array with one element
- machine:
# Array of strings written in the more verbose way
- pc,
- accel=kvm
# Simple flag parameter
- nographic
- monitor:
- unix:/run/my-vm.sock
# This is simila to the name key, but using a different YAML syntax
- cpu:
- host
- smp:
# Simple integer
- 2
# Explicit hash
- { sockets: 1 }
# Implicit hash
- cores: 1
- { threads: 1 }
- m: 2G
- device: ide-hd,bus=ide.0,drive=drive0
- drive: file=/var/storage/my-vm.qcow,if=none,id=drive0,format=qcow,readonly
First one though is easier to read and understand when compared to others.
To achieve the best performance possible it is necessary to use vCPU pinning together with custom scheduler and
higher thread priorities. In order to be able to perform these operations qemu-launcher
has to be executed with
elevated privileges.
Simplest way to do so is to execute the qemu-launcher
binary directly as the root
user, or using tools like
sudo
or pkexec
. Yet another option is to set the SETUID bit on the binary using chmod u+s qemu-launcher
command.
For most security-conscious users, it is possible to use the pre-mounted cpuset
tree and allow all users who need
to execute the qemu-launcher
an access to modify cpuset entries and allow changing process priorities if
necessary.
For those who are interested in the high level overview of how this tool works, there is a short summary. The
qemu-launcher
binary first loads the YAML definition file for the specified virtual machine, compiles the list of
command line options for qemu, appends the -qmp stdio
option and launches the child qemu process, setting the
effective user and group IDs for it and manipulating the environment variable as desired.
If the vCPU pinning is specified, qemu-launcher
will negotiate capabilities using the QMP protocol (JSON control
protocol provided by the qemu) using stdio of the child process and execute the query-cpus-fast
command to obtain
the list of the qemu process threads responsible for each vCPU socket/core/thread triplet. It will then proceeds to
check if the cpuset cgroup tree is mounted under the specified mount point, mounting it if it is not. Next the a
prefix (qemu
by default) directory where all cpusets will be hosted is created and all cores available on the
hypervisor are allocated to this prefix. The pool
directory is created, representing a pool of available
(non-pinned) host CPU threads and all currently running tasks are moved into this new pool
cpuset. Then a
separate cpu sets (subdirectories) for each logical core to which threads will be pinned are created and respective
logical core id is removed from the pool
cpu set and written into the cpu set dedicated for this core, and
finally moves the vCPU thread over to the newly created cpu set to shield it from other tasks.
If the priority
and the scheduler
options of the launcher
configuration section are provided, then the
application will execute the chrt
command (see man chrt(1)
) for each vCPU thread ID, obtained via the QMP
protocol as described above, passing it the preferred scheduler and the priority parameters.
The application then sits calmly, waiting for the child qemu process to finish and unwinds the changes done to the cpu sets.
Just open an issue or a pull request, describe what problem you are facing or feature you would like to see and we will work together to see what can be done.