Evaluate Salt impact on workstation deployment #973

deeplow · 2024-04-08T11:44:20Z

A project goal has been to remove salt from where it's not strictly needed to speed up deployment / make it more transparent. Some work is already under way in #936 and #837.

However I haven't seen yet exact measurements of what this performance impact is. This ticket addresses that.

I ran some profile on the current salt states. Basically the result of adding --out=profile to all the qubesctl commands in the files/provision-all script. The result from the first run is here and the subsequent sdw-admin --apply.

salt module	first run time (s)	upon migration (s)
file	0.508	0.370
pkg	385.832	10.683
pkgrepo	2.335	0.325
qvm	82.828	12.565
cmd	68.398	19.654
total	537.674	43.435

The above results were obtained with the following (replacing <MODULE> with the appropriate salt module):

grep -E "<MODULE>\.*"  sdw-4.2-profiling.txt | grep -o -E "[0-9]+\.[0-9]+" | jq -s 'add'

Discussion

From a pure salt execution time perspective, what we are planning to remove from templates (file, pkgrepo modules) have no impact at all. Furthermore pkg module would be moved installed in templates as dependencies, arguably also taking similar time.

Solely from these values, saltstack appears to introduce no additional overhead. However, this isn't all. After full installation sdw-admin --apply took me around 6m45s whereas salt only counts ~45s. Where did the 6 minutes come from?! I have dug enough to understand fully by my guess is that this comes from the VM statup overhead of the Qubes-salt implementation. In particular, the following mechanism (original discussion):

For every VM managed by Salt (from dom0):

Start target VM.

Have dom0 to create DispVM.

Send all the Salt configuration there.

Grant it qubes.VMShell access to that selected VM only

Run salt-ssh (over qrexec) from the DispVM, targeting that single VM. Do not filter return channel there - so for example all the grains will be available to salt-ssh during state rendering.

Collect output back to dom0 (success/failure flag, optionally logging full output to some file)

Destroy DispVM

Shutdown target VM (opt-out? only when wasn't running at the start?).

Repeat above for next VM. Multiple instances can be running at the same time, if there is enough memory.

In other words, in order to target a sd-app, as we know, it creates a disp-mgmt-sd-app, which then does the provisioning. So each single target qube launches two qubes: the target and its disposable qube.

How can reduce the impact of salt in provisioning?

Based on the above assumption, my conclusion is that our best chance to reduce SaltStack performance impact is to drop salt completely from VM management. To accomplish this, shared templates should be able to bootstrap into any app qube on boot (conditional on-boot provisioning).

The text was updated successfully, but these errors were encountered:

deeplow · 2024-04-08T11:47:58Z

Closing this unless there are other profiling-related questions that need to be explored.

cfm · 2024-04-08T21:17:49Z

Thank you for this terrific analysis, @deeplow.

legoktm · 2024-04-09T15:22:14Z

+1, this is great.

my conclusion is that our best chance to reduce SaltStack performance impact is to drop salt completely from VM management

Reminds me of https://dom.as/2007/11/15/optimization-operator/ :)

deeplow closed this as completed Apr 8, 2024

deeplow mentioned this issue May 21, 2024

refactor service configuration from Salt (at provisioning) to systemd (at boot) #1004

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluate Salt impact on workstation deployment #973

Evaluate Salt impact on workstation deployment #973

deeplow commented Apr 8, 2024

deeplow commented Apr 8, 2024

cfm commented Apr 8, 2024

legoktm commented Apr 9, 2024

Evaluate Salt impact on workstation deployment #973

Evaluate Salt impact on workstation deployment #973

Comments

deeplow commented Apr 8, 2024

Discussion

How can reduce the impact of salt in provisioning?

deeplow commented Apr 8, 2024

cfm commented Apr 8, 2024

legoktm commented Apr 9, 2024