Allow up to 254 vCPUs to a VM #9385
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This follows on turning the crank to max vCPUs in Helios and Propolis; if the hardware has so many vCPUs available, what's to stop someone from allocating them all for a single VM?
Similar to creating a VM requiring more memory than is available, one can create (or resize) a VM into a size that is much larger than any hardware has, or is available at runtime. Attempting to run such an instance will error because the instance can't get placed.
One could imagine a future operator control to limit max VM sizes for a silo; larger VMs get more difficult to migrate, can be more difficult to place. Without something like "anti-fragmentation" to group smaller VMs together it's quite possible that a sled could have 255 CPUs, 2 vCPUs for one small VM, 253 CPUs not spoken for, and unable to fit a 254 vCPU VM.
Further, 254 busy vCPUs leaves zero to one CPUs available for Propolis, driving emulated hardware, processing I/O, co-located Crucible, sled-agent, other services, etc. There is no mechanism to earmark CPUs for control plane and I/O purposes, so this isn't any worse than the status quo. But when such a mechanism comes to exist, we'll need to gracefully tolerate prior existence of sled-or-larger-size VMs.
Note that Helios is fine with being asked to oversubscribe hardware threads to vCPUs, and that's how I'd tested that a 254-vCPU VM works reasonably (on a 32-thread CPU).
test_cannot_provision_instance_beyond_cpu_capacityis the demonstration that the control plane isn't willing to oversubscribe hardware in practice.(Dan pointed out to me a bit ago that we could allow 255 vCPUs - my choice of 254 on the Helios side was really a fencepost error on my part. But I'd like to disallow odd vCPU counts in the first place, related to Propolis#940, so 254 is fine.)