Skip to content

Conversation

@jmpesp
Copy link
Contributor

@jmpesp jmpesp commented Oct 23, 2025

In the same way that a higher level type was added to SiloUser and SiloGroup, add a type to Disks. Currently this is only Crucible, but that will change imminently to include local storage.

The main driver for this change was for different types of disks to occupy the same PCI BDF space. Having a higher level "Disk" that includes the runtime state for a disk (with information like what instance it's attached to, what slot it occupies) be separate from where the disk's blocks are coming from achieves this without any complicated database or diesel magic.

Call sites that used the Disk db model now take either a Disk datastore type, or directly take a CrucibleDisk (for functions that should only be performed on Crucible disks). Several sagas have changed to only accept CrucibleDisk in the parameters as well.

In this commit there's also a small change to how the instance spec is computed with respect to the storage that provides a new option for instances to have file backed storage. This is currently unused but the framework for creating different components depending on this new Disk type is there.

Other stuff in this commit:

  • fix a bug where project_create_disk would insert a disk into the database, and then perform some checks. If this failed in the context of a saga it would leave behind the inserted disk.

  • delete disk_refetch, fulfilling a TODO-cleanup. similarly, disk_set_runtime is deleted. both of these will need to be revisited for hotplug, but the whole redesign of how sled agent and nexus work with VMMs obviates the old code.

In the same way that a higher level type was added to SiloUser and
SiloGroup, add a type to Disks. Currently this is only Crucible, but
that will change imminently to include local storage.

The main driver for this change was for different types of disks to
occupy the same PCI BDF space. Having a higher level "Disk" that
includes the runtime state for a disk (with information like what
instance it's attached to, what slot it occupies) be separate from where
the disk's blocks are coming from achieves this without any complicated
database or diesel magic.

Call sites that used the Disk db model now take either a Disk datastore
type, or directly take a CrucibleDisk (for functions that should only be
performed on Crucible disks). Several sagas have changed to only accept
CrucibleDisk in the parameters as well.

In this commit there's also a small change to how the instance spec is
computed with respect to the storage that provides a new option for
instances to have file backed storage. This is currently unused but the
framework for creating different components depending on this new Disk
type is there.

Other stuff in this commit:

- fix a bug where `project_create_disk` would insert a disk into the
  database, and _then_ perform some checks. If this failed in the
  context of a saga it would leave behind the inserted disk.

- delete `disk_refetch`, fulfilling a `TODO-cleanup`. similarly,
  `disk_set_runtime` is deleted. both of these will need to be revisited
  for hotplug, but the whole redesign of how sled agent and nexus work
  with VMMs obviates the old code.
@morlandi7 morlandi7 requested a review from hawkw October 27, 2025 18:45
device_path,
}
pub fn slot(&self) -> Option<u8> {
self.slot.map(|x| x.into())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style nit, considered unimportant: could be

Suggested change
self.slot.map(|x| x.into())
self.slot.map(Into::into)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in f4425c6

Crucible => b"crucible"
);

/// A Disk, where how the blocks are stored depend on the disk_type.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it would be worthwhile saying something here (or on the disk_type field?) that data specific to the particular disk type is stored in another table, depending on the value of disk_type. Maybe we also ought to explicitly list those tables here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, done in 227268d

Comment on lines +68 to +70
pub fn pantry_address(&self) -> Option<SocketAddrV6> {
self.pantry_address.as_ref().map(|x| x.parse().unwrap())
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize that this was here before, and we haven't changed this code, but, is the rationale for storing this as a string rather than using Postgres' inet type plus a port is that we want to enforce that this is only ever an IPv6 address?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe so yeah, it's been a while.

Comment on lines +61 to +160
#[derive(Debug, Clone, Serialize, Deserialize)]
pub enum Disk {
Crucible(CrucibleDisk),
}

impl Disk {
pub fn model(&self) -> &model::Disk {
match &self {
Disk::Crucible(disk) => disk.model(),
}
}

pub fn id(&self) -> Uuid {
self.model().id()
}

pub fn name(&self) -> &api::external::Name {
self.model().name()
}

pub fn time_deleted(&self) -> Option<DateTime<Utc>> {
self.model().time_deleted()
}

pub fn project_id(&self) -> Uuid {
self.model().project_id
}

pub fn runtime(&self) -> DiskRuntimeState {
self.model().runtime()
}

pub fn state(&self) -> DiskState {
self.model().state()
}

pub fn size(&self) -> model::ByteCount {
self.model().size
}

pub fn slot(&self) -> Option<u8> {
self.model().slot()
}

pub fn block_size(&self) -> model::BlockSize {
self.model().block_size
}
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct CrucibleDisk {
pub disk: model::Disk,
pub disk_type_crucible: DiskTypeCrucible,
}

impl CrucibleDisk {
pub fn model(&self) -> &model::Disk {
&self.disk
}

pub fn id(&self) -> Uuid {
self.model().id()
}

pub fn name(&self) -> &api::external::Name {
self.model().name()
}

pub fn time_deleted(&self) -> Option<DateTime<Utc>> {
self.model().time_deleted()
}

pub fn project_id(&self) -> Uuid {
self.model().project_id
}

pub fn runtime(&self) -> DiskRuntimeState {
self.model().runtime()
}

pub fn state(&self) -> DiskState {
self.model().state()
}

pub fn size(&self) -> model::ByteCount {
self.model().size
}

pub fn slot(&self) -> Option<u8> {
self.model().slot()
}

pub fn volume_id(&self) -> VolumeUuid {
self.disk_type_crucible.volume_id()
}

pub fn pantry_address(&self) -> Option<SocketAddrV6> {
self.disk_type_crucible.pantry_address()
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this representation is interesting. It seems like storing the model within the types that represent different disk types and providing an accessor for the shared model results in a certain amount of boilerplate that has to be duplicated into every new disk type.

Is there a reason we've arrived at this representation rather than one where the shared disk model is present on the type that represents every kind of disk, like:

pub struct Disk {
    model: model::Disk,
    type_specific: DiskType,
}

enum DiskType {
    Crucible(model::DiskTypeCrucible),
}

Or, is this intended to allow us to always have the cross-model state accessible regardless of the disk type?

If we need that, might also consider a representation that's like:

pub struct Disk<T> {
    pub model: model::Disk,
    pub type_specific: T,
}

so that the shared code need not be repeated every time?

For what it's worth, I'm certainly not trying to argue that the way you've done it here is wrong, just curious about what led us to settle on this way of representing it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shook out this way because I wanted not to have to deal with any Options (beyond what the actual db model fields have), have things like volume_id and pantry_address only be defined for a CrucibleDisk, and have the ability for functions to accept the specific type of CrucibleDisk instead of Disk. There is a regrettable amount of copy and paste of that boilerplate though.

Ok(disk)
}

/// Return all the Crucible Disks matching a list of volume ids. Currently
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
/// Return all the Crucible Disks matching a list of volume ids. Currently
/// Return all the Crucible Disks matching a list of volume IDs. Currently

though i presume this was already there?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


if self.slot_usage.contains(&slot) {
return Err(Error::internal_error(&format!(
"disk PCI slot {slot} used more than once",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: should this mention which disk we were trying to attach in that slot?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +505 to +506
// XXX here is where we would restrict the type of disk used for
// a boot disk. should we?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using a file-backed disk as the boot device would mean that the instance can never be restarted anywhere else, and is perma-wasted if the physical disk that boot device file was on goes away...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, using any local storage restricts the VMM to only ever appearing on one sled. I'm inclined to let users select local storage as a boot volume, but for it to work they would need to:

  • boot first from a Crucible volume made from an image or snapshot
  • prepare the local storage disk they want to use by installing an OS and/or formatting how ever they want
  • then stop the instance, detach the crucible volume, start the instance again, selecting it as a boot disk

It's not impossible to do, just a bit annoying. Maybe possible to automate with Packer? Coming back to this after a few days, I can't come up with a reason for restricting boot disk type.

Comment on lines +1477 to +1479
CREATE TYPE IF NOT EXISTS omicron.public.disk_type AS ENUM (
'crucible'
);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be nice to have a comment here saying that this indicates which other table to query in order to look up disk-type-specific information...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would the new comment for disk_type in nexus/db-model/src/disk.rs suffice? :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It occurs to me that we might be able to reduce some of the boilerplate in code that queries the database for specific disk types if we made a SQL view for each disk type that's defined like

CREATE VIEW IF NOT EXISTS
     omicron.public.crucible_disk
AS SELECT
    -- ...
FROM
    omicron.public.disk
INNER JOIN
    omicron.public.disk_type_crucible
ON
    disk.id = disk_type_crucible.disk_id
WHERE
    disk.type = ' crucible'

or whatever?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hah, I did this exact thing first! When asking about it, we decided it was better that Nexus be aware of the different underlying tables rather than trying to mask that here. I also couldn't insert into a view so the disk slot allocation query didn't work.

* table?
*/
/* Runtime state */
-- disk_state omicron.public.DiskState NOT NULL, /* TODO see above */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this line now references "TODO see above" and there is no longer an "above" comment. should we just delete this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this above comment this is referencing starts on line 1451?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants