[reconfigurator] RoT planner support #8421

karencfv · 2025-06-23T07:12:43Z

This commit implements support for RoT updates in the planner and support for reconfigurator-cli updates as well.

TODO:

sign verification to know how to choose artifacts [tuf] Store rkth/sign hashes in TUF repo description #8729
more tests

dev-tools/reconfigurator-cli/tests/output/target-release-stdout

karencfv · 2025-06-26T08:38:04Z

@davepacheco just as an FYI this is what I have for the RoT planner support so far. It's not finished yet, but the main bulk of it is done. I just need to refine some bits and add more tests. A big chunk of the lines of code are generated test files btw.

karencfv · 2025-06-27T04:40:02Z

I'm not sure how soon we're planning to demo this as part of the update bring-up. As you all know, I'll be out next week. So, if you'd like to use this as part of the demo soon, please feel free to modify/merge/close/superseed this PR as you all see fit, if this is something you'd like to do to speed up things 😄

karencfv · 2025-07-07T08:05:58Z

dev-tools/reconfigurator-cli/tests/input/target-release.txt

#8478 may change the way this test works if implemented before I merge this PR

karencfv · 2025-07-07T08:18:17Z

Heya! 👋 Just a tiny ping that this is ready to review :)

davepacheco

Nice! This is looking good but there are a few details to nail down.

dev-tools/reconfigurator-cli/src/lib.rs

dev-tools/reconfigurator-cli/tests/output/target-release-stdout

nexus/reconfigurator/planning/src/planner.rs

davepacheco · 2025-07-08T03:56:43Z

nexus/reconfigurator/planning/src/mgs_updates/mod.rs

+            let found_active_version =
+                ArtifactVersion::new(active_caboose.caboose.version.clone())
+                    .map_err(|e| {
+                        MgsUpdateStatusError::FailedArtifactVersionParse(e)
+                    })?;


This isn't very important but it looks like for the SP case we just operated on strings without assuming we could parse them. Do we need to parse the RoT versions?

Just to populate ExpectedActiveRotSlot. I thought it'd be cleaner to use the same type as
expected_active_slot since it is a struct containing two fields rather than just a version or a slot.

Don't have a super strong opinion about this though. Happy to change if you think what I did doesn't make sense

nexus/reconfigurator/planning/src/mgs_updates/mod.rs

karencfv

Alright! Finally got all the moving pieces in place 🎉 I think I've addressed all the comments and this should be ready for a new round of reviews. Thanks everyone!

karencfv · 2025-08-07T06:41:31Z

nexus/reconfigurator/planning/src/mgs_updates/mod.rs

            ),
+            make_artifact(
+                "oxide-rot-1",
+                ArtifactKind::GIMLET_ROT_IMAGE_A,


I thought so too! Had to adapt like this unfortunately 🤷‍♀️

karencfv · 2025-08-07T06:48:31Z

nexus/reconfigurator/planning/src/mgs_updates/mod.rs

+    };
+
+    let board = &active_caboose.caboose.board;
+    let Some(rkth) = &active_caboose.caboose.sign else {


We spoke about comparing against the CMPA page's rkth field instead. But unfortunately, all I get from inventory is a base64 encoded string. I did not find a way in omicron or hubtools to decode that safely into a known structure. While a worthy goal to compare the artifact's sign to the CMPA's rkth field, in the interest of time, I think for now I'd like to compare against the inventory caboose's sign field.

I can write up a follow up issue to tackle this at a later time. #8799

jgallagher · 2025-08-07T15:32:31Z

dev-tools/reconfigurator-cli/tests/output/cmds-example-stdout

-RoT bootloader stage 0 next version: None
 SP active version:   Some("0.0.1")
 SP inactive version: None
+RoT bootloader stage 0 version:   Some("0.0.1")


Tiny nit - this has extra spaces but not enough to line it up with the None on the next line. Should it?

Urgh, yeah. That was a bad copy-paste. I made them all line up, but it looks awful RoT pending persistent boot preference is too long 😄, and it's not really a table. I think it's better if they don't line up at all.

jgallagher · 2025-08-07T17:05:57Z

dev-tools/reconfigurator-cli/tests/output/cmds-target-release-stdout

-> # for another sled.
-> sled-update-sp 98e6b7c2-2efa-41ca-b20a-0a4d61102fe6 --active 1.0.0
-set sled 98e6b7c2-2efa-41ca-b20a-0a4d61102fe6 SP versions: active -> 1.0.0
+> # for an SP on the same sled.


Is the expectation that we'll update the bootloader -> RoT -> SP on one sled before we move to the next sled? I naively thought it would be "update all sleds' bootloaders" -> "update all sleds' RoTs" -> "update all sleds' SPs".

Yep! That is my understanding. Or rather host OS -> bootloader -> RoT -> SP as specified in RFD 565. I'm not sure how this will specifically affect the work you're doing with host OS though. Maybe @davepacheco has more input on this?

nexus/reconfigurator/planning/src/system.rs

nexus/reconfigurator/planning/src/mgs_updates/mod.rs

jgallagher · 2025-08-07T17:20:16Z

nexus/reconfigurator/planning/src/mgs_updates/mod.rs

            ),
+            make_artifact(
+                "oxide-rot-1",
+                ArtifactKind::GIMLET_ROT_IMAGE_A,


I think there's a lot to be desired here and it's strongly related to the "do something more reasonable with TUF" stuff that came up recently with the measurements work, maybe?

karencfv

Thanks for the review! I think I've addressed all of the comments

I've rolled back the changes for the cmds-target-release test. I need to make some changes to the fake artifacts to include a caboose so the planner can choose an artifact using the sign within them. I have an idea on how to do this based on https://github.com/oxidecomputer/hubtools/blob/main/hubtools/src/archive_builder.rs#L33-L77 (thanks Laura for the tip), but it will take some digging into.

I'd like to work on that test on a follow up PR, to no longer keep holding this one up if people are OK with that. @jgallagher will be working on the planner changes for host OS soon, and this PR makes several changes to the structure of the MGS driven updates in the planner.

Opened: #8798

karencfv · 2025-08-07T23:36:56Z

dev-tools/reconfigurator-cli/tests/output/cmds-example-stdout

-RoT bootloader stage 0 next version: None
 SP active version:   Some("0.0.1")
 SP inactive version: None
+RoT bootloader stage 0 version:   Some("0.0.1")


Urgh, yeah. That was a bad copy-paste. I made them all line up, but it looks awful RoT pending persistent boot preference is too long 😄, and it's not really a table. I think it's better if they don't line up at all.

karencfv · 2025-08-07T23:53:49Z

dev-tools/reconfigurator-cli/tests/output/cmds-target-release-stdout

-> # for another sled.
-> sled-update-sp 98e6b7c2-2efa-41ca-b20a-0a4d61102fe6 --active 1.0.0
-set sled 98e6b7c2-2efa-41ca-b20a-0a4d61102fe6 SP versions: active -> 1.0.0
+> # for an SP on the same sled.


Yep! That is my understanding. Or rather host OS -> bootloader -> RoT -> SP as specified in RFD 565. I'm not sure how this will specifically affect the work you're doing with host OS though. Maybe @davepacheco has more input on this?

nexus/reconfigurator/planning/src/mgs_updates/mod.rs

nexus/reconfigurator/planning/src/system.rs

davepacheco

Looking good!

nexus/reconfigurator/planning/src/system.rs

nexus/reconfigurator/planning/src/mgs_updates/rot.rs

davepacheco · 2025-08-08T22:29:05Z

nexus/reconfigurator/planning/src/mgs_updates/rot.rs

+        return MgsUpdateStatus::Impossible;
+    }
+
+    // If found pending persistent boot preference is not empty, then an update


I'm having trouble figuring out when we would hit this case. In a normal update, I assume the expected pending persistent boot preference would be None. During the update, we might find it to be Some, but in that case, we would have bailed out above with Impossible. (Is that correct? I guess it's fine, and potentially necessary if this is part of the precondition check.) In that case, we might write a new PendingMgsUpdate where the expected pending persistent boot preference was Some. Next time around, we might wind up here. But is it right to return NotDone? Don't we need to check whether mgs_update_status_inactive_versions() returns Impossible?

Put differently: I'd expect us to check all the Impossible cases, and if none of this is true, then it's NotDone (rather than have any explicit conditions under which we return NotDone). If that's true, I think we could just strike these two if blocks altogether (this one and the next one).

I think it's possible to end up in a state where found_pending_persistent_boot_preference and expected_pending_persistent_boot_preference are Some(), which would not be caught in the impossible cases above. But, it's true that then this would be caught as a NotDone in mgs_update_status_inactive_versions(). I liked being explicit as to why we may find ourselves in a NotDone state, but I guess it's not really necessary and introduces complexity.

nexus/reconfigurator/planning/src/mgs_updates/rot.rs

davepacheco · 2025-08-08T22:45:22Z

nexus/reconfigurator/planning/src/mgs_updates/mod.rs

+            "gimlet_rot_image_a" => ARTIFACT_HASH_ROT_GIMLET_A,
+            "gimlet_rot_image_b" => ARTIFACT_HASH_ROT_GIMLET_B,
+            "psc_rot_image_a" => ARTIFACT_HASH_ROT_PSC_A,
+            "psc_rot_image_b" => ARTIFACT_HASH_ROT_PSC_B,
+            "switch_rot_image_a" => ARTIFACT_HASH_ROT_SWITCH_A,
+            "switch_rot_image_b" => ARTIFACT_HASH_ROT_SWITCH_B,


Is there some way we could avoid hardcoding these strings?

It looks like there are some constants for these in tufaceous_artifact (e.g., ArtifactKind::GIMLET_ROT_IMAGE_A). I'm not sure if you can use these on the left side of the match. If not, maybe a bunch of ifs? It seems worth it to avoid hardcoding these strings.

I can't really use the constants as they are. It would require StructuralPartialEq. Since this is just a test, it didn't seem worth it to go make changes in tufaceous with a nightly-only experimental API. But sure! I can use a bunch of ifs.

nexus/reconfigurator/planning/src/mgs_updates/mod.rs

davepacheco · 2025-08-08T22:55:42Z

nexus/reconfigurator/planning/src/mgs_updates/mod.rs

+        let PendingMgsUpdateDetails::Rot {
+            expected_active_slot: old_expected_active_slot,


If I'm understanding right, this changed because the surrounding code was just trying to do an update, previously that was always an SP update, but the way you changed things at L1293 - L1295 meant that two updates were possible, and the RoT one would be first.

I would suggest doing one of these:

have this do the SP update just like it used to. (Change L1294 to be an empty map, I think?)

make it clearer that this will do an RoT update by making that the only one possible (change L1291 to be an empty map?)

duplicate the test, once for each device kind

Yep! In the end I decided to completely extract everything RoT related and create another test for RoT only. test_basic was getting pretty long and complex, and it was only going to get worse when bootloader and host OS are implemented 😅

nexus/reconfigurator/planning/src/mgs_updates/mod.rs

karencfv

Thanks for the thorough review! I think I've addressed all of your comments. Let me know if I'm missing anything!

karencfv · 2025-08-11T01:44:39Z

nexus/reconfigurator/planning/src/mgs_updates/mod.rs

+            "gimlet_rot_image_a" => ARTIFACT_HASH_ROT_GIMLET_A,
+            "gimlet_rot_image_b" => ARTIFACT_HASH_ROT_GIMLET_B,
+            "psc_rot_image_a" => ARTIFACT_HASH_ROT_PSC_A,
+            "psc_rot_image_b" => ARTIFACT_HASH_ROT_PSC_B,
+            "switch_rot_image_a" => ARTIFACT_HASH_ROT_SWITCH_A,
+            "switch_rot_image_b" => ARTIFACT_HASH_ROT_SWITCH_B,


I can't really use the constants as they are. It would require StructuralPartialEq. Since this is just a test, it didn't seem worth it to go make changes in tufaceous with a nightly-only experimental API. But sure! I can use a bunch of ifs.

karencfv · 2025-08-11T03:49:56Z

nexus/reconfigurator/planning/src/mgs_updates/mod.rs

+        let PendingMgsUpdateDetails::Rot {
+            expected_active_slot: old_expected_active_slot,


Yep! In the end I decided to completely extract everything RoT related and create another test for RoT only. test_basic was getting pretty long and complex, and it was only going to get worse when bootloader and host OS are implemented 😅

karencfv · 2025-08-11T03:54:17Z

nexus/reconfigurator/planning/src/mgs_updates/rot.rs

+// License, v. 2.0. If a copy of the MPL was not distributed with this
+// file, You can obtain one at https://mozilla.org/MPL/2.0/.
+
+//! Facilities for making choices about RoT updates


I can tackle that in a follow up PR :)

karencfv · 2025-08-11T04:23:37Z

nexus/reconfigurator/planning/src/mgs_updates/rot.rs

+        return MgsUpdateStatus::Impossible;
+    }
+
+    // If found pending persistent boot preference is not empty, then an update


I think it's possible to end up in a state where found_pending_persistent_boot_preference and expected_pending_persistent_boot_preference are Some(), which would not be caught in the impossible cases above. But, it's true that then this would be caught as a NotDone in mgs_update_status_inactive_versions(). I liked being explicit as to why we may find ourselves in a NotDone state, but I guess it's not really necessary and introduces complexity.

karencfv · 2025-08-11T04:35:42Z

nexus/reconfigurator/planning/src/mgs_updates/rot.rs

+        // than 1 artifact for the same board and root key table hash (RKTH)
+        // that can be verified afgainst the RoT's CMPA/CFPA. But it doesn't
+        // prevent us from picking one and proceeding. Make a note and proceed.
+        error!(log, "found more than one matching artifact for RoT update");


huh 🤔 I'm not sure why I changed that

nexus/reconfigurator/planning/src/mgs_updates/mod.rs

karencfv · 2025-08-11T20:38:02Z

Looks like there is a merge freeze for R16. I'll merge once that's open again

…ule (#8826) This is just extracting code and moving it to another file. No changes have been made. Follow up to #8421 (comment)

Analogous to #8421 for RoT bootloader. Reconfigurator CLI support has already been implemented #8620 Closes: #8668

karencfv added 3 commits June 23, 2025 19:11

[reconfigurator] CLI support for setting RoT versions

f2e8de3

expectorate

fd098df

planner support for configuring RoT update

799b169

karencfv changed the title ~~[reconfigurator] CLI support for setting RoT versions~~ [reconfigurator] RoT planner support Jun 25, 2025

set update hierarchy and clean up

914d4d5

karencfv commented Jun 25, 2025

View reviewed changes

dev-tools/reconfigurator-cli/tests/output/target-release-stdout Outdated Show resolved Hide resolved

karencfv added 4 commits June 25, 2025 21:25

Remove unnecessary todo

3c7cc8b

Merge main

13c7934

Add additional checks

d9ca092

clean up

7f8904e

karencfv added 4 commits June 27, 2025 15:13

Remove additional arguments

6bee79f

Add information to sled-show command

6e0f444

clean up

4827e16

clean up

e045766

karencfv marked this pull request as ready for review June 27, 2025 04:34

karencfv requested review from davepacheco, jgallagher and plotnick June 27, 2025 04:34

davepacheco assigned karencfv Jul 1, 2025

karencfv commented Jul 7, 2025

View reviewed changes

davepacheco reviewed Jul 8, 2025

View reviewed changes

davepacheco modified the milestones: 15, 16 Jul 8, 2025

karencfv added 3 commits July 9, 2025 13:50

merge main

3c4fe9f

fixes after merge

f515fb7

address style comments

c44d9bf

karencfv added 4 commits August 7, 2025 13:31

Merge branch 'main' into rot-planner

6b635ea

refactor tests

374f613

Improve tests

a63233b

Improve testing

77c2409

karencfv commented Aug 7, 2025

View reviewed changes

karencfv requested a review from davepacheco August 7, 2025 06:50

jgallagher reviewed Aug 7, 2025

View reviewed changes

karencfv added 4 commits August 8, 2025 10:09

expectorate

5ecf6a5

Address comments

2456fda

Extract rot code into its own submodule

46526ee

roll back cmds-target-release rot testing

d4f31f9

karencfv commented Aug 8, 2025

View reviewed changes

karencfv requested a review from jgallagher August 8, 2025 01:28

This was referenced Aug 8, 2025

Test RoT and bootloader updates with reconfigurator CLI #8798

Closed

Extract RKTH from CMPA RotPage in inventory to use in planner #8799

Open

davepacheco reviewed Aug 8, 2025

View reviewed changes

Address comments

5da11b0

karencfv commented Aug 11, 2025

View reviewed changes

davepacheco approved these changes Aug 11, 2025

View reviewed changes

nexus/reconfigurator/planning/src/mgs_updates/mod.rs Outdated Show resolved Hide resolved

typo

82b45e0

karencfv enabled auto-merge (squash) August 11, 2025 20:32

karencfv disabled auto-merge August 11, 2025 20:36

karencfv enabled auto-merge (squash) August 11, 2025 23:35

karencfv merged commit 1591e21 into oxidecomputer:main Aug 12, 2025
17 checks passed

karencfv deleted the rot-planner branch August 12, 2025 00:39

karencfv mentioned this pull request Aug 12, 2025

[reconfigurator] Extract SP related planning code into its own submodule #8826

Merged

karencfv added a commit that referenced this pull request Aug 12, 2025

[reconfigurator] Extract SP related planning code into its own submod…

2a9af3b

…ule (#8826) This is just extracting code and moving it to another file. No changes have been made. Follow up to #8421 (comment)

karencfv added a commit that referenced this pull request Aug 13, 2025

[reconfigurator] RoT bootloader planner support (#8664)

ba17a4f

Analogous to #8421 for RoT bootloader. Reconfigurator CLI support has already been implemented #8620 Closes: #8668

karencfv mentioned this pull request Aug 27, 2025

[reconfigurator-cli] Fixes and improvements for RoT update testing #8904

Merged

		let PendingMgsUpdateDetails::Rot {
		expected_active_slot: old_expected_active_slot,

[reconfigurator] RoT planner support #8421

[reconfigurator] RoT planner support #8421

Uh oh!

Conversation

karencfv commented Jun 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

karencfv commented Jun 26, 2025

Uh oh!

karencfv commented Jun 27, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

karencfv commented Jul 7, 2025

Uh oh!

davepacheco left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

karencfv left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

karencfv Aug 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

karencfv left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

davepacheco left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

karencfv commented Jun 23, 2025 •

edited

Loading

karencfv Aug 7, 2025 •

edited

Loading

karencfv left a comment •

edited

Loading