-
Notifications
You must be signed in to change notification settings - Fork 59
[reconfigurator] RoT planner support #8421
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
dev-tools/reconfigurator-cli/tests/output/target-release-stdout
Outdated
Show resolved
Hide resolved
@davepacheco just as an FYI this is what I have for the RoT planner support so far. It's not finished yet, but the main bulk of it is done. I just need to refine some bits and add more tests. A big chunk of the lines of code are generated test files btw. |
I'm not sure how soon we're planning to demo this as part of the update bring-up. As you all know, I'll be out next week. So, if you'd like to use this as part of the demo soon, please feel free to modify/merge/close/superseed this PR as you all see fit, if this is something you'd like to do to speed up things 😄 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#8478 may change the way this test works if implemented before I merge this PR
Heya! 👋 Just a tiny ping that this is ready to review :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! This is looking good but there are a few details to nail down.
dev-tools/reconfigurator-cli/tests/output/target-release-stdout
Outdated
Show resolved
Hide resolved
let found_active_version = | ||
ArtifactVersion::new(active_caboose.caboose.version.clone()) | ||
.map_err(|e| { | ||
MgsUpdateStatusError::FailedArtifactVersionParse(e) | ||
})?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't very important but it looks like for the SP case we just operated on strings without assuming we could parse them. Do we need to parse the RoT versions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to populate ExpectedActiveRotSlot
. I thought it'd be cleaner to use the same type as
expected_active_slot
since it is a struct containing two fields rather than just a version or a slot.
Don't have a super strong opinion about this though. Happy to change if you think what I did doesn't make sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright! Finally got all the moving pieces in place 🎉 I think I've addressed all the comments and this should be ready for a new round of reviews. Thanks everyone!
), | ||
make_artifact( | ||
"oxide-rot-1", | ||
ArtifactKind::GIMLET_ROT_IMAGE_A, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought so too! Had to adapt like this unfortunately 🤷♀️
}; | ||
|
||
let board = &active_caboose.caboose.board; | ||
let Some(rkth) = &active_caboose.caboose.sign else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We spoke about comparing against the CMPA page's rkth field instead. But unfortunately, all I get from inventory is a base64 encoded string. I did not find a way in omicron or hubtools to decode that safely into a known structure. While a worthy goal to compare the artifact's sign to the CMPA's rkth field, in the interest of time, I think for now I'd like to compare against the inventory caboose's sign field.
I can write up a follow up issue to tackle this at a later time. #8799
RoT bootloader stage 0 next version: None | ||
SP active version: Some("0.0.1") | ||
SP inactive version: None | ||
RoT bootloader stage 0 version: Some("0.0.1") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tiny nit - this has extra spaces but not enough to line it up with the None
on the next line. Should it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Urgh, yeah. That was a bad copy-paste. I made them all line up, but it looks awful RoT pending persistent boot preference
is too long 😄, and it's not really a table. I think it's better if they don't line up at all.
> # for another sled. | ||
> sled-update-sp 98e6b7c2-2efa-41ca-b20a-0a4d61102fe6 --active 1.0.0 | ||
set sled 98e6b7c2-2efa-41ca-b20a-0a4d61102fe6 SP versions: active -> 1.0.0 | ||
> # for an SP on the same sled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the expectation that we'll update the bootloader -> RoT -> SP on one sled before we move to the next sled? I naively thought it would be "update all sleds' bootloaders" -> "update all sleds' RoTs" -> "update all sleds' SPs".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep! That is my understanding. Or rather host OS -> bootloader -> RoT -> SP as specified in RFD 565. I'm not sure how this will specifically affect the work you're doing with host OS though. Maybe @davepacheco has more input on this?
), | ||
make_artifact( | ||
"oxide-rot-1", | ||
ArtifactKind::GIMLET_ROT_IMAGE_A, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's a lot to be desired here and it's strongly related to the "do something more reasonable with TUF" stuff that came up recently with the measurements work, maybe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review! I think I've addressed all of the comments
I've rolled back the changes for the cmds-target-release
test. I need to make some changes to the fake artifacts to include a caboose so the planner can choose an artifact using the sign
within them. I have an idea on how to do this based on https://github.com/oxidecomputer/hubtools/blob/main/hubtools/src/archive_builder.rs#L33-L77 (thanks Laura for the tip), but it will take some digging into.
I'd like to work on that test on a follow up PR, to no longer keep holding this one up if people are OK with that. @jgallagher will be working on the planner changes for host OS soon, and this PR makes several changes to the structure of the MGS driven updates in the planner.
Opened: #8798
RoT bootloader stage 0 next version: None | ||
SP active version: Some("0.0.1") | ||
SP inactive version: None | ||
RoT bootloader stage 0 version: Some("0.0.1") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Urgh, yeah. That was a bad copy-paste. I made them all line up, but it looks awful RoT pending persistent boot preference
is too long 😄, and it's not really a table. I think it's better if they don't line up at all.
> # for another sled. | ||
> sled-update-sp 98e6b7c2-2efa-41ca-b20a-0a4d61102fe6 --active 1.0.0 | ||
set sled 98e6b7c2-2efa-41ca-b20a-0a4d61102fe6 SP versions: active -> 1.0.0 | ||
> # for an SP on the same sled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep! That is my understanding. Or rather host OS -> bootloader -> RoT -> SP as specified in RFD 565. I'm not sure how this will specifically affect the work you're doing with host OS though. Maybe @davepacheco has more input on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good!
return MgsUpdateStatus::Impossible; | ||
} | ||
|
||
// If found pending persistent boot preference is not empty, then an update |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm having trouble figuring out when we would hit this case. In a normal update, I assume the expected pending persistent boot preference would be None
. During the update, we might find it to be Some
, but in that case, we would have bailed out above with Impossible
. (Is that correct? I guess it's fine, and potentially necessary if this is part of the precondition check.) In that case, we might write a new PendingMgsUpdate
where the expected pending persistent boot preference was Some
. Next time around, we might wind up here. But is it right to return NotDone
? Don't we need to check whether mgs_update_status_inactive_versions()
returns Impossible
?
Put differently: I'd expect us to check all the Impossible
cases, and if none of this is true, then it's NotDone
(rather than have any explicit conditions under which we return NotDone
). If that's true, I think we could just strike these two if
blocks altogether (this one and the next one).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's possible to end up in a state where found_pending_persistent_boot_preference
and expected_pending_persistent_boot_preference
are Some()
, which would not be caught in the impossible cases above. But, it's true that then this would be caught as a NotDone
in mgs_update_status_inactive_versions()
. I liked being explicit as to why we may find ourselves in a NotDone
state, but I guess it's not really necessary and introduces complexity.
"gimlet_rot_image_a" => ARTIFACT_HASH_ROT_GIMLET_A, | ||
"gimlet_rot_image_b" => ARTIFACT_HASH_ROT_GIMLET_B, | ||
"psc_rot_image_a" => ARTIFACT_HASH_ROT_PSC_A, | ||
"psc_rot_image_b" => ARTIFACT_HASH_ROT_PSC_B, | ||
"switch_rot_image_a" => ARTIFACT_HASH_ROT_SWITCH_A, | ||
"switch_rot_image_b" => ARTIFACT_HASH_ROT_SWITCH_B, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there some way we could avoid hardcoding these strings?
It looks like there are some constants for these in tufaceous_artifact (e.g., ArtifactKind::GIMLET_ROT_IMAGE_A
). I'm not sure if you can use these on the left side of the match
. If not, maybe a bunch of if
s? It seems worth it to avoid hardcoding these strings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't really use the constants as they are. It would require StructuralPartialEq. Since this is just a test, it didn't seem worth it to go make changes in tufaceous with a nightly-only experimental API. But sure! I can use a bunch of if
s.
let PendingMgsUpdateDetails::Rot { | ||
expected_active_slot: old_expected_active_slot, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm understanding right, this changed because the surrounding code was just trying to do an update, previously that was always an SP update, but the way you changed things at L1293 - L1295 meant that two updates were possible, and the RoT one would be first.
I would suggest doing one of these:
- have this do the SP update just like it used to. (Change L1294 to be an empty map, I think?)
- make it clearer that this will do an RoT update by making that the only one possible (change L1291 to be an empty map?)
- duplicate the test, once for each device kind
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep! In the end I decided to completely extract everything RoT related and create another test for RoT only. test_basic
was getting pretty long and complex, and it was only going to get worse when bootloader and host OS are implemented 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the thorough review! I think I've addressed all of your comments. Let me know if I'm missing anything!
"gimlet_rot_image_a" => ARTIFACT_HASH_ROT_GIMLET_A, | ||
"gimlet_rot_image_b" => ARTIFACT_HASH_ROT_GIMLET_B, | ||
"psc_rot_image_a" => ARTIFACT_HASH_ROT_PSC_A, | ||
"psc_rot_image_b" => ARTIFACT_HASH_ROT_PSC_B, | ||
"switch_rot_image_a" => ARTIFACT_HASH_ROT_SWITCH_A, | ||
"switch_rot_image_b" => ARTIFACT_HASH_ROT_SWITCH_B, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't really use the constants as they are. It would require StructuralPartialEq. Since this is just a test, it didn't seem worth it to go make changes in tufaceous with a nightly-only experimental API. But sure! I can use a bunch of if
s.
let PendingMgsUpdateDetails::Rot { | ||
expected_active_slot: old_expected_active_slot, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep! In the end I decided to completely extract everything RoT related and create another test for RoT only. test_basic
was getting pretty long and complex, and it was only going to get worse when bootloader and host OS are implemented 😅
// License, v. 2.0. If a copy of the MPL was not distributed with this | ||
// file, You can obtain one at https://mozilla.org/MPL/2.0/. | ||
|
||
//! Facilities for making choices about RoT updates |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can tackle that in a follow up PR :)
return MgsUpdateStatus::Impossible; | ||
} | ||
|
||
// If found pending persistent boot preference is not empty, then an update |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's possible to end up in a state where found_pending_persistent_boot_preference
and expected_pending_persistent_boot_preference
are Some()
, which would not be caught in the impossible cases above. But, it's true that then this would be caught as a NotDone
in mgs_update_status_inactive_versions()
. I liked being explicit as to why we may find ourselves in a NotDone
state, but I guess it's not really necessary and introduces complexity.
// than 1 artifact for the same board and root key table hash (RKTH) | ||
// that can be verified afgainst the RoT's CMPA/CFPA. But it doesn't | ||
// prevent us from picking one and proceeding. Make a note and proceed. | ||
error!(log, "found more than one matching artifact for RoT update"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
huh 🤔 I'm not sure why I changed that
Looks like there is a merge freeze for R16. I'll merge once that's open again |
…ule (#8826) This is just extracting code and moving it to another file. No changes have been made. Follow up to #8421 (comment)
This commit implements support for RoT updates in the planner and support for reconfigurator-cli updates as well.
TODO: