[reconfigurator] Pre-checks and post_update actions for RoT bootloader update #8325

karencfv · 2025-06-12T07:37:53Z

This commit implements several checks that must happen before updating an RoT bootloader, and post-update actions.

Manual testing on a simulated Omircon:

Previous state

$ target/debug/omdb --dns-server [::1]:64764 db inventory collections show latest sp
<...>
Switch SimSidecar1
    part number: FAKE_SIM_SIDECAR
    power:    A2
    revision: 0
    MGS slot: Switch 1
    found at: 2025-06-17 01:33:40.905223 UTC from http://[::1]:58369
    cabooses:
        SLOT       BOARD        NAME          VERSION GIT_COMMIT SIGN                                                             
        SpSlot0    SimSidecarSp SimSidecar    0.0.2   ffffffff   n/a                                                              
        SpSlot1    SimSidecarSp SimSidecar    0.0.1   fefefefe   n/a                                                              
        RotSlotA   SimRot       SimSidecarRot 0.0.4   eeeeeeee   11594bb5548a757e918e6fe056e2ad9e084297c9555417a025d8788eacf55daf 
        RotSlotB   SimRot       SimSidecarRot 0.0.3   edededed   11594bb5548a757e918e6fe056e2ad9e084297c9555417a025d8788eacf55daf 
        Stage0     SimRotStage0 SimSidecarRot 0.0.200 ddddddddd  11594bb5548a757e918e6fe056e2ad9e084297c9555417a025d8788eacf55daf 
        Stage0Next SimRotStage0 SimSidecarRot 0.0.200 dadadadad  11594bb5548a757e918e6fe056e2ad9e084297c9555417a025d8788eacf55daf 
    RoT pages:
        SLOT         DATA_BASE64                         
        Cmpa         c2lkZWNhci1jbXBhAAAAAAAAAAAAAAAA... 
        CfpaActive   c2lkZWNhci1jZnBhLWFjdGl2ZQAAAAAA... 
        CfpaInactive c2lkZWNhci1jZnBhLWluYWN0aXZlAAAA... 
        CfpaScratch  c2lkZWNhci1jZnBhLXNjcmF0Y2gAAAAA... 
    RoT: active slot: slot A
    RoT: persistent boot preference: slot A
    RoT: pending persistent boot preference: -
    RoT: transient boot preference: -
    RoT: slot A SHA3-256: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
    RoT: slot B SHA3-256: bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb

Updating via reconfigurator-sp-updater:

$ ./target/debug/reconfigurator-sp-updater --dns-server [::1]:64764 [::]:55066 --log-level trace
<...>
〉set SimSidecar1 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236 1.0.0 rot-bootloader -a 0.0.200 -i 0.0.200
updated configuration for SimSidecar1
Jun 17 04:34:22.403 INFO begin update attempt for baseboard, update_id: f388fbca-7e52-449a-8593-ed99750b430b, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 17 04:34:22.459 DEBG client request, body: None, uri: http://[::]:64890/artifact/sha256/005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, method: GET, repo_depot_url: http://[::]:64890
Jun 17 04:34:22.463 DEBG client response, result: Ok(Response { url: "http://[::]:64890/artifact/sha256/005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236", status: 200, headers: {"content-type": "application/octet-stream", "x-request-id": "a3636c28-8ce4-4454-aba1-bc129fad4eed", "content-length": "750", "date": "Tue, 17 Jun 2025 04:34:22 GMT"} }), repo_depot_url: http://[::]:64890
Jun 17 04:34:22.463 DEBG loaded artifact contents, update_id: f388fbca-7e52-449a-8593-ed99750b430b, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 17 04:34:22.464 DEBG client request, body: None, uri: http://[::1]:57702/sp/switch/1, method: GET, mgs_backend_addr: [::1]:57702, mgs_backend_name: dendrite-b6d65341-167c-41df-9b5c-41cded99c229.host.control-plane.oxide.internal., update_id: f388fbca-7e52-449a-8593-ed99750b430b, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 17 04:34:22.465 DEBG client response, result: Ok(Response { url: "http://[::1]:57702/sp/switch/1", status: 200, headers: {"content-type": "application/json", "x-request-id": "146467ea-2797-43ec-9c25-af9df804f1b8", "content-length": "734", "date": "Tue, 17 Jun 2025 04:34:22 GMT"} }), mgs_backend_addr: [::1]:57702, mgs_backend_name: dendrite-b6d65341-167c-41df-9b5c-41cded99c229.host.control-plane.oxide.internal., update_id: f388fbca-7e52-449a-8593-ed99750b430b, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 17 04:34:22.465 DEBG found SP state, state: SpState { base_mac_address: [0, 0, 0, 0, 0, 0], hubris_archive_id: "0000000000000000", model: "FAKE_SIM_SIDECAR", power_state: A2, revision: 0, rot: V3 { active: A, pending_persistent_boot_preference: None, persistent_boot_preference: A, slot_a_error: None, slot_a_fwid: "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa", slot_b_error: None, slot_b_fwid: "bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb", stage0_error: None, stage0_fwid: "cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc", stage0next_error: None, stage0next_fwid: "dddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddd", transient_boot_preference: None }, serial_number: "SimSidecar1" }, update_id: f388fbca-7e52-449a-8593-ed99750b430b, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 17 04:34:22.465 DEBG client request, body: None, uri: http://[::1]:57702/sp/switch/1/component/stage0/caboose?firmware_slot=0, method: GET, mgs_backend_addr: [::1]:57702, mgs_backend_name: dendrite-b6d65341-167c-41df-9b5c-41cded99c229.host.control-plane.oxide.internal., update_id: f388fbca-7e52-449a-8593-ed99750b430b, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 17 04:34:22.466 DEBG client response, result: Ok(Response { url: "http://[::1]:57702/sp/switch/1/component/stage0/caboose?firmware_slot=0", status: 200, headers: {"content-type": "application/json", "x-request-id": "847a44c2-05d9-4779-9f57-879bbc912d7c", "content-length": "179", "date": "Tue, 17 Jun 2025 04:34:22 GMT"} }), mgs_backend_addr: [::1]:57702, mgs_backend_name: dendrite-b6d65341-167c-41df-9b5c-41cded99c229.host.control-plane.oxide.internal., update_id: f388fbca-7e52-449a-8593-ed99750b430b, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 17 04:34:22.466 DEBG found active slot caboose, caboose: SpComponentCaboose { board: "SimRotStage0", epoch: None, git_commit: "ddddddddd", name: "SimSidecarRot", sign: Some("11594bb5548a757e918e6fe056e2ad9e084297c9555417a025d8788eacf55daf"), version: "0.0.200" }, update_id: f388fbca-7e52-449a-8593-ed99750b430b, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 17 04:34:22.467 DEBG client request, body: None, uri: http://[::1]:57702/sp/switch/1/component/stage0/caboose?firmware_slot=1, method: GET, mgs_backend_addr: [::1]:57702, mgs_backend_name: dendrite-b6d65341-167c-41df-9b5c-41cded99c229.host.control-plane.oxide.internal., update_id: f388fbca-7e52-449a-8593-ed99750b430b, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 17 04:34:22.468 DEBG client response, result: Ok(Response { url: "http://[::1]:57702/sp/switch/1/component/stage0/caboose?firmware_slot=1", status: 200, headers: {"content-type": "application/json", "x-request-id": "c8efd6e2-953f-4952-a63d-cdf5aa78ea30", "content-length": "179", "date": "Tue, 17 Jun 2025 04:34:22 GMT"} }), mgs_backend_addr: [::1]:57702, mgs_backend_name: dendrite-b6d65341-167c-41df-9b5c-41cded99c229.host.control-plane.oxide.internal., update_id: f388fbca-7e52-449a-8593-ed99750b430b, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 17 04:34:22.468 DEBG ready to start update, update_id: f388fbca-7e52-449a-8593-ed99750b430b, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 17 04:34:22.468 DEBG client request, body: Some(Body), uri: http://[::1]:57702/sp/switch/1/component/stage0/update?firmware_slot=1&id=f388fbca-7e52-449a-8593-ed99750b430b, method: POST, mgs_backend_addr: [::1]:57702, mgs_backend_name: dendrite-b6d65341-167c-41df-9b5c-41cded99c229.host.control-plane.oxide.internal., update_id: f388fbca-7e52-449a-8593-ed99750b430b, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 17 04:34:22.469 DEBG client response, result: Ok(Response { url: "http://[::1]:57702/sp/switch/1/component/stage0/update?firmware_slot=1&id=f388fbca-7e52-449a-8593-ed99750b430b", status: 204, headers: {"x-request-id": "112bca21-dff9-4a75-b021-6abedf125f5b", "date": "Tue, 17 Jun 2025 04:34:22 GMT"} }), mgs_backend_addr: [::1]:57702, mgs_backend_name: dendrite-b6d65341-167c-41df-9b5c-41cded99c229.host.control-plane.oxide.internal., update_id: f388fbca-7e52-449a-8593-ed99750b430b, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 17 04:34:22.469 INFO update started, mgs_addr: http://[::1]:57702, update_id: f388fbca-7e52-449a-8593-ed99750b430b, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 17 04:34:22.469 DEBG started update, update_id: f388fbca-7e52-449a-8593-ed99750b430b, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 17 04:34:22.470 DEBG client request, body: None, uri: http://[::1]:57702/sp/switch/1/component/stage0/update-status, method: GET, mgs_backend_addr: [::1]:57702, mgs_backend_name: dendrite-b6d65341-167c-41df-9b5c-41cded99c229.host.control-plane.oxide.internal., update_id: f388fbca-7e52-449a-8593-ed99750b430b, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 17 04:34:22.470 DEBG client response, result: Ok(Response { url: "http://[::1]:57702/sp/switch/1/component/stage0/update-status", status: 200, headers: {"content-type": "application/json", "x-request-id": "0e7f3f42-5a12-4038-8137-aa9852dee198", "content-length": "107", "date": "Tue, 17 Jun 2025 04:34:22 GMT"} }), mgs_backend_addr: [::1]:57702, mgs_backend_name: dendrite-b6d65341-167c-41df-9b5c-41cded99c229.host.control-plane.oxide.internal., update_id: f388fbca-7e52-449a-8593-ed99750b430b, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 17 04:34:22.470 DEBG got update status, status: InProgress { bytes_received: 978, id: f388fbca-7e52-449a-8593-ed99750b430b, total_bytes: 1024 }, mgs_addr: http://[::1]:57702, update_id: f388fbca-7e52-449a-8593-ed99750b430b, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 17 04:34:25.471 DEBG client request, body: None, uri: http://[::1]:57702/sp/switch/1/component/stage0/update-status, method: GET, mgs_backend_addr: [::1]:57702, mgs_backend_name: dendrite-b6d65341-167c-41df-9b5c-41cded99c229.host.control-plane.oxide.internal., update_id: f388fbca-7e52-449a-8593-ed99750b430b, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 17 04:34:25.473 DEBG client response, result: Ok(Response { url: "http://[::1]:57702/sp/switch/1/component/stage0/update-status", status: 200, headers: {"content-type": "application/json", "x-request-id": "28cb0e67-1265-4314-bad4-28c8df6b3872", "content-length": "64", "date": "Tue, 17 Jun 2025 04:34:25 GMT"} }), mgs_backend_addr: [::1]:57702, mgs_backend_name: dendrite-b6d65341-167c-41df-9b5c-41cded99c229.host.control-plane.oxide.internal., update_id: f388fbca-7e52-449a-8593-ed99750b430b, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 17 04:34:25.474 DEBG got update status, status: Complete { id: f388fbca-7e52-449a-8593-ed99750b430b }, mgs_addr: http://[::1]:57702, update_id: f388fbca-7e52-449a-8593-ed99750b430b, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 17 04:34:25.474 DEBG delivered artifact, update_id: f388fbca-7e52-449a-8593-ed99750b430b, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 17 04:34:25.475 DEBG attempting to reset device to do bootloader signature check, update_id: f388fbca-7e52-449a-8593-ed99750b430b, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 17 04:34:25.475 DEBG client request, body: None, uri: http://[::1]:57702/sp/switch/1/component/rot/reset, method: POST, mgs_backend_addr: [::1]:57702, mgs_backend_name: dendrite-b6d65341-167c-41df-9b5c-41cded99c229.host.control-plane.oxide.internal., update_id: f388fbca-7e52-449a-8593-ed99750b430b, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 17 04:34:25.475 DEBG client response, result: Ok(Response { url: "http://[::1]:57702/sp/switch/1/component/rot/reset", status: 204, headers: {"x-request-id": "c4b0b9af-dec6-4c2d-abb4-8a78a033c80d", "date": "Tue, 17 Jun 2025 04:34:25 GMT"} }), mgs_backend_addr: [::1]:57702, mgs_backend_name: dendrite-b6d65341-167c-41df-9b5c-41cded99c229.host.control-plane.oxide.internal., update_id: f388fbca-7e52-449a-8593-ed99750b430b, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 17 04:34:25.475 DEBG attempting to retrieve boot info to verify image validity, update_id: f388fbca-7e52-449a-8593-ed99750b430b, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 17 04:34:25.476 DEBG client request, body: Some(Body), uri: http://[::1]:57702/sp/switch/1/component/rot/rot-boot-info, method: GET, mgs_backend_addr: [::1]:57702, mgs_backend_name: dendrite-b6d65341-167c-41df-9b5c-41cded99c229.host.control-plane.oxide.internal., update_id: f388fbca-7e52-449a-8593-ed99750b430b, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 17 04:34:25.477 DEBG client response, result: Ok(Response { url: "http://[::1]:57702/sp/switch/1/component/rot/rot-boot-info", status: 200, headers: {"content-type": "application/json", "x-request-id": "cdd975eb-38dc-4f3e-b21c-820f59c76ae1", "content-length": "565", "date": "Tue, 17 Jun 2025 04:34:25 GMT"} }), mgs_backend_addr: [::1]:57702, mgs_backend_name: dendrite-b6d65341-167c-41df-9b5c-41cded99c229.host.control-plane.oxide.internal., update_id: f388fbca-7e52-449a-8593-ed99750b430b, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 17 04:34:25.477 DEBG attempting to set RoT bootloader active slot, update_id: f388fbca-7e52-449a-8593-ed99750b430b, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 17 04:34:25.478 DEBG client request, body: Some(Body), uri: http://[::1]:57702/sp/switch/1/component/stage0/active-slot?persist=true, method: POST, mgs_backend_addr: [::1]:57702, mgs_backend_name: dendrite-b6d65341-167c-41df-9b5c-41cded99c229.host.control-plane.oxide.internal., update_id: f388fbca-7e52-449a-8593-ed99750b430b, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 17 04:34:25.478 DEBG client response, result: Ok(Response { url: "http://[::1]:57702/sp/switch/1/component/stage0/active-slot?persist=true", status: 204, headers: {"x-request-id": "c7b39382-78de-4101-8d10-1bc979d6dab1", "date": "Tue, 17 Jun 2025 04:34:24 GMT"} }), mgs_backend_addr: [::1]:57702, mgs_backend_name: dendrite-b6d65341-167c-41df-9b5c-41cded99c229.host.control-plane.oxide.internal., update_id: f388fbca-7e52-449a-8593-ed99750b430b, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 17 04:34:25.478 DEBG attempting to reset device to set to new RoT bootloader version, update_id: f388fbca-7e52-449a-8593-ed99750b430b, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 17 04:34:25.479 DEBG client request, body: None, uri: http://[::1]:57702/sp/switch/1/component/rot/reset, method: POST, mgs_backend_addr: [::1]:57702, mgs_backend_name: dendrite-b6d65341-167c-41df-9b5c-41cded99c229.host.control-plane.oxide.internal., update_id: f388fbca-7e52-449a-8593-ed99750b430b, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 17 04:34:25.479 DEBG client response, result: Ok(Response { url: "http://[::1]:57702/sp/switch/1/component/rot/reset", status: 204, headers: {"x-request-id": "ae776edb-6474-457b-98f0-7fcacf01d202", "date": "Tue, 17 Jun 2025 04:34:25 GMT"} }), mgs_backend_addr: [::1]:57702, mgs_backend_name: dendrite-b6d65341-167c-41df-9b5c-41cded99c229.host.control-plane.oxide.internal., update_id: f388fbca-7e52-449a-8593-ed99750b430b, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 17 04:34:25.479 DEBG client request, body: None, uri: http://[::1]:57702/sp/switch/1, method: GET, mgs_backend_addr: [::1]:57702, mgs_backend_name: dendrite-b6d65341-167c-41df-9b5c-41cded99c229.host.control-plane.oxide.internal., update_id: f388fbca-7e52-449a-8593-ed99750b430b, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 17 04:34:25.479 DEBG client response, result: Ok(Response { url: "http://[::1]:57702/sp/switch/1", status: 200, headers: {"content-type": "application/json", "x-request-id": "6e75b4c5-5a51-4d8f-84be-44c84eebdc36", "content-length": "734", "date": "Tue, 17 Jun 2025 04:34:25 GMT"} }), mgs_backend_addr: [::1]:57702, mgs_backend_name: dendrite-b6d65341-167c-41df-9b5c-41cded99c229.host.control-plane.oxide.internal., update_id: f388fbca-7e52-449a-8593-ed99750b430b, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 17 04:34:25.480 DEBG found SP state, state: SpState { base_mac_address: [0, 0, 0, 0, 0, 0], hubris_archive_id: "0000000000000000", model: "FAKE_SIM_SIDECAR", power_state: A2, revision: 0, rot: V3 { active: A, pending_persistent_boot_preference: None, persistent_boot_preference: A, slot_a_error: None, slot_a_fwid: "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa", slot_b_error: None, slot_b_fwid: "bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb", stage0_error: None, stage0_fwid: "01368372b4c730e54ef9efe240bea5e9d277a3708ddd7eac7115727fde52dda4", stage0next_error: None, stage0next_fwid: "01368372b4c730e54ef9efe240bea5e9d277a3708ddd7eac7115727fde52dda4", transient_boot_preference: None }, serial_number: "SimSidecar1" }, update_id: f388fbca-7e52-449a-8593-ed99750b430b, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 17 04:34:25.480 DEBG client request, body: None, uri: http://[::1]:57702/sp/switch/1/component/stage0/caboose?firmware_slot=0, method: GET, mgs_backend_addr: [::1]:57702, mgs_backend_name: dendrite-b6d65341-167c-41df-9b5c-41cded99c229.host.control-plane.oxide.internal., update_id: f388fbca-7e52-449a-8593-ed99750b430b, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 17 04:34:25.481 DEBG client response, result: Ok(Response { url: "http://[::1]:57702/sp/switch/1/component/stage0/caboose?firmware_slot=0", status: 200, headers: {"content-type": "application/json", "x-request-id": "56df2fc1-3c9d-4b07-9131-8818802028d9", "content-length": "132", "date": "Tue, 17 Jun 2025 04:34:25 GMT"} }), mgs_backend_addr: [::1]:57702, mgs_backend_name: dendrite-b6d65341-167c-41df-9b5c-41cded99c229.host.control-plane.oxide.internal., update_id: f388fbca-7e52-449a-8593-ed99750b430b, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 17 04:34:25.481 DEBG found active slot caboose, caboose: SpComponentCaboose { board: "SimRotStage0", epoch: None, git_commit: "this-is-fake-data", name: "SimRotStage0", sign: Some("SimRotStage0"), version: "1.0.0" }, update_id: f388fbca-7e52-449a-8593-ed99750b430b, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 17 04:34:25.481 DEBG precheck result, precheck: Ok(UpdateComplete), update_id: f388fbca-7e52-449a-8593-ed99750b430b, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 17 04:34:25.481 DEBG client request, body: None, uri: http://[::1]:57702/sp/switch/1, method: GET, mgs_backend_addr: [::1]:57702, mgs_backend_name: dendrite-b6d65341-167c-41df-9b5c-41cded99c229.host.control-plane.oxide.internal., update_id: f388fbca-7e52-449a-8593-ed99750b430b, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 17 04:34:25.482 DEBG client response, result: Ok(Response { url: "http://[::1]:57702/sp/switch/1", status: 200, headers: {"content-type": "application/json", "x-request-id": "e7ef45f4-2a62-43c0-8b66-840acf0eaa42", "content-length": "734", "date": "Tue, 17 Jun 2025 04:34:25 GMT"} }), mgs_backend_addr: [::1]:57702, mgs_backend_name: dendrite-b6d65341-167c-41df-9b5c-41cded99c229.host.control-plane.oxide.internal., update_id: f388fbca-7e52-449a-8593-ed99750b430b, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 17 04:34:25.482 DEBG found SP state, state: SpState { base_mac_address: [0, 0, 0, 0, 0, 0], hubris_archive_id: "0000000000000000", model: "FAKE_SIM_SIDECAR", power_state: A2, revision: 0, rot: V3 { active: A, pending_persistent_boot_preference: None, persistent_boot_preference: A, slot_a_error: None, slot_a_fwid: "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa", slot_b_error: None, slot_b_fwid: "bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb", stage0_error: None, stage0_fwid: "01368372b4c730e54ef9efe240bea5e9d277a3708ddd7eac7115727fde52dda4", stage0next_error: None, stage0next_fwid: "01368372b4c730e54ef9efe240bea5e9d277a3708ddd7eac7115727fde52dda4", transient_boot_preference: None }, serial_number: "SimSidecar1" }, update_id: f388fbca-7e52-449a-8593-ed99750b430b, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 17 04:34:25.482 DEBG client request, body: None, uri: http://[::1]:57702/sp/switch/1/component/stage0/caboose?firmware_slot=0, method: GET, mgs_backend_addr: [::1]:57702, mgs_backend_name: dendrite-b6d65341-167c-41df-9b5c-41cded99c229.host.control-plane.oxide.internal., update_id: f388fbca-7e52-449a-8593-ed99750b430b, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 17 04:34:25.482 DEBG client response, result: Ok(Response { url: "http://[::1]:57702/sp/switch/1/component/stage0/caboose?firmware_slot=0", status: 200, headers: {"content-type": "application/json", "x-request-id": "cb53c8c7-8756-405e-9f76-5aad5577c600", "content-length": "132", "date": "Tue, 17 Jun 2025 04:34:25 GMT"} }), mgs_backend_addr: [::1]:57702, mgs_backend_name: dendrite-b6d65341-167c-41df-9b5c-41cded99c229.host.control-plane.oxide.internal., update_id: f388fbca-7e52-449a-8593-ed99750b430b, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 17 04:34:25.483 DEBG found active slot caboose, caboose: SpComponentCaboose { board: "SimRotStage0", epoch: None, git_commit: "this-is-fake-data", name: "SimRotStage0", sign: Some("SimRotStage0"), version: "1.0.0" }, update_id: f388fbca-7e52-449a-8593-ed99750b430b, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 17 04:34:25.484 INFO update attempt done, result: CompletedUpdate, elapsed_millis: 3080, update_id: f388fbca-7e52-449a-8593-ed99750b430b, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0

State after the update

$ target/debug/omdb --dns-server [::1]:64764 db inventory collections show latest sp
<...>
Switch SimSidecar1
    part number: FAKE_SIM_SIDECAR
    power:    A2
    revision: 0
    MGS slot: Switch 1
    found at: 2025-06-17 02:23:37.654819 UTC from http://[::1]:58369
    cabooses:
        SLOT       BOARD        NAME          VERSION GIT_COMMIT        SIGN                                                             
        SpSlot0    SimSidecarSp SimSidecar    0.0.2   ffffffff          n/a                                                              
        SpSlot1    SimSidecarSp SimSidecar    0.0.1   fefefefe          n/a                                                              
        RotSlotA   SimRot       SimSidecarRot 0.0.4   eeeeeeee          11594bb5548a757e918e6fe056e2ad9e084297c9555417a025d8788eacf55daf 
        RotSlotB   SimRot       SimSidecarRot 0.0.3   edededed          11594bb5548a757e918e6fe056e2ad9e084297c9555417a025d8788eacf55daf 
        Stage0     SimRotStage0 SimRotStage0  1.0.0   this-is-fake-data SimRotStage0                                                     
        Stage0Next SimRotStage0 SimRotStage0  1.0.0   this-is-fake-data SimRotStage0                                                     
    RoT pages:
        SLOT         DATA_BASE64                         
        Cmpa         c2lkZWNhci1jbXBhAAAAAAAAAAAAAAAA... 
        CfpaActive   c2lkZWNhci1jZnBhLWFjdGl2ZQAAAAAA... 
        CfpaInactive c2lkZWNhci1jZnBhLWluYWN0aXZlAAAA... 
        CfpaScratch  c2lkZWNhci1jZnBhLXNjcmF0Y2gAAAAA... 
    RoT: active slot: slot A
    RoT: persistent boot preference: slot A
    RoT: pending persistent boot preference: -
    RoT: transient boot preference: -
    RoT: slot A SHA3-256: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
    RoT: slot B SHA3-256: bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb

Related: #7988

…r update

karencfv · 2025-06-17T05:47:16Z

nexus/mgs-updates/src/driver_update.rs

+            // We'll loop for 3 minutes to wait for any ongoing RoT bootloader update.
+            // We need to wait for 2 resets which have a timeout of 60 seconds each,
+            // and an attempt to retrieve boot info, which has a time out of 30 seconds.
+            // We give an additional 30 seconds to as a buffer for the other actions.
+            Ok(PrecheckStatus::WaitingForOngoingRotBootloaderUpdate) => {
+                if before.elapsed()
+                    >= WAIT_FOR_ONGOING_ROT_BOOTLOADER_UPDATE_TIMEOUT
+                {
+                    return Err(UpdateWaitError::Timeout(
+                        WAIT_FOR_ONGOING_ROT_BOOTLOADER_UPDATE_TIMEOUT,
+                    ));
+                }
+
+                tokio::time::sleep(ROT_BOOLOADER_UPDATE_PROGRESS_INTERVAL)
+                    .await;
+                continue;
+            }


@davepacheco is this implementation accurate with #7988 (comment) ? Or is there something I missed?

I think not. More on this in my comment above.

karencfv · 2025-06-17T05:47:42Z

nexus/mgs-updates/src/rot_bootloader_updater.rs

+            // TODO-K: In the RoT bootloader update code in wicket, there is a set of
+            // known bootloader FWIDs that don't have cabooses. Is this something we
+            // should care about here?
+            // https://github.com/oxidecomputer/omicron/blob/89ce370f0a96165c777e90a008257a6085897f2a/wicketd/src/update_tracker.rs#L1817-L1841
+
+            // TODO-K: There are also older versions of the SP have a bug that prevents
+            // setting the active slot for the RoT bootloader. Is this something we should
+            // care about here?
+            // https://github.com/oxidecomputer/omicron/blob/89ce370f0a96165c777e90a008257a6085897f2a/wicketd/src/update_tracker.rs#L1705-L1710


Would be good to get some input from @davepacheco or @lzrd here

I spoke with @lzrd IRL about this. We do want to keep these checks in place for development experience. But they're not urgent. These can be added later.

When you say "these checks", do you mean both of the above (the two comments in L64-L72)? I would have thought we could leave these out because we'd never expect to find these versions on systems where we'd be running automated update, especially if the failure mode is just that we'll not do the update.

If we want to add these because we think somehow we might see these in dev systems, can we strike these comments and file issues instead?

Done in e06418e

karencfv · 2025-06-17T05:49:18Z

nexus/mgs-updates/src/rot_bootloader_updater.rs

+            // TODO-K: In post_update we'll be restarting the RoT twice to do signature
+            // checks, and to set stage0 to the new version. What happens if the RoT
+            // itself is being updated (during the reset stage)? Should we check for that
+            // here before setting the RoT bootloader as ready to update?


Is this even possible? I know the planner will be doing the SP, RoT, bootloader, host OS updates sequentially. But could it be possible that a rogue nexus may attempt to do an RoT update while a bootloader one is happening or vice versa?

I don't know about a "rogue" Nexus, but I think we should assume it's always possible any given Nexus could be executing an older blueprint concurrently with a different Nexus executing a newer blueprint.

Thanks! Yeah, that makes sense. I agree.

I guess my question now is what happens if a Nexus is resetting an RoT as part of an RoT update, and another Nexus is resetting an RoT as part of an RoT bootloader update?

I think all of the prechecks should prevent this from happening, even if a Nexus is operating on a blueprint. Assuming of this is working as intended:

The planner ensures there is at most one PendingMgsUpdate in a given blueprint

The planner only removes a PendingMgsUpdate if it's completed or become impossible

The prechecks of any given update prevent a Nexus from starting an update if the target isn't in the same state it was when the planner decided to perform the update

I don't think it's possible for two different Nexuses to attempt to reset two different components simultaneously:

"reset" happens at the end of the update

... which means all the prechecks passed

... which means the update couldn't have been completed yet

... which means the planner couldn't have created a new blueprint with a different PendingMgsUpdate (unless the update has become impossible, which should have caused any in-flight update to fail before it got to "reset")

Maybe there's some path through here where this is possible, but if there is it seems like something we have to fix?

Maybe I'm misunderstanding, but that's assuming we're talking about the same blueprint version, yes? Just a comment above you mention:

... I think we should assume it's always possible any given Nexus could be executing an older blueprint concurrently with a different Nexus executing a newer blueprint.

So, we could assume it's possible a Nexus is resetting an RoT as part of an RoT update of an older blueprint, and another Nexus is resetting an RoT as part of an RoT bootloader update of a newer blueprint.

Something like:

Nexus#1 with a blueprint with a new RoT version starts an RoT update.

Nexus#2 with a different blueprint with a new RoT bootloader version (and no changes to the RoT) starts an update.

Both Nexus#1 and Nexus#2 enter the post-update stage at similar times, and clash resetting the RoT

Is this possible?

If so, it would probably make sense for the RoT bootloader to have pre-checks that validate the expected state of the RoT, and the RoT bootloader to have pre-checks that validate the expected state of the RoT bootloader.

Is it overkill to add those additional checks even if we're almost certain that this scenario is near impossible?

Ah! gotcha. OK, that's reassuring, thanks.

I guess there's a question here of: in step 5

curious to know what happens in this case as well!

That sequence does some possible. Further, it's really hard (at best) for the control plane to avoid this. It's the old: you can imagine implementing a lock, but then you have the problem of: what if Nexus#1 actually dies permanently with the lock held? That's why people use leases instead of locks, but a lease has the same problem as what we have here: something could validate its lease, then go out to lunch for a long time right before performing the action that's supposed to be protected by the lease. I guess when we revoke the lease we could use Ignition to power-cycle the sled hosting the Nexus, but that raises more questions.

Instead, we've generally opted to allow these sequences but make sure that the end result is acceptable. I think that's largely the case here, though I'm not positive. I think we have to assume that:

if any of these devices is externally reset (or if the rack loses power) at any point in the process, the device will come up again

whatever working state the device is in, there is a PendingMgsUpdate that can get it into the desired state

In that case, if two updates are stomping on each other, they might cause each others' updates to fail. But as long as they're also both trying to sync up with the latest PendingMgsUpdate, and the planner is updating the latest PendingMgsUpdate's preconditions to match reality, this should converge to the desired end state, right?

I'm a little nervous that a bunch of resets setting versions of two different components will leave one of the two in a state where the device is no longer capable of updating. Is this possible?

In all cases, if the software we actually try to deploy is itself broken (like, has logic bugs in the Hubris software), then all bets are off. Similarly, if the bits rot in the flash, all bets are off. Let's ignore those cases for now.

No matter what reset is generated or when, as long as there's good software in the currently-active slot, I'd expect the device should be able to boot up and report what is in each slot (if anything). The planner can then generate a PendingMgsUpdate whose preconditions reflect what was found and we should be able to perform that update.

For the SP: this is a straightforward two-slot approach where we don't switch to a new slot until we know it contains a copy of the software (which we assumed above to be working). So I don't see how the active slot could ever not have working software. I assume here that switching slots is atomic -- we're not copying data from one place to another, which could fail partway through and leave the destination corrupted.

For the RoT: we have the extra requirement that the signature matches. But if we start with working software in slot A, we will not update slot A again until we have working, signed software in slot B (and vice versa). So I don't know how we could ever not have working, correctly signed software in one of these slots. Similarly, I assume here that switching slots is atomic.

For the RoT bootloader: as I understood it, device will not allow us to replace stage0 unless it's validated the signature on stage0next. The switch of slots here is not atomic and it's conceivable that we lose power while that copy is happening and brick the device. But again as I understand it that's not possible as a result of a reset from us because the code that processes that reset request is busy doing the copy and won't handle it until the copy is complete.

So I don't see how we could reset the device into a state where we couldn't update it. Maybe @lzrd or @labbott could say more?

What you wrote sounds correct. RoT hubris will not respond to further messages from the SP until it completes the image swap for both Hubris and bootloader so those should be atomic operations (assuming no external power loss).

karencfv · 2025-06-23T01:33:06Z

This PR will need to be updated if #8398 lands before this PR is merged

nexus/mgs-updates/src/rot_updater.rs

davepacheco · 2025-06-25T17:07:40Z

nexus/mgs-updates/src/rot_bootloader_updater.rs

+            // TODO-K: In the RoT bootloader update code in wicket, there is a set of
+            // known bootloader FWIDs that don't have cabooses. Is this something we
+            // should care about here?
+            // https://github.com/oxidecomputer/omicron/blob/89ce370f0a96165c777e90a008257a6085897f2a/wicketd/src/update_tracker.rs#L1817-L1841
+
+            // TODO-K: There are also older versions of the SP have a bug that prevents
+            // setting the active slot for the RoT bootloader. Is this something we should
+            // care about here?
+            // https://github.com/oxidecomputer/omicron/blob/89ce370f0a96165c777e90a008257a6085897f2a/wicketd/src/update_tracker.rs#L1705-L1710


When you say "these checks", do you mean both of the above (the two comments in L64-L72)? I would have thought we could leave these out because we'd never expect to find these versions on systems where we'd be running automated update, especially if the failure mode is just that we'll not do the update.

If we want to add these because we think somehow we might see these in dev systems, can we strike these comments and file issues instead?

davepacheco · 2025-06-25T17:11:28Z

nexus/mgs-updates/src/rot_bootloader_updater.rs

+                            // The name for the SP component here is STAGE0
+                            // it's a little confusing because we're really
+                            // trying to reach STAGE0NEXT, and there is no
+                            // ROT_BOOTLOADER variant. We specify that we
+                            // want STAGE0NEXT by setting the firmware slot
+                            // to 1, which is where it will always be.


Yeah, I've also been confused here, but I'm not sure why. Isn't this just like the SP, where you have one component ("SP"/"Stage0") and an active slot (0) and an inactive slot (1)? Is "stage0_next" just the name for "the inactive slot for stage0"? (I'm not positive about this -- I'm really asking!)

Is the confusion just that stage0next sometimes seems to be its own component, whereas the inactive slot for the SP isn't?

I think what confused me, is that in the case of the SP you have the overall component SpComponent::SP_ITSELF which has two slots 0 and 1. And for the bootloader you have the component SpComponent::STAGE0 which has two slots stage0 (0) and stage0_next (1). So the name of the component is basically just the name of the active slot. This makes it really weird when you want to fetch information about stage0_next! I have to do

mgs_client..sp_component_caboose_get( update.sp_type, update.slot_id, &SpComponent::STAGE0.to_string(), // This is the name of the active slot! 1, )

Instead of something like

mgs_client..sp_component_caboose_get( update.sp_type, update.slot_id, &SpComponent::ROT_BOOTLOADER.to_string(), 1, )

Yeah, I think that's right. "stage0" is the name of the RoT bootloader, and we also use it synonymously with "the active slot for the RoT bootloader". I think I see now what you meant in the comment but I didn't understand it when I read it. Maybe something like: "The naming here is a bit confusing because "stage0" sometimes refers to the component (RoT bootloader) and sometimes refers to the active slot for that component. Here, we're accessing the inactive slot for it. The component is still "stage0"."

davepacheco · 2025-06-25T17:32:57Z

nexus/mgs-updates/src/rot_bootloader_updater.rs

+            // TODO-K: In post_update we'll be restarting the RoT twice to do signature
+            // checks, and to set stage0 to the new version. What happens if the RoT
+            // itself is being updated (during the reset stage)? Should we check for that
+            // here before setting the RoT bootloader as ready to update?


That sequence does some possible. Further, it's really hard (at best) for the control plane to avoid this. It's the old: you can imagine implementing a lock, but then you have the problem of: what if Nexus#1 actually dies permanently with the lock held? That's why people use leases instead of locks, but a lease has the same problem as what we have here: something could validate its lease, then go out to lunch for a long time right before performing the action that's supposed to be protected by the lease. I guess when we revoke the lease we could use Ignition to power-cycle the sled hosting the Nexus, but that raises more questions.

Instead, we've generally opted to allow these sequences but make sure that the end result is acceptable. I think that's largely the case here, though I'm not positive. I think we have to assume that:

if any of these devices is externally reset (or if the rack loses power) at any point in the process, the device will come up again

whatever working state the device is in, there is a PendingMgsUpdate that can get it into the desired state

In that case, if two updates are stomping on each other, they might cause each others' updates to fail. But as long as they're also both trying to sync up with the latest PendingMgsUpdate, and the planner is updating the latest PendingMgsUpdate's preconditions to match reality, this should converge to the desired end state, right?

davepacheco · 2025-06-25T17:35:21Z

nexus/mgs-updates/src/rot_bootloader_updater.rs

+        // TODO-K: Again, we're resetting the ROT twice here, what happens
+        // if an RoT update is happening at the same time?
+


As I mentioned above:

I think if we reset the device while the RoT update is going on (which should be really unlikely), that update may fail, but the device should come back up one way or another in some state that can be updated again, right?

If the RoT update changes the contents of the RoT slots (A or B) or changes which slot is active, that doesn't affect this update.

If the RoT update resets the device while we're doing this, one of these will happen:

it's fine (e.g., if a reset happened while we were stuck at L224, it would just be an extra reset and wouldn't affect us)

it causes this update to fail (e.g., because we're unable to do the reset)

we hit the window mentioned in Do not update more than one RoT stage0 at a time in a rack to minimize risk. #7819. This seems possible but very unlikely. It would brick the device. That's bad: let's say the sled would be out of commission. But it's not worse than that (rack service is unaffected, we've just eroded some of our fault tolerance margin). I don't think this is meaningfully more likely than losing power in the same window, and I don't think we can do anything to meaningfully reduce that likelihood any further. I think we can't hit the window I was worried about because I believe @lzrd mentioned the device cannot process an externally-requested reset during this window.

I think we can't hit the window I was worried about because I believe @lzrd mentioned the device cannot process an externally-requested reset during this window.

I'm a little worried about this case, I'll prod him again when he's back from leave and double check this. I'll document what he tells me :)

davepacheco · 2025-06-25T18:22:01Z

nexus/mgs-updates/src/driver_update.rs

+        // This is the first time a Nexus instance is attempting to
+        // update the RoT bootloader, we don't need to wait for an
+        // ongoing update.
+        Ok(PrecheckStatus::WaitingForOngoingRotBootloaderUpdate) => (),


I'm not sure this is wrong, but it isn't what I had in mind in #7988 (comment). What I was proposing there was that:

precheck() would return:

ReadyForUpdate if it looks like no update is in progress (probably: stage0next is valid and matches stage0)

WaitForOngoingUpdate (nit: I wouldn't have this be specific to "RoT bootloader") if it looks like an update might be going on (probably: stage0next is invalid or it's valid but doesn't match stage0)

If we got WaitForOngoingUpdate here, we'd wait for up to PROGRESS_TIMEOUT for it to instead return ReadyForUpdate. If the timeout elapsed, we'd proceed as though we got ReadyForUpdate (but consider it like the "takeover" case -- log it as a takeover and report how accordingly).

The problem with what's here is that we don't know that there's no update ongoing and we might wind up trying to write to stage0next when some other update is trying to validate it and/or persist it. I think that would actually be fine if it happened once, but I don't see anything to prevent it from continuing to happen -- each Nexus constantly interrupting update attempts by other Nexus instances.

davepacheco · 2025-06-25T18:28:17Z

nexus/mgs-updates/src/driver_update.rs

+            // We give an additional 30 seconds to as a buffer for the other actions.
+            Ok(PrecheckStatus::WaitingForOngoingRotBootloaderUpdate) => {
+                if before.elapsed()
+                    >= WAIT_FOR_ONGOING_ROT_BOOTLOADER_UPDATE_TIMEOUT


Could we use the caller-provided timeout here and bump that one if necessary to match the value you're using here? That seems a lot simpler to me than having multiple timeouts, some caller-provided and some hardcoded, plus special knowledge of which timeouts to use for which devices.

davepacheco · 2025-06-25T18:29:47Z

nexus/mgs-updates/src/driver_update.rs

+            // We'll loop for 3 minutes to wait for any ongoing RoT bootloader update.
+            // We need to wait for 2 resets which have a timeout of 60 seconds each,
+            // and an attempt to retrieve boot info, which has a time out of 30 seconds.
+            // We give an additional 30 seconds to as a buffer for the other actions.
+            Ok(PrecheckStatus::WaitingForOngoingRotBootloaderUpdate) => {
+                if before.elapsed()
+                    >= WAIT_FOR_ONGOING_ROT_BOOTLOADER_UPDATE_TIMEOUT
+                {
+                    return Err(UpdateWaitError::Timeout(
+                        WAIT_FOR_ONGOING_ROT_BOOTLOADER_UPDATE_TIMEOUT,
+                    ));
+                }
+
+                tokio::time::sleep(ROT_BOOLOADER_UPDATE_PROGRESS_INTERVAL)
+                    .await;
+                continue;
+            }


I think not. More on this in my comment above.

davepacheco · 2025-06-25T18:37:13Z

nexus/mgs-updates/src/common_sp_update.rs

+    #[error("invalid RoT bootloader image: {error:?}")]
+    RotBootloaderImageError { error: RotImageError },


I'm curious for @jgallagher's take on this but it would seem nice to me if the generic parts of this package (this file, the driver, and apply_update) didn't know so much about specific devices. This would preclude this type from including more specific typed errors like RotImageError, but I believe the only thing consumers of this error type care about is that the error is fatal to the update attempt.

So I'd consider renaming RotCommunicationFailed to TransientError and RotBootloaderImageError to FatalError. Both would just contain message: String.

Yeah, that makes sense. Specifically, RotBootloaderImageError doesn't really mean anything without context. I'll make these more generic

Done in e06418e

davepacheco · 2025-06-25T18:44:09Z

nexus/mgs-updates/src/driver_update.rs

-                error!(log, "post_update failed"; &error);
-                return Err(ApplyUpdateError::SpResetFailed(error.to_string()));
+            match error {
+                PostUpdateError::GatewayClientError(error) => {


suggestion: add a is_transient() (or is_fatal()) to PostUpdateError. Then replace this whole match block with:

if !error.is_transient() { let error = InlineErrorChain::new(&error); error!(log, "post_update failed"; &error); return Err(ApplyUpdateError::SpResetFailed( error.to_string(), )); }

karencfv

Thanks for taking a look @davepacheco! I've made some changes and I'll finish the rest tomorrow hopefully.

karencfv · 2025-06-26T07:42:56Z

nexus/mgs-updates/src/common_sp_update.rs

+    #[error("invalid RoT bootloader image: {error:?}")]
+    RotBootloaderImageError { error: RotImageError },


Yeah, that makes sense. Specifically, RotBootloaderImageError doesn't really mean anything without context. I'll make these more generic

nexus/mgs-updates/src/driver_update.rs

karencfv · 2025-06-26T08:12:02Z

nexus/mgs-updates/src/rot_bootloader_updater.rs

+                            // The name for the SP component here is STAGE0
+                            // it's a little confusing because we're really
+                            // trying to reach STAGE0NEXT, and there is no
+                            // ROT_BOOTLOADER variant. We specify that we
+                            // want STAGE0NEXT by setting the firmware slot
+                            // to 1, which is where it will always be.


I think what confused me, is that in the case of the SP you have the overall component SpComponent::SP_ITSELF which has two slots 0 and 1. And for the bootloader you have the component SpComponent::STAGE0 which has two slots stage0 (0) and stage0_next (1). So the name of the component is basically just the name of the active slot. This makes it really weird when you want to fetch information about stage0_next! I have to do

mgs_client..sp_component_caboose_get( update.sp_type, update.slot_id, &SpComponent::STAGE0.to_string(), // This is the name of the active slot! 1, )

Instead of something like

mgs_client..sp_component_caboose_get( update.sp_type, update.slot_id, &SpComponent::ROT_BOOTLOADER.to_string(), 1, )

karencfv · 2025-06-26T08:17:34Z

nexus/mgs-updates/src/rot_bootloader_updater.rs

+            // TODO-K: In post_update we'll be restarting the RoT twice to do signature
+            // checks, and to set stage0 to the new version. What happens if the RoT
+            // itself is being updated (during the reset stage)? Should we check for that
+            // here before setting the RoT bootloader as ready to update?


I'm a little nervous that a bunch of resets setting versions of two different components will leave one of the two in a state where the device is no longer capable of updating. Is this possible?

karencfv · 2025-06-26T08:20:29Z

nexus/mgs-updates/src/rot_bootloader_updater.rs

+        // TODO-K: Again, we're resetting the ROT twice here, what happens
+        // if an RoT update is happening at the same time?
+


I think we can't hit the window I was worried about because I believe @lzrd mentioned the device cannot process an externally-requested reset during this window.

I'm a little worried about this case, I'll prod him again when he's back from leave and double check this. I'll document what he tells me :)

nexus/mgs-updates/src/rot_updater.rs

karencfv

Thanks for taking the time to review @davepacheco! I think I've addressed all of your comments.

Manual testing:

Before update:

coatlicue@centzon:~/src/omicron$ target/debug/omdb --dns-server [::1]:64561 db inventory collections show latest sp
<...>
Switch SimSidecar1
    part number: FAKE_SIM_SIDECAR
    power:    A2
    revision: 0
    MGS slot: Switch 1
    found at: 2025-06-27 00:22:32.100950 UTC from http://[::1]:33476
    cabooses:
        SLOT       BOARD        NAME          VERSION GIT_COMMIT SIGN                                                             
        SpSlot0    SimSidecarSp SimSidecar    0.0.2   ffffffff   n/a                                                              
        SpSlot1    SimSidecarSp SimSidecar    0.0.1   fefefefe   n/a                                                              
        RotSlotA   SimRot       SimSidecarRot 0.0.4   eeeeeeee   11594bb5548a757e918e6fe056e2ad9e084297c9555417a025d8788eacf55daf 
        RotSlotB   SimRot       SimSidecarRot 0.0.3   edededed   11594bb5548a757e918e6fe056e2ad9e084297c9555417a025d8788eacf55daf 
        Stage0     SimRotStage0 SimSidecarRot 0.0.200 ddddddddd  11594bb5548a757e918e6fe056e2ad9e084297c9555417a025d8788eacf55daf 
        Stage0Next SimRotStage0 SimSidecarRot 0.0.200 dadadadad  11594bb5548a757e918e6fe056e2ad9e084297c9555417a025d8788eacf55daf 
    RoT pages:
        SLOT         DATA_BASE64                         
        Cmpa         c2lkZWNhci1jbXBhAAAAAAAAAAAAAAAA... 
        CfpaActive   c2lkZWNhci1jZnBhLWFjdGl2ZQAAAAAA... 
        CfpaInactive c2lkZWNhci1jZnBhLWluYWN0aXZlAAAA... 
        CfpaScratch  c2lkZWNhci1jZnBhLXNjcmF0Y2gAAAAA... 
    RoT: active slot: slot A
    RoT: persistent boot preference: slot A
    RoT: pending persistent boot preference: -
    RoT: transient boot preference: -
    RoT: slot A SHA3-256: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
    RoT: slot B SHA3-256: bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb

Letting reconfigurator-sp-updater run an update attempt twice. One where it completes the update, and another where it finds no changes needed.

〉set SimSidecar1 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236 1.0.0 rot-bootloader -a 0.0.200 -i 0.0.200
updated configuration for SimSidecar1Jun 27 01:37:07.971 INFO begin update attempt for baseboard, update_id: f5fcc5d6-29cf-4f02-8d8d-acd7db44fefd, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 27 01:37:08.028 DEBG client request, body: None, uri: http://[::]:49985/artifact/sha256/005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, method: GET, repo_depot_url: http://[::]:49985
Jun 27 01:37:08.029 DEBG client response, result: Ok(Response { url: "http://[::]:49985/artifact/sha256/005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236", status: 200, headers: {"content-type": "application/octet-stream", "x-request-id": "a505f129-b737-4b2e-b460-9f4467de6ffd", "content-length": "750", "date": "Fri, 27 Jun 2025 01:37:08 GMT"} }), repo_depot_url: http://[::]:49985
Jun 27 01:37:08.030 DEBG loaded artifact contents, update_id: f5fcc5d6-29cf-4f02-8d8d-acd7db44fefd, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 27 01:37:08.030 DEBG client request, body: None, uri: http://[::1]:60958/sp/switch/1, method: GET, mgs_backend_addr: [::1]:60958, mgs_backend_name: dendrite-b6d65341-167c-41df-9b5c-41cded99c229.host.control-plane.oxide.internal., update_id: f5fcc5d6-29cf-4f02-8d8d-acd7db44fefd, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 27 01:37:08.031 DEBG client response, result: Ok(Response { url: "http://[::1]:60958/sp/switch/1", status: 200, headers: {"content-type": "application/json", "x-request-id": "b83006af-812a-4c39-9981-87e9aea0ab7f", "content-length": "734", "date": "Fri, 27 Jun 2025 01:37:07 GMT"} }), mgs_backend_addr: [::1]:60958, mgs_backend_name: dendrite-b6d65341-167c-41df-9b5c-41cded99c229.host.control-plane.oxide.internal., update_id: f5fcc5d6-29cf-4f02-8d8d-acd7db44fefd, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 27 01:37:08.032 DEBG found SP state, state: SpState { base_mac_address: [0, 0, 0, 0, 0, 0], hubris_archive_id: "0000000000000000", model: "FAKE_SIM_SIDECAR", power_state: A2, revision: 0, rot: V3 { active: A, pending_persistent_boot_preference: None, persistent_boot_preference: A, slot_a_error: None, slot_a_fwid: "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa", slot_b_error: None, slot_b_fwid: "bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb", stage0_error: None, stage0_fwid: "cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc", stage0next_error: None, stage0next_fwid: "dddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddd", transient_boot_preference: None }, serial_number: "SimSidecar1" }, update_id: f5fcc5d6-29cf-4f02-8d8d-acd7db44fefd, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 27 01:37:08.032 DEBG client request, body: None, uri: http://[::1]:60958/sp/switch/1/component/stage0/caboose?firmware_slot=0, method: GET, mgs_backend_addr: [::1]:60958, mgs_backend_name: dendrite-b6d65341-167c-41df-9b5c-41cded99c229.host.control-plane.oxide.internal., update_id: f5fcc5d6-29cf-4f02-8d8d-acd7db44fefd, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 27 01:37:08.033 DEBG client response, result: Ok(Response { url: "http://[::1]:60958/sp/switch/1/component/stage0/caboose?firmware_slot=0", status: 200, headers: {"content-type": "application/json", "x-request-id": "c1a24c95-84f5-4796-97ed-5f6efc434050", "content-length": "179", "date": "Fri, 27 Jun 2025 01:37:07 GMT"} }), mgs_backend_addr: [::1]:60958, mgs_backend_name: dendrite-b6d65341-167c-41df-9b5c-41cded99c229.host.control-plane.oxide.internal., update_id: f5fcc5d6-29cf-4f02-8d8d-acd7db44fefd, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 27 01:37:08.033 DEBG found active slot caboose, caboose: SpComponentCaboose { board: "SimRotStage0", epoch: None, git_commit: "ddddddddd", name: "SimSidecarRot", sign: Some("11594bb5548a757e918e6fe056e2ad9e084297c9555417a025d8788eacf55daf"), version: "0.0.200" }, update_id: f5fcc5d6-29cf-4f02-8d8d-acd7db44fefd, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 27 01:37:08.034 DEBG client request, body: None, uri: http://[::1]:60958/sp/switch/1/component/stage0/caboose?firmware_slot=1, method: GET, mgs_backend_addr: [::1]:60958, mgs_backend_name: dendrite-b6d65341-167c-41df-9b5c-41cded99c229.host.control-plane.oxide.internal., update_id: f5fcc5d6-29cf-4f02-8d8d-acd7db44fefd, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 27 01:37:08.034 DEBG client response, result: Ok(Response { url: "http://[::1]:60958/sp/switch/1/component/stage0/caboose?firmware_slot=1", status: 200, headers: {"content-type": "application/json", "x-request-id": "7e3f2725-6dd9-46b4-a1a6-e829183f2ec6", "content-length": "179", "date": "Fri, 27 Jun 2025 01:37:08 GMT"} }), mgs_backend_addr: [::1]:60958, mgs_backend_name: dendrite-b6d65341-167c-41df-9b5c-41cded99c229.host.control-plane.oxide.internal., update_id: f5fcc5d6-29cf-4f02-8d8d-acd7db44fefd, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 27 01:37:08.035 DEBG ready to start update, update_id: f5fcc5d6-29cf-4f02-8d8d-acd7db44fefd, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 27 01:37:08.035 DEBG client request, body: Some(Body), uri: http://[::1]:60958/sp/switch/1/component/stage0/update?firmware_slot=1&id=f5fcc5d6-29cf-4f02-8d8d-acd7db44fefd, method: POST, mgs_backend_addr: [::1]:60958, mgs_backend_name: dendrite-b6d65341-167c-41df-9b5c-41cded99c229.host.control-plane.oxide.internal., update_id: f5fcc5d6-29cf-4f02-8d8d-acd7db44fefd, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 27 01:37:08.036 DEBG client response, result: Ok(Response { url: "http://[::1]:60958/sp/switch/1/component/stage0/update?firmware_slot=1&id=f5fcc5d6-29cf-4f02-8d8d-acd7db44fefd", status: 204, headers: {"x-request-id": "6fc35ed5-766e-4cfb-a6fa-3fdb0e339980", "date": "Fri, 27 Jun 2025 01:37:08 GMT"} }), mgs_backend_addr: [::1]:60958, mgs_backend_name: dendrite-b6d65341-167c-41df-9b5c-41cded99c229.host.control-plane.oxide.internal., update_id: f5fcc5d6-29cf-4f02-8d8d-acd7db44fefd, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 27 01:37:08.036 INFO update started, mgs_addr: http://[::1]:60958, update_id: f5fcc5d6-29cf-4f02-8d8d-acd7db44fefd, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 27 01:37:08.036 DEBG started update, update_id: f5fcc5d6-29cf-4f02-8d8d-acd7db44fefd, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 27 01:37:08.036 DEBG client request, body: None, uri: http://[::1]:60958/sp/switch/1/component/stage0/update-status, method: GET, mgs_backend_addr: [::1]:60958, mgs_backend_name: dendrite-b6d65341-167c-41df-9b5c-41cded99c229.host.control-plane.oxide.internal., update_id: f5fcc5d6-29cf-4f02-8d8d-acd7db44fefd, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 27 01:37:08.037 DEBG client response, result: Ok(Response { url: "http://[::1]:60958/sp/switch/1/component/stage0/update-status", status: 200, headers: {"content-type": "application/json", "x-request-id": "7d43f685-1d75-456c-8240-183c8bab00b5", "content-length": "107", "date": "Fri, 27 Jun 2025 01:37:07 GMT"} }), mgs_backend_addr: [::1]:60958, mgs_backend_name: dendrite-b6d65341-167c-41df-9b5c-41cded99c229.host.control-plane.oxide.internal., update_id: f5fcc5d6-29cf-4f02-8d8d-acd7db44fefd, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 27 01:37:08.037 DEBG got update status, status: InProgress { bytes_received: 978, id: f5fcc5d6-29cf-4f02-8d8d-acd7db44fefd, total_bytes: 1024 }, mgs_addr: http://[::1]:60958, update_id: f5fcc5d6-29cf-4f02-8d8d-acd7db44fefd, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 27 01:37:11.038 DEBG client request, body: None, uri: http://[::1]:60958/sp/switch/1/component/stage0/update-status, method: GET, mgs_backend_addr: [::1]:60958, mgs_backend_name: dendrite-b6d65341-167c-41df-9b5c-41cded99c229.host.control-plane.oxide.internal., update_id: f5fcc5d6-29cf-4f02-8d8d-acd7db44fefd, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 27 01:37:11.039 DEBG client response, result: Ok(Response { url: "http://[::1]:60958/sp/switch/1/component/stage0/update-status", status: 200, headers: {"content-type": "application/json", "x-request-id": "37b595ae-7538-4739-84a1-748b90c12f31", "content-length": "64", "date": "Fri, 27 Jun 2025 01:37:11 GMT"} }), mgs_backend_addr: [::1]:60958, mgs_backend_name: dendrite-b6d65341-167c-41df-9b5c-41cded99c229.host.control-plane.oxide.internal., update_id: f5fcc5d6-29cf-4f02-8d8d-acd7db44fefd, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 27 01:37:11.040 DEBG got update status, status: Complete { id: f5fcc5d6-29cf-4f02-8d8d-acd7db44fefd }, mgs_addr: http://[::1]:60958, update_id: f5fcc5d6-29cf-4f02-8d8d-acd7db44fefd, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 27 01:37:11.040 DEBG delivered artifact, update_id: f5fcc5d6-29cf-4f02-8d8d-acd7db44fefd, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 27 01:37:11.041 DEBG attempting to reset device to do bootloader signature check, update_id: f5fcc5d6-29cf-4f02-8d8d-acd7db44fefd, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 27 01:37:11.041 DEBG client request, body: None, uri: http://[::1]:60958/sp/switch/1/component/rot/reset, method: POST, mgs_backend_addr: [::1]:60958, mgs_backend_name: dendrite-b6d65341-167c-41df-9b5c-41cded99c229.host.control-plane.oxide.internal., update_id: f5fcc5d6-29cf-4f02-8d8d-acd7db44fefd, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 27 01:37:11.042 DEBG client response, result: Ok(Response { url: "http://[::1]:60958/sp/switch/1/component/rot/reset", status: 204, headers: {"x-request-id": "0fdf52b5-430f-47c8-9b13-0689b6fd9840", "date": "Fri, 27 Jun 2025 01:37:11 GMT"} }), mgs_backend_addr: [::1]:60958, mgs_backend_name: dendrite-b6d65341-167c-41df-9b5c-41cded99c229.host.control-plane.oxide.internal., update_id: f5fcc5d6-29cf-4f02-8d8d-acd7db44fefd, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 27 01:37:11.042 DEBG attempting to retrieve boot info to verify image validity, update_id: f5fcc5d6-29cf-4f02-8d8d-acd7db44fefd, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 27 01:37:11.043 DEBG client request, body: Some(Body), uri: http://[::1]:60958/sp/switch/1/component/rot/rot-boot-info, method: GET, mgs_backend_addr: [::1]:60958, mgs_backend_name: dendrite-b6d65341-167c-41df-9b5c-41cded99c229.host.control-plane.oxide.internal., update_id: f5fcc5d6-29cf-4f02-8d8d-acd7db44fefd, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 27 01:37:11.043 DEBG client response, result: Ok(Response { url: "http://[::1]:60958/sp/switch/1/component/rot/rot-boot-info", status: 200, headers: {"content-type": "application/json", "x-request-id": "e3cbdcea-302b-4482-819b-c247ab1aa8bb", "content-length": "565", "date": "Fri, 27 Jun 2025 01:37:10 GMT"} }), mgs_backend_addr: [::1]:60958, mgs_backend_name: dendrite-b6d65341-167c-41df-9b5c-41cded99c229.host.control-plane.oxide.internal., update_id: f5fcc5d6-29cf-4f02-8d8d-acd7db44fefd, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 27 01:37:11.043 DEBG attempting to set RoT bootloader active slot, update_id: f5fcc5d6-29cf-4f02-8d8d-acd7db44fefd, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 27 01:37:11.044 DEBG client request, body: Some(Body), uri: http://[::1]:60958/sp/switch/1/component/stage0/active-slot?persist=true, method: POST, mgs_backend_addr: [::1]:60958, mgs_backend_name: dendrite-b6d65341-167c-41df-9b5c-41cded99c229.host.control-plane.oxide.internal., update_id: f5fcc5d6-29cf-4f02-8d8d-acd7db44fefd, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 27 01:37:11.044 DEBG client response, result: Ok(Response { url: "http://[::1]:60958/sp/switch/1/component/stage0/active-slot?persist=true", status: 204, headers: {"x-request-id": "546d31b5-f5f8-46c7-a7e0-30d96d66b3d0", "date": "Fri, 27 Jun 2025 01:37:11 GMT"} }), mgs_backend_addr: [::1]:60958, mgs_backend_name: dendrite-b6d65341-167c-41df-9b5c-41cded99c229.host.control-plane.oxide.internal., update_id: f5fcc5d6-29cf-4f02-8d8d-acd7db44fefd, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 27 01:37:11.044 DEBG attempting to reset device to set to new RoT bootloader version, update_id: f5fcc5d6-29cf-4f02-8d8d-acd7db44fefd, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 27 01:37:11.044 DEBG client request, body: None, uri: http://[::1]:60958/sp/switch/1/component/rot/reset, method: POST, mgs_backend_addr: [::1]:60958, mgs_backend_name: dendrite-b6d65341-167c-41df-9b5c-41cded99c229.host.control-plane.oxide.internal., update_id: f5fcc5d6-29cf-4f02-8d8d-acd7db44fefd, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 27 01:37:11.045 DEBG client response, result: Ok(Response { url: "http://[::1]:60958/sp/switch/1/component/rot/reset", status: 204, headers: {"x-request-id": "50ed22e7-01e0-4681-aec8-79556e64641e", "date": "Fri, 27 Jun 2025 01:37:11 GMT"} }), mgs_backend_addr: [::1]:60958, mgs_backend_name: dendrite-b6d65341-167c-41df-9b5c-41cded99c229.host.control-plane.oxide.internal., update_id: f5fcc5d6-29cf-4f02-8d8d-acd7db44fefd, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 27 01:37:11.045 DEBG client request, body: None, uri: http://[::1]:60958/sp/switch/1, method: GET, mgs_backend_addr: [::1]:60958, mgs_backend_name: dendrite-b6d65341-167c-41df-9b5c-41cded99c229.host.control-plane.oxide.internal., update_id: f5fcc5d6-29cf-4f02-8d8d-acd7db44fefd, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 27 01:37:11.045 DEBG client response, result: Ok(Response { url: "http://[::1]:60958/sp/switch/1", status: 200, headers: {"content-type": "application/json", "x-request-id": "734236b8-71ff-48aa-88e1-f9c82fc21e8f", "content-length": "734", "date": "Fri, 27 Jun 2025 01:37:11 GMT"} }), mgs_backend_addr: [::1]:60958, mgs_backend_name: dendrite-b6d65341-167c-41df-9b5c-41cded99c229.host.control-plane.oxide.internal., update_id: f5fcc5d6-29cf-4f02-8d8d-acd7db44fefd, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 27 01:37:11.046 DEBG found SP state, state: SpState { base_mac_address: [0, 0, 0, 0, 0, 0], hubris_archive_id: "0000000000000000", model: "FAKE_SIM_SIDECAR", power_state: A2, revision: 0, rot: V3 { active: A, pending_persistent_boot_preference: None, persistent_boot_preference: A, slot_a_error: None, slot_a_fwid: "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa", slot_b_error: None, slot_b_fwid: "bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb", stage0_error: None, stage0_fwid: "01368372b4c730e54ef9efe240bea5e9d277a3708ddd7eac7115727fde52dda4", stage0next_error: None, stage0next_fwid: "01368372b4c730e54ef9efe240bea5e9d277a3708ddd7eac7115727fde52dda4", transient_boot_preference: None }, serial_number: "SimSidecar1" }, update_id: f5fcc5d6-29cf-4f02-8d8d-acd7db44fefd, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 27 01:37:11.046 DEBG client request, body: None, uri: http://[::1]:60958/sp/switch/1/component/stage0/caboose?firmware_slot=0, method: GET, mgs_backend_addr: [::1]:60958, mgs_backend_name: dendrite-b6d65341-167c-41df-9b5c-41cded99c229.host.control-plane.oxide.internal., update_id: f5fcc5d6-29cf-4f02-8d8d-acd7db44fefd, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 27 01:37:11.046 DEBG client response, result: Ok(Response { url: "http://[::1]:60958/sp/switch/1/component/stage0/caboose?firmware_slot=0", status: 200, headers: {"content-type": "application/json", "x-request-id": "6b26ea4f-c653-44d0-ac73-d353618f32d6", "content-length": "132", "date": "Fri, 27 Jun 2025 01:37:11 GMT"} }), mgs_backend_addr: [::1]:60958, mgs_backend_name: dendrite-b6d65341-167c-41df-9b5c-41cded99c229.host.control-plane.oxide.internal., update_id: f5fcc5d6-29cf-4f02-8d8d-acd7db44fefd, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 27 01:37:11.047 DEBG found active slot caboose, caboose: SpComponentCaboose { board: "SimRotStage0", epoch: None, git_commit: "this-is-fake-data", name: "SimRotStage0", sign: Some("SimRotStage0"), version: "1.0.0" }, update_id: f5fcc5d6-29cf-4f02-8d8d-acd7db44fefd, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 27 01:37:11.047 DEBG precheck result, precheck: Ok(UpdateComplete), update_id: f5fcc5d6-29cf-4f02-8d8d-acd7db44fefd, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 27 01:37:11.047 DEBG client request, body: None, uri: http://[::1]:60958/sp/switch/1, method: GET, mgs_backend_addr: [::1]:60958, mgs_backend_name: dendrite-b6d65341-167c-41df-9b5c-41cded99c229.host.control-plane.oxide.internal., update_id: f5fcc5d6-29cf-4f02-8d8d-acd7db44fefd, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 27 01:37:11.048 DEBG client response, result: Ok(Response { url: "http://[::1]:60958/sp/switch/1", status: 200, headers: {"content-type": "application/json", "x-request-id": "d7605e95-518f-45a4-a295-e136a6eab863", "content-length": "734", "date": "Fri, 27 Jun 2025 01:37:11 GMT"} }), mgs_backend_addr: [::1]:60958, mgs_backend_name: dendrite-b6d65341-167c-41df-9b5c-41cded99c229.host.control-plane.oxide.internal., update_id: f5fcc5d6-29cf-4f02-8d8d-acd7db44fefd, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 27 01:37:11.048 DEBG found SP state, state: SpState { base_mac_address: [0, 0, 0, 0, 0, 0], hubris_archive_id: "0000000000000000", model: "FAKE_SIM_SIDECAR", power_state: A2, revision: 0, rot: V3 { active: A, pending_persistent_boot_preference: None, persistent_boot_preference: A, slot_a_error: None, slot_a_fwid: "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa", slot_b_error: None, slot_b_fwid: "bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb", stage0_error: None, stage0_fwid: "01368372b4c730e54ef9efe240bea5e9d277a3708ddd7eac7115727fde52dda4", stage0next_error: None, stage0next_fwid: "01368372b4c730e54ef9efe240bea5e9d277a3708ddd7eac7115727fde52dda4", transient_boot_preference: None }, serial_number: "SimSidecar1" }, update_id: f5fcc5d6-29cf-4f02-8d8d-acd7db44fefd, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 27 01:37:11.048 DEBG client request, body: None, uri: http://[::1]:60958/sp/switch/1/component/stage0/caboose?firmware_slot=0, method: GET, mgs_backend_addr: [::1]:60958, mgs_backend_name: dendrite-b6d65341-167c-41df-9b5c-41cded99c229.host.control-plane.oxide.internal., update_id: f5fcc5d6-29cf-4f02-8d8d-acd7db44fefd, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 27 01:37:11.048 DEBG client response, result: Ok(Response { url: "http://[::1]:60958/sp/switch/1/component/stage0/caboose?firmware_slot=0", status: 200, headers: {"content-type": "application/json", "x-request-id": "9f7413ce-d377-4cda-b279-702f6bb01076", "content-length": "132", "date": "Fri, 27 Jun 2025 01:37:11 GMT"} }), mgs_backend_addr: [::1]:60958, mgs_backend_name: dendrite-b6d65341-167c-41df-9b5c-41cded99c229.host.control-plane.oxide.internal., update_id: f5fcc5d6-29cf-4f02-8d8d-acd7db44fefd, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 27 01:37:11.049 DEBG found active slot caboose, caboose: SpComponentCaboose { board: "SimRotStage0", epoch: None, git_commit: "this-is-fake-data", name: "SimRotStage0", sign: Some("SimRotStage0"), version: "1.0.0" }, update_id: f5fcc5d6-29cf-4f02-8d8d-acd7db44fefd, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 27 01:37:11.049 INFO update attempt done, result: CompletedUpdate, elapsed_millis: 3076, update_id: f5fcc5d6-29cf-4f02-8d8d-acd7db44fefd, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 27 01:37:31.048 INFO dispatching new attempt (retry timer expired), part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1
Jun 27 01:37:31.048 INFO begin update attempt for baseboard, update_id: d19527d1-0c0b-4707-8333-1df802ba6440, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 27 01:37:31.098 DEBG client request, body: None, uri: http://[::]:49985/artifact/sha256/005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, method: GET, repo_depot_url: http://[::]:49985
Jun 27 01:37:31.099 DEBG client response, result: Ok(Response { url: "http://[::]:49985/artifact/sha256/005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236", status: 200, headers: {"content-type": "application/octet-stream", "x-request-id": "f5da3756-a4ee-48f2-befa-fd26cf4808e3", "content-length": "750", "date": "Fri, 27 Jun 2025 01:37:31 GMT"} }), repo_depot_url: http://[::]:49985
Jun 27 01:37:31.101 DEBG loaded artifact contents, update_id: d19527d1-0c0b-4707-8333-1df802ba6440, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 27 01:37:31.101 DEBG client request, body: None, uri: http://[::1]:60958/sp/switch/1, method: GET, mgs_backend_addr: [::1]:60958, mgs_backend_name: dendrite-b6d65341-167c-41df-9b5c-41cded99c229.host.control-plane.oxide.internal., update_id: d19527d1-0c0b-4707-8333-1df802ba6440, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 27 01:37:31.103 DEBG client response, result: Ok(Response { url: "http://[::1]:60958/sp/switch/1", status: 200, headers: {"content-type": "application/json", "x-request-id": "4d6237ea-1f04-4348-a40f-f44f4520ae9e", "content-length": "734", "date": "Fri, 27 Jun 2025 01:37:31 GMT"} }), mgs_backend_addr: [::1]:60958, mgs_backend_name: dendrite-b6d65341-167c-41df-9b5c-41cded99c229.host.control-plane.oxide.internal., update_id: d19527d1-0c0b-4707-8333-1df802ba6440, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 27 01:37:31.103 DEBG found SP state, state: SpState { base_mac_address: [0, 0, 0, 0, 0, 0], hubris_archive_id: "0000000000000000", model: "FAKE_SIM_SIDECAR", power_state: A2, revision: 0, rot: V3 { active: A, pending_persistent_boot_preference: None, persistent_boot_preference: A, slot_a_error: None, slot_a_fwid: "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa", slot_b_error: None, slot_b_fwid: "bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb", stage0_error: None, stage0_fwid: "01368372b4c730e54ef9efe240bea5e9d277a3708ddd7eac7115727fde52dda4", stage0next_error: None, stage0next_fwid: "01368372b4c730e54ef9efe240bea5e9d277a3708ddd7eac7115727fde52dda4", transient_boot_preference: None }, serial_number: "SimSidecar1" }, update_id: d19527d1-0c0b-4707-8333-1df802ba6440, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 27 01:37:31.104 DEBG client request, body: None, uri: http://[::1]:60958/sp/switch/1/component/stage0/caboose?firmware_slot=0, method: GET, mgs_backend_addr: [::1]:60958, mgs_backend_name: dendrite-b6d65341-167c-41df-9b5c-41cded99c229.host.control-plane.oxide.internal., update_id: d19527d1-0c0b-4707-8333-1df802ba6440, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 27 01:37:31.104 DEBG client response, result: Ok(Response { url: "http://[::1]:60958/sp/switch/1/component/stage0/caboose?firmware_slot=0", status: 200, headers: {"content-type": "application/json", "x-request-id": "899c1dde-366b-4ce6-b4de-530e9128aeb2", "content-length": "132", "date": "Fri, 27 Jun 2025 01:37:31 GMT"} }), mgs_backend_addr: [::1]:60958, mgs_backend_name: dendrite-b6d65341-167c-41df-9b5c-41cded99c229.host.control-plane.oxide.internal., update_id: d19527d1-0c0b-4707-8333-1df802ba6440, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 27 01:37:31.105 DEBG found active slot caboose, caboose: SpComponentCaboose { board: "SimRotStage0", epoch: None, git_commit: "this-is-fake-data", name: "SimRotStage0", sign: Some("SimRotStage0"), version: "1.0.0" }, update_id: d19527d1-0c0b-4707-8333-1df802ba6440, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 27 01:37:31.106 INFO update attempt done, result: FoundNoChangesNeeded, elapsed_millis: 57, update_id: d19527d1-0c0b-4707-8333-1df802ba6440, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0

After the update:

$ ./target/debug/faux-mgs --sp-sim-addr [::1]:56988 read-component-caboose --component stage0 -s 0 VERS
Jun 27 01:38:23.194 INFO creating SP handle on to talk to SP simulator at [::1]:56988, component: faux-mgs
Jun 27 01:38:23.195 INFO initial discovery complete, addr: [::1]:56988, component: faux-mgs
1.0.0

If there isn't anything further to change and you approve this PR, would you mind hitting the merge button as well? (I try my best not to log into work stuff while on vacation 😅 )

karencfv · 2025-06-27T01:41:50Z

nexus/mgs-updates/src/driver_update.rs

+        // This is the first time a Nexus instance is attempting to
+        // update the RoT bootloader, we don't need to wait for an
+        // ongoing update.
+        Ok(PrecheckStatus::WaitingForOngoingRotBootloaderUpdate) => (),


karencfv · 2025-06-27T01:42:42Z

nexus/mgs-updates/src/rot_bootloader_updater.rs

+    sp_slot: u32,
+    timeout: Duration,
+) -> Result<Option<RotImageError>, PostUpdateError> {
+    let mut ticker = tokio::time::interval(Duration::from_secs(1));


karencfv · 2025-06-27T01:45:43Z

nexus/mgs-updates/src/rot_bootloader_updater.rs

+                    }
+                }
+            },
+            Err(error) => {


It could be a communication error because the SP itself is rebooting. Doesn't hurt to wait a bit? 🤷‍♀️

davepacheco · 2025-07-02T20:42:51Z

nexus/mgs-updates/src/rot_bootloader_updater.rs

+                        return Err(PostUpdateError::TransientError {
+                            message,
+                        });


I feel like maybe we should wait longer here (WAIT_FOR_BOOT_INFO_TIMEOUT longer, like maybe 2m) because it feels like we really don't want to hit this if the device was going to come back. If we hit this, we're going to wind up returning TransientError, which will cause the caller to call post_update() again, which will reset the device again before polling again. Or maybe we should return a PermanentError here?

It looks like for the SP, post_update() only does the reset and then apply_update() will retry precheck() in a loop, which is more like what we want. The simplest way to mimic that here would be to have post_update() for the RoT bootloader retry a lot longer and return a PermanentError when it gives up.

davepacheco · 2025-07-02T20:44:42Z

nexus/mgs-updates/src/rot_bootloader_updater.rs

+                });
+            }
+
+            // This operation is very delicate.  Here, we're overwriting the device


I love this comment 😆 but it needs to be rewrapped (it runs to over 80 columns).

davepacheco · 2025-07-02T20:46:50Z

nexus/mgs-updates/src/rot_bootloader_updater.rs

+            match (&expected_stage0_next_version, &found_stage0_next_version) {
+                // expected garbage, found garbage
+                (
+                    ExpectedVersion::NoValidVersion,
+                    FoundVersion::MissingVersion,
+                ) => (),
+                // expected a specific version and found it
+                (
+                    ExpectedVersion::Version(artifact_version),
+                    FoundVersion::Version(found_stage0_next_version),
+                ) if artifact_version.to_string()
+                    == *found_stage0_next_version =>
+                {
+                    ()
+                }
+                // anything else is a mismatch
+                (ExpectedVersion::NoValidVersion, FoundVersion::Version(_))
+                | (ExpectedVersion::Version(_), FoundVersion::MissingVersion)
+                | (ExpectedVersion::Version(_), FoundVersion::Version(_)) => {
+                    return Err(PrecheckError::WrongInactiveVersion {
+                        expected: expected_stage0_next_version.clone(),
+                        found: found_stage0_next_version,
+                    });
+                }
+            };


This isn't critical, but it feels like this (and maybe even going back to L139) could be commonized between SP, RoT, and RoT bootloader. Maybe a method like ExpectedVersion::matches(&self, found: FoundVersion)?

davepacheco · 2025-07-02T20:48:36Z

nexus/mgs-updates/src/driver_update.rs

+/// With the RoT bootloader need to wait for 2 resets which have a timeout
+/// of 60 seconds each, and an attempt to retrieve boot info, which has a
+/// time out of 30 seconds. We then give ourselves a few more minutes to act
+/// as a buffer for other pending actions.


Suggested change

/// With the RoT bootloader need to wait for 2 resets which have a timeout

/// of 60 seconds each, and an attempt to retrieve boot info, which has a

/// time out of 30 seconds. We then give ourselves a few more minutes to act

/// as a buffer for other pending actions.

/// With the RoT bootloader, we need to wait for 2 resets, which have a timeout

/// of 60 seconds each; plus an attempt to retrieve boot info, which has a

/// timeout of 30 seconds. We then give ourselves a few more minutes to act

/// as a buffer for other pending actions.

alternatively, I'm wondering if we should be a lot more specific. Something like:

// Generally, this value covers two different things: // // 1. While we're uploading an image to the SP or it's being prepared, how long can the status stay the same before we give up altogether and try again? In practice, this would rarely pause for more than a few seconds. // 2. The period where we might wait for an update to complete -- either our own update (in which case this is the period after the final device reset until the device comes up reporting the new version) or another instance's update (in which case this could cover almost the _entire_ update process). // // In both cases, if the timeout is reached, the whole update attempt will fail. This behavior is only intended to deal with pathological cases, like an MGS crash (which could cause an upload to hang indefinitely) or a Nexus crash (which could cause any update to hang indefinitely at any point). So we can afford to be generous here. Further, we really don't want to trip this erroneously in a working system because we're likely to get stuck continuing to retry and give up before each attempt finishes. // // In terms of sizing this timeout: // - For all updates, the upload phase generally takes 10-20 seconds. // - For SP updates, the post-reset phase can take about 30s (with Sidecar SPs being the longest). // - For RoT and RoT bootloader updates, two resets and an intervening "set active slot" operation are required. Together, these could take just a few seconds. // // Adding all the above together, and giving ourselves plenty of margin, we choose 10 minutes.

davepacheco · 2025-07-02T21:01:13Z

nexus/mgs-updates/src/driver_update.rs

    // Check the live state first to see if:
-    // - this update has already been completed, or
+    // - this update has already been completed,
+    // - we are waiting for an ongoing update, or


Suggested change

// - we are waiting for an ongoing update, or

// - we should wait a bit because an update may be in-progress, or

davepacheco · 2025-07-02T21:01:57Z

nexus/mgs-updates/src/driver_update.rs

+            Ok(PrecheckStatus::ReadyForUpdate) => break,
+            Ok(PrecheckStatus::WaitingForOngoingUpdate) => {
+                if before.elapsed() >= progress_timeout {
+                    warn!(


It'd be nice to have it so that the returned how on success in this case reflects a takeover.

Sorry if this question is nonsensical, but: if WaitingForOngoingUpdate wasn't a variant of PrecheckStatus at all, and was instead included as a variant of PrecheckError, could we remove all of this? If we're not ready to start the update yet because there's another update running, that seems consistent with other kinds of PrecheckErrors that fire while another update is running (e.g., when we see WrongInactiveVersion because someone else has already started writing the inactive slot).

@karencfv: @jgallagher and I discussed this offline. I think I've led us down a bit of an unnecessary path here. Going back to my comment here:
#7988 (comment)

All of this:

the new PrecheckStatus::WaitingForOngoingUpdate variant

the looping here when we see that variant

the logic in the RoT bootloader impl that returns it

was just for dealing with case (3) in that comment, which is:

if another Nexus enters apply_update() at this point, how does it know not to try to upload another image?

I now believe that this is sufficiently unlikely even without any of the above code to handle it.

First, a reminder that it's okay if this does happen sometimes. The device will not allow us to brick it. If we kick off a new upload while someone else is about to activate stage0next or if someone resets it while we're uploading it or about to activate stage0next, the worst that happens is that an update attempt fails. What we need is for these things to be sufficiently unlikely that eventually (and quickly) an operation will succeed.

There are two ways that I can see this case happening:

Two Nexus instances doing concurrent updates, both passing precheck before the other has started an upload. Let's say Nexus 1 starts the upload first.
a. It's very likely that Nexus 2 will try to start an upload before Nexus 1 resets the device. That will fail and Nexus 2 will bug out and wait for Nexus 1 to finish -- great.
b. If Nexus 1 resets the device first, then Nexus 2 starts the upload, that could blow Nexus 1's update out of the water because Nexus 2 is changing stage0next. When Nexus 1 goes to activate it, that will fail because it no longer matches a signed image, since the contents have changed. But in this case, Nexus 2's update attempt should complete successfully, unless we somehow hit this again. We could hit it up to three times (as many as there are Nexus instances), but no more than that. That's because once any Nexus has started an upload, stage0next's contents will change, and any other Nexus instance will fail preconditions and abandon any attempt until the planner changes the expected preconditions.

Alternatively: Nexus 1 starts an update and gets as far as the first reset. No other Nexus passed preconditions so there will be no concurrent updates ... except that the planner sees the updated stage0next and generates a new blueprint with new preconditions, allowing Nexus 2 to immediately come in and start another update. This plays out like 1a and 1b above, and again, this is fine as long as it doesn't happen that often.

Case 2 here is very similar to the problem described in #8483 and the solution described there (have the planner wait a few minutes before changing preconditions of any MGS update) should make this very unlikely.

All of this is a bit unsatisfying and could benefit from some more formal modeling. But from what @jgallagher and I could think through, it seems pretty unlikely that we'd get stuck here.

The net result of all of this is that I think we can simplify this PR quite a bit by just ignoring this problem altogether:

rip out the new PrecheckStatus variant

rip out the code that returned it from the RoT bootloader precheck

rip out most of the changes to this function (all the stuff around this comment)

and I'm sorry for leading us down the wrong path!

Thanks for the thorough explanation! This all makes sense to me. Always great to remove complexity so I'm all in. Done.

davepacheco · 2025-07-02T21:04:05Z

nexus/mgs-updates/src/driver_update.rs

+        // error or an RoT bootloader image error.  There is intentionally no
+        // timeout here.  If we've staged an update but not managed to reset
+        // the device, there's no point where we'd want to stop trying to do so.


Suggested change

// error or an RoT bootloader image error. There is intentionally no

// timeout here. If we've staged an update but not managed to reset

// the device, there's no point where we'd want to stop trying to do so.

// error or some other transient error. There is intentionally no

// timeout here. If we've staged an update but not managed to reset

// the device, there's no point where we'd want to stop trying to do so.

davepacheco · 2025-07-02T21:08:05Z

nexus/mgs-updates/src/driver_update.rs

+                if before.elapsed() >= progress_timeout {
+                    warn!(
+                        log,
+                        "update takeover: timed out while waiting for ongoing update"


Do we have a test for this case?

Not necessary anymore since we removed this logic

davepacheco · 2025-07-02T21:09:13Z

Thanks! The behavior here looks a lot better. I've got some suggestions for cleanup. Given how tricky this stuff is it'd be nice to get @jgallagher's eyes on it too, if you've got the time.

jgallagher · 2025-07-03T14:51:08Z

nexus/mgs-updates/src/driver_update.rs

+            Ok(PrecheckStatus::ReadyForUpdate) => break,
+            Ok(PrecheckStatus::WaitingForOngoingUpdate) => {
+                if before.elapsed() >= progress_timeout {
+                    warn!(


Sorry if this question is nonsensical, but: if WaitingForOngoingUpdate wasn't a variant of PrecheckStatus at all, and was instead included as a variant of PrecheckError, could we remove all of this? If we're not ready to start the update yet because there's another update running, that seems consistent with other kinds of PrecheckErrors that fire while another update is running (e.g., when we see WrongInactiveVersion because someone else has already started writing the inactive slot).

jgallagher · 2025-07-03T15:00:57Z

nexus/mgs-updates/src/rot_bootloader_updater.rs

+            } = &update.details
+            else {
+                unreachable!(
+                    "pending MGS update details within ReconfiguratorSpUpdater \


Suggested change

"pending MGS update details within ReconfiguratorSpUpdater \

"pending MGS update details within ReconfiguratorRotBootloaderUpdater \

jgallagher · 2025-07-03T15:03:45Z

nexus/mgs-updates/src/rot_bootloader_updater.rs

+                    if v == found_stage0_version {
+                        Ok(PrecheckStatus::ReadyForUpdate)
+                    } else {
+                        Ok(PrecheckStatus::WaitingForOngoingUpdate)


I think I feel more strongly that this should be a PrecheckError (following up on my comment above). This feels basically the same as a version mismatch that we expect to see because another update is in progress.

(See the other thread on this)

jgallagher · 2025-07-03T15:06:04Z

nexus/mgs-updates/src/rot_bootloader_updater.rs

+            // Before setting stage0 to the new version we want to ensure
+            // the image is good and we're not going to brick the device.


It doesn't seem like we should be able to brick an RoT in any way by sending normal API commands in some kind of bad order. (This is ignoring cases like "lost power at just the wrong time" as described below, since that's a pretty extenuating circumstance.)

If we didn't do this check and the image wasn't good, would it actually brick it?

Yeah if my understanding is right, I think it's more like:

"To protect against bricking itself, the device will only activate a new image after it's been verified. Images are only verified at device boot time. Thus, we'll reset the device once to cause the signature to be verified. Then we can activate the new image and reset the device again."

Yeah, I meant what @davepacheco said. Clearly my English needs improving 😄

jgallagher · 2025-07-03T15:08:07Z

nexus/mgs-updates/src/rot_bootloader_updater.rs

+            // If the image is not valid we bail
+            if let Some(e) = stage0next_error {
+                return Err(PostUpdateError::FatalError {
+                    error: e.to_string(),


Should this use InlineErrorChain? Or even better, could the type of error be whatever the real type of e is to avoid having to stringify it here?

I think we want InlineErrorChain here.

I suggested in the last round of review that PostUpdateError::FatalError just contain a string so that we didn't need PostUpdateError to contain the union of all different errors that each impl might return.

jgallagher · 2025-07-03T15:09:05Z

nexus/mgs-updates/src/rot_bootloader_updater.rs

+                WAIT_FOR_BOOT_INFO_TIMEOUT,
+            )
+            .await?;
+            // If the image is not valid we bail


What would it mean for the update if the image is not valid here? Or maybe: how could this happen?

I've updated the comment for more clarity

jgallagher · 2025-07-03T15:11:06Z

nexus/mgs-updates/src/rot_bootloader_updater.rs

+                // The minimum we will ever return is 3.
+                // Additionally, V2 does not report image errors, so we cannot
+                // know with certainty if a signature check came back with errors
+                RotState::V2 { .. } => unreachable!(),


I think we should return a permanent error here instead of panicking. As far as we know we'll only ever see V3 or later, but this is entirely under the control of an external entity; we should not panic if we get unexpected messages from it.

karencfv · 2025-07-08T05:37:23Z

Thanks @jgallagher and @davepacheco for taking the time to review! I think I've addressed all of your comments. Please let me know if there's anything missing!

jgallagher

Thanks - this is really intricate stuff. Just a few minor nit / wording suggestions.

jgallagher · 2025-07-09T19:36:38Z

nexus/mgs-updates/src/driver_update.rs

    // Check the live state first to see if:
-    // - this update has already been completed, or
+    // - this update has already been completed,
+    // - we should wait a bit because an update may be in-progress, or


Is this comment change still applicable now that we've removed the waiting loop?

Oops! Thanks for catching that!

jgallagher · 2025-07-09T19:39:48Z

nexus/mgs-updates/src/common_sp_update.rs


+impl FoundVersion {
+    pub fn matches(
+        self,


Nit - this should take &self. We can .clone() ourselves in the (rare) error case; better than forcing callers to always clone to call us.

jgallagher · 2025-07-09T19:41:45Z

nexus/mgs-updates/src/rot_bootloader_updater.rs

+            // If boot info contains any error with the image loaded onto
+            // stage0_next, we run the risk of bricking the device if this image
+            // is loaded onto stage0. We return a fatal error.


If there's an error, the device won't let us load it onto stage0, right? (In particular: nothing we can do using the normal API can brick the device?) If that's right, I'd maybe reword this to something like

Suggested change

// If boot info contains any error with the image loaded onto

// stage0_next, we run the risk of bricking the device if this image

// is loaded onto stage0. We return a fatal error.

// If boot info contains any error with the image loaded onto

// stage0_next, the device won't let us load this image onto

// stage0. We return a fatal error.

karencfv added 3 commits June 12, 2025 18:08

[reconfigurator] Pre-checks and post_update actions for RoT bootloade…

2baa799

…r update

building blocks in case I can get PostUpdateError to work

8072c7d

Introduce PostUpdateError

2de9298

This was referenced Jun 11, 2025

MgsUpdateDriver support for RoT bootloader update #7988

Closed

[reconfigurator] rot-bootloader subcommand for reconfigurator-sp-updater #8330

Merged

karencfv added 6 commits June 17, 2025 11:01

implement WaitingForOngoingUpdate variant

f578e35

Merge branch 'main' into rot-bootloader-pre-and-post-actions

fa1ba5e

Implement a waiting time for an ongoing RoT bootloader update

3ced091

fmt

c15ca08

Improved error handling

4fb917a

Clean up

dff0254

karencfv commented Jun 17, 2025

View reviewed changes

karencfv marked this pull request as ready for review June 17, 2025 05:49

karencfv requested review from davepacheco and lzrd June 17, 2025 05:49

karencfv added 2 commits June 18, 2025 17:56

Merge branch 'main' into rot-bootloader-pre-and-post-actions

c6be847

add some tests

25b0f40

davepacheco reviewed Jun 25, 2025

View reviewed changes

karencfv mentioned this pull request Jun 26, 2025

Reject MGS driven update for bootloader when using known un-updateable device #8457

Open

karencfv commented Jun 26, 2025

View reviewed changes

karencfv added 6 commits June 26, 2025 20:31

address comments

e06418e

clean up

b93303f

change conditions for determining whether we wait for an ongoing update

0fbdcf1

Update wait for ongoing update logic

609122c

Remove ticker

0d40a73

clean up

92e97e3

karencfv commented Jun 27, 2025

View reviewed changes

Missed one

69a8a4f

davepacheco assigned karencfv Jul 1, 2025

davepacheco reviewed Jul 2, 2025

View reviewed changes

jgallagher reviewed Jul 3, 2025

View reviewed changes

davepacheco mentioned this pull request Jul 3, 2025

extra, noisy blueprints during SP update #8483

Closed

karencfv added 5 commits July 8, 2025 13:25

merge main

2fd4e71

fix after merge

99bd5da

address comments

dc9e787

simplify checks

0e76cce

address comments

486b54b

karencfv requested review from davepacheco and jgallagher July 8, 2025 05:36

davepacheco added this to the 16 milestone Jul 8, 2025

jgallagher approved these changes Jul 9, 2025

View reviewed changes

address comments

fec4545

karencfv enabled auto-merge (squash) July 9, 2025 22:12

karencfv merged commit 3b3931b into oxidecomputer:main Jul 9, 2025
17 checks passed

karencfv deleted the rot-bootloader-pre-and-post-actions branch July 10, 2025 02:25

		// TODO-K: Again, we're resetting the ROT twice here, what happens
		// if an RoT update is happening at the same time?

		#[error("invalid RoT bootloader image: {error:?}")]
		RotBootloaderImageError { error: RotImageError },

	// - we are waiting for an ongoing update, or
	// - we should wait a bit because an update may be in-progress, or

	"pending MGS update details within ReconfiguratorSpUpdater \
	"pending MGS update details within ReconfiguratorRotBootloaderUpdater \

		// Before setting stage0 to the new version we want to ensure
		// the image is good and we're not going to brick the device.

Uh oh!

[reconfigurator] Pre-checks and post_update actions for RoT bootloader update #8325

[reconfigurator] Pre-checks and post_update actions for RoT bootloader update #8325

Uh oh!

Conversation

karencfv commented Jun 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

karencfv Jun 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

karencfv Jun 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

karencfv commented Jun 23, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

karencfv left a comment

Choose a reason for hiding this comment

Uh oh!

karencfv commented Jun 12, 2025 •

edited

Loading

karencfv Jun 18, 2025 •

edited

Loading

karencfv Jun 22, 2025 •

edited

Loading