Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Account for Optiga throttling delay in PIN countdown #4000

Merged
merged 6 commits into from
Jul 9, 2024

Conversation

andrewkozlik
Copy link
Contributor

@andrewkozlik andrewkozlik commented Jul 4, 2024

If Optiga's security event counter is set higher than 128, the chip introduces a throttling delay which slows down PIN operations. This makes the progress bar freeze at certain points. In case of PIN change the overall delay can add as much as 65 seconds to the process.

Since we are now measuring the actual elapsed time instead of making estimates of the elapsed time, this PR could mess with the UI tests. The progress bar could be captured at different moments by the UI tests. I am not familiar with how the UI tests make snapshots so not sure whether this would actually be an issue. I tried to avoid this problem by returning 1 ms in ui_estimate_time() for the emulator build.

Further work:

  1. Optiga PIN operations are also executed at startup on wiped devices and on devices that do not have a PIN set up. (An empty PIN is still a PIN. It just unlocks automatically.) However, the PIN progress dialog does not seem to show up in these instances, even though I think it used to in the past. As a result the device appears frozen and it does not connect to Suite until these PIN operations complete. We should ensure that the PIN dialog comes up in these cases, otherwise it's pretty frustrating for the user. Done in 95c362c.
  2. If the exponential backoff is activated due to incorrect PIN attempts, it will actually cause the SEC to drop, so we overestimate the total time. It probably doesn't make a significant difference in relative terms.
  3. If the exponential backoff is activated, then we could reduce the exponential backoff time by Optiga's throttling delay, but only if we take care of point 2.

@andrewkozlik andrewkozlik self-assigned this Jul 4, 2024
@andrewkozlik andrewkozlik requested a review from prusnak as a code owner July 4, 2024 16:48
@andrewkozlik andrewkozlik requested review from TychoVrahe and removed request for prusnak July 4, 2024 16:48
Copy link

github-actions bot commented Jul 4, 2024

core UI changes device test click test persistence test
T2T1 Model T test(screens) main(screens) test(screens) main(screens) test(screens) main(screens)
T2B1 Safe 3 test(screens) main(screens) test(screens) main(screens) 2724
T3T1 test(screens) main(screens) test(screens) main(screens) test(screens) main(screens)
All main(screens)

@TychoVrahe
Copy link
Contributor

The progress bar snapshots in tests are taken when the text message changes, i.e. when changing from "2 seconds remaining" to "1 second remaining" regardless of the actuall progress.

As for point 1), progress bars when booting the device are explicitly disabled (per product guys request to avoid visual disturbance). We can re-enable it if SEC is elevated, but we will need to expose reading SEC to python.

@andrewkozlik
Copy link
Contributor Author

We can re-enable it if SEC is elevated, but we will need to expose reading SEC to python.

That's not a problem. I already exposed it, so that it can be shown in Features.

@andrewkozlik
Copy link
Contributor Author

I observed something very strange when making measurements. It seems that hal_delay(100) which we call in the exponential backoff actually takes about 116 ms on Trezor T. I didn't expect it to be 100 ms exactly, but a 16% increase seems a lot. This implies that we are underestimating the total progress time, so I will probably need to compensate for that somehow. What is even more strange is that on TS5 the actual time of hal_delay(100) grows progressively from 106 to 111 ms over a 30 second interval. No clue what that is about. Might be some error in my measurements, e.g. due to debuglink being active.

image

@andrewkozlik
Copy link
Contributor Author

In d5a2e78 I fixed the extra long hal_delay() by replacing

for (uint32_t i = 0; i < 10 * wait; i++)

with

while (hal_ticks_ms() - begin < wait_ms)

@andrewkozlik andrewkozlik requested a review from matejcik as a code owner July 6, 2024 17:18
@andrewkozlik andrewkozlik added the translations Put this label on a PR to run tests in all languages label Jul 6, 2024
@andrewkozlik andrewkozlik removed the request for review from matejcik July 6, 2024 17:19
@andrewkozlik
Copy link
Contributor Author

Point 1 done in 95c362c.

I also added a new message DebugLinkOptigaSetSecMax, which lets us set Optiga's SEC counter to maximum value by attempting an ECDH operation with an invalid curve point. Pretty handy for testing. Use trezorctl debug optiga-set-sec-max.

@TychoVrahe
Copy link
Contributor

Seems to me that you are measuring the delay time including the rendering time, that is with the ui_progress call. Rendering the progress bar isn't trivial so it the additional 16ms on TT is about what i would expect. On TS5 the algorithm (along with how the whole rendering works) is changed, the time is no longer constant but actually depends on how much of the progress is done (~ how large the circle sector is rendered). Hence the gradual increase.

ui_progress limits the progress rendering at most each 100ms, which is also the delay so it will render every time.

The progress calculation is done using real tick values that include the rendering time, so this shouldn't be an issue

core/embed/trezorhal/optiga/optiga.c Show resolved Hide resolved
core/src/boot.py Outdated Show resolved Hide resolved
storage/storage.c Outdated Show resolved Hide resolved
static void ui_total_init(uint32_t total_ms) {
ui_total = total_ms;
ui_rem = total_ms;
static uint32_t ui_estimate_time(storage_pin_op_t op) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like introducing the SystemCoreClock dependency here, and not happy about the EMULATOR ifdef either.

We could perhaps organize it this way:

  • use unix implementation of optiga_estimate_time too and put there the 500ms as mock value (or even take the operation into account)
  • introduce something like time_estimations.c/h in each trezorhal implementation, that would contain function estimate_pbkfd2(iterations) returning estimated time to complete on given HW. Probably use some mock value for emulator/unix there too.

Then we could have single implementation here in storage, with possibility to cleanly improve emulation over time.

(just an example, there might be better ways to do this, perhaps we don't need to restructure it now and do it when we have clearer idea)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use unix implementation of optiga_estimate_time too and put there the 500ms as mock value

That wouldn't work for T1 and TT, which do not use the unix Optiga implementation.

introduce something like time_estimations.c/h [...]

That would work and I agree it's more generic. It just seemed easier to introduce the SystemCoreClock dependency.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, the optiga_estimate_time would still be behind ifdef USE_OPTIGA, for TT and T1 the time would be just the pbkfd2 estimation.

Sure its easier like this, feel free to keep current implementation, we can refactor this later if it bothers us at some point

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The introduction of time_estimate.c will come in handy if we start using the hardware accelerated SHA-256 for the STM32U5.

Copy link

github-actions bot commented Jul 8, 2024

legacy UI changes device test(screens) main(screens)

@andrewkozlik
Copy link
Contributor Author

I added one more change to not lock freshly initialized storage. That way we don't have to unnecessarily unlock it, which otherwise doubles the initialization time of wiped storage, taking a great deal of time when Optiga's SEC is high.
40cf96a

@andrewkozlik andrewkozlik force-pushed the andrewkozlik/optiga-progress branch from 1b8d87f to 37caef2 Compare July 9, 2024 12:42
@andrewkozlik andrewkozlik force-pushed the andrewkozlik/optiga-progress branch from 37caef2 to b403aa3 Compare July 9, 2024 13:20
@andrewkozlik andrewkozlik merged commit 1016b0c into main Jul 9, 2024
114 of 119 checks passed
@andrewkozlik andrewkozlik deleted the andrewkozlik/optiga-progress branch July 9, 2024 14:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
translations Put this label on a PR to run tests in all languages
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

3 participants