Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Long boot-up time #302

Closed
szszszsz opened this issue Jun 26, 2023 · 13 comments · Fixed by #369
Closed

Long boot-up time #302

szszszsz opened this issue Jun 26, 2023 · 13 comments · Fixed by #369

Comments

@szszszsz
Copy link
Member

szszszsz commented Jun 26, 2023

Nitrokey 3 C NFC v1.5 takes long time to boot up. Example system log:

čen 24 18:31:38 spock kernel: usb 6-3.2: new full-speed USB device number 30 using xhci_hcd
čen 24 18:31:44 spock kernel: usb 6-3.2: New USB device found, idVendor=20a0, idProduct=42b2, bcdDevice= 1.05
čen 24 18:31:44 spock kernel: usb 6-3.2: New USB device strings: Mfr=1, Product=2, SerialNumber=0
čen 24 18:31:44 spock kernel: usb 6-3.2: Product: Nitrokey 3
čen 24 18:31:44 spock kernel: usb 6-3.2: Manufacturer: Nitrokey
čen 24 18:31:44 spock kernel: hid-generic 0003:20A0:42B2.0024: hiddev97,hidraw3: USB HID v1.11 Device [Nitrokey Nitrokey 3] on usb-0000:0f:00.3-3.2/input1
čen 24 18:31:44 spock kernel: cdc_acm 6-3.2:1.2: ttyACM0: USB ACM device

Here around 6 seconds is taken before detection in port, and enumeration finish.

@daringer
Copy link
Collaborator

Afaik, this is a usb-power-management issue.
I can reproduce this switching on power-management for the usb-hub for which the nk3 is connected to. If I switch off power-management the issue is gone. Using powertop to switch power-management on/off...

This explicitly seems to be an lpc55s issue as this does not reproduce for the nRF52 variant for me.

@hw0lff
Copy link

hw0lff commented Aug 13, 2023

Swtiching off power management for my usb hub does not fix it for me. Is there perhaps another workaround?

@daringer
Copy link
Collaborator

daringer commented Sep 3, 2023

Could you try to to switch off any usb-related power-saving mechanisms using powertop ? In fact, this is what I have done and then (wrongly?) assumed this has to be the usb-hub ...

@hw0lff
Copy link

hw0lff commented Sep 18, 2023

I just disabled any and all power saving options. (Assuming Bad means that it's disabled and there aren't more knobs to turn off power saving.)
image

@hw0lff
Copy link

hw0lff commented Oct 27, 2023

Is there a way I can help with this issue?

@daringer
Copy link
Collaborator

For me it also looks like the powertop approach might be missleading - or at least partly. On my desktop machine I tried various ports (e.g., different usb hubs) with different results:

  • usb2 connector -> no delay
  • usb3.2 connector (red) -> 4-5secs delay
  • usb3.2 connector (red) -> no delay
  • usb3.1(2?) connector (blue) -> no delay
  • usb3.1(2?) connector (blue) + usb-hub -> no delay
  • usb3.2 connector (red) + usb-hub -> 4-5secs delay (and one situation where I got 2 times new full-speed USB device number 10 using xhci_hcd before it actually worked, so ~10secs)

I've no idea (simply not done something like that yet) how to get more information out of the usb-subsystem to debug this, but my guess would be to first have a trace from a host we could compare against.

@hw0lff did you try different usb ports (internal usb hubs) and you get the same results?

@hw0lff
Copy link

hw0lff commented Oct 28, 2023

I just now noticed I didn't mention that I have a NK-3A not a 3C. I'm sorry I didn't include this info earlier.

I mostly have USB A 3.1 Ports and they behave the same: always a 5 second delay

I just tested the USB-C Thunderbolt 3 port on my machine with the NK-3A plugged into a USB-A -> USB-C adapter. There it's just a 2-3 second delay until I see output in the kernel log.
The other (, kinda broken) USB-C non-thunderbolt port has the same 5 second delay as the USB-A 3.1 ports.

@hw0lff
Copy link

hw0lff commented Oct 28, 2023

@daringer Just now I looked through sysfs and found this parameter:
/sys/module/usbcore/parameters/initial_descriptor_timeout

The default on my machine is 5000 (milliseconds).
I manually turned it down to 500 and up to 8000.
This definitely changes timespan between plugging in and having the NK recognised.
If I set it too 100 I get this error: usb 1-1: device descriptor read/64, error -71 in between, but the NK will be available too.

I don't think changing this manually is a fix, but for me it is a workaround until the underlying problem is properly solved.
The name of the parameter hints to place where we should investigate: usb device descriptor timeouts.

edit: formatting

@daringer
Copy link
Collaborator

I just now noticed I didn't mention that I have a NK-3A not a 3C. I'm sorry I didn't include this info earlier.

That's no problem, we see this happening on all lpc55s devices (means any NFC device) and we don't observe this on nRF52 based devices i.e., the NK3 Mini.

@daringer Just now I looked through sysfs and found this parameter:
/sys/module/usbcore/parameters/initial_descriptor_timeout

That's indeed interesting, my value is also 5000 - reducing this (and therefore the read-error) might imply that if Linux is asking more aggressively for the device descriptor it will succeed faster. This could mean that maybe the first sent descriptor is "lost" somehow, maybe even send too fast by the Nitrokey 3. That's for sure something to investigate more closely...
Thanks for the inputs!

@robin-nitrokey
Copy link
Member

robin-nitrokey commented Oct 30, 2023

This could mean that maybe the first sent descriptor is "lost" somehow, maybe even send too fast by the Nitrokey 3.

Or the other way round: We somehow lose the first descriptor request, so we don’t send a response while the kernel is still waiting for it during this timeout. AFAIR when I tried to debug this, the device was idle during the delay so that would make sense. Edit: Nevermind, if it sent out the reply it would also be idle.

@robin-nitrokey
Copy link
Member

I had another log at the logs from the init process: We actually do not receive the initial GET_DESCRIPTOR request. In fact, we do not receive any data at all until the timeout, i. e. the USB bus never returns PollResult::Data in UsbDevice::poll.

Assuming that this is a timing issue, there are two naive interpretations:

  1. We initialize the USB interface too late and miss the request. I tried to reorder the init operations, namely move the SPI and I2C configuration after the USB setup, and to add polls after every init step, but that did not help.
  2. We initialize the USB interface too early, as the USB poll task is only started after all init steps are complete. When I moved the initialization of the USB interface to the end, I got stuck in ctimer0 interrupts. The only operation with an explicit timeout that happens after the USB init (in the regular execution order) is EFS init, so I tried to poll the USB device during the timeout but that did not help either.

@robin-nitrokey
Copy link
Member

When I moved the initialization of the USB interface to the end, I got stuck in ctimer0 interrupts

Nevermind, I moved the USB init after the point where the pending timers were cancelled, so the timer was apparently activated again and spamming these interrupts. With this fixed, moving the USB init to the end actually solves the problem for me. Quick-and-dirty implementation: https://github.com/Nitrokey/nitrokey-3-firmware/tree/usb-init

@daringer
Copy link
Collaborator

I can confirm, for me this also solves the issue on an nk3cn.

robin-nitrokey added a commit that referenced this issue Oct 31, 2023
This patch splits the NFC and USB initialization and moves the USB
initialization to the end of the initialization process on the lpc55.
This makes sure that we can respond to all USB requests as the poll task
is started directly after the init process.

Previously, we initialized the USB interface at the beginning of the
process so that we could not respond to requests until the end of the
init process.  This could potentially mean that the device would lose
the initial GET DESCRIPTOR request sent by the Linux kernel, causing a
delay of typically 5 s after connecting the device.

Fixes: #302
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants