Set NVMe IO timeouts according to AWS recommendations #2820

markusboehme · 2023-02-22T13:09:02Z

Issue number: Closes #2758

Description of changes: Amazon EC2 instances based on the Nitro hypervisor expose EBS volumes via NVMe. That does not mean that the EBS volumes behave like a directly attached disk in every way: Delayed responses to IO requests may happen due to reasons outside of the controller's influence. Hence, the documentation recommends disabling IO request timeouts for EBS volumes attached via NVMe.

On Linux, the IO request timeout cannot be disabled completely. Setting it to the maximum value of an unsigned 32 bit integer gets us close enough: Since the timeout is expected to be given in milliseconds, requests will time out after a little more than 49 days have passed.

Note that this configures timeouts via the block layer. At the time of writing, the documentation recommends setting it as a module parameter for the NVMe driver (where the expected unit is seconds, not milliseconds). However, since the switch to blk-mq this timeout value goes unused for regular IO requests during normal operation.

Testing done: I built the aws-k8s-1.25 variant for x86_64 and launched a c5d.large EC2 instance using it. Alongside its two EBS volumes, this instance type also offers a directly attached NVMe disk (ephemeral storage). I verified that the increased IO timeout only applies to EBS volumes:

[ec2-user@admin]$ sudo -i
[root@admin]# yum install -y nvme-cli
[...]
[root@admin]# for blkdev in /sys/block/nvme*; do grep -H . "${blkdev}/queue/io_timeout"; nvme id-ctrl "/.bottlerocket/rootfs/dev/${blkdev##*/}" | grep -E '^mn +'; echo; done
/sys/block/nvme0n1/queue/io_timeout:4294967276
mn        : Amazon Elastic Block Store              

/sys/block/nvme1n1/queue/io_timeout:4294967276
mn        : Amazon Elastic Block Store              

/sys/block/nvme2n1/queue/io_timeout:30000
mn        : Amazon EC2 NVMe Instance Storage

Eagle-eyed reviewers will note that the timeout for EBS volumes differs slightly from the one specified in the udev rules. This is due to the timeout value being clamped to that of a signed 32 bit integer before being converted into a unit based on timer ticks, and then being treated as the longest possible timeout in that new unit.

Reviewer notes:

The udev rules file is included in all variant builds. Since the rule is matching on the disk model name, it is a no-op in environments other than EC2.
The udev rules file is packaged as part of the ghostdog RPM. Strictly speaking, this is not a perfect match given ghostdog concerns itself with ephemeral storage only. I didn't find a package matching better than that and creating a dedicated one seemed overblown to me.

Terms of contribution:

By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.

Amazon EC2 instances based on the Nitro hypervisor expose EBS volumes via NVMe. That does not mean that the EBS volumes behave like a directly attached disk in every way: Delayed responses to IO requests may happen due to reasons outside of the controller's influence. Hence, the documentation recommends disabling IO request timeouts for EBS volumes attached via NVMe. [1] On Linux, the IO request timeout cannot be disabled completely. Setting it to the maximum value of an unsigned 32 bit integer gets us close enough: Since the timeout is expected to be given in milliseconds, requests will time out after a little more than 49 days have passed. Note that this configures timeouts via the block layer. At the time of writing, the documentation recommends setting it as a module parameter for the NVMe driver (where the expected unit is seconds, not milliseconds). However, since the switch to blk-mq this timeout value goes unused for regular IO requests during normal operation. [1]: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/nvme-ebs-volumes.html#timeout-nvme-ebs-volumes Signed-off-by: Markus Boehme <[email protected]>

stmcginnis

Testing coverage looks good, and change makes sense. Thanks for the extensive PR description!

zmrow

Nice! 🍔

markusboehme requested review from bcressey, foersleo and etungsten February 22, 2023 13:09

stmcginnis approved these changes Feb 22, 2023

View reviewed changes

etungsten approved these changes Feb 22, 2023

View reviewed changes

zmrow approved these changes Feb 22, 2023

View reviewed changes

foersleo approved these changes Feb 22, 2023

View reviewed changes

markusboehme merged commit 09d7cc1 into bottlerocket-os:develop Feb 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set NVMe IO timeouts according to AWS recommendations #2820

Set NVMe IO timeouts according to AWS recommendations #2820

markusboehme commented Feb 22, 2023

stmcginnis left a comment

zmrow left a comment

Set NVMe IO timeouts according to AWS recommendations #2820

Set NVMe IO timeouts according to AWS recommendations #2820

Conversation

markusboehme commented Feb 22, 2023

stmcginnis left a comment

Choose a reason for hiding this comment

zmrow left a comment

Choose a reason for hiding this comment