Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set NVMe IO timeouts according to AWS recommendations #2820

Conversation

markusboehme
Copy link
Member

Issue number: Closes #2758

Description of changes: Amazon EC2 instances based on the Nitro hypervisor expose EBS volumes via NVMe. That does not mean that the EBS volumes behave like a directly attached disk in every way: Delayed responses to IO requests may happen due to reasons outside of the controller's influence. Hence, the documentation recommends disabling IO request timeouts for EBS volumes attached via NVMe.

On Linux, the IO request timeout cannot be disabled completely. Setting it to the maximum value of an unsigned 32 bit integer gets us close enough: Since the timeout is expected to be given in milliseconds, requests will time out after a little more than 49 days have passed.

Note that this configures timeouts via the block layer. At the time of writing, the documentation recommends setting it as a module parameter for the NVMe driver (where the expected unit is seconds, not milliseconds). However, since the switch to blk-mq this timeout value goes unused for regular IO requests during normal operation.

Testing done: I built the aws-k8s-1.25 variant for x86_64 and launched a c5d.large EC2 instance using it. Alongside its two EBS volumes, this instance type also offers a directly attached NVMe disk (ephemeral storage). I verified that the increased IO timeout only applies to EBS volumes:

[ec2-user@admin]$ sudo -i
[root@admin]# yum install -y nvme-cli
[...]
[root@admin]# for blkdev in /sys/block/nvme*; do grep -H . "${blkdev}/queue/io_timeout"; nvme id-ctrl "/.bottlerocket/rootfs/dev/${blkdev##*/}" | grep -E '^mn +'; echo; done
/sys/block/nvme0n1/queue/io_timeout:4294967276
mn        : Amazon Elastic Block Store              

/sys/block/nvme1n1/queue/io_timeout:4294967276
mn        : Amazon Elastic Block Store              

/sys/block/nvme2n1/queue/io_timeout:30000
mn        : Amazon EC2 NVMe Instance Storage        

Eagle-eyed reviewers will note that the timeout for EBS volumes differs slightly from the one specified in the udev rules. This is due to the timeout value being clamped to that of a signed 32 bit integer before being converted into a unit based on timer ticks, and then being treated as the longest possible timeout in that new unit.

Reviewer notes:

  • The udev rules file is included in all variant builds. Since the rule is matching on the disk model name, it is a no-op in environments other than EC2.
  • The udev rules file is packaged as part of the ghostdog RPM. Strictly speaking, this is not a perfect match given ghostdog concerns itself with ephemeral storage only. I didn't find a package matching better than that and creating a dedicated one seemed overblown to me.

Terms of contribution:

By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.

Amazon EC2 instances based on the Nitro hypervisor expose EBS volumes
via NVMe. That does not mean that the EBS volumes behave like a directly
attached disk in every way: Delayed responses to IO requests may happen
due to reasons outside of the controller's influence. Hence, the
documentation recommends disabling IO request timeouts for EBS volumes
attached via NVMe. [1]

On Linux, the IO request timeout cannot be disabled completely. Setting
it to the maximum value of an unsigned 32 bit integer gets us close
enough: Since the timeout is expected to be given in milliseconds,
requests will time out after a little more than 49 days have passed.

Note that this configures timeouts via the block layer. At the time of
writing, the documentation recommends setting it as a module parameter
for the NVMe driver (where the expected unit is seconds, not
milliseconds). However, since the switch to blk-mq this timeout value
goes unused for regular IO requests during normal operation.

[1]: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/nvme-ebs-volumes.html#timeout-nvme-ebs-volumes

Signed-off-by: Markus Boehme <[email protected]>
Copy link
Contributor

@stmcginnis stmcginnis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing coverage looks good, and change makes sense. Thanks for the extensive PR description!

Copy link
Contributor

@zmrow zmrow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! 🍔

@markusboehme markusboehme merged commit 09d7cc1 into bottlerocket-os:develop Feb 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Set NVMe IO timeouts according to AWS recommendations
5 participants