Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rpm2img: stripe root filesystem #2863

Merged
merged 1 commit into from
Mar 9, 2023
Merged

Conversation

cbgbt
Copy link
Contributor

@cbgbt cbgbt commented Mar 7, 2023

Description of changes:

Aligning the root filesystem along 4M boundaries provides a subtle but
measureable performance improvement in Bottlerocket's boot process.

I had this idea while looking at our filesystem settings and reading the manpages for ext4, then found this article on the web, suggesting others have had this idea to improve SSD performance as well. I used this article for some guidance on how to select the stride and stripe_width.

The other change here is to disable lazy_itable_init. Enabled by default, this performs inode table initialization at boot in the background, speeding up calls to mkfs.ext4. Instead, we're opting to slow down our filesystem creation for added safety and speed at boot.

Some useful manpages snippets:

stride=stride-size
Configure the filesystem for a RAID array with stride-size filesystem blocks. This is the number of blocks read or written to disk before moving to the next disk, which is sometimes referred to as the chunk size. This mostly affects placement of filesystem metadata like bitmaps at mke2fs time to avoid placing them on a single disk, which can hurt performance. It may also be used by the block allocator.

stripe_width=stripe-width
Configure the filesystem for a RAID array with stripe-width filesystem blocks per stripe. This is typically stride-size * N, where N is the number of data-bearing disks in the RAID (e.g. for RAID 5 there is one parity disk, so N will be the number of disks in the array minus 1). This allows the block allocator to prevent read-modify-write of the parity in a RAID stripe if possible when the data is written.

lazy_itable_init[= <0 to disable, 1 to enable>]
If enabled and the uninit_bg feature is enabled, the inode table will not be fully initialized by mke2fs. This speeds up filesystem initialization noticeably, but it requires the kernel to finish initializing the filesystem in the background when the filesystem is first mounted. If the option value is omitted, it defaults to 1 to enable lazy inode table zeroing.

Testing done:
Experimentally, this results in a performance improvement to Bottlerocket (Welch's t-test with p value < 0.00000).

I also tested using a 512kb stripe size based on some suggestion of a 512kb stripe size in the EBS direct documentation but experimentation did not conclusively indicate an improvement.

(click me)

speedtest-diskmagic

Terms of contribution:

By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.

@cbgbt
Copy link
Contributor Author

cbgbt commented Mar 7, 2023

Leaving this in draft while I make the code changes more self-documenting and investigate mechanisms to apply them only to AWS variants.

@stmcginnis
Copy link
Contributor

Leaving this in draft while I make the code changes more self-documenting and investigate mechanisms to apply them only to AWS variants.

I wonder if this would be useful for metal variants as well. In most cases, this should result in a performance improvement since a typical block size would align on that boundary.

@cbgbt
Copy link
Contributor Author

cbgbt commented Mar 7, 2023

In most cases, this should result in a performance improvement since a typical block size would align on that boundary.

It's a good point, this is probably a net benefit on most random access devices.

The optimal alignment depends a lot on the underlying storage medium, but it would be nice to have data. Then again, data on metal would be a bit like boiling the ocean...

Aligning the root filesystem along 4MiB boundaries provides a subtle but
measureable performance improvement in Bottlerocket's boot process.
@cbgbt cbgbt marked this pull request as ready for review March 8, 2023 18:20
Copy link
Contributor

@etungsten etungsten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🪄 awesome work!

Copy link
Contributor

@arnaldo2792 arnaldo2792 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💨

Copy link
Contributor

@bcressey bcressey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🐎

@cbgbt cbgbt merged commit 5412ab8 into bottlerocket-os:develop Mar 9, 2023
@zmrow
Copy link
Contributor

zmrow commented Mar 9, 2023

Nice work!!

@cbgbt cbgbt deleted the diskmagic branch August 15, 2023 23:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants