Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rbd: Use assume_storage_prezeroed when formatting #4996

Open
wants to merge 1 commit into
base: devel
Choose a base branch
from

Conversation

black-dragon74
Copy link
Member

@black-dragon74 black-dragon74 commented Dec 3, 2024

Describe what this PR does

Instead of passing lazy_itable_init=1 and lazy_journal_init=1 to
mkfs.ext4, pass assume_storage_prezeroed=1 which is
stronger and allows the filesystem to skip inode table zeroing
completely instead of simply doing it lazily.

The support for this flag is checked by trying to format a fake
temporary image with mkfs.ext4 and checking its STDERR.

Closes: #4948

@black-dragon74 black-dragon74 added the DNM DO NOT MERGE label Dec 3, 2024
@mergify mergify bot added the component/rbd Issues related to RBD label Dec 3, 2024
@@ -101,7 +101,7 @@ var (
xfsHasReflink = xfsReflinkUnset

mkfsDefaultArgs = map[string][]string{
"ext4": {"-m0", "-Enodiscard,lazy_itable_init=1,lazy_journal_init=1"},
"ext4": {"-m0", "-Enodiscard,assume_storage_prezeroed=1"},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sure that you only do this when mkfs.ext4 supports the option, you'll need to add some form of detection, ideally with something like sync.Once.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assume_storage_prezeroed option became available in e2fsprogs 1.47.0 last year, so it should probably be "discovered" similar to xfsSupportsReflink().

My suggestion to do it similar to xfsSupportsReflink() isn't viable because mkfs.ext4 doesn't include extended options in its help output, but trying with assume_storage_prezeroed=1 and falling back to lazy_itable_init=1,lazy_journal_init=1 upon parsing out Bad option(s) specified: assume_storage_prezeroed error should work.

Copy link
Member Author

@black-dragon74 black-dragon74 Dec 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I and Madhu had a discussion on this. Since this flag is part of e2fsprogs >= 1.47 and we have control over which version of it we bundle in the container, do we really need this check?

Sidenote: Detection is possible by parsing the semver of e2fsprogs. It is standard flag so it must be there?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check for the version isn't always reliable in the face of downstream backports.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as @idryomov mentioned we cannot depend on the version check, as the base image includes the required version, we could say what is supported by this version but that could be a problem if someone wants to build/test custom images. let's depend on the error check and fall back that works for all the cases.

@black-dragon74 black-dragon74 marked this pull request as ready for review January 6, 2025 11:58
@black-dragon74 black-dragon74 removed the DNM DO NOT MERGE label Jan 6, 2025
@black-dragon74 black-dragon74 force-pushed the stor-prezeroed branch 2 times, most recently from 2699bc7 to 6b52f64 Compare January 6, 2025 12:25
@@ -67,6 +67,10 @@ const (
xfsReflinkNoSupport
xfsReflinkSupport

ext4PrezeroedUnset
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

restart counting again, use ext4PrezeroedUnset = itoa

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

go doesn't consider iota twice in the same const block :(

@@ -748,6 +756,7 @@ func (ns *NodeServer) NodePublishVolume(
return &csi.NodePublishVolumeResponse{}, nil
}

//nolint:cyclop,gocyclo // function is required to have multiple conditional checks
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is rather ugly. Maybe it is not required if you add a function like

args := ns.GetMKFSDefaultArgs(fsType)

a function like that can add the assume_storage_prezeroed=1 option if support is detected.

if existingFormat == "" && !staticVol && !readOnly && !isBlock {
args := mkfsDefaultArgs[fsType]
if fsType == "ext4" && ns.ext4SupportsPrezeroed() {
args = []string{"-m0", "-Enodiscard,assume_storage_prezeroed=1"}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this drops the default options from line 112, is that intentional?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, lazy_itable_init=1,lazy_journal_init=1 are not required when using assume_storage_prezeroed=1. Rest of the options (-m0, -Enodiscard) from line 112 are included.

return ext4HasPrezeroedSupport == ext4PrezeroedSupport
}

tempImage := "/tmp/prezeroed.img"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use a temporary filename, please

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


tempImage := "/tmp/prezeroed.img"
ctx := context.TODO()
_, _, err := util.ExecCommand(ctx, "dd", "if=/dev/zero", "of="+tempImage, "bs=1M", "count=1")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of executing a dd command, open the file, seek to 1mb and write a singled zero byte. This makes a sparse file, which should be much quicker and can easily be done with plain Go.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this insight :)

@black-dragon74 black-dragon74 force-pushed the stor-prezeroed branch 2 times, most recently from 3d4e46b to ed5b89a Compare January 8, 2025 09:33
Copy link
Member

@nixpanic nixpanic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, just a few nits and potential improvements

}

// createSpareFile makes `file` a sparse file of size `sizeMB`.
func createSparseFile(file *os.File, sizeMB int64) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not specific to the nodeserver, move it to the internal/util package please, and add a unit-test for it

out, err := diskMounter.Exec.Command(
"mkfs.ext4",
"-n",
"-Enodiscard,assume_storage_prezeroed=1",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is nice to have the options mentioned as a const somewhere, they are here, but also in getMkfsArgs().

Maybe keep the default map, and use ext4.prezeroed as fstype to get the options?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is nice to have the options mentioned as a const somewhere, they are here

I don't have an opinion about it being expressed as a const, but I think it should be a single option. This function is supposed to be checking whether ext4 supports "prezeroed", so it should be just that -- pass only assume_storage_prezeroed=1, nodiscard isn't relevant here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the nodiscard flag.

}
// ext4HasPrezeroed is set by ext4SupportsPrezeroed(), use the function when
// checking the support for assume_storage_prezeroed.
ext4HasPrezeroedSupport = ext4PrezeroedUnset
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand you copied the xfsHasReflink idea. But I think it is cleaner if the NodeServer struct has a field for these kind of detectable parameters. Have you thought about doing it that way?

Copy link
Member Author

@black-dragon74 black-dragon74 Jan 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I refactored both the XFS and ext4 detections to use struct fields. PTAL.

// "assume_storage_prezeroed" option. It does this by creating a temporary
// image file, attempting to format it with the ext4 filesystem using the
// "assume_storage_prezeroed" option, and checking the output for errors.
func (ns *NodeServer) ext4SupportsPrezeroed() bool {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about if we check if this is supported or not in the driver.go(if we are stating it as nodeplugin) before starting the node plugin and store it in the Nodeserver struct like its suggested here https://github.com/ceph/ceph-csi/pull/4996/files#r1910308642?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the result for this is cached, let's check it when we need it instead of doing it preemptively? This way also has the added benefit of keeping driver.go simple. WDYT?

Instead of passing lazy_itable_init=1 and lazy_journal_init=1 to
mkfs.ext4, pass assume_storage_prezeroed=1 which is
stronger and allows the filesystem to skip inode table zeroing
completely instead of simply doing it lazily.

The support for this flag is checked by trying to format a fake
temporary image with mkfs.ext4 and checking its STDERR.

Closes: ceph#4948

Signed-off-by: Niraj Yadav <[email protected]>
}

ctx := context.TODO()
tempImgFile, err := os.CreateTemp(os.TempDir(), "prezeroed.img")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although unlikely, it is possible that this runs more than once at the same time. This is eventually called while attaching+mounting the RBD-image.

It is possible that the file is created and then:

  • more than one thread passes here
  • one thread finishes the function and deletes the file
  • other threads continue to execute the mkfs.ext4 command, with the file missing
  • the other threads will fail with an unexpected error

The prezeroed.img filename should be unique, use os.CreateTemp(). Or you can use an approach that builds on sync.Once.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a suffix. Each filename will be different.

t.Run(tt.name, func(t *testing.T) {
t.Parallel()

file, err := os.CreateTemp("", "test-sparse-")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

place this in the t.TempDir() directory for automatic cleanup, no need to have to remove the file in a defer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/rbd Issues related to RBD
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Pass assume_storage_prezeroed option when formatting ext4 if possible
4 participants