Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Block] Add option to configure block device flush #2447

Merged
merged 9 commits into from
Feb 12, 2021

Conversation

georgepisaltu
Copy link
Contributor

@georgepisaltu georgepisaltu commented Feb 8, 2021

Reason for This PR

#2172

Description of Changes

This commit introduces the notion of cache type to the virtio block device implementation. This allows users to choose from two cache types:

  • unsafe: this option will advertise the flush virtio capability to the guest block driver, but will acknowledge and ignore incoming flush requests;
  • writethrough: this option will advertise the flush virtio capability to the guest block driver and will perform a fsync syscall on incoming flush requests;

This functionality is exposed to users through an optional ignore_flush field in the block configuration API request. When absent, this field defaults to true, ignoring incoming flush requests from the guest driver and keeping the behavior of the block device the same as before this change. Setting it to false will make the device call fsync on the backing file when processing flush requests.

  • This functionality can be added in rust-vmm.

License Acceptance

By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license.

PR Checklist

[Author TODO: Meet these criteria.]
[Reviewer TODO: Verify that these criteria are met. Request changes if not]

  • All commits in this PR are signed (git commit -s).
  • The reason for this PR is clearly provided (issue no. or explanation).
  • The description of changes is clear and encompassing.
  • Any required documentation changes (code and docs) are included in this PR.
  • Any newly added unsafe code is properly documented.
  • Any API changes are reflected in firecracker/swagger.yaml.
  • Any user-facing changes are mentioned in CHANGELOG.md.
  • All added/changed functionality is tested.

src/devices/src/virtio/block/device.rs Outdated Show resolved Hide resolved
src/devices/src/virtio/block/device.rs Outdated Show resolved Hide resolved
src/devices/src/virtio/block/persist.rs Outdated Show resolved Hide resolved
src/api_server/swagger/firecracker.yaml Outdated Show resolved Hide resolved
@alindima
Copy link
Contributor

This PR also partially fixes: #2207.
In order to fully fix it, we should also tackle in this PR this related issue: #2188

The block device caching strategy could also influence the decision of calling fsync on DiskProperties::drop, so that we make sure that all data has been flushed before closing the backing file, WDYT @acatangiu @georgepisaltu ?

@acatangiu
Copy link
Contributor

This PR also partially fixes: #2207.
In order to fully fix it, we should also tackle in this PR this related issue: #2188

The block device caching strategy could also influence the decision of calling fsync on DiskProperties::drop, so that we make sure that all data has been flushed before closing the backing file, WDYT @acatangiu @georgepisaltu ?

@2207 is fully fixed by this PR in my view. We have seen no customer usecase for O_SYNC/O_DSYNC so we can update the issue and close it as fixed by this PR.

Indeed, for completeness, we should add the fsync on DiskProperties::drop for disks where configured caching is writeback.

@alindima
Copy link
Contributor

This PR also partially fixes: #2207.
In order to fully fix it, we should also tackle in this PR this related issue: #2188
The block device caching strategy could also influence the decision of calling fsync on DiskProperties::drop, so that we make sure that all data has been flushed before closing the backing file, WDYT @acatangiu @georgepisaltu ?

@2207 is fully fixed by this PR in my view. We have seen no customer usecase for O_SYNC/O_DSYNC so we can update the issue and close it as fixed by this PR.

Indeed, for completeness, we should add the fsync on DiskProperties::drop for disks where configured caching is writeback.

Yes, I agree. I meant that we can close #2207 if we also add the drop implementation, not necessarily O_SYNC support

@georgepisaltu
Copy link
Contributor Author

Added a fsync call on DiskProperties::drop.

Copy link
Contributor

@acatangiu acatangiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good, only nits.

PR is missing documentation. The newly available caching strategies need to be included in block device documentation.

src/vmm/src/version_map.rs Outdated Show resolved Hide resolved
src/devices/src/virtio/block/persist.rs Show resolved Hide resolved
src/devices/src/virtio/block/persist.rs Outdated Show resolved Hide resolved
src/devices/src/virtio/block/persist.rs Show resolved Hide resolved
src/devices/src/virtio/block/persist.rs Outdated Show resolved Hide resolved
tests/integration_tests/build/test_coverage.py Outdated Show resolved Hide resolved
src/devices/src/virtio/block/persist.rs Outdated Show resolved Hide resolved
src/devices/src/virtio/block/persist.rs Show resolved Hide resolved
src/vmm/src/version_map.rs Outdated Show resolved Hide resolved
Copy link
Contributor Author

@georgepisaltu georgepisaltu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also added the documentation in docs/api_requests/ as the block device doesn't have its own doc. I will create the documentation for the block device and move the cache documentation in a subsequent PR to not block this one as they are not related.

src/devices/src/virtio/block/persist.rs Show resolved Hide resolved
src/devices/src/virtio/block/persist.rs Outdated Show resolved Hide resolved
src/devices/src/virtio/block/persist.rs Outdated Show resolved Hide resolved
src/vmm/src/version_map.rs Outdated Show resolved Hide resolved
tests/integration_tests/build/test_coverage.py Outdated Show resolved Hide resolved
METRICS.block.flush_count.inc();
}
CacheType::Unsafe => {
// This is a noop.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should also do a file.flush() here. This is part of the Write trait implementation and although this is currently a noop, this unsafe option should ensure that the data that could be buffered on the writer is out to the OS.

Otherwise, if the rust File implementation decides to have some extra buffering, this data could not even reach the OS via write()s if we don't flush it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no need to include file.flush() because std::fs::File is not buffered and that is by design. It works the same way an FD would work in C when using open, read and write.

For buffered IO in C, one would use fopen, fread and fwrite. In Rust, this is achieved by using BufReader and BufWriter.

This means that all data written with a file.write() will reach the cache as long as the file is not opened with O_DIRECT or O_DSYNC (which is not the case in Firecracker), and this is exactly the reason the Rust file.flush() is a noop. You can check this and the non-buffered behavior of file.flush() using strace.

Moreover, I don't think the argument that the File implementation might change supports adding file.flush(). If anything, it prohibits it. Right now, file.flush() is a noop, which means the behviour is the same as not having it. If it does anything in the future and we add it now in the implementation, it would create another bug.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My arguments were also rooted in the fact that for CacheType::WriteBack, we do a file.flush() before calling fsync(). why is that?

Even if this a noop, this just follows the semantic of the flush method of the Write trait

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is just future proofing. I thought about the semantics of File::flush possibly changing in the future and I wanted to be on the safe side. The rationale was that it has no performance cost now, as it is a noop, and could save us some trouble in the future in the unlikely event that flush semantics change.

I would agree to removing it, I have no strong preference here. Let me know what you think.

src/devices/src/virtio/block/device.rs Outdated Show resolved Hide resolved
src/vmm/src/builder.rs Outdated Show resolved Hide resolved
src/vmm/src/vmm_config/drive.rs Outdated Show resolved Hide resolved
src/vmm/src/rpc_interface.rs Show resolved Hide resolved
src/api_server/src/request/drive.rs Show resolved Hide resolved
tests/integration_tests/functional/test_drives.py Outdated Show resolved Hide resolved
activating the device, the guest driver will be able to send flush requests
to the device, but the device will just acknowledge the request without
actually performing any flushing on the host side. The data which was not
yet committed to disk will continue to reside in the host page cache.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe note here that the data may is only committed to the physical disk when the host driver decides to

Copy link
Contributor Author

@georgepisaltu georgepisaltu Feb 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think an explanation of how caching works on Linux systems is not appropriate here, especially since whatever we write in this section highly depends on how the host OS and FS are set up. The fact that IO is cached is the default behavior in Linux systems and I don't think we need to explain this here.

We should, however, touch on this in the general block device documentation and provide our recommendations regarding host OS and FS setup. In that section we should also mention what host configuration we are running our tests on, and that should be enough.

Note: the block device doesn't have its own documentation and I will add it and move the cache documentation that file in a subsequent PR.

This commit introduces the notion of cache type to the virtio block
device implementation. This allows users to choose from two cache
types:

- `unsafe`: this option will advertise the flush virtio capability
to the guest block driver, but will acknowledge and ignore incoming
flush requests;
- `writethrough`: this option will advertise the flush virtio
capability to the guest block driver and will perform a `fsync`
syscall on incoming flush requests;

Signed-off-by: George Pisaltu <[email protected]>
@@ -26,6 +28,7 @@ lazy_static! {
let mut mapping = HashMap::new();
mapping.insert(String::from("0.23.0"), 1);
mapping.insert(String::from("0.24.0"), 2);
mapping.insert(String::from("0.25.0"), 3);
Copy link
Contributor

@sandreim sandreim Feb 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also publish some "temporary" binary artifacts for running snapshot integration tests across versions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once I get all approves, I will build and publish the artifacts so they are built from the same source as what is going to get merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants