Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Block] Add option to configure block device flush #2447

Merged
merged 9 commits into from
Feb 12, 2021
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@

- Added devtool build `--ssh-keys` flag to support fetching from private
git repositories.
- Added option to configure block device flush.

### Changed

Expand Down
70 changes: 70 additions & 0 deletions docs/api_requests/block-caching.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
# Block device caching strategies

Firecracker offers the possiblity of choosing the block device caching
strategy. Caching strategy affects the path data written from inside the
microVM takes to the host persistent storage.

## How it works

When installing a block device through a PUT /drives API call, users can choose
the caching strategy by inserting a `cache_type` field in the JSON body of the
request. The available cache types are:

- `Unsafe`
- `Writeback`

### Unsafe mode (default)

When configuring the block caching strategy to `Unsafe`, the device will
advertise the VirtIO `flush` feature to the guest driver. If negotiated when
activating the device, the guest driver will be able to send flush requests
to the device, but the device will just acknowledge the request without
actually performing any flushing on the host side. The data which was not
yet committed to disk will continue to reside in the host page cache.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe note here that the data may is only committed to the physical disk when the host driver decides to

Copy link
Contributor Author

@georgepisaltu georgepisaltu Feb 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think an explanation of how caching works on Linux systems is not appropriate here, especially since whatever we write in this section highly depends on how the host OS and FS are set up. The fact that IO is cached is the default behavior in Linux systems and I don't think we need to explain this here.

We should, however, touch on this in the general block device documentation and provide our recommendations regarding host OS and FS setup. In that section we should also mention what host configuration we are running our tests on, and that should be enough.

Note: the block device doesn't have its own documentation and I will add it and move the cache documentation that file in a subsequent PR.


### Writeback mode

When configuring the block caching strategy to `Writeback`, the device will
advertise the VirtIO `flush` feature to the guest driver. If negotiated when
activating the device, the guest driver will be able to send flush requests
to the device. When the device executes a flush request, it will perform an
`fsync` syscall on the backing block file, committing all data in the host
page cache to disk.

## Supported use cases

The caching strategy should be used in order to make a trade-off:

- `Unsafe`
- enhances performance as fewer syscalls and IO operations are performed when
running workloads
- sacrifices data integrity in situations where the host simply loses the
contents of the page cache without committing them to the backing storage
(such as a power outage)
- recommended for use cases with ephemeral storage, such as serverless
environments
- `Writeback`
- ensures that once a flush request was acknowledged by the host, the data
is committed to the backing storage
- sacrifices performance, from boot time increases to greater
emulation-related latencies when running workloads
- recommended for use cases with low power environments, such as embedded
environments

## How to configure it

Example sequence that configures a block device with a caching strategy:

```bash
curl --unix-socket ${socket} -i \
-X PUT "http://localhost/drives/dummy" \
-H "accept: application/json" \
-H "Content-Type: application/json" \
-d "{
\"drive_id\": \"dummy\",
\"path_on_host\": \"${drive_path}\",
\"is_root_device\": false,
\"is_read_only\": false,
\"cache_type\": \"Writeback\"
}"
```
1 change: 1 addition & 0 deletions src/api_server/src/parsed_request.rs
Original file line number Diff line number Diff line change
Expand Up @@ -643,6 +643,7 @@ pub(crate) mod tests {
\"is_root_device\": true, \
\"partuuid\": \"string\", \
\"is_read_only\": true, \
\"cache_type\": \"Unsafe\", \
\"rate_limiter\": { \
\"bandwidth\": { \
\"size\": 0, \
Expand Down
19 changes: 15 additions & 4 deletions src/api_server/src/request/drive.rs
Original file line number Diff line number Diff line change
Expand Up @@ -216,20 +216,33 @@ mod tests {
assert!(parse_put_drive(&Body::new("invalid_payload"), None).is_err());
assert!(parse_put_drive(&Body::new("invalid_payload"), Some(&"id")).is_err());

// PATCH with invalid fields.
// PUT with invalid fields.
let body = r#"{
"drive_id": "bar",
"is_read_only": false
}"#;
assert!(parse_put_drive(&Body::new(body), Some(&"2")).is_err());

// PATCH with invalid types on fields. Adding a drive_id as number instead of string.
// PUT with missing all optional fields.
let body = r#"{
"drive_id": "1000",
"path_on_host": "dummy",
"is_root_device": true,
"is_read_only": true
}"#;
assert!(parse_put_drive(&Body::new(body), Some(&"1000")).is_ok());

// PUT with invalid types on fields. Adding a drive_id as number instead of string.
assert!(parse_put_drive(&Body::new(body), Some(&"foo")).is_err());

// PUT with the complete configuration.
let body = r#"{
"drive_id": "1000",
"path_on_host": "dummy",
"is_root_device": true,
"partuuid": "string",
"is_read_only": true,
"cache_type": "Unsafe",
georgepisaltu marked this conversation as resolved.
Show resolved Hide resolved
"rate_limiter": {
"bandwidth": {
"size": 0,
Expand All @@ -244,7 +257,5 @@ mod tests {
}
}"#;
assert!(parse_put_drive(&Body::new(body), Some(&"1000")).is_ok());

assert!(parse_put_drive(&Body::new(body), Some(&"foo")).is_err());
}
}
5 changes: 5 additions & 0 deletions src/api_server/swagger/firecracker.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -753,6 +753,11 @@ definitions:
properties:
drive_id:
type: string
cache_type:
type: string
description:
Represents the caching strategy for the block device.
default: "Unsafe"
is_read_only:
type: boolean
is_root_device:
Expand Down
75 changes: 69 additions & 6 deletions src/devices/src/virtio/block/device.rs
Original file line number Diff line number Diff line change
Expand Up @@ -30,16 +30,41 @@ use super::{
use crate::virtio::VIRTIO_MMIO_INT_CONFIG;
use crate::Error as DeviceError;

use serde::{Deserialize, Serialize};

/// Configuration options for disk caching.
#[derive(Clone, Copy, Debug, Deserialize, PartialEq, Serialize)]
pub enum CacheType {
/// Flushing mechanic will be advertised to the guest driver, but
/// the operation will be a noop.
Unsafe,
/// Flushing mechanic will be advertised to the guest driver and
/// flush requests coming from the guest will be performed using
/// `fsync`.
Writeback,
}

impl Default for CacheType {
fn default() -> CacheType {
CacheType::Unsafe
}
}

/// Helper object for setting up all `Block` fields derived from its backing file.
pub(crate) struct DiskProperties {
cache_type: CacheType,
file_path: String,
file: File,
nsectors: u64,
image_id: Vec<u8>,
}

impl DiskProperties {
pub fn new(disk_image_path: String, is_disk_read_only: bool) -> io::Result<Self> {
pub fn new(
disk_image_path: String,
is_disk_read_only: bool,
cache_type: CacheType,
) -> io::Result<Self> {
let mut disk_image = OpenOptions::new()
.read(true)
.write(!is_disk_read_only)
Expand All @@ -57,6 +82,7 @@ impl DiskProperties {
}

Ok(Self {
cache_type,
nsectors: disk_size >> SECTOR_SHIFT,
image_id: Self::build_disk_image_id(&disk_image),
file_path: disk_image_path,
Expand Down Expand Up @@ -121,6 +147,31 @@ impl DiskProperties {
}
config
}

pub fn cache_type(&self) -> CacheType {
self.cache_type
}
}

impl Drop for DiskProperties {
fn drop(&mut self) {
match self.cache_type {
CacheType::Writeback => {
// flush() first to force any cached data out.
if self.file.flush().is_err() {
error!("Failed to flush block data on drop.");
}
// Sync data out to physical media on host.
if self.file.sync_all().is_err() {
error!("Failed to sync block data on drop.")
}
METRICS.block.flush_count.inc();
}
CacheType::Unsafe => {
// This is a noop.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should also do a file.flush() here. This is part of the Write trait implementation and although this is currently a noop, this unsafe option should ensure that the data that could be buffered on the writer is out to the OS.

Otherwise, if the rust File implementation decides to have some extra buffering, this data could not even reach the OS via write()s if we don't flush it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no need to include file.flush() because std::fs::File is not buffered and that is by design. It works the same way an FD would work in C when using open, read and write.

For buffered IO in C, one would use fopen, fread and fwrite. In Rust, this is achieved by using BufReader and BufWriter.

This means that all data written with a file.write() will reach the cache as long as the file is not opened with O_DIRECT or O_DSYNC (which is not the case in Firecracker), and this is exactly the reason the Rust file.flush() is a noop. You can check this and the non-buffered behavior of file.flush() using strace.

Moreover, I don't think the argument that the File implementation might change supports adding file.flush(). If anything, it prohibits it. Right now, file.flush() is a noop, which means the behviour is the same as not having it. If it does anything in the future and we add it now in the implementation, it would create another bug.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My arguments were also rooted in the fact that for CacheType::WriteBack, we do a file.flush() before calling fsync(). why is that?

Even if this a noop, this just follows the semantic of the flush method of the Write trait

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is just future proofing. I thought about the semantics of File::flush possibly changing in the future and I wanted to be on the safe side. The rationale was that it has no performance cost now, as it is a noop, and could save us some trouble in the future in the unlikely event that flush semantics change.

I would agree to removing it, I have no strong preference here. Let me know what you think.

}
};
}
}

/// Virtio device for exposing block level read/write operations on a host file.
Expand Down Expand Up @@ -155,12 +206,13 @@ impl Block {
pub fn new(
id: String,
partuuid: Option<String>,
cache_type: CacheType,
disk_image_path: String,
is_disk_read_only: bool,
is_disk_root: bool,
rate_limiter: RateLimiter,
) -> io::Result<Block> {
let disk_properties = DiskProperties::new(disk_image_path, is_disk_read_only)?;
let disk_properties = DiskProperties::new(disk_image_path, is_disk_read_only, cache_type)?;

let mut avail_features = (1u64 << VIRTIO_F_VERSION_1) | (1u64 << VIRTIO_BLK_F_FLUSH);

Expand Down Expand Up @@ -343,7 +395,8 @@ impl Block {

/// Update the backing file and the config space of the block device.
pub fn update_disk_image(&mut self, disk_image_path: String) -> io::Result<()> {
let disk_properties = DiskProperties::new(disk_image_path, self.is_read_only())?;
let disk_properties =
DiskProperties::new(disk_image_path, self.is_read_only(), self.cache_type())?;
self.disk = disk_properties;
self.config_space = self.disk.virtio_block_config_space();

Expand Down Expand Up @@ -380,6 +433,10 @@ impl Block {
pub fn is_root_device(&self) -> bool {
self.root_device
}

pub fn cache_type(&self) -> CacheType {
self.disk.cache_type()
}
}

impl VirtioDevice for Block {
Expand Down Expand Up @@ -491,8 +548,12 @@ pub(crate) mod tests {
let size = SECTOR_SIZE * num_sectors;
f.as_file().set_len(size).unwrap();

let disk_properties =
DiskProperties::new(String::from(f.as_path().to_str().unwrap()), true).unwrap();
let disk_properties = DiskProperties::new(
String::from(f.as_path().to_str().unwrap()),
true,
CacheType::Unsafe,
)
.unwrap();

assert_eq!(size, SECTOR_SIZE * num_sectors);
assert_eq!(disk_properties.nsectors, num_sectors);
Expand All @@ -504,7 +565,9 @@ pub(crate) mod tests {
// Testing `backing_file.virtio_block_disk_image_id()` implies
// duplicating that logic in tests, so skipping it.

assert!(DiskProperties::new("invalid-disk-path".to_string(), true).is_err());
assert!(
DiskProperties::new("invalid-disk-path".to_string(), true, CacheType::Unsafe).is_err()
);
}

#[test]
Expand Down
2 changes: 1 addition & 1 deletion src/devices/src/virtio/block/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ pub mod persist;
pub mod request;
pub mod test_utils;

pub use self::device::Block;
pub use self::device::{Block, CacheType};
pub use self::event_handler::*;
pub use self::request::*;

Expand Down
Loading