Skip to content

Conversation

@germag
Copy link
Collaborator

@germag germag commented Nov 15, 2023

Summary of the PR

The vhost user protocol supports live migration, and during live migration, the vhost-user frontend needs to track the modifications the vhost-user backend makes to the memory mapped regions, marking the dirty pages in a log (i.e., a bitmap).

If the backend has the VHOST_USER_PROTOCOL_F_LOG_SHMFD protocol feature, it will receive the VHOST_USER_SET_LOG_BASE message with a file descriptor of the dirty-pages log memory region. This log covers all known guest addresses, and must be manipulated atomically.

For further info please see
https://qemu-project.gitlab.io/qemu/interop/vhost-user.html#migration

This PR adds support for creating an atomic-memory-mapped bitmap, and being able to replace it in runtime. The vhost user protocol does not specify whether the previous bitmap is still active after replying to the VHOST_USER_SET_LOG_BASE message, so we use an RwLock to be sure that the in-flight requests are using the new bitmap after the message reply.

For further info please see
https://qemu-project.gitlab.io/qemu/interop/vhost-user.html#migration

Note to reviewers

This code is part of the effort to support live migration in virtiofsd and is currently being used in a version of virtiofsd that supports migration: https://gitlab.com/virtiofsd-live-migration

@germag germag marked this pull request as draft November 15, 2023 11:28
@germag
Copy link
Collaborator Author

germag commented Nov 15, 2023

/cc @XanClic

@germag germag force-pushed the vhost-add-bitmap-support branch from 05c4efa to c85634b Compare February 8, 2024 19:03
@germag germag marked this pull request as ready for review February 8, 2024 19:05
@germag germag requested a review from Ablu as a code owner February 8, 2024 19:05
@germag germag changed the title Add VHOST_USER_SET_LOG_BASE message support Add replaceable-mmapped bitmap support Feb 8, 2024
@germag
Copy link
Collaborator Author

germag commented Feb 8, 2024

This new version adds the bitmap support that was originally intended to be part of for the vm-memory crate: rust-vmm/vm-memory#264

@germag
Copy link
Collaborator Author

germag commented Feb 9, 2024

@stefano-garzarella @Ablu

@XanClic
Copy link
Contributor

XanClic commented Feb 9, 2024

How is this supposed to be used? Compared to the previous proposal (in vm-memory), if I replace vm_memory::bitmap::BitmapRegion by vhost_user_backend::BitmapMmapRegion, it works – but that type isn’t public (and neither is the bitmap module), so it doesn’t work as-is with this PR.

What does compile and apparently also kind of works is to continue to use () as the bitmap, but I don’t think that’s really intentional: The BitmapReplace implementation for that is a no-op, so the AtomicBitmapMmap objects created by set_log_base() end up just being dropped. And that seems a bit dangerous, that nothing complains about having no bitmap during migration.

It’s questionable to me whether () can really implement BitmapReplace, when actually it won’t do anything. I also don’t quite understand why set_log_base() creates AtomicBitmapMmap objects specifically, and why BitmapReplace requires being able to replace any bitmap that implements it by specifically an AtomicBitmapMmap.

It feels like ideally we’d have a trait (maybe LogBitmap) with some inner type that allows for set_log_base() support, i.e. the trait provides one method for replacing (with the inner type, not a concrete one), and one for creating the inner type from the information we get through SET_LOG_BASE. AtomicBitmapMmap would then implement that trait, but () would not. set_log_base() is only fully implemented if the bitmap type implements LogBitmap (returns an error otherwise), and get_protocol_features() should probably automatically set LOG_SHMFD if that trait is implemented.

The problem with “is trait implemented or not” is that it’s not so easy in Rust and would require a whole new line of VhostUserBackendReqHandler traits, I believe. However, what can work would be to do the test at runtime, e.g. with an associated const:

pub trait LogBitmap: Sized {
    fn new<R: GuestMemoryRegion>(region: &R, logmem: MmapLogReg) -> io::Result<Self>;
}

pub trait LogBitmapRegion: Bitmap {
    const CAN_LOG_DIRTY_WRITES: bool; // Can you really use this bitmap type for migration?
    type Inner: LogBitmap;

    fn replace(&self, bitmap: Self::Inner);
}

And then this can be implemented for (), setting the const to false:

impl LogBitmap for () {                                                                                                                                                         
    fn new<R: GuestMemoryRegion>(_region: &R, _logmem: MmapLogReg) -> io::Result<Self> {
        Err(io::Error::new(
            io::ErrorKind::Unsupported,
            "Back-end does not support logging memory modification, cannot migrate",
        ))
    }
}

impl LogBitmapRegion for () {
    const CAN_LOG_DIRTY_WRITES: bool = false;
    type Inner = ();

    fn replace(&self, _bitmap: ()) {}
}

(And you just impl LogBitmap for AtomicBitmapMmap and impl LogBitmapRegion for BitmapMmapRegion.)

set_log_base() may want to double-check <T::Bitmap as LogBitmapRegion>::CAN_LOG_DIRTY_WRITES before doing anything, but may also just stay as-is, relying on <T::Bitmap as LogBitmapRegion>::Inner::new(region, logmem) to return an error if the bitmap doesn’t really work.

get_features() may want to auto-set VhostUserVirtioFeatures::LOG_ALL.bits() if CAN_LOG_DIRTY_WRITES is true, same for get_protocol_features() and VhostUserProtocolFeatures::LOG_SHMFD. (Though set_features() will need to take care to filter out LOG_ALL from the bits sent to the back-end implementation if it was auto-set.)


Alternatively, the quick fix would be to just make BitmapMmapRegion public and have ()::replace() return a runtime error that prevents migration, because having no real bitmap seems to forbid migration.

@germag germag force-pushed the vhost-add-bitmap-support branch from c85634b to 1d0c045 Compare February 14, 2024 15:23
@germag
Copy link
Collaborator Author

germag commented Feb 14, 2024

v2:

  • Fix module access
  • Add assoc. type to prevent () to be used as a replaceable bitmap

Copy link
Member

@stefano-garzarella stefano-garzarella left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about splitting the last commit in two since we are touching both vhost-user-backend and vhost crates?

Since it is a new feature, can you also add an entry on both vhost and vhost-user-backend changelog file?

@germag germag force-pushed the vhost-add-bitmap-support branch from 1d0c045 to 120a04d Compare February 20, 2024 11:35
@germag
Copy link
Collaborator Author

germag commented Feb 20, 2024

v3:

  • Address Stefano feedback
    • Add TODO comments
    • Add copy right notice

@stefano-garzarella
Copy link
Member

LGTM. Please, can you add an entry on both vhost and vhost-user-backend changelog for this new feature.

Nit: maybe we can remove vhost-user-backend: prefix from the last commit where we touch both crates, and add it to the first commit.

@germag
Copy link
Collaborator Author

germag commented Feb 20, 2024

LGTM. Please, can you add an entry on both vhost and vhost-user-backend changelog for this new feature.

I knew I was forgetting something, sorry, yes I'll do it

Nit: maybe we can remove vhost-user-backend: prefix from the last commit where we touch both crates, and add it to the first commit.

Ok

@germag germag force-pushed the vhost-add-bitmap-support branch from 120a04d to c76b191 Compare February 21, 2024 11:37
@germag
Copy link
Collaborator Author

germag commented Feb 21, 2024

v4:

  • Changelog entries
  • Fix commit message
  • Check for 0-sized mem regions
  • Include inner error on io::Error

@germag germag force-pushed the vhost-add-bitmap-support branch 2 times, most recently from a555ff0 to ce64985 Compare February 21, 2024 15:16
@germag
Copy link
Collaborator Author

germag commented Feb 21, 2024

v5:

  • wrap MmapLogReg on Arc to avoid multiple mmaps

@germag
Copy link
Collaborator Author

germag commented Mar 1, 2024

A related question is: What happens on a cancelled migration (i.e. SET_FEATURES called with F_LOG cleared)?

taking a quick look at qemu's code, I don't see any place where F_LOG_ALL is cleared in any case (there is something in vdpa but is unrelated)

@XanClic
Copy link
Contributor

XanClic commented Mar 1, 2024

A related question is: What happens on a cancelled migration (i.e. SET_FEATURES called with F_LOG cleared)? I think as per spec we can’t actually remove the bitmaps, because the front-end might set F_LOG again without calling SET_LOG_BASE a second time. So it sounds like we’d have to have a way to disable bitmaps, actually, I think.

We will need to disable the InnerBitmap, while keeping the old one just in case we don't receive a new SET_LOG_BASE, because the vm-memory types will continue calling mark_dirty(), right now it seems unnecessary complexity to me since as you said there is no harm in setting dirty bits in the log.

Yes, there isn’t any real harm, but it will cost a bit of performance after a cancelled migration, so would be nice to look into at some point at least.

I think if desired we can introduce bitmap disabling locally here in the vhost-user-backend crate without modifications to vm-memory, just by adding a bool/AtomicBool to either AtomicBitmapMmap or MmapLogReg. (bool might give better run-time performance, but would require a new replace() cycle.)

@XanClic
Copy link
Contributor

XanClic commented Mar 1, 2024

A related question is: What happens on a cancelled migration (i.e. SET_FEATURES called with F_LOG cleared)?

taking a quick look at qemu's code, I don't see any place where F_LOG_ALL is cleared in any case (there is something in vdpa but is unrelated)

vhost_log_global_stop() calls vhost_migration_log(..., false), which will have vhost_dev_set_features() clear VHOST_F_LOG_ALL. vhost_log_global_stop() is invoked from memory_global_dirty_log_do_stop() as a log_global_stop memory listener.

The current draft for virtiofsd does use clearing of F_LOG_ALL to recognize a cancelled migration. Note that qemu will call memory_global_dirty_log_do_stop() only when the VM is running (see its caller memory_global_dirty_log_stop()), i.e. definitely only when the migration has been cancelled, never when it’s just been completed.

@germag
Copy link
Collaborator Author

germag commented Mar 1, 2024

I think if desired we can introduce bitmap disabling locally here in the vhost-user-backend crate without modifications to vm-memory, just by adding a bool/AtomicBool to either AtomicBitmapMmap or MmapLogReg. (bool might give better run-time performance, but would require a new replace() cycle.)

I still think is unnecessary right now, and I'm not sure about the performance gain either, we will need to check an AtomicBool even if we don't have a InnerBitmap.

@Ablu
Copy link
Collaborator

Ablu commented Mar 1, 2024

I'm not sure if I understood this, but in SET_LOG_BASE we are just constructing the log, as I said before, we are not starting it

Ah... My brain assumed that that's what the replace would trigger :). But it certainly makes a lot more sense now!

Do you already have actual starting of the logging implemented somewhere? I tried to look around in https://gitlab.com/virtiofsd-live-migration, but could not immediately spot it there. Would love to see how that works out API wise.

Overall, it would have probably helped me if there would have been a more concrete mention that this is only part of the solution and other bits still have to follow (with a rough overview of what those pieces are).

@XanClic
Copy link
Contributor

XanClic commented Mar 1, 2024

introduce

I still think is unnecessary right now, and I'm not sure about the performance gain either, we will need to check an AtomicBool even if we don't have a InnerBitmap.

Sure. mark_dirty() does get a range, and its current implementation dirties every single bit separately, but I assume the range is a single page most of the time, so it’s just a single operation. OTOH, I’m sure that fetching an atomic bool is quicker than performing an atomic OR, and as I said, we could get away without using an atomic, too (let mut disabled_bmap = region().bitmap().inner().clone(); disabled_bmap.disabled = true; region().bitmap().replace(disabled_bmap);).

@germag
Copy link
Collaborator Author

germag commented Mar 1, 2024

o you already have actual starting of the logging implemented somewhere? I tried to look around in https://gitlab.com/virtiofsd-live-migration, but could not immediately spot it there. Would love to see how that works out API wise.

No, because that is part of vm-memory, it expects the bitmap exists at compile time, as I said before

Overall, it would have probably helped me if there would have been a more concrete mention that this is only part of the solution and other bits still have to follow (with a rough overview of what those pieces are).

I think I discussed it in the previous reviews

@germag
Copy link
Collaborator Author

germag commented Mar 1, 2024

introduce

I still think is unnecessary right now, and I'm not sure about the performance gain either, we will need to check an AtomicBool even if we don't have a InnerBitmap.

Sure. mark_dirty() does get a range, and its current implementation dirties every single bit separately, but I assume the range is a single page most of the time, so it’s just a single operation. OTOH, I’m sure that fetching an atomic bool is quicker than performing an atomic OR,

Probably, but currently in the most common case (i.e., without a bitmap) we don't access any of that

and as I said, we could get away without using an atomic, too (let mut disabled_bmap = region().bitmap().inner().clone(); disabled_bmap.disabled = true; region().bitmap().replace(disabled_bmap);).

Ok, I will implement the AtomicBool approach, otherwise I think we also need to take into account that we can receive a SET_FEATURE unrelated to live migration

@XanClic
Copy link
Contributor

XanClic commented Mar 1, 2024

I'm not sure if I understood this, but in SET_LOG_BASE we are just constructing the log, as I said before, we are not starting it

Ah... My brain assumed that that's what the replace would trigger :).

As far as I understand, it effectively does.

The vm-memory crate calls mark_dirty() for every write access, basically, so from one perspective, logging is active all the time, it’s just that mark_dirty() is effectively a no-op as long as the bitmaps don’t have an mmap-ed area to back them.

Specifically, <BitmapMmapRegion as Bitmap>::mark_dirty() is a no-op if BitmapMmapRegion.inner is None, but if let Some(bmap) = BitmapMmapRegion.inner, bmap.mark_dirty() will immediately start setting dirty bits in the mmap-ed area, i.e. as soon as we have called replace().

Do you already have actual starting of the logging implemented somewhere? I tried to look around in https://gitlab.com/virtiofsd-live-migration, but could not immediately spot it there.

Because so far I understood that attaching the bitmap to the mem regions would immediately start logging, there is nothing that virtiofsd does except to start aggregating its own internal state.

@germag
Copy link
Collaborator Author

germag commented Mar 1, 2024

The current draft for virtiofsd does use clearing of F_LOG_ALL to recognize a cancelled migration.

but that is only for the internal state, keeping marking dirty pages will not interfere with it
So, I'm still not sure if enabling/disabling the bitmap is it is worth doing it

@XanClic
Copy link
Contributor

XanClic commented Mar 1, 2024

introduce

I still think is unnecessary right now, and I'm not sure about the performance gain either, we will need to check an AtomicBool even if we don't have a InnerBitmap.

Sure. mark_dirty() does get a range, and its current implementation dirties every single bit separately, but I assume the range is a single page most of the time, so it’s just a single operation. OTOH, I’m sure that fetching an atomic bool is quicker than performing an atomic OR,

Probably, but currently in the most common case (i.e., without a bitmap) we don't access any of that

I was specifically talking about the cancelled migration case, and I did agree (“Sure.”) that we don’t necessarily need to address it now. If you want to implement disabling now, fine with me; if you don’t, also fine with me. I only said I believe this is something that should be looked into at some point, not necessarily now.

and as I said, we could get away without using an atomic, too (let mut disabled_bmap = region().bitmap().inner().clone(); disabled_bmap.disabled = true; region().bitmap().replace(disabled_bmap);).

Ok, I will implement the AtomicBool approach, otherwise I think we also need to take into account that we can receive a SET_FEATURE unrelated to live migration

I don’t understand. It’s easy to find out whether F_LOG_ALL is being toggled by a SET_FEATURE call or not: We can just store in VhostUserHandler whether logging is currently done or not, and find out whether the bit is being toggled this way; or we can even just check whether region().bitmap().inner().disabled matches features & VhostUserVirtioFeatures::LOG_ALL.bits() == 0.

The current draft for virtiofsd does use clearing of F_LOG_ALL to recognize a cancelled migration.

but that is only for the internal state, keeping marking dirty pages will not interfere with it

I only gave this as an example to show that clearing F_LOG_ALL does occur.

To be entirely clear, I still do not have any objection to this PR as-is.

But from what I understand, with the PR as-is, logging does start when SET_LOG_BASE is called, and will never stop. Functionally, I don’t see any problem with that, but it is not correct as per spec, and in the case of a cancelled migration, seems like an admittedly most likely absolutely negligible waste of CPU cycles. Therefore, I do believe this should be addressed at some point in time, but not necessarily now.

@germag
Copy link
Collaborator Author

germag commented Mar 1, 2024

I was specifically talking about the cancelled migration case, and I did agree (“Sure.”) that we don’t necessarily need to address it now. If you want to implement disabling now, fine with me; if you don’t, also fine with me. I only said I believe this is something that should be looked into at some point, not necessarily now.
...
To be entirely clear, I still do not have any objection to this PR as-is.

But from what I understand, with the PR as-is, logging does start when SET_LOG_BASE is called, and will never stop. Functionally, I don’t see any problem with that, but it is not correct as per spec, and in the case of a cancelled migration, seems like an admittedly most likely absolutely negligible waste of CPU cycles. Therefore, I do believe this should be addressed at some point in time, but not necessarily now.

Ahh ok, sorry for the confusion (you know Mondays), ok I take note, and I'll work on a solution for a future PR, and do some testing, also I'll measure the performance impact. Thanks

@XanClic
Copy link
Contributor

XanClic commented Mar 1, 2024

(Just for reference for a future PR, this is how I imagine it without using an atomic: https://czenczek.de/0001-Disable-bitmaps-unless-F_LOG_ALL.patch (github wouldn’t let me attach this file 🙁))

@stefano-garzarella stefano-garzarella force-pushed the vhost-add-bitmap-support branch from 99c035f to 492cd28 Compare March 8, 2024 14:07
@stefano-garzarella
Copy link
Member

@germag I just rebased it and everything looks fine.
IIUC we can merge it for now and improve later.

@XanClic @Ablu WDYT?

@XanClic
Copy link
Contributor

XanClic commented Mar 8, 2024

No objections from my side, looks good to me.

@Ablu
Copy link
Collaborator

Ablu commented Mar 8, 2024

Ah, when skimming the discussion above I assumed that changes were planned.

I still do not fully understand where we are and what is missing.

  • Why do we immediately enable logging if the specification says that it starting logging should happen as part of SET_FEATURES?
  • What exactly is missing here for full functionality?
  • Do we see ways to simplify the API?

Probably all of this is clear to you being more familiar with the specs and implementations. Chances are that I am also just misunderstanding bits. But if this is not usable as it is yet and we do not have a full understanding of what the API needs will be in the end, then what do we gain by merging now?

I expressed my worry about the additional trait complexities and I am happy to merge this if we have a road towards simplifying that, but it would be great to have the plans spelled out somewhere at least in a rough form.

@XanClic
Copy link
Contributor

XanClic commented Mar 8, 2024

Why do we immediately enable logging if the specification says that it starting logging should happen as part of SET_FEATURES?

German knows this better than me, but I assume just because it’s simpler: There currently is no way to attach dirty bitmaps to memory regions (which we must do on SET_LOG_BASE) but keep them disabled. So that would need to be implemented.
We can’t see an actual problem that would arise from starting too early or stopping too late.
I’ve posted a patch that implements disabling dirty bitmaps until SET_FEATURES sets F_LOG_ALL. German has said he’d prefer to incorporate that in a future PR rather than here, and because I don’t see a functional problem, I don’t disagree.

What exactly is missing here for full functionality?

Nothing. This MR works as-is for live migration with virtiofsd.
There is the F_LOG_ALL issue, but in practice that is not a functional issue but at worst a performance issue.

Do we see ways to simplify the API?

German knows this better, but if anything, the only problem I have with the API is that it requires specifying a bitmap type at all, because vhost-user should always use BitmapMmapRegion. I don’t have a problem with the traits, because they didn’t appear when I made use of this MR in virtiofsd. The only thing I needed to do was to replace the bitmap type from being implicitly () to explicitly BitmapMmapRegion.
Patch 2 in the virtiofsd migration MR makes virtiofsd ready for state-less migration*, and it really is just changing the bitmap type from () to BitmapMmapRegion, and setting the necessary feature bits.

*State-less migration means it cannot transfer its internal state yet, but that’s a completely separate topic.

@Ablu
Copy link
Collaborator

Ablu commented Mar 8, 2024

What exactly is missing here for full functionality?

Nothing. This MR works as-is for live migration with virtiofsd. There is the F_LOG_ALL issue, but in practice that is not a functional issue but at worst a performance issue.

OK. That's where I got confused. Thanks! I looked at https://gitlab.com/virtiofsd-live-migration and could not tell what exactly was missing.

Why do we immediately enable logging if the specification says that it starting logging should happen as part of SET_FEATURES?

German knows this better than me, but I assume just because it’s simpler: There currently is no way to attach dirty bitmaps to memory regions (which we must do on SET_LOG_BASE) but keep them disabled. So that would need to be implemented. We can’t see an actual problem that would arise from starting too early or stopping too late. I’ve posted a patch that implements disabling dirty bitmaps until SET_FEATURES sets F_LOG_ALL. German has said he’d prefer to incorporate that in a future PR rather than here, and because I don’t see a functional problem, I don’t disagree.

Then let's document that either in a code comment or the commit log and merge? 🤔

@germag
Copy link
Collaborator Author

germag commented Mar 11, 2024

V7:

  • Add a TODO comment to support LOG_ALL flag on SET_FEATURE

germag added 3 commits March 11, 2024 16:34
The vhost user protocol supports live migration, and during live
migration, the vhost-user frontend needs to track the modifications the
vhost-user backend makes to the memory mapped regions, marking the
dirty pages in a log (i.e., a bitmap).

If the backend has the VHOST_USER_PROTOCOL_F_LOG_SHMFD protocol feature
it will receive the VHOST_USER_SET_LOG_BASE message with a file
descriptor of the dirty-pages log memory region. This log covers all
known guest addresses, and must be manipulated atomically.

For further info please see
https://qemu-project.gitlab.io/qemu/interop/vhost-user.html#migration

This commit adds support for creating an atomic-memory-mapped bitmap,
and being able to replace it in runtime. The vhost user protocol does
not specify whether the previous bitmap is still active after replying
to the VHOST_USER_SET_LOG_BASE message, so we use an RwLock to be sure
that the in-flight requests are using the new bitmap after the message
reply.

Signed-off-by: German Maglione <[email protected]>
Any bitmap used in the vhost-user backend must implement the
BitmapReplace trait, which provides the functionality to replace the
internal bitmap in runtime.

This internal bitmap is required because in the vm-memory crate the
bitmap is expected to exist at the time of creating the memory regions,
and in the case of vhost-user the bitmap is added at runtime, also it
could be replaced at a later time. In addition, the vhost user protocol
does not specify whether the previous bitmap is still active after
replying to the VHOST_USER_SET_LOG_BASE message, so we must be sure
that the in-flight requests are using the new bitmap after the message
reply.

Signed-off-by: German Maglione <[email protected]>
If the backend has the VHOST_USER_PROTOCOL_F_LOG_SHMFD protocol feature
it will receive the VHOST_USER_SET_LOG_BASE message with a file
descriptor of the dirty-pages log memory region. This log covers all
known guest addresses, and must be manipulated atomically.

For further info please see
https://qemu-project.gitlab.io/qemu/interop/vhost-user.html#migration

Signed-off-by: German Maglione <[email protected]>
@germag germag force-pushed the vhost-add-bitmap-support branch from 7d7697a to 83bd2a0 Compare March 11, 2024 15:34
@germag
Copy link
Collaborator Author

germag commented Mar 11, 2024

Rebase

@Ablu Ablu merged commit 6ce9d36 into rust-vmm:main Mar 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants