Skip to content
This repository has been archived by the owner on Mar 24, 2022. It is now read-only.

Add userfaultfd-based Region backend #492

Merged
merged 33 commits into from
May 20, 2020
Merged

Add userfaultfd-based Region backend #492

merged 33 commits into from
May 20, 2020

Conversation

tyler
Copy link
Member

@tyler tyler commented Apr 8, 2020

This pull request integrates Fastly's userfaultfd based memory management backend as an alternate Region implementation.

Userfaultfd is a Linux-specific mechanism for handling page faults within userspace. (See https://www.kernel.org/doc/html/latest/admin-guide/mm/userfaultfd.html for a good technical explanation of it.) For Lucet, the use case is that when an Instance is started, none of the linear memory has to be copied initially.

We register the entire region with userfaultfd. When an instance is started, we set up the stacks, metadata, etc as normal, but we leave the pages of the linear memory "missing". When the instance starts and tries to access one of the linear memory pages, this triggers a page fault, as there is no physical memory backing the virtual memory. Since the region is registered with userfaultfd, this triggers a message to be sent to the userfaultfd handler thread. The handler thread determines which instance and module has faulted, and copies the necessary memory into place, before reawakening the instance thread.

At its core what this does is provide a way of reducing startup time and increasing flexibility in how memory is handled at the cost of increased runtime overhead (by was of context switches especially).

(All credit for this actually goes to @acfoltzer.)

@fst-crenshaw fst-crenshaw self-requested a review April 8, 2020 19:46
@acfoltzer acfoltzer self-requested a review April 9, 2020 20:48
@acfoltzer
Copy link
Contributor

@tyler there are a few things I'd like to add to the public API surface and documentation to go with this. It would probably be easiest if I did that on a branch and then PRed into this, is that alright with you?

AddrLocation::Globals | AddrLocation::SigStack => {
tracing::error!("UFFD pagefault at unexpected location: {:?}", loc);
uffd.wake(fault_page as *mut c_void, host_page_size())
.map_err(|e| Error::InternalError(e.into()))?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the things I've been seeing in the mmap-related PRs is a re-examination of errors like these. If there had been a UFFD pagefault at an unexpected location, should the lucet runtime return its generic "InternalError" or should it just panic?

See related PR: #486

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good call, it should just panic. I think @acfoltzer wanted to take care of that in a second PR. Though tbh I think we should just do it here.

// zero the sigstack
(slot.sigstack, limits.signal_stack_size),
]
.into_iter()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am new to userfaultfd, and I'm wondering about a small difference between its implementation of new_instance_with and the mmap implementation.

// make the stack read/writable
(slot.stack, limits.stack_size),

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So yeah, that's intentional here. We're making the allocation of the stack lazy. If you look at the other lines changed in that commit, you'll see where I make page faults in the Stack area zero out the page on demand.

That said, it does point to a lack of documentation here that I should fix.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some docs!

.map_err(|e| Error::InternalError(e.into()))?,
);

// map the chunk of virtual memory for all of the slots
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing the answer to my question could be here. The memory is already marked as READ/WRITE for all the slots.

Copy link
Contributor

@acfoltzer acfoltzer Apr 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AddrLocation::Stack => unsafe {
uffd.zeropage(fault_page as *mut c_void, host_page_size(), true)
.map_err(|e| Error::InternalError(e.into()))?;
},
is also a key part of why this works

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm. If I'm reading that correctly, does that mean that the stack isn't zeroed out until it's actually needed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may help to refer to the ioctl_userfaultfd(2) man page on this, since "zeropage" does a lot more than what it sounds like: http://man7.org/linux/man-pages/man2/ioctl_userfaultfd.2.html (search for UFFDIO_ZEROPAGE on that page).

Basically zeropage is one of several ways that the uffd worker thread can resolve a page fault in the memory that it has registered. The stack starts off as mprotect read/write but madvise don't need, so accesses will trigger a uffd fault. We then recognize that the fault address is in the stack, and then call zeropage to both zero out the memory and resolve the fault, so that the instance's thread wakes up and can make progress once again.

@fst-crenshaw
Copy link
Contributor

Now that CircleCI is running mmap and userfaultfd tests, will we be able to merge this?

@acfoltzer
Copy link
Contributor

Now that CircleCI is running mmap and userfaultfd tests, will we be able to merge this?

CircleCI experiments are currently only happening on branches, but I'm hoping to get a proper PR together today.

@fst-crenshaw
Copy link
Contributor

So excited!

tyler and others added 19 commits May 19, 2020 13:35
- Instantiates the suite in `lucet_runtime_internals::alloc::tests` for `UffdRegion

- Fixes early return issues in `UffdRegion` similar to #455

- Adds a test to show that the per-instance heap limit applies to runtime expansions, not just
initial instantiation

- Refactors `validate_runtime_spec` to take the per-instance heap limit as an additional
argument. This centralizes the logic for rejecting initially-oversized heap limits, and makes it
clearer what's happening in each region's instantiation logic.

- Removes the `UffdRegion`'s assertion that signal stack size is a multiple of page size. Since the
user can now control this as a parameter, we reject it gracefully when validating `Limits` rather
than panicking.
Leaving the question of errors in the handler alone for this commit, since that'll be a more major
change.
Notably, this should get us building and running uffd in Linux CI.

It turns out to be a tremendous pain to enable a feature flag for just one crate within a
workspace. The situation is [being addressed][1], but in the meantime I believe the best route
forward is to just have uffd on by default for Linux.

[1]: rust-lang/cargo#5364
@tyler
Copy link
Member Author

tyler commented May 19, 2020

Ready for review :)

Copy link
Contributor

@fst-crenshaw fst-crenshaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approve approve approve!

@fst-crenshaw
Copy link
Contributor

The GitHub Actions tests are failing since they can't run the userfaultfd tests. I thought I'd try a little PR to have the offensive tests get skipped by GitHub Actions, since they are being run by CircleCI. See: #528

@tyler tyler merged commit c557dd8 into master May 20, 2020
@tyler tyler deleted the tyler/uffd-integration branch May 20, 2020 17:18
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants