Skip to content

Commit 9e011ba

Browse files
committed
examples: Have UFFD handler kill Firecracker should it die
If the UFFD handler exits abnormaly for some reason, have it take down Firecracker as well by SIGKILL-ing it from a panic hook. For this, reintroduce the "get peer creds" logic. We have to use SIGKILL because Firecracker could be inside the handler for a KVM-originated page fault that is not marked as interruptible, in which case all signals but SIGKILL are ignored (happens for example during KVM_SET_MSRS when it triggers the initialization of a gfn_to_pfn_cache for the kvm-clock page, which uses GUP without FOLL_INTERRUPTIBLE). While we're at it, add a hint to the generic "process not found" error message to indicate that potentially Firecracker died, and that the cause of this could be the UFFD handler crashing (for example, in #4601 the cause of the mystery hang is the UFFD handler crashing, but we were stumped by what's going on for over half a year. Let's avoid that going forward). We can't enable this by default because it interferes with unittests, and also the "malicious_handler", so expose a function on `Runtime` to enable it only in valid_handler and fault_all_handler. Signed-off-by: Patrick Roy <[email protected]>
1 parent 3793b99 commit 9e011ba

File tree

4 files changed

+42
-0
lines changed

4 files changed

+42
-0
lines changed

src/firecracker/examples/uffd/fault_all_handler.rs

+1
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@ fn main() {
2424
let (stream, _) = listener.accept().expect("Cannot listen on UDS socket");
2525

2626
let mut runtime = Runtime::new(stream, file);
27+
runtime.install_panic_hook();
2728
runtime.run(|uffd_handler: &mut UffdHandler| {
2829
// Read an event from the userfaultfd.
2930
let event = uffd_handler

src/firecracker/examples/uffd/uffd_utils.rs

+37
Original file line numberDiff line numberDiff line change
@@ -208,6 +208,43 @@ impl Runtime {
208208
}
209209
}
210210

211+
fn peer_process_credentials(&self) -> libc::ucred {
212+
let mut creds: libc::ucred = libc::ucred {
213+
pid: 0,
214+
gid: 0,
215+
uid: 0,
216+
};
217+
let mut creds_size = size_of::<libc::ucred>() as u32;
218+
let ret = unsafe {
219+
libc::getsockopt(
220+
self.stream.as_raw_fd(),
221+
libc::SOL_SOCKET,
222+
libc::SO_PEERCRED,
223+
&mut creds as *mut _ as *mut _,
224+
&mut creds_size as *mut libc::socklen_t,
225+
)
226+
};
227+
if ret != 0 {
228+
panic!("Failed to get peer process credentials");
229+
}
230+
creds
231+
}
232+
233+
pub fn install_panic_hook(&self) {
234+
let peer_creds = self.peer_process_credentials();
235+
236+
let default_panic_hook = std::panic::take_hook();
237+
std::panic::set_hook(Box::new(move |panic_info| {
238+
let r = unsafe { libc::kill(peer_creds.pid, libc::SIGKILL) };
239+
240+
if r != 0 {
241+
eprintln!("Failed to kill Firecracker process from panic hook");
242+
}
243+
244+
default_panic_hook(panic_info);
245+
}));
246+
}
247+
211248
/// Polls the `UnixStream` and UFFD fds in a loop.
212249
/// When stream is polled, new uffd is retrieved.
213250
/// When uffd is polled, page fault is handled by

src/firecracker/examples/uffd/valid_handler.rs

+1
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@ fn main() {
2424
let (stream, _) = listener.accept().expect("Cannot listen on UDS socket");
2525

2626
let mut runtime = Runtime::new(stream, file);
27+
runtime.install_panic_hook();
2728
runtime.run(|uffd_handler: &mut UffdHandler| {
2829
// Read an event from the userfaultfd.
2930
let event = uffd_handler

tests/framework/microvm.py

+3
Original file line numberDiff line numberDiff line change
@@ -310,6 +310,9 @@ def kill(self):
310310
if self.screen_pid:
311311
os.kill(self.screen_pid, signal.SIGKILL)
312312
except:
313+
LOG.error(
314+
"Failed to kill Firecracker Process. Did it already die (or did the UFFD handler process die and take it down)?"
315+
)
313316
LOG.error(self.log_data)
314317
raise
315318

0 commit comments

Comments
 (0)