Skip to content

Conversation

@FelixMcFelix
Copy link
Collaborator

This PR introduces a TokenLock type, which allows for a single thread to be in a critical section without actively holding a KMutex or a write on a KRwLock. This is used to ensure that we have at most one active ioctl handler performing management operations. Previously, we relied upon write locks to the underlay and xde_devs to enforce this constraint.

Underlay installation, port creation, and port destruction have been restructured to rely on the TokenLock as the main means of mutual exclusion, and then to access other locked resources (which may be shared with the datapath) more granularly. This allows us to ensure that any calls to DLS (link name resolution, device creation) are made without inappropriately holding any lock, but are still appropriately atomic with respect to one another.

Fixes #758.

@FelixMcFelix FelixMcFelix added this to the 16 milestone May 29, 2025
@FelixMcFelix
Copy link
Collaborator Author

FelixMcFelix commented Jun 3, 2025

optd.zip

Attached is the source of a simple test program which exercises the deadlock here by creating ports, deleting ports, and spawning new processes. On master, we make it around ~300 ports in before the deadlock is hit. With these bits, the program comfortably runs in excess of 45k ports (35s), I'll run it for a little longer after I rework one or two things.

@FelixMcFelix FelixMcFelix marked this pull request as ready for review June 3, 2025 11:48
Comment on lines +695 to +704
#[cfg(all(not(feature = "std"), not(test)))]
let curthread = unsafe {
NonNull::new(threadp())
.expect("current thread *must* be a valid pointer")
};

#[cfg(any(feature = "std", test))]
let curthread = std::thread::current().id();

*thread_lock = Some(curthread);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This differs slightly from other times this pattern occurs in the kernel (e.g., nvme.c) -- here we drop the lock as soon as possible and stake our claim purely using the thread ID, whereas in the prior art the thread ID is only stored when making an upcall.

If having the same behaviour is crucial, we could shift the thread ID logic into a Token::upcall_context(&mut self, f: impl FnOnce()) and have Token hold a lock guard instead -- this would also be useful in proving that none of the inner data/locks are held in a given closure.

@FelixMcFelix FelixMcFelix modified the milestones: 16, 15 Jun 3, 2025
@FelixMcFelix FelixMcFelix requested a review from pfmooney June 3, 2025 15:58
Still thinking on the token lock_sig...
@FelixMcFelix FelixMcFelix merged commit f5560fa into master Jun 4, 2025
10 checks passed
@FelixMcFelix FelixMcFelix deleted the fix-758 branch June 4, 2025 20:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

xde deadlock due to upcalls and sled-agent forks

3 participants