Use `Token`-based locking on XDE management operations #761

FelixMcFelix · 2025-05-29T10:47:24Z

This PR introduces a TokenLock type, which allows for a single thread to be in a critical section without actively holding a KMutex or a write on a KRwLock. This is used to ensure that we have at most one active ioctl handler performing management operations. Previously, we relied upon write locks to the underlay and xde_devs to enforce this constraint.

Underlay installation, port creation, and port destruction have been restructured to rely on the TokenLock as the main means of mutual exclusion, and then to access other locked resources (which may be shared with the datapath) more granularly. This allows us to ensure that any calls to DLS (link name resolution, device creation) are made without inappropriately holding any lock, but are still appropriately atomic with respect to one another.

Fixes #758.

FelixMcFelix · 2025-06-03T11:19:22Z

optd.zip

Attached is the source of a simple test program which exercises the deadlock here by creating ports, deleting ports, and spawning new processes. On master, we make it around ~300 ports in before the deadlock is hit. With these bits, the program comfortably runs in excess of 45k ports (35s), I'll run it for a little longer after I rework one or two things.

FelixMcFelix · 2025-06-03T12:21:07Z

lib/opte/src/ddi/sync.rs

+        #[cfg(all(not(feature = "std"), not(test)))]
+        let curthread = unsafe {
+            NonNull::new(threadp())
+                .expect("current thread *must* be a valid pointer")
+        };
+
+        #[cfg(any(feature = "std", test))]
+        let curthread = std::thread::current().id();
+
+        *thread_lock = Some(curthread);


This differs slightly from other times this pattern occurs in the kernel (e.g., nvme.c) -- here we drop the lock as soon as possible and stake our claim purely using the thread ID, whereas in the prior art the thread ID is only stored when making an upcall.

If having the same behaviour is crucial, we could shift the thread ID logic into a Token::upcall_context(&mut self, f: impl FnOnce()) and have Token hold a lock guard instead -- this would also be useful in proving that none of the inner data/locks are held in a given closure.

lib/opte/src/ddi/sync.rs

xde/src/xde.rs

xde/src/stats.rs

Still thinking on the token lock_sig...

xde/src/xde.rs

FelixMcFelix added 3 commits May 28, 2025 12:10

Initial go at a Token type management lock

3fc1324

Initial reorganisation.

9c1fe7a

Merge branch 'master' into fix-758

3e931f6

FelixMcFelix added this to the 16 milestone May 29, 2025

FelixMcFelix added 2 commits June 2, 2025 12:14

Ensure that only management token holder can see underlay, write to devs

3c9284c

Tweaks.

5985d2b

FelixMcFelix mentioned this pull request Jun 2, 2025

xde deadlock due to upcalls and sled-agent forks #758

Closed

FelixMcFelix added 2 commits June 2, 2025 22:45

cv_wait_sig saves the day.

de91c80

Implement the other CV methods, return the wakup reason

821d5a1

FelixMcFelix marked this pull request as ready for review June 3, 2025 11:48

Underlay does not need double-locked.

dacc8ca

FelixMcFelix commented Jun 3, 2025

View reviewed changes

FelixMcFelix modified the milestones: 16, 15 Jun 3, 2025

FelixMcFelix requested a review from pfmooney June 3, 2025 15:58

pfmooney reviewed Jun 3, 2025

View reviewed changes

Most feedback.

260a40d

Still thinking on the token lock_sig...

FelixMcFelix mentioned this pull request Jun 4, 2025

TokenLock should bubble up EINTR to userland callers when awoken by a signal #766

Open

pfmooney approved these changes Jun 4, 2025

View reviewed changes

xde/src/xde.rs Outdated Show resolved Hide resolved

Typo.

282eb7f

FelixMcFelix merged commit f5560fa into master Jun 4, 2025
10 checks passed

FelixMcFelix deleted the fix-758 branch June 4, 2025 20:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use `Token`-based locking on XDE management operations #761

Use `Token`-based locking on XDE management operations #761

Uh oh!

FelixMcFelix commented May 29, 2025

Uh oh!

FelixMcFelix commented Jun 3, 2025 •

edited

Loading

Uh oh!

FelixMcFelix Jun 3, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Use Token-based locking on XDE management operations #761

Use Token-based locking on XDE management operations #761

Uh oh!

Conversation

FelixMcFelix commented May 29, 2025

Uh oh!

FelixMcFelix commented Jun 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

FelixMcFelix Jun 3, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Use `Token`-based locking on XDE management operations #761

Use `Token`-based locking on XDE management operations #761

FelixMcFelix commented Jun 3, 2025 •

edited

Loading