Skip to content

Conversation

@BusyJay
Copy link

@BusyJay BusyJay commented May 7, 2025

During contention, almost all threads are active on CPU, unlock them fast can make those threads make progress more quickly. This help improve global throughput in high contention a lot.

One shortcoming is that fair unlock is now required be invoked explicitly.

This is an improvement to #418.

During contention, almost all threads are active on CPU, unlock them fast can make those threads make progress more quickly. This help improve global throughput in high contention a lot.

One shortcoming is that fair unlock is now required be invoked explicitly.

This is an improvement to Amanieu#418.

Signed-off-by: Jay <[email protected]>
@BusyJay
Copy link
Author

BusyJay commented May 7, 2025

Running cargo run --bin mutex --release -- 9:36:9 5 5 2 2:

Running with 9 threads

name average median std.dev.
parking_lot::Mutex (this pr) 477.862 kHz 478.851 kHz 11.738 kHz
parking_lot::Mutex (master) 364.841 kHz 365.127 kHz 11.269 kHz
std::sync::Mutex 769.754 kHz 767.714 kHz 23.908 kHz
pthread_mutex_t 982.966 kHz 989.991 kHz 31.816 kHz

Running with 18 threads

name average median std.dev.
parking_lot::Mutex (this pr) 219.059 kHz 218.991 kHz 4.160 kHz
parking_lot::Mutex (master) 82.786 kHz 82.975 kHz 2.435 kHz
std::sync::Mutex 389.199 kHz 394.549 kHz 21.358 kHz
pthread_mutex_t 482.005 kHz 489.225 kHz 26.404 kHz

Running with 27 threads

name average median std.dev.
parking_lot::Mutex (this pr) 164.219 kHz 164.298 kHz 1.971 kHz
parking_lot::Mutex (master) 28.246 kHz 28.306 kHz 0.443 kHz
std::sync::Mutex 280.553 kHz 280.014 kHz 10.115 kHz
pthread_mutex_t 311.815 kHz 311.582 kHz 8.409 kHz

Running with 36 threads

name average median std.dev.
parking_lot::Mutex (this pr) 112.111 kHz 112.127 kHz 0.856 kHz
parking_lot::Mutex (master) 22.055 kHz 22.059 kHz 0.150 kHz
std::sync::Mutex 193.672 kHz 195.078 kHz 10.369 kHz
pthread_mutex_t 224.436 kHz 225.767 kHz 12.334 kHz

@BusyJay
Copy link
Author

BusyJay commented May 7, 2025

Running cargo run --bin rwlock --release -- 36 9 5 5 2 2

parking_lot::RwLock (this pr) - [write] 1102.323 kHz [read] 2943.833 kHz
parking_lot::RwLock (master) - [write] 628.062 kHz [read] 954.938 kHz
seqlock::SeqLock - [write] 648.979 kHz [read] 152225.000 kHz
pthread_rwlock_t - [write] 1678.253 kHz [read] 376.558 kHz

@BusyJay
Copy link
Author

BusyJay commented May 7, 2025

Reimplement the PR by maintaining parked bit on waker side, new implementation is less error-prone and work with CondVar directly.

Benchmark shows even more positive results:

Running cargo run --bin mutex --release -- 9:36:9 5 5 2 2:

Running with 9 threads

name average median std.dev.
parking_lot::Mutex (this pr) 405.134 kHz 406.469 kHz 9.105 kHz
parking_lot::Mutex (master) 364.841 kHz 365.127 kHz 11.269 kHz
std::sync::Mutex 769.754 kHz 767.714 kHz 23.908 kHz
pthread_mutex_t 982.966 kHz 989.991 kHz 31.816 kHz

Running with 18 threads

name average median std.dev.
parking_lot::Mutex (this pr) 268.530 kHz 268.355 kHz 5.586 kHz
parking_lot::Mutex (master) 82.786 kHz 82.975 kHz 2.435 kHz
std::sync::Mutex 389.199 kHz 394.549 kHz 21.358 kHz
pthread_mutex_t 482.005 kHz 489.225 kHz 26.404 kHz

Running with 27 threads

name average median std.dev.
parking_lot::Mutex (this pr) 185.802 kHz 186.233 kHz 2.598 kHz
parking_lot::Mutex (master) 28.246 kHz 28.306 kHz 0.443 kHz
std::sync::Mutex 280.553 kHz 280.014 kHz 10.115 kHz
pthread_mutex_t 311.815 kHz 311.582 kHz 8.409 kHz

Running with 36 threads

name average median std.dev.
parking_lot::Mutex (this pr) 134.010 kHz 133.784 kHz 1.509 kHz
parking_lot::Mutex (master) 22.055 kHz 22.059 kHz 0.150 kHz
std::sync::Mutex 193.672 kHz 195.078 kHz 10.369 kHz
pthread_mutex_t 224.436 kHz 225.767 kHz 12.334 kHz

Running cargo run --bin rwlock --release -- 36 9 5 5 2 2

parking_lot::RwLock (this pr) - [write] 6121.347 kHz [read] 968.373 kHz
parking_lot::RwLock (master) - [write] 628.062 kHz [read] 954.938 kHz
seqlock::SeqLock - [write] 648.979 kHz [read] 152225.000 kHz
pthread_rwlock_t - [write] 1678.253 kHz [read] 376.558 kHz

{
let mut prev = self.state.load(Ordering::Relaxed);
let new_state = prev & !LOCKED_BIT;
prev = self.state.swap(new_state, Ordering::Release);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a bug here: you may "forget" a parked thread if another thread sets PARKED_BIT between the load and swap.

Copy link
Author

@BusyJay BusyJay May 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then prev must be set to PARKED_BIT | LOCKED_BIT at L104 and can't pass the check at L105.

@BusyJay
Copy link
Author

BusyJay commented May 7, 2025

Bench with the command in #418 cargo run --release 32 2 10000 100, regression seems resolved:

std::sync::Mutex avg 30.795793ms min 28.369313ms max 33.668656ms
parking_lot::Mutex (this PR) avg 40.800542ms min 37.16543ms max 44.621677ms
parking_lot::Mutex (master) avg 206.836045ms min 183.902676ms max 213.697023ms
spin::Mutex avg 63.898884ms min 58.45244ms max 74.323676ms
AmdSpinlock avg 70.131547ms min 65.356139ms max 83.456119ms

std::sync::Mutex avg 30.52266ms min 28.69828ms max 34.945486ms
parking_lot::Mutex (this PR) avg 41.146074ms min 38.453175ms max 42.433051ms
parking_lot::Mutex (master) avg 210.387478ms min 187.38791ms max 215.752182ms
spin::Mutex avg 62.823716ms min 54.801191ms max 74.31628ms
AmdSpinlock avg 68.937325ms min 55.406785ms max 80.83359ms

BusyJay added a commit to BusyJay/parking_lot that referenced this pull request May 8, 2025
This is an alternative implementation of idea Amanieu#461.

Compared to Amanieu#461, this PR maintains parked bit on waiter side, so that
waker doesn't have to atomic operation twice. And waker now reset all
lock states back to 0 no matter what state it was. This makes fast lock
more likely succeed during high contention.

Signed-off-by: Jay <[email protected]>
BusyJay added a commit to BusyJay/parking_lot that referenced this pull request May 8, 2025
This is an alternative implementation of idea Amanieu#461.

Compared to Amanieu#461, this PR maintains parked bit on waiter side, so that
waker doesn't have to atomic operation twice. And waker now reset all
lock states back to 0 no matter what state it was. This makes fast lock
more likely succeed during high contention.

Signed-off-by: Jay <[email protected]>
BusyJay added a commit to BusyJay/parking_lot that referenced this pull request May 8, 2025
This is an alternative more aggressive implementation of idea Amanieu#461.

Compared to Amanieu#461, this PR
- maintains parked bit on waiter side, so that waker doesn't
  have to atomic operation twice.
- reset all lock states back to 0 when unlock. This makes fast lock
  more likely succeed during high contention.
- set PARKED_BIT even waiter is prevented from sleep, so that more
  threads can be woken up during contention to compete for progress.

Signed-off-by: Jay <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants