From 34bb5bc83e4b43299705890dfa6128ec27bb2a9f Mon Sep 17 00:00:00 2001 From: Amanieu d'Antras Date: Fri, 27 May 2016 15:51:54 +0100 Subject: [PATCH 1/4] Replace synchronization primitives with those from parking_lot --- text/0000-parking-lot.md | 102 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 102 insertions(+) create mode 100644 text/0000-parking-lot.md diff --git a/text/0000-parking-lot.md b/text/0000-parking-lot.md new file mode 100644 index 00000000000..5b0b28e807c --- /dev/null +++ b/text/0000-parking-lot.md @@ -0,0 +1,102 @@ +- Feature Name: parking_lot +- Start Date: 2016-05-27 +- RFC PR: (leave this empty) +- Rust Issue: (leave this empty) + +# Summary +[summary]: #summary + +This RFC proposes replacing the `Mutex`, `Condvar`, `RwLock` and `Once` types in +the standard library with those from the [`parking_lot`](https://github.com/Amanieu/parking_lot) crate. The synchronization +primitives in the `parking_lot` crate are smaller, faster and more flexible than +those in the Rust standard library. + +# Motivation +[motivation]: #motivation + +The primitives provided by `parking_lot` have several advantages over those +in the Rust standard library: + +1. `Mutex` and `Once` only require 1 byte of storage space, while `Condvar` + and `RwLock` only require 1 word of storage space. On the other hand the + standard library primitives require a dynamically allocated `Box` to hold + OS-specific synchronization primitives. The small size of `Mutex` in + particular encourages the use of fine-grained locks to increase + parallelism. +2. Since they consist of just a single atomic variable, have constant + initializers and don't need destructors, these primitives can be used as + `static` global variables. The standard library primitives require + dynamic initialization and thus need to be lazily initialized with + `lazy_static!`. +3. Uncontended lock acquisition and release is done through fast inline + paths which only require a single atomic operation. +4. Microcontention (a contended lock with a short critical section) is + efficiently handled by spinning a few times while trying to acquire a + lock. +5. The locks are adaptive and will suspend a thread after a few failed spin + attempts. This makes the locks suitable for both long and short critical + sections. +6. `Condvar`, `RwLock` and `Once` work on Windows XP, unlike the standard + library versions of those types. +7. `RwLock` takes advantage of hardware lock elision on processors that + support it, which can lead to huge performance wins with many readers. +8. `MutexGuard` (and the `RwLock` equivalents) is `Send`, which means it can be + unlocked by a different thread than the one that locked it. +9. `RwLock` will prefer writers, whereas the standard library version makes no + guarantees as to whether readers or writers are given priority. +10. `Condvar` is guaranteed not to produce spurious wakeups. A thread will only + be woken up if it timed out or it was woken up by a notification. +11. `Condvar::notify_all` will only wake up a single thread and requeue the rest + to wait on the associated `Mutex`. This avoids a thundering herd problem + where all threads try to acquire the lock at the same time. + +# Detailed design +[design]: #detailed-design + +The API of `Mutex`, `Condvar`, `RwLock` and `Once` will mostly stay the same. +The only user-visible API changes are the following: + +- `Once` is no longer required to be `'static`. +- `MutexGuard`, `RwLockReadGuard` and `RwLockWriteGuard` will be `Send` if the + underlying type is also `Send`. This allows them to be unlocked from a + different thread than the one that created them. +- `Condvar` is guaranteed not to produce any spurious wakeups. A thread will + only be woken up if its wait times out or if the `Condvar` is notified by + another thread. +- `Condvar` is no longer restricted to being associated with a single `Mutex` + for its entire lifetime. The only restriction is that you cannot wait using + a `Mutex` if there are currently threads waiting on the `Condvar` with a + different `Mutex` (this is the same restriction that pthreads has). This + situation is detected and a panic will be generated. +- `Mutex`, `Condvar` and `RwLock` will have `const fn` constructors and no not + require any drop glue. This makes them suitable for use in `static` variables. +- Calling `RwLock::read` when already holding a read lock may result in a + deadlock if there is a writer thread waiting. Note that this was already the + case in the Windows `RwLock` but it is now explicitly documented. + +The internal parking lot APIs `park`, `unpark_one`, `unpark_all` and +`unpark_requeue` are not publicly exposed in the standard library API. Users +who wish to use these to create their own synchronization primitives should use +the `parking_lot` crate directly. + +# Drawbacks +[drawbacks]: #drawbacks + +`Mutex`, `Condvar` and `RwLock` are no longer simple wrappers around OS primitives. + +The implementation of `parking_lot` is quite complicated because it needs to +support many advanced features like thread requeuing, hardware lock elision and +spin waiting. + +# Alternatives +[alternatives]: #alternatives + +The main alternative is to keep the existing synchronization primitives as they +are, which is essentially wrappers around OS synchronization primitives. This is +undesirable since there are many issues with these, such as the lack of support +for Windows XP or glibc's support for lock elision causing memory safety issues. + +# Unresolved questions +[unresolved]: #unresolved-questions + +None From e38efb66b48019319e1df52b931add95a1db95b3 Mon Sep 17 00:00:00 2001 From: Amanieu d'Antras Date: Fri, 27 May 2016 18:02:57 +0100 Subject: [PATCH 2/4] Add parking_lot benchmark results --- text/0000-parking-lot.md | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/text/0000-parking-lot.md b/text/0000-parking-lot.md index 5b0b28e807c..3ed948c772a 100644 --- a/text/0000-parking-lot.md +++ b/text/0000-parking-lot.md @@ -50,6 +50,11 @@ in the Rust standard library: to wait on the associated `Mutex`. This avoids a thundering herd problem where all threads try to acquire the lock at the same time. +Here are some benchmark results of `parking_lot` synchronization primitives +compare to those of the standard library, showing the rate of lock acquisition: +- [x86_64 Linux](https://gist.github.com/Amanieu/6a4b4151b89b78224992106f9bc4374f) +- [x86_64 Windows](https://gist.github.com/Amanieu/6812507e66c5cbaa6ab5ab04d9c71eac) + # Detailed design [design]: #detailed-design From 98e9d51c8f3bfe938149f2e5ebbf55772b472774 Mon Sep 17 00:00:00 2001 From: Amanieu d'Antras Date: Fri, 27 May 2016 18:28:53 +0100 Subject: [PATCH 3/4] Add benchmark results for AArch64 Linux --- text/0000-parking-lot.md | 1 + 1 file changed, 1 insertion(+) diff --git a/text/0000-parking-lot.md b/text/0000-parking-lot.md index 3ed948c772a..c92a3d5a1fa 100644 --- a/text/0000-parking-lot.md +++ b/text/0000-parking-lot.md @@ -54,6 +54,7 @@ Here are some benchmark results of `parking_lot` synchronization primitives compare to those of the standard library, showing the rate of lock acquisition: - [x86_64 Linux](https://gist.github.com/Amanieu/6a4b4151b89b78224992106f9bc4374f) - [x86_64 Windows](https://gist.github.com/Amanieu/6812507e66c5cbaa6ab5ab04d9c71eac) +- [AArch64 Linux](https://gist.github.com/Amanieu/0f10ea5b2acb2819b75442390f2855f8) # Detailed design [design]: #detailed-design From 2e35f7f78e89b3c3cb11530ffc2b9f5393702a1b Mon Sep 17 00:00:00 2001 From: Amanieu d'Antras Date: Fri, 3 Jun 2016 03:32:07 +0100 Subject: [PATCH 4/4] parking_lot now supports RwLock downgrading --- text/0000-parking-lot.md | 1 + 1 file changed, 1 insertion(+) diff --git a/text/0000-parking-lot.md b/text/0000-parking-lot.md index c92a3d5a1fa..8ac3e838a0c 100644 --- a/text/0000-parking-lot.md +++ b/text/0000-parking-lot.md @@ -49,6 +49,7 @@ in the Rust standard library: 11. `Condvar::notify_all` will only wake up a single thread and requeue the rest to wait on the associated `Mutex`. This avoids a thundering herd problem where all threads try to acquire the lock at the same time. +12. `RwLock` supports atomically downgrading a write lock into a read lock. Here are some benchmark results of `parking_lot` synchronization primitives compare to those of the standard library, showing the rate of lock acquisition: