Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace synchronization primitives with those from parking_lot #1632

Closed
wants to merge 4 commits into from

Conversation

Amanieu
Copy link
Member

@Amanieu Amanieu commented May 27, 2016

@alilleybrinker
Copy link

Just as a first comment, I'm not sure how the parking_lot primitives can be adopted in favor of the current standard library primitives without violation of Rust's stability guarantees.

@ticki
Copy link
Contributor

ticki commented May 27, 2016

If this can be done in a way that is compatible, I would absolutely love to do this.

@Amanieu
Copy link
Member Author

Amanieu commented May 27, 2016

Just as a first comment, I'm not sure how the parking_lot primitives can be adopted in favor of the current standard library primitives without violation of Rust's stability guarantees.

As mentioned in the RFC, the parking_lot primitives have the same API as the standard ones, so there is no compatibility issue if we replace one with the other.

@nagisa
Copy link
Member

nagisa commented May 27, 2016

I expected to see benchmarks on all tier 1 and at least some tier 2 platforms included into the RFC text as the reason to/not to accept the RFC.

@Amanieu
Copy link
Member Author

Amanieu commented May 27, 2016

@nagisa I added some benchmark results for Linux and Windows. Unfortunately I don't have any other platforms to test on.

@hanna-kruppe
Copy link

hanna-kruppe commented May 27, 2016

I seem to recall from the Webkit article that the parking-lot-style locks are not entirely fair. While I also recall good reasons why this isn't so bad, if this is also true of the Rust crate, I would expect it (as well as any other downsides that may exist — for example, does the crate leak memory as Webkit does?) to be discussed in the RFC.

@nagisa
Copy link
Member

nagisa commented May 27, 2016

I personally haven’t came across a single case where entirely-fair locks were necessary or even desirable. Is there one? I don’t recall us ever guaranteeing this for any locking mechanism in the standard library so far.

Either way, from the same webkit blog post it seems like making locks fair is also an option.

@hanna-kruppe
Copy link

hanna-kruppe commented May 28, 2016

I'm not particularly keen on fair locks either (then again, I'm not sure I'm qualified to judge anyway). I just want all the differences, including the ones that may be irrelevant or drawbacks, documented in the RFC to make it easier to form an opinion without scavenging N articles across the internet. In particular if there are so many minor design decisions where parking_lot may differ from Webkit's implementation.

@Amanieu
Copy link
Member Author

Amanieu commented May 28, 2016

Neither pthread mutexes or Windows SRWLOCK, which is what the standard library primitives are based on, make any fairness guarantees, so there is no difference here.

@comex
Copy link

comex commented May 29, 2016

👍 needing an allocation for each mutex is awful.

@oli-obk
Copy link
Contributor

oli-obk commented May 29, 2016

The rfc does not mention, that Mutex doesn't get poisoned anymore when the thread panics. This can probably be abused to exhibit UB and definitely changes UX

@Amanieu
Copy link
Member Author

Amanieu commented May 29, 2016

I have poisoning support for parking_lot in a branch, and this is the version that would be integrated into the standard library.

I don't plan on adding poisoning support to the parking_lot crate itself because poisoning has a significant performance cost and I feel that it makes the API uglier. By the way, lack of poisoning doesn't make anything unsafe, it just means that user code needs to be aware that locked data may be left in an inconsistent (but safe) state after a panic. This is why AssertUnwindSafe does not require unsafe.

@Amanieu
Copy link
Member Author

Amanieu commented May 29, 2016

As an example, on AArch64, adding support for poisoning makes Mutex 70% slower. Most of this overhead comes from the need to call std::thread::panicking which in turn accesses TLS.

@kennytm
Copy link
Member

kennytm commented May 29, 2016

Is it possible to turn off poisoning when we are linking to the abort panic runtime?

@nagisa
Copy link
Member

nagisa commented May 29, 2016

Abort still runs various handlers before actually quitting process (to my knowledge, at least), therefore all data behind mutex that may be accessed there should also be poisoned.

@ticki
Copy link
Contributor

ticki commented May 29, 2016

@nagisa It only runs the panic hook, nothing else.

@ticki
Copy link
Contributor

ticki commented May 29, 2016

So, it should be possible (abort doesn't call destructors anyways).

@nagisa
Copy link
Member

nagisa commented May 29, 2016

You may have mutexes in statics and the panic hook may run arbitrary code.

@Amanieu
Copy link
Member Author

Amanieu commented May 29, 2016

@nagisa With panic = abort a mutex can never become poisoned because that is done in the destructor of MutexGuard, which will poison the mutex if the current thread is unwinding due to a panic.

@ticki
Copy link
Contributor

ticki commented May 29, 2016

@nagisa The point is that the inner value cannot become poisoned because the destructor won't be called.

@nrc nrc added the T-libs-api Relevant to the library API team, which will review and decide on the RFC. label May 30, 2016
@eternaleye
Copy link

eternaleye commented May 31, 2016

One thing I feel is worth noting - the WordLock used per-bucket in the implementation of the parking lot is a traditional test-and-compare-and-swap lock, a type which degrades badly as cores increase due to cache-coherency traffic[1]. This can be somewhat ameliorated by adding exponential backoff, but the worst case is actually unaffected by that. In the worst-case, where lock hold time is short and core count is above a (realistically reachable) threshold, TACAS locks suffer catastrophic throughput failure[2] compared to MCS, CLH, or other "scalable" locks.

[1] http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.103.8168&rep=rep1&type=pdf
[2] https://pdos.csail.mit.edu/papers/linux:lock.pdf

@Amanieu
Copy link
Member Author

Amanieu commented May 31, 2016

@eternaleye WordLock uses the same basic algorithm as Mutex, which is to spin a few times trying to acquire the lock (with exponential backoff) but then fall back to adding the thread to a queue so that it is woken up later.

@eternaleye
Copy link

@Amanieu: Ah, had missed the backoff. However, I do strongly suggest reading my second link - it shows catastrophic throughput loss for the ticket spinlocks in the Linux kernel of the time, and compares against their MCS lock implementations patched into Linux.

@Amanieu
Copy link
Member Author

Amanieu commented May 31, 2016

@eternaleye

One big assumption that this paper makes is that all "threads" are always running and never scheduled out. This is a perfectly fine assumption to make in the context of a kernel where a "thread" is a raw CPU. However this doesn't necessarily apply in the context a userspace process where the scheduler will arbitrarily suspend threads for a (relatively) long duration.

And this leads to the big issue with queue-based locks: you have no way of knowing whether the next thread in the queue has been scheduled out. If it has then you have one whole timeslice during which nobody owns the lock. If you instead have threads spinning on the lock word waiting for it to get unlocked then you are pretty much guaranteed to pass the lock to a thread that is running.

Now, with that said, the spin waiting strategy in parking_lot is a bit ad-hoc and could use some fine-tuning (read: I made up some numbers and it seemed to give good results). In particular I'm considering reducing the number of times I yield the thread in favor of just queuing up the thread and having it sleep until explicitly woken up.

@arthurprs
Copy link

In order to really discuss this we need benchmarks with the poisoned branch on most tier A platforms.

@Amanieu
Copy link
Member Author

Amanieu commented May 31, 2016

The benchmarks linked in the RFC were done on the poison branch.

@Amanieu
Copy link
Member Author

Amanieu commented Jun 1, 2016

To what extent would this allow the synchronization primitives to be made available with #[no_std]/libcore-only configurations?

There are two reasons why this won't work on #[no_std]:

  1. Condvar waits with timeout use the std::time::{Instant,Duration} types.
  2. Putting a thread to sleep requires an implementation of ThreadParker, which needs to call OS-specific APIs.

@gereeter
Copy link

gereeter commented Jun 1, 2016

A more conservative change would be to just move the OS-specific implementations of ThreadParker into the standard library. Currently, the stardard library always uses the generic implementation of ThreadParker to implement thread::park and thread::Thread::unpark, and Once and channels would be sped up significantly from having faster park implementations.

@gereeter
Copy link

gereeter commented Jun 1, 2016

Additionally, I'd like to see benchmarks against the standard library implementation of Once. The standard library already uses a parking-based algorithm, and it'd be nice to see the difference between using a centralized parking lot and a queue for each Once.

@Amanieu
Copy link
Member Author

Amanieu commented Jun 1, 2016

I expect the performance of Once to be roughly the same in both cases, and also mostly irrelevant since parking threads is only done in the rare slow path.

As for ThreadParker, there are no performance differences between my implementations and just using a Mutex + Condvar. Again, any performance benefit is vastly outweighed by the process of putting a thread to sleep. I only have custom implementations for two reasons: to support Windows XP and to avoid a dependency on the standard library Mutex and Condvar types.

@briansmith
Copy link

Why not make each mutex a full page, and force them to be page-aligned, to minimize cache contention? Shouldn't this be provided, at least, as an option?

Sorry, I meant to write "full cache line" and "cache-line-aligned." AFAICT, it should be really easy to provide both aligned-and-padded and single-byte variants.

@Amanieu
Copy link
Member Author

Amanieu commented Jun 1, 2016

@briansmith What you really want to do is align the mutex and the data it is protecting to a cacheline. This can easily be done by marking Mutex<T> with #[aligned = "64"]. However this has not been implemented in rustc yet.

@Ericson2314
Copy link
Contributor

This would be a nice thing to optionally enable if we gave std some features. Would also making battle-testing much easier.

@CodesInChaos
Copy link

What seems a bit risky with this change is that it not only replaces the implementation, but that it also makes additional promises. This restricts alternative implementations and prevents backing out of the change.

@arthurprs
Copy link

@CodesInChaos the guarantees can always be reduced and this would still give good improvements in the performance side.

Anyway I fell like this is a great proposal, we just need to wait a few months (let parking lot prove itself) before really making a decision.

@Amanieu
Copy link
Member Author

Amanieu commented Jun 3, 2016

The RwLock type in parking_lot now supports atomically downgrading a write lock into a read lock. This fixes rust-lang/rust#32527.

@bstrie
Copy link
Contributor

bstrie commented Jul 12, 2016

Regarding real-world usage of parking_lot, today has brought us http://plhk.ru/trash/netmap-20160710.pdf .

@alexcrichton alexcrichton added the final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. label Jul 19, 2016
@alexcrichton
Copy link
Member

🔔 This RFC is now entering its week-long final comment period 🔔

The libs team felt that this RFC's discussion has reached a standstill now and as a result it's ready to move into FCP. Our feeling matches what I wrote earlier about how parking_lot may not be quite ready just yet to replace all the primitives in the standard library as we feel it needs some mroe time to prove itself out across platforms and implementations.

@ghost
Copy link

ghost commented Jul 25, 2016

Having an implementation of synchronization primitives directly in standard
library has also an additional benefit in the form of more consistent behaviour
across supported platforms. This is not true when building them on top of
OS-specific primitives, for example:

  • On some platforms pthead_mutex_t contains embedded mutex and
    pthread_mutex_init never fails (unless misused). On the others it is merely a
    pointer, and initialization requires additional memory allocation (on top of
    the one that is required to box the handle itself). This is the case for
    example in FreeBSD, OpenBSD, and NetBSD.
  • Statically allocated pthread mutexes are lazy initialized in
    pthread_mutex_lock, this means that this function can also fail, for the same
    reasons that initialization may fail.
  • Non error-checking mutexes are allowed to perform deadlock detection,
    and pthread_mutex_lock could fail with EDEADLK.
  • pthread_cond_timedwait uses system clock by default, and doesn't match the
    documented semantics of Condvar::wait_timeout in case system time is changed
    (workaround contained in standard library doesn't really help when time is
    moved backwards).

... (and many others, not to mention differences between Windows API).

It seems that at some point it would be a really nice thing to have.

@aturon
Copy link
Member

aturon commented Jul 25, 2016

@tmiasko

It seems that at some point it would be a really nice thing to have.

I agree, there are lots of potential benefits to offering these constructs. But it's a pretty big move to make, and for the time being gaining more experience with them externally seems prudent. (In general, it's not 100% clear to me what direction the concurrency story within std should go long term.)

@alexcrichton
Copy link
Member

The libs team got a chance to talk about this RFC this week at libs triage, and our conclusion remains the same as before in that while parking_lot seems like a great crate with some really nice benefits, we'd prefer to wait for it to get more battle tested before moving it wholesale into the standard library. Thanks though for the RFC @Amanieu!

@techtonik
Copy link

Can this still be included as a fallback mechanism to allow Rust to run not only on shiny new Windows?

@bstrie
Copy link
Contributor

bstrie commented Apr 2, 2017

Another blog post mentioning parking_lot: https://blog.mozilla.org/nfroyd/2017/03/29/on-mutex-performance-part-1/

It's been about a year since the crate was first published, so at some point it would be worth thinking about a timeline or otherwise mulling over specific criteria for potentially moving forward on this. Rather than trying to go into the stdlib directly, maybe a more reasonable first step would be to get this into rust-lang-nursery and see where it goes from there.

@Ericson2314
Copy link
Contributor

I can easy imagine adding some Cargo features to make this opt-in (c.f. allocator choice).

@sanmai-NL
Copy link

@aturon: it seems @bstrie’s last post warrants a comment/update from your team’s side, now that about 1,5 years have passed since its decision to hold this off.

@TerjeBr
Copy link

TerjeBr commented Feb 15, 2018

It has been a long time now. Should this pull request be reopened?

@Ericson2314
Copy link
Contributor

The work of the up-and-comming portability time this year will hopefully allow this for free.

@faern
Copy link
Contributor

faern commented Apr 22, 2019

Work on this has been going on for quite a while over at rust-lang/rust#56410.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. T-libs-api Relevant to the library API team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.