-
-
Notifications
You must be signed in to change notification settings - Fork 595
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement ctl::shared_ptr #1229
base: master
Are you sure you want to change the base?
Conversation
This code does not segfault and it compiles cleanly, and it took a while to get here.
__::shared_pointer::make takes ownership of p the same way as shared_ptr does: either the allocation succeeds and the returned control block owns it, or the allocation throws and the unique_ptr frees it. Note that this construction is safe since C++17, in which the evaluation of constructor arguments is sequenced after a new-expression allocation.
ctl is overloaded...
} | ||
|
||
// TODO(mrdomino): exercise more of API | ||
// TODO(mrdomino): threading stress-test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'll find numerous threading torture tests throughout the codebase, since it's sadly the best thing we have without TSAN.
incref(_Atomic(size_t)* r) | ||
{ | ||
size_t r2 = atomic_fetch_add_explicit(r, 1, memory_order_relaxed); | ||
if (r2 > ((size_t)-1) >> 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we realistically expect 2**63 references to happen? Probably not worth the bloat. This function is basically just an XADD
instruction. I'd say put it in the header. The atomics header in cosmo libc is pretty lightweight. It's a natural dependency of this class.
decref(_Atomic(size_t)* r) | ||
{ | ||
if (!atomic_fetch_sub_explicit(r, 1, memory_order_release)) { | ||
atomic_thread_fence(memory_order_acquire); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how I feel about atomic_thread_fence()
. I just tried using a similar trick to this for my syncthreads() function on an M2 processor and it made things slower. For example:
void
syncthreads(void)
{
static atomic_uint count;
static atomic_uint phaser;
int phase = atomic_load_explicit(&phaser, memory_order_relaxed);
if (atomic_fetch_add_explicit(&count, 1, memory_order_acq_rel) + 1 == nth) {
atomic_store_explicit(&count, 0, memory_order_relaxed);
atomic_store_explicit(&phaser, phase + 1, memory_order_release);
} else {
int backoff = 0;
while (atomic_load_explicit(&phaser, memory_order_acquire) == phase)
backoff = delay(backoff);
}
}
Became:
void
syncthreads(void)
{
static atomic_uint count;
static atomic_uint phaser;
int phase = atomic_load_explicit(&phaser, memory_order_relaxed);
if (atomic_fetch_add_explicit(&count, 1, memory_order_release) + 1 == nth) {
atomic_thread_fence(memory_order_acquire);
atomic_store_explicit(&count, 0, memory_order_relaxed);
atomic_store_explicit(&phaser, phase + 1, memory_order_release);
} else {
int backoff = 0;
while (atomic_load_explicit(&phaser, memory_order_acquire) == phase)
backoff = delay(backoff);
}
}
One nasty thing about atomic_thread_fence()
is it isn't supported by TSAN at all. So it's harder to prove code is correct.
How certain are you that this decref() implementation is optimal? Could you whip up a torture test + benchmark that demonstrates its merit in comparison to alternatives? I would have assumed that that doing an acquire load beforehand to check for zero would be faster, since it'd let you avoid the costly XADD
instruction in many cases.
No description provided.