-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use safepoint to deliver SIGINT #16174
Conversation
// that throws a SIGINT. | ||
jl_gc_state_save_and_set(eh->gc_state); | ||
if (eh->defer_signal == 0) { | ||
jl_sigint_safepoint(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm fairly certain we don't actually want catch
to be a safepoint (now that it's avoidable)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sigint safepoint is kind of avoidable (by reenable the gc safepoint, edit: however, this would be expensive...) I don't think it's valid to avoid the GC safepoint in the gc state restore though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's fine. i just don't think we want to throw an InterruptException before the catch block has a chance to install a try/catch of its own.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't really provide the grantee that an async signal will not be thrown between a catch
and a try
though, unless we garentee it can only be compiled and not interpreted. The GC safepoint can also throw the exception even when there's not a sigint safepoint.
Suppose I have a Also, if you have multiple |
A callback can still throw exception so sigatomic_begin/end are still useful. I made it fast because of this... The sigint is always delivered to current task on the master thread if it doesn't block sigint (I realized that I might have missed a pending sigint check there, similar to the one in en_restore_state). However, you can block the sigint on the task you don't want it to be delivered by calling sigatomic_begin, is this good enough? |
I'm still confused by the callback situation in this PR. If I want |
For julia callback from C, I think a much faster sigatomic is as far as what this approach can provide. For the task issue, I'm wondering if we can make non-root tasks sigatomic by default? This way any frontend will need to explicitly reenable sigint on tasks that want to handle sigint. |
A sigint can be delivered in the callback if the C caller wasn't called in sigatomic region. It is a little unreliably though since in principle a sigint can be delivered before the try block starts so I think to do this reliably you need to call the C function with sigatomic and reenable sigatomic in the callback after you started the try block. |
I'm not sure if non-root tasks should be sigatomic by default. But it would be nice to have an option to make the task sigint-unaware when it is created. e.g. I'm not sure how calling And what if several tasks are blocking on I/O, and I want a SIGINT to wake up a certain task to catch the n |
Yes, that's what I'm doing now. |
You can call sigatomic_begin without calling end to permanently block sigint on a task. Doing that at initialization time is also possible since I accidentally made defer_signal a property of the task (instead of just save it on the stack on task switch....) |
1579cfe
to
033cfc1
Compare
What is the sigatomic state of a new Task? e.g. sigatomic_begin()
@async println("am I sigatomic?")
sigatomic_end()
yield() |
@yuyichao, |
sigatomic is now thread and task local. New task currently (on this branch) inherit the state from when it's started... since I haven't set fix start_task for this yet... |
23088cc
to
1b9c1a2
Compare
Travis linux32 failure is #15920 |
246654c
to
d6035ef
Compare
while (!ptls->gc_state) { | ||
// FIXME: The acquire load pairs with the release store | ||
// in the signal handler of safepoint so we are sure that | ||
// all the memory write on those threads are visible. However, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
memory writes? memory written ... is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe store
is the right word?
d6035ef
to
88f7a52
Compare
Depending on the caller these functions can be called from unmanaged threads that cannot touch GC state or having GC safepoint.
Add note and function prototypes about using safepoint to deliver SIGINT.
Preparing for using the safepoint to deliver SIGINT, which requires distinguish between master and worker threads.
* Remove unnecessary sigatomic * Make flisp calls sigatomic * Make type inference calls sigatomic * Refactor interthread communication through signal * Make sure `sleep` is aborted on `SIGINT` on Linux to deliver the exception faster * Implement force signal throwing when `SIGINT` arrives too frequently * Hack to abort io syscall on `SIGINT` Fix #1468; Fix #2622; Towards #14675
Also add tests.
88f7a52
to
c589e45
Compare
It seems like the documentation of the |
accum_weight = 0; | ||
return 0; | ||
} | ||
double new_weight = accum_weight * exp(-(dt / 1e9)) + 0.3; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clearly written by a physicist :)
I've also noticed one case where |
@@ -267,6 +268,7 @@ static jl_ast_context_t *jl_ast_ctx_enter(void) | |||
|
|||
static void jl_ast_ctx_leave(jl_ast_context_t *ctx) | |||
{ | |||
JL_SIGATOMIC_END(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This breaks --lisp
; segfaults.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah... ok.... will fix later today....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And in case you need it now, I think making jl_sigint_safepoint noop should work around it for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 417e49a
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be tested if breaking it matters
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
run(pipeline(DevNull,
$(Base.julia_cmd()) --lisp, DevNull))
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
something like that. might need to be unix only, would need to run it through CI first
This PR broke the Windows buildbots: https://build.julialang.org/builders/package_win6.2-x64/builds/247/steps/make/logs/stdio Similar to #11727, but now bootstrap freezes when output is redirected instead of just being unable to run tests. @vtjnash @yuyichao please advise, we'll have to revert this if no solution can be found. |
The only commit that touches libuv is the last one and reverting it seems to get rid of the problem for me locally (tested by piping it through |
Something is fishy here. The buildbot seems to have been printing |
If you follow that issue, you'll discover it's an upstream bug where libuv forgot to initialize the synchronization object. I think it then corrupts its internal state and tries to report an error that didn't happen by forgetting about a write req that did happen. |
Which issue? I didn't see any mentioning of synchronization object in #11727? |
you have to follow it somewhat far. there's a more concise explanation at libuv/libuv#629, in which I demonstrate how it would make nodejs to hang (for the same reason) in a one-liner. |
Ah, that one, I somehow think you are talking about |
I think this broke the arm build too: https://build.julialang.org/builders/package_tarballarm/builds/227/steps/make%20binary-dist/logs/stdio |
I believe it's caused by a llvm bug (probably similar to https://llvm.org/bugs/show_bug.cgi?id=27545) easy to workaround by removing an optimization, need to first check if it's fixed on llvm svn... |
This implements @vtjnash 's idea of delivering sigint in a safer way, without having to add sigatomic to everywhere. This also solves a few performance issues along the way. A brief summary of the differences,
Make GC safepoint thread local
And create 3 signal pages. See the comment in
safepoint.c
for detail.This have a very minor performance hit for accessing safepoint. However, I think we can avoid the safepoint and GC transition around gcframe setup and gcframe pop on x86 so this shouldn't be a big issue. (On ARM, the atomic release store is more expensive than two GC safepoint so we may use a normal store with two safepoint instead, the performance hit of making the safepoint thread local is still very minor in this case though (~5-10% for empty GC frame push/pop, cheaper than the tls getter call....)).
Use the GC safepoint mechanism (i.e. SegFault) to deliver InterruptException at known points
This automatically makes
ccall
of external librarysigatomic
. (Fix Segfault when aborting matrix multiplication #1468, Fix make ccall sigatomic (defer SIGINT handling) #2622). In order to avoid waiting too long before the exception is delivered when running unmanaged code, this also implement waking up libuv, abort syscall and abort libsupport io (this use a hack but can be generalized if needed) when sigint arrives. This also implement force throwing of exception if SIGINT arrives too frequently so that one can useCtrl-C
to force abort some dead loop. A warning will be printed in such case since it bypass the safe path and even sigatomic.The most important consequence IMHO is that pressing
Ctrl-C
during sysimg compilation should no longer segfault =)This should also make it easier to implement signal handling #14675.
Make parser and type inference calls sigatomic
sigatomic is much cheaper (see below) now so it is now used to protect important runtime code to not be interrupted by
Ctrl-C
.Clean up safepoint and sigatomic, optimize try-catch and
sigatomic
Split out
safepoint.c
, makedefer_signal
thread local and avoid the expensive atomic opsMake
defer_signal
task local and automatically restore on expection. This eliminate the try-catch needed before indisable_sigint
.This is technically breaking but
sigatomic_begin
andsigatomic_end
aren't exported.Inline
jl_sigatomic_begin
andjl_sigatomic_end
in codegen.Benchmarking try-catch and
disable_sigint
Timing before
Timing after
Close #12333
Close #12309