-
-
Notifications
You must be signed in to change notification settings - Fork 373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Async widget methods deadlock due to on_msg not being called #646
Comments
I believe this issue is root caused by the mentioned issue / PR and that threading against messages won't solve the issue because the Comms message buffer is deadlocked with the cell execution await. I also was digging through some issues one of our clients had around widgets and they eventually accidentally solved a similar in-line result wait problem by separating the widget value fetch and the request into two cell like above in a more complicated setting. However it's easy to fall back into the deadlock if their notebook widget use is ever refactored or shared in a new notebook. @SylvainCorlay @jasongrout @Carreau Since this overlaps with widget interactions a bit. Can we determine if the two issues are the same / what in the proposed PR linked needs additional attention to potentially address the root problem? If there's testing or communication patterns that need to be evaluated / documented around particular widgets that might be impacted I could put a little time to contribute there or in ipykernel PRs if it helps. |
This possibility of a deadlock is somewhat intrinsic to the protocol. You won't process a comm message until you are finished processing an execution request... In #589, @sonthonaxrk proposed that comm messages could be handled right away, even before other shell messages are finished processing. However, this is probably too disruptive of the protocol. We had a couple of discussions about this lately, and another idea that was envisioned was to use the same mechanism as the input request to make really blocking calls to the front-end. |
Thanks @SylvainCorlay! Could you elaborate on how really blocking calls would work? Would it solve our issue? For our API we would actually prefer if the calls were blocking until the results are there so that the users don't need to use |
They would use the same principle as when you type |
Would this be applied to any function returning a future? |
@ilyabo what were the traitlet errors? I’m a bit busy with real work right now, but I’ve been thinking of a way this could be solved with greenlet, and just doing a hard switch of the stack when the comm message returns. |
Thanks @sonthonaxrk! You can try for yourself, our widget is public:
|
Cheers, I'll probably have the chance to take a look this week if it's related to the async kernel changes. |
@sonthonaxrk Did you have a chance to look into it? |
@sonthonaxrk could you elaborate a bit on how issues like this might be worked around with greenlet? |
This library offers a solution which can be used until the issue is solved in ipykernel: |
We are developing a Jupyter widget which makes it possible to embed interactive Unfolded maps into notebooks:
Our widget is embedded via an iframe and so the communications with the JavaScript side are asynchronous. Some of the functions we want to add to the widget need to return data asynchronously. To communicate between Jupyter and JS we are using widget's
send
andon_msg
methods. Our methods returnfuture
s. It works fine - the messages are being sent and received, but only if we do it in two steps:When we are trying to use
await
so that we can have the results and do something with them within one cell, the execution is blocked forever:We tried to investigate the cause. It turns out that the callback we register with
widget.on_msg
only gets called once the execution of the cell is finished. Hence, if we wait until we obtain the result of the calculation within the same cell where the method was called, we end up in a deadlock.So far we've had no success in finding a workaround, despite trying many things. For instance, here we run the communication in a separate thread and wait until the query response is added to a blocking queue:
Calling the above function fails (after the 2 seconds timeout) with an error saying that the queue is empty:
And we don't see the
_receive
function being called.However, if we check the queue immediately after getting this error, we can see that the queue is not empty - the response was added to it, apparently, just after the error caused the cell execution to be cancelled:
It might be the same issue as this one: ipython/ipython#12786
but I am not completely sure. I tried installing and running the kernel with the fix from this PR, but I was getting errors related to traitlet messaging when trying to instantiate our widget.
The text was updated successfully, but these errors were encountered: