Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Async widget methods deadlock due to on_msg not being called #646

Open
ilyabo opened this issue Apr 22, 2021 · 11 comments
Open

Async widget methods deadlock due to on_msg not being called #646

ilyabo opened this issue Apr 22, 2021 · 11 comments

Comments

@ilyabo
Copy link

ilyabo commented Apr 22, 2021

We are developing a Jupyter widget which makes it possible to embed interactive Unfolded maps into notebooks:

image

Our widget is embedded via an iframe and so the communications with the JavaScript side are asynchronous. Some of the functions we want to add to the widget need to return data asynchronously. To communicate between Jupyter and JS we are using widget's send and on_msg methods. Our methods return futures. It works fine - the messages are being sent and received, but only if we do it in two steps:

image

When we are trying to use await so that we can have the results and do something with them within one cell, the execution is blocked forever:

image

We tried to investigate the cause. It turns out that the callback we register with widget.on_msg only gets called once the execution of the cell is finished. Hence, if we wait until we obtain the result of the calculation within the same cell where the method was called, we end up in a deadlock.

So far we've had no success in finding a workaround, despite trying many things. For instance, here we run the communication in a separate thread and wait until the query response is added to a blocking queue:
image

Calling the above function fails (after the 2 seconds timeout) with an error saying that the queue is empty:
image
And we don't see the _receive function being called.

However, if we check the queue immediately after getting this error, we can see that the queue is not empty - the response was added to it, apparently, just after the error caused the cell execution to be cancelled:
image

It might be the same issue as this one: ipython/ipython#12786
but I am not completely sure. I tried installing and running the kernel with the fix from this PR, but I was getting errors related to traitlet messaging when trying to instantiate our widget.

@MSeal
Copy link
Contributor

MSeal commented Apr 23, 2021

I believe this issue is root caused by the mentioned issue / PR and that threading against messages won't solve the issue because the Comms message buffer is deadlocked with the cell execution await.

I also was digging through some issues one of our clients had around widgets and they eventually accidentally solved a similar in-line result wait problem by separating the widget value fetch and the request into two cell like above in a more complicated setting. However it's easy to fall back into the deadlock if their notebook widget use is ever refactored or shared in a new notebook.

@SylvainCorlay @jasongrout @Carreau Since this overlaps with widget interactions a bit. Can we determine if the two issues are the same / what in the proposed PR linked needs additional attention to potentially address the root problem? If there's testing or communication patterns that need to be evaluated / documented around particular widgets that might be impacted I could put a little time to contribute there or in ipykernel PRs if it helps.

@SylvainCorlay
Copy link
Member

This possibility of a deadlock is somewhat intrinsic to the protocol. You won't process a comm message until you are finished processing an execution request...

In #589, @sonthonaxrk proposed that comm messages could be handled right away, even before other shell messages are finished processing. However, this is probably too disruptive of the protocol.

We had a couple of discussions about this lately, and another idea that was envisioned was to use the same mechanism as the input request to make really blocking calls to the front-end.

@ilyabo
Copy link
Author

ilyabo commented Apr 23, 2021

Thanks @SylvainCorlay! Could you elaborate on how really blocking calls would work? Would it solve our issue? For our API we would actually prefer if the calls were blocking until the results are there so that the users don't need to use await.

@SylvainCorlay
Copy link
Member

They would use the same principle as when you type input() in a cell currently.

@ilyabo
Copy link
Author

ilyabo commented Apr 23, 2021

Would this be applied to any function returning a future?

@sonthonaxrk
Copy link

@ilyabo what were the traitlet errors?

I’m a bit busy with real work right now, but I’ve been thinking of a way this could be solved with greenlet, and just doing a hard switch of the stack when the comm message returns.

@ilyabo
Copy link
Author

ilyabo commented Apr 24, 2021

Thanks @sonthonaxrk!

Here's the error:

You can try for yourself, our widget is public:

pip install unfolded.map-sdk
from unfolded.map_sdk import UnfoldedMap
unfolded_map = UnfoldedMap(
  mapUUID='ae4f5345-8507-49ec-85f6-a1f8c7bc2219'
)
unfolded_map

@sonthonaxrk
Copy link

sonthonaxrk commented Apr 25, 2021

Cheers, I'll probably have the chance to take a look this week if it's related to the async kernel changes.

@ilyabo
Copy link
Author

ilyabo commented May 10, 2021

@sonthonaxrk Did you have a chance to look into it?

@saraedum
Copy link

saraedum commented Oct 7, 2021

@sonthonaxrk could you elaborate a bit on how issues like this might be worked around with greenlet?

@ilyabo
Copy link
Author

ilyabo commented Oct 7, 2021

This library offers a solution which can be used until the issue is solved in ipykernel:
https://github.com/Kirill888/jupyter-ui-poll

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants