Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broadcast Comm shell messages on iopub #263

Open
vidartf opened this issue May 24, 2017 · 13 comments
Open

Broadcast Comm shell messages on iopub #263

vidartf opened this issue May 24, 2017 · 13 comments

Comments

@vidartf
Copy link
Contributor

vidartf commented May 24, 2017

According to the overarching messaging philosophy as laid out in the docs (emphasis added),

IOPub: this socket is the ‘broadcast channel’ where the kernel publishes all side effects (stdout, stderr, etc.) as well as the requests coming from any client over the shell socket and its own requests on the stdin socket. [...] In a multi-client scenario, we want all frontends to be able to know what each other has sent to the kernel (this can be useful in collaborative scenarios, for example). This socket allows both side effects and the information about communications taking place with one client over the shell channel to be made available to all clients in a uniform manner.

Following this logic, it would make sense if any incoming Comm messages on the shell socket were broadcast on the IOPub socket as well. @jasongrout pointed out that this might be useful for collaborative ipywidgets.

In chat, @minrk suggested a message type of something like comm_echo.

@jasongrout
Copy link
Member

@jasongrout pointed out that this might be useful for collaborative ipywidgets.

The idea is that when a client's model state is synced to the kernel, it is automatically rebroadcast out to all the other clients, so everyone gets the state update.

@maartenbreddels
Copy link

See also jupyter-widgets/ipywidgets#1218

@maartenbreddels
Copy link

@vidartf as you mentioned in person over video, it cannot be avoided to send back the echo msg to the original front-end source. Why is that the case?

@vidartf
Copy link
Contributor Author

vidartf commented Oct 31, 2017

Basically, when broadcasting on IOPub you cannot choose which clients to include/exclude. At least if I remember the discussion with @minrk correctly.

@maartenbreddels
Copy link

Would be good to know that a design issue in 0mq (like almost 0 prob of being solvable), or something that could be solved on the python side.

@minrk
Copy link
Member

minrk commented Oct 31, 2017

zmq subscriptions are a prefix-matching whitelist. Each SUB socket can subscribe to one or more topics. In Jupyter, we generally subscribe to '', which will receive everything because we don't use zmq topics. What you cannot do is subscribe to "everything BUT x", which is what would be needed for this. If zmq adopted a subscription blacklist, we could do it without much difficulty.

What could theoretically work, given these constraints, would be for IOPub clients to opt-in to every topic they should receive, rather than everything:

  1. initial subscription explicitly opting in to every existing topic except comm echo
  2. notify all peers of new connections
  3. on connection of a peer (can't really detect connections, but we can detect requests), each other peer subscribes to the topic that will be the topic of the new peer's comm-echo
  4. each comm-echo message is sent with a special topic derived from the parent_header's session id.

This is not really feasible, for a variety of reasons:

  1. tracking peers is antithetical to zmq design, and the library makes this very difficult. The kernel would need to keep track of all connected peers. zmq does not notify about connect/disconnect, so we can't track this rigorously. We can track requests, but we cannot know when a peer is gone, so this would be an ever-growing list.
  2. subscriptions take a finite amount of time to propagate, so requiring subscriptions to continually occur over time would inevitably result in lost messages and tricky debugging. Subscribing once at connection time has caused enough difficulty already.
  3. We would need a significant update of the message spec and all kernels just to get started because zmq topics are not part of the current spec, and that first step of "subscribe to everything but comm-echo" is undefined.

But...

The websocket connections between the server and browsers do not have this behavior, though. We do know about connected peers at this level, and could choose to avoid sending specific messages to specific endpoints based on their content and destination. This would be a significant deviation from the current notebook application spec, however, where the websocket messages map 1:1 onto the zmq messages. Doing so would make this not a part of the Jupyter protocol, but an optimization that the notebook server itself makes.

@minrk
Copy link
Member

minrk commented Oct 31, 2017

We also already have a message with this behavior: execute_input is an IOPub message that all peers receive on every execute_request, including the peer that sent the original request. It's a lot harder for these messages to be large, of course, but they have never been a source of trouble that I know of.

Since comm messages coming from the client are much less likely to be large than messages coming from the kernel, what @vidartf discussed with me today sounds good to me:

  1. comm_echo is sent to everybody by default (just like execute_input already is)
  2. kernels can provide a mechanism to opt-out of comm_echo altogether (e.g. comm_manager.echo_enabled = False) for the cases where comm_echo is not needed and is a problem (e.g. know there is only one frontend).

@maartenbreddels
Copy link

Ok, I like to idea of seeing it as an optimization.
@vidartf I doubt there are many situations where large amounts of data flow from front end backend, it's more likely to go the other way around right?

However, if this echoing is implemented, would it mean that every update from the frontend will be echoed back all the time (also in the case of just 1 front and 1 backend)?

@minrk
Copy link
Member

minrk commented Nov 2, 2017

would it mean that every update from the frontend will be echoed back all the time (also in the case of just 1 front and 1 backend)?

Yes, because it cannot be known at the kernel level how many frontends there are.

@vidartf
Copy link
Contributor Author

vidartf commented Nov 2, 2017

I doubt there are many situations where large amounts of data flow from front end backend

There are some exceptions I can think of (e.g. a video stream synced back to the kernel). However, I think most of these scenarios are not very suitable for having many clients connected (several potentially competing data sources all trying to sync back to kernel). I'm guessing that is one scenario when you would want to disable echos.

every update from the frontend will be echoed back all the time

Yes, and this could be optimized by turning of echos, but the overhead in keeping it should be reasonably low as long as the message are small (they will/should be discarded when received after message-level deserialization).

@maartenbreddels
Copy link

A video stream would indeed be a really strong case on implementing this before having a way to not echo back to the originating frontend.

@maartenbreddels
Copy link

Some thoughts:
what about including an 'session_exclude' key in the header (http://jupyter-client.readthedocs.io/en/latest/messaging.html#general-message-format)
For the Comm.send object, we could have an extra echo=True/False argument, the echo argument can be passed to the Session.send method, which will include 'session_exclude' (which will include its own session value in it). When it arrives at the notebook, I guess we can then for each websocket connection check if we should send it or not. There is still a bit of overhead between the notebook server and the kernel, but that would be as bad the overhead between notebook server and browser I think.
This would require:

  • An API change, introducing an echo=True/False argument is quite some methods in ipykernel and jupyter_client.
  • An extra (optional?) entry in the message format header.session_exclude.

@jasongrout
Copy link
Member

Another problem with comm messages and multiple clients is that the spec says:

If the target_name key is not found on the receiving side, then it should immediately reply with a comm_close message to avoid an inconsistent state.

However, if you have two connections to the kernel, and one has the target name and the other doesn't, the one that doesn't have the target name will close the comm down in the kernel. Unfortunately, since comm messages aren't rebroadcast, this also means that the client that did open a comm will have no clue the comm was closed.

jasongrout added a commit to jasongrout/jupyterlab that referenced this issue Aug 1, 2019
The comm message protocol currently has implicit assumptions that only
one kernel connection is handling comm messages. This option allows a
kernel connection to opt out of handling comms.

See jupyter/jupyter_client#263

This fixes the error where ipywidgets stops working if you open a notebook and a console to the same kernel, since the console automatically closes all comms since it does not have the comm target registered.
jasongrout added a commit to jasongrout/jupyterlab that referenced this issue Aug 1, 2019
The comm message protocol currently has implicit assumptions that only
one kernel connection is handling comm messages. This option allows a
kernel connection to opt out of handling comms.

See jupyter/jupyter_client#263

This fixes the error where ipywidgets stops working if you open a notebook and a console to the same kernel, since the console automatically closes all comms since it does not have the comm target registered.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants