Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Machinery for translating pipe stream into messages #251

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

takluyver
Copy link
Member

This is the initial piece for capturing output from the kernel at the OS level, by hooking stdout and stderr up to pipes.

Kernels wanting to make use of this machinery will need to delimit the stream output resulting from executing a cell with markers containing the execute message ID:

b"""
\0JUPYTER STREAM BEGIN d853e19e-012f-438b-ac71-6970f49a53b7
bar
baz
\0JUPYTER STREAM END d853e19e-012f-438b-ac71-6970f49a53b7
"""

These are used to set the message id in the parent header of the messages generated, so that a frontend can associate them with the relevant cell. The end marker is translated into an EndOfOutput object - once the frontend has seen one of these from each captured stream, plus the kernel-idle message on iopub, it knows it has all output produced while that cell was executing. The cell may have started background threads, and output from those may be lost or misdirected, but that's already the case, and I think it's unavoidable for piped data. I believe this machinery should recover if e.g. one of the markers it looks for isn't there.

Question: at present, output outside the marked blocks (after one END, before the next BEGIN) is silently discarded. Do we want to expose it to the frontend in some way, e.g. as stream output messages without a parent header?

@minrk
Copy link
Member

minrk commented Apr 4, 2017

Do we want to expose it to the frontend in some way, e.g. as stream output messages without a parent header?

I think so, or preserve the current behavior of associating it with the most recently seen parent.

def new_stream_msg(self, text, parent_id):
msg = {}
header = msg_header(new_id(), 'stream', self.username, self.session_id)
parent_header = msg_header(parent_id, 'execute_request',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should be constructing parent_headers here. The actual request header should be arriving here somehow.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I deliberately did it this way so that this machinery doesn't need to see the messages going to the kernel - I think it's going to be simpler to integrate into applications that way, and as far as I can think, frontends only rely on the parent message ID to know what a message is a response to.

If the application does need the full parent header, a separate piece further up the stack could keep a cache of message headers sent to the kernel, and apply them to the messages coming from this machinery by matching the IDs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'execute_request' is often the wrong value for msg_type, though. The parent_header needs to match the msg_type and session in the original requests (it really should match all fields, really, but those are especially important).

Maybe when a new request is seen, it gets forwarded on this pipe?

I thought with kernel nanny the Nanny was going to proxy all requests, so that it would necessarily see every request as it passes by, so the msg_id could be looked up in the recent requests that passed through.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've lost track of what we actually wanted from the kernel nanny, so I thought I'd try to work on this bit separately, because at least I understand it. I thought we were heading in the direction of allowing a Python based process such as the notebook server integrate the nanny functionality without running a separate process, so I'd start by doing output capturing without the nanny inside one or more of our frontends. At the moment, I can't even think of an easy way to integrate it, though, because it requires changes in multiple repositories.

msg['msg_type'] = header['msg_type']
msg['parent_header'] = parent_header
msg['content'] = {u'name': self.stream_name, u'text': text}
msg['metadata'] = {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be simplified with msg = session.msg(msg_type, content=content, parent_header=parent_header)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants