Skip to content

Conversation

@parente
Copy link
Member

@parente parente commented May 17, 2017

Additional logic to support culling or skipping kernels that are busy or that have connections.

Follow-on to #2215.

/cc @kevin-bates

Additional logic to support culling or skipping kernels that are busy or
that have connections.
@rgbkrk
Copy link
Member

rgbkrk commented May 17, 2017

What a nice self cleaning notebook server.

@parente
Copy link
Member Author

parente commented May 17, 2017

✨ 📓 ✨

@kevin-bates
Copy link
Member

@parente - thanks for adding this. I'm pretty sure I understand the use case for the cull-busy flag (to allow a single cell's execution to exceed the idle period - correct?) but I guess I'm not following the cull-connected case. I figured the kernel was always in a connected state. Seems like this case would be covered by the prior culling feature - in conjunction with cull-busy. Could you please clarify the use cases for these flags? Just trying to minimize knobs here. Thanks!

@parente
Copy link
Member Author

parente commented May 18, 2017

The connection count tracked by the API is actually how many clients are connected to the kernel via the notebook API. My thinking is that in some situations, you may not want to cull a kernel that has a browser tab open pointing at the associated notebook, even if the kernel is idle (e.g., kiosk mode). In others, you may want to cull regardless of whether there are connected clients or not (e.g., users at work forget they have tabs open). cull_connected is the knob to handle these two use cases. cull_busy handles the situation you described above.

At work, I plan to configure our JupyterHub instances to cull_busy=False, cull_connected=True such that users with long running jobs (e.g., Spark) won't lose work if they exceed the idle period, but browser tabs left open over the weekend won't keep kernels alive once they go idle.

@kevin-bates
Copy link
Member

Thank you for the great explanation. In light of this, I'm wondering if cull_busy=False isn't a better default. I understand that that would be contrary to the current functionality, but given culling isn't officially available it seems like a good time to make that kind of change.

@parente
Copy link
Member Author

parente commented May 18, 2017

I was thinking the same thing about the new cull_busy default and agree since this isn't even released yet, now's the time to change it. I'll send a new commit in a moment.

@parente
Copy link
Member Author

parente commented May 22, 2017

Thanks for the review @rgbkrk and @kevin-bates.

I'm not up to speed on how we're merging for 5.1 (or not) in the notebook repo. Label it 5.1 and wait for a merge?

@takluyver takluyver added this to the 5.1 milestone May 22, 2017
@takluyver
Copy link
Member

@parente yes, that's pretty much it. And if it goes a few days without further review, remind people about it.

If you're confident enough in a change and you have merge permissions, you can also ask for more comments and announce that you'll merge it yourself in a couple of days unless people ask you to wait longer.


cull_connected = Bool(True, config=True,
help="""Whether to consider culling kernels which have one or more connections.
Only effective if cull_interval is not 0."""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean cull_idle_timeout in both of these messages? Looking at the code, setting cull_interval to 0 produces a warning and resets it to 300.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch - yes, the descriptions of each should reference cull_idle_timeout.

@parente
Copy link
Member Author

parente commented May 22, 2017

Thanks for the process heads-up. I've fixed the command line help. I'll leave the PR open for a couple days like you suggest and then merge it.

Copy link
Member

@minrk minrk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks! I'd have a slight inclination to default both of the flags to False, so that the default is only that truly idle kernels are culled (no connections and no executions). And each more aggressive knob is opt-in.

help="""The interval (in seconds) on which to check for idle kernels exceeding the cull timeout value."""
)

cull_connected = Bool(True, config=True,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe default to the most conservative False for both of these?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@minrk I understand taking a conservative approach but I believe that a majority of administrators that enable culling in the first place (the feature is off by default) would want connected kernels culled - provided they've exceeded the idle duration and are not busy for that period. That said, I suppose administrators will review the four culling parameters when enabling, but still feel cull_connected = True is the appropriate default - IMHO.

@minrk
Copy link
Member

minrk commented May 25, 2017

Not executing code but with active users still looking at it doesn't seem idle to me. Killing a kernel for a currently open and active page seems pretty aggressive for a default.

@kevin-bates
Copy link
Member

Given that the idle period will typically be on the order of multiple hours I seriously doubt users will still be looking at the page having not done anything during that period. However, since we don't default the idle time (only recommend) and given that admins may choose a timeout value in line with a human's window of patience, your example makes sense.
Thanks for the discussion.

@minrk minrk merged commit 1e18e05 into jupyter:master May 27, 2017
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 7, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants