Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated spawner logging to improve debugging and traceability. #8

Merged
merged 2 commits into from
Mar 20, 2020

Conversation

arthurian
Copy link
Member

This PR updates spawner.py. The changes are related to logging and checking whether a notebook is running.

When issues occur, it can be difficult to determine the cause since logging is a bit ad-hoc and has been added over time. The goal is to streamline the logging in such a way that it is easier to trace/debug spawner activity on a per user basis and make it possible to correlate the data with other metrics.

Streamlining logging

  • Added self.log_user() to the spawner, which automatically logs the user ID. By default, this will log an INFO message unless otherwise specified.
  • Updated the existing logging messages to make it easier to trace the various asynchronous calls that are happening for every user. In particular, there's a lot of noise around the poll() function.

Checking if a worker notebook is running on the server:

This change is slightly speculative, but given that the manager must SSH into every worker instance to verify that jupyter is running, the is_notebook_running() check should be as efficient as possible.

Currently, we run ps -ef | grep jupyterhub-singleuser which spawns 2 processes joined by a pipeline. That should work fine 99.999% of the time, but since we're seeing high loads when users are training their models, it might be worth switching to nice -5 pgrep -a -f jupyterhub-singleuser. This should be functionally equivalent, except that it executes one process with a slightly higher priority.

@joshuagetega @dodget

@arthurian arthurian requested a review from joshuagetega March 18, 2020 18:26
Copy link

@dodget dodget left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good @arthurian 👍

Comment on lines +124 to +127
def log_user(self, message='', level=logging.INFO):
user = self.user.name if self.user else None
log_message = "[user:%s] %s" % (user, message)
self.log.log(level, log_message)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What an excellent, simple helper.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very handy, indeed!

return None #its up!
else:
self.log.debug("Poll, notebook is not running for user %s" % self.user.name)
self.log_user("poll: notebook is NOT running")
return "server up, no instance running for user %s" % self.user.name

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @arthurian, I just noticed a possible typo here. It should probably say "Server up, no notebook running for user ..." instead of "Server up, no instance running for user...". This was there from before, not introduced by you. Just pointing it out.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think you're right about that.

Copy link

@joshuagetega joshuagetega left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works great, @arthurian. I have successfully tested it on the miniconda sandbox cluster.

Changes include increasing the poll interval and increasing the number of attempts the is_notebook_running() function will be called with from within the poll() function. This is meant to reduce the chances of the poll() function wrongly determining that the jupyterhub-singleuser process is not running on a worker instance when it actually is.
@joshuagetega joshuagetega merged commit 9942107 into develop Mar 20, 2020
@joshuagetega joshuagetega deleted the spawner-logging branch March 20, 2020 11:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants