Updated spawner logging to improve debugging and traceability. #8

arthurian · 2020-03-18T18:26:36Z

This PR updates spawner.py. The changes are related to logging and checking whether a notebook is running.

When issues occur, it can be difficult to determine the cause since logging is a bit ad-hoc and has been added over time. The goal is to streamline the logging in such a way that it is easier to trace/debug spawner activity on a per user basis and make it possible to correlate the data with other metrics.

Streamlining logging

Added self.log_user() to the spawner, which automatically logs the user ID. By default, this will log an INFO message unless otherwise specified.
Updated the existing logging messages to make it easier to trace the various asynchronous calls that are happening for every user. In particular, there's a lot of noise around the poll() function.

Checking if a worker notebook is running on the server:

This change is slightly speculative, but given that the manager must SSH into every worker instance to verify that jupyter is running, the is_notebook_running() check should be as efficient as possible.

Currently, we run ps -ef | grep jupyterhub-singleuser which spawns 2 processes joined by a pipeline. That should work fine 99.999% of the time, but since we're seeing high loads when users are training their models, it might be worth switching to nice -5 pgrep -a -f jupyterhub-singleuser. This should be functionally equivalent, except that it executes one process with a slightly higher priority.

@joshuagetega @dodget

dodget

Looks good @arthurian 👍

dodget · 2020-03-18T18:41:06Z

jupyterhub_files/spawner.py

+    def log_user(self, message='', level=logging.INFO):
+        user = self.user.name if self.user else None
+        log_message = "[user:%s] %s" % (user, message)
+        self.log.log(level, log_message)


What an excellent, simple helper.

Very handy, indeed!

joshuagetega · 2020-03-19T13:27:52Z

jupyterhub_files/spawner.py

                        return None #its up!
                    else:
-                        self.log.debug("Poll, notebook is not running for user %s" % self.user.name)
+                        self.log_user("poll: notebook is NOT running") 
                        return "server up, no instance running for user %s" % self.user.name


Hi @arthurian, I just noticed a possible typo here. It should probably say "Server up, no notebook running for user ..." instead of "Server up, no instance running for user...". This was there from before, not introduced by you. Just pointing it out.

Yeah I think you're right about that.

joshuagetega

Works great, @arthurian. I have successfully tested it on the miniconda sandbox cluster.

Changes include increasing the poll interval and increasing the number of attempts the is_notebook_running() function will be called with from within the poll() function. This is meant to reduce the chances of the poll() function wrongly determining that the jupyterhub-singleuser process is not running on a worker instance when it actually is.

Updated spawner logging to improve debugging and traceability.

bd8d8da

arthurian requested a review from joshuagetega March 18, 2020 18:26

dodget approved these changes Mar 18, 2020

View reviewed changes

joshuagetega reviewed Mar 19, 2020

View reviewed changes

joshuagetega approved these changes Mar 19, 2020

View reviewed changes

joshuagetega merged commit 9942107 into develop Mar 20, 2020

joshuagetega deleted the spawner-logging branch March 20, 2020 11:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updated spawner logging to improve debugging and traceability. #8

Updated spawner logging to improve debugging and traceability. #8

arthurian commented Mar 18, 2020

dodget left a comment

dodget Mar 18, 2020

joshuagetega Mar 19, 2020

joshuagetega Mar 19, 2020

arthurian Mar 19, 2020

joshuagetega left a comment

Updated spawner logging to improve debugging and traceability. #8

Updated spawner logging to improve debugging and traceability. #8

Conversation

arthurian commented Mar 18, 2020

Streamlining logging

Checking if a worker notebook is running on the server:

dodget left a comment

Choose a reason for hiding this comment

dodget Mar 18, 2020

Choose a reason for hiding this comment

joshuagetega Mar 19, 2020

Choose a reason for hiding this comment

joshuagetega Mar 19, 2020

Choose a reason for hiding this comment

arthurian Mar 19, 2020

Choose a reason for hiding this comment

joshuagetega left a comment

Choose a reason for hiding this comment