Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adjust Idle culler settings and add internal culling #1133

Merged
merged 10 commits into from
Mar 3, 2022

Conversation

viniciusdc
Copy link
Contributor

@viniciusdc viniciusdc commented Mar 1, 2022

Fixes | Closes | Resolves #974

This PR adds come extra configuration to the Jupyterhub idle culling system, right now it's enabled by default from jupyterhub values.yaml helm chart. This PR includes a new block of config for the culling system under the same yaml file.

As the user Jupyterlab server has an odd behavior when running alongside Jupyterhub, the WebSocket connection created for the window browser access to the user server generates false positives of the connected server (affecting the idle state) (see related topic here). To fix this issue, I've enabled the internal culling system for the Jupyterlab user server, which will cull idle kernels and terminals (see relevant information about the kernel inability to be correctly tracked sometimes, here)

As extra information about this general process, I recommend this comment here, explaining why using both internal and external idle culling services is a good path to go (see).

Changes introduced in this PR:

  • Add explicitly configuration to adjust default behavior of idle culler
  • Add Jupterhub internal culler setting

Types of changes

What types of changes does your PR introduce?

Put an x in the boxes that apply

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds a feature)
  • Breaking change (fix or feature that would cause existing features to not work as expected)
  • Documentation Update
  • Code style update (formatting, renaming)
  • Refactoring (no functional changes, no API changes)
  • Build related changes
  • Other (please describe):

Testing

Requires testing

  • Yes
  • No

In case you checked yes, did you write tests?

Documentation

Does your contribution include breaking changes or deprecations?
If so have you updated the documentation?

  • Yes, docstrings
  • Yes, main documentation
  • Yes, deprecation notices

Further comments (optional)

  • Needs further testing, just to confirm the timeouts max period until pod deletion

Extra outputs from user pod, internal idle culling system:

  • Because c.AppServer.... was used, the pod was terminated after the idle timeout was reached:
│ notebook [I 2022-03-01 08:06:11.171 SingleUserNotebookApp manager:91] [nb_conda_kernels] enabled, 2 kernels found                                                                                                                           
│ notebook [I 2022-03-01 08:06:11.191 SingleUserNotebookApp notebookapp:1593] Authentication of /metrics is OFF, since other authentication is disabled.    
...
│ notebook [I 2022-03-01 08:06:12.514 SingleUserNotebookApp notebookapp:2075] Will shut down after 240 seconds with no kernels or terminals.                                                                                                  │
│ notebook [I 2022-03-01 08:06:12.518 SingleUserNotebookApp mixins:576] Starting jupyterhub-singleuser server version 1.5.0                                                                                                                   │
...
│ notebook [I 2022-03-01 08:06:12.556 SingleUserNotebookApp mixins:556] Updating Hub with activity every 300 seconds                                                                                                                          
│ notebook [I 2022-03-01 08:06:20.643 LabApp] Build is up to date                                                                                                                                                                             
│ notebook [I 2022-03-01 08:06:30.761 SingleUserNotebookApp kernelmanager:179] Kernel started: 089428ef-4e49-46c9-9e70-82ca723c9b11, name: python3                                                                                            │
│ notebook [I 2022-03-01 08:06:30.762 SingleUserNotebookApp kernelmanager:442] Culling kernels with idle durations > 120 seconds at 30 second intervals ...                                                                                   │
│ notebook [I 2022-03-01 08:06:30.762 SingleUserNotebookApp kernelmanager:447] Culling kernels even with connected clients                                                                                                                    │
│ notebook [I 2022-03-01 08:06:31.821 SingleUserNotebookApp kernelmanager:222] Starting buffering for 089428ef-4e49-46c9-9e70-82ca723c9b11:619bced9-9cb3-45d2-b960-1d5761d5539d                                                               │
│ notebook [I 2022-03-01 08:06:37.725 SingleUserNotebookApp kernelmanager:179] Kernel started: a75e3125-02ef-4be6-acc3-a9d87f27fc02, name: conda-env-filesystem-dask-py                                                                       │
│ notebook CONDA_PREFIX=/home/conda/filesystem/envs/dask                                                                                                                                                                                      
│ notebook [I 2022-03-01 08:06:42.008 SingleUserNotebookApp kernelmanager:222] Starting buffering for a75e3125-02ef-4be6-acc3-a9d87f27fc02:81585be1-0c11-499c-973f-28c871a05940                                                               │
│ notebook [I 2022-03-01 08:06:46.027 SingleUserNotebookApp handlers:169] Saving file at /Untitled2.ipynb                                                                                                                                     
│ notebook [W 2022-03-01 08:09:00.763 SingleUserNotebookApp kernelmanager:482] Culling 'idle' kernel 'python3' (089428ef-4e49-46c9-9e70-82ca723c9b11) with 1 connections due to 147 seconds of inactivity.                                    │
│ notebook [I 2022-03-01 08:09:00.764 SingleUserNotebookApp multikernelmanager:258] Kernel shutdown: 089428ef-4e49-46c9-9e70-82ca723c9b11                                                                                                     │
│ notebook [W 2022-03-01 08:09:00.989 SingleUserNotebookApp kernelmanager:482] Culling 'idle' kernel 'conda-env-filesystem-dask-py' (a75e3125-02ef-4be6-acc3-a9d87f27fc02) with 2 connections due to 137 seconds of inactivity.               │
│ notebook [I 2022-03-01 08:09:00.990 SingleUserNotebookApp multikernelmanager:258] Kernel shutdown: a75e3125-02ef-4be6-acc3-a9d87f27fc02                                                                                                     │
│ notebook [I 2022-03-01 08:11:12.516 SingleUserNotebookApp notebookapp:2069] No kernels or terminals for 263 seconds; shutting down.                                                                                                         │
│ notebook [I 2022-03-01 08:11:12.536 SingleUserNotebookApp notebookapp:2166] Shutting down 0 kernels                                                                                                                                         
│ notebook [I 2022-03-01 08:11:12.536 SingleUserNotebookApp notebookapp:2181] Shutting down 0 terminals                                                                                                                                       
│ notebook stream closed   .

Hub idle culling system when user log-outs:

│ [I 2022-03-01 14:14:44.037 JupyterHub proxy:289] Adding user bob to proxy /user/bob/ => http://10.244.1.12:8888                                                                                                                            
│ [I 2022-03-01 14:14:44.040 JupyterHub users:677] Server bob is ready                                                                                                                                                                        
│ [I 2022-03-01 14:14:44.041 JupyterHub log:189] 200 GET /hub/api/users/bob/server/progress ([email protected]) 6401.87ms                                                                                                                        
│ [I 2022-03-01 14:14:44.228 JupyterHub log:189] 302 GET /hub/spawn-pending/bob -> /user/bob/ ([email protected]) 3.59ms                                                                                                                         
│ [I 2022-03-01 14:14:44.770 JupyterHub log:189] 302 GET /hub/api/oauth2/authorize?client_id=jupyterhub-user-bob&redirect_uri=%2Fuser%2Fbob%2Foauth_callback&response_type=code&state=[secret] -> /user/bob/oauth_callback?code=[secret]&stat │
│ [I 2022-03-01 14:14:44.992 JupyterHub log:189] 200 POST /hub/api/oauth2/token ([email protected]) 45.49ms                                                                                                                                     
│ [I 2022-03-01 14:14:45.022 JupyterHub log:189] 200 GET /hub/api/authorizations/token/[secret] ([email protected]) 23.19ms                                                                                                                     
│ [I 2022-03-01 14:14:46.743 JupyterHub log:189] 200 GET /hub/api/authorizations/token/[secret] ([email protected]) 21.60ms                                                                                                                     
│ [I 2022-03-01 14:14:56.697 JupyterHub proxy:347] Checking routes                                                                                                                                                                            
│ [I 2022-03-01 14:14:56.824 JupyterHub log:189] 200 GET /hub/api/users ([email protected]) 28.59ms                                                                                                                                         
│ [I 2022-03-01 14:15:10.891 JupyterHub login:44] User logged out: bob                                                                                                                                                                        
│ [I 2022-03-01 14:15:10.914 JupyterHub log:189] 302 GET /hub/logout -> https://qhubstages.qhub.dev/auth/realms/qhub/protocol/openid-connect/logout?redirect_uri=https%3A%2F%2Fqhubstages.qhub.dev%2Fhub%2Flogin (@10.244.0.1) 25.84ms        │
│ [I 2022-03-01 14:15:11.334 JupyterHub log:189] 200 GET /hub/login (@10.244.0.1) 12.80ms                                                                                                                                                     
│ [I 2022-03-01 14:15:56.698 JupyterHub proxy:347] Checking routes                                                                                                                                                                            
│ [I 2022-03-01 14:15:56.814 JupyterHub log:189] 200 GET /hub/api/users ([email protected]) 17.92ms                                                                                                                                         
│ [I 2022-03-01 14:16:56.681 JupyterHub proxy:347] Checking routes                                                                                                                                                                            
│ [I 2022-03-01 14:16:56.818 JupyterHub log:189] 200 GET /hub/api/users ([email protected]) 22.15ms                                                                                                                                         
│ [I 2022-03-01 14:17:56.681 JupyterHub proxy:347] Checking routes                                                                                                                                                                            
│ [I 2022-03-01 14:17:56.814 JupyterHub log:189] 200 GET /hub/api/users ([email protected]) 17.72ms                                                                                                                                         
│ [I 2022-03-01 14:18:56.681 JupyterHub proxy:347] Checking routes                                                                                                                                                                            
│ [I 2022-03-01 14:18:56.815 JupyterHub log:189] 200 GET /hub/api/users ([email protected]) 18.92ms                                                                                                                                         
│ [I 220301 14:18:56 __init__:192] Culling server bob (inactive for 00:03:45)                                                                                                                                                                 
│ [I 2022-03-01 14:18:56.822 JupyterHub proxy:309] Removing user bob from proxy (/user/bob/)                                                                                                                                                  
│ [I 2022-03-01 14:18:56.825 JupyterHub spawner:2620] Deleting pod dev/jupyter-bob                                                                                                                                                            
│ [I 2022-03-01 14:19:00.304 JupyterHub base:1116] User bob server took 3.483 seconds to stop                                                                                                                                                 
│ [I 2022-03-01 14:19:00.305 JupyterHub log:189] 204 DELETE /hub/api/users/bob/server ([email protected]) 3487.17ms  

@viniciusdc viniciusdc requested a review from costrouc March 1, 2022 15:02
@magsol magsol added this to the Release v0.4.0 milestone Mar 1, 2022
@@ -107,6 +107,11 @@ module "jupyterhub" {
name = "dask-etc"
namespace = var.environment
kind = "configmap"
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer if extra mounts was instead added within the module so that we can directly reference the config map name. There is a terraform concat function that can be used.

Copy link
Contributor Author

@viniciusdc viniciusdc Mar 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @costrouc, just to be sure... you are saying something like this

        extra-mounts      = merge(
          var.extra-mounts,
          {
            "/etc/jupyter" = {
              name = "server-idle-culling"
              namespace = var.namespace
              kind = "configmap"
            }
          }
        )

on main.tf? I changed over merge, as concat only works for lists

@viniciusdc
Copy link
Contributor Author

Finished testing with the config below:

  • Kernel and Terminal timeout of 10m and culling period of 1m
  • Notebook server timeout of 10m

Note that kernel/terminal shutdown represents user activity, so the total timeout for the process -- pod deletion, corresponds to the proximity sum of the kernel/terminal timeout + server (approximately)

  • Spawned a default pod and launched a terminal and notebook instances (just some simple imports), then minimized the tab
hub      [I 2022-03-03 11:35:26.303 JupyterHub users:677] Server bob is ready
notebook [I 2022-03-03 11:35:39.624 SingleUserNotebookApp kernelmanager:179] Kernel started: 6ae0155b-7abc-4d85-b9f7-0893041d5ec2, name: python3
notebook [I 2022-03-03 11:35:43.907 SingleUserNotebookApp management:365] New terminal with automatic name: 1  
  • After the timeout passed (plus extra culling period)
notebook [W 2022-03-03 11:46:39.626 SingleUserNotebookApp kernelmanager:482] Culling 'idle' kernel 'python3' (6ae0155b-7abc-4d85-b9f7-0893041d5ec2) with 2 connections due to 643 seconds of inactivity
notebook [I 2022-03-03 11:46:39.626 SingleUserNotebookApp multikernelmanager:258] Kernel shutdown: 6ae0155b-7abc-4d85-b9f7-0893041d5ec2
notebook [W 2022-03-03 11:46:43.911 SingleUserNotebookApp terminalmanager:160] Culling terminal '1' due to 654 seconds of inactivity.
notebook [I 2022-03-03 11:46:44.012 SingleUserNotebookApp management:382] Terminal 1 closed
notebook Websocket closed  
  • Server shutting down
notebook [I 2022-03-03 11:57:23.850 SingleUserNotebookApp notebookapp:2069] No kernels or terminals for 638 seconds; shutting down.
notebook [I 2022-03-03 11:57:23.862 SingleUserNotebookApp notebookapp:2166] Shutting down 0 kernels
notebook [I 2022-03-03 11:57:23.862 SingleUserNotebookApp notebookapp:2181] Shutting down 0 terminals
hub      [I 2022-03-03 11:57:26.295 JupyterHub spawner:2620] Deleting pod dev/jupyter-bob
hub      [W 2022-03-03 11:57:26.349 JupyterHub base:1073] User bob server stopped, with exit code: 1
hub      [I 2022-03-03 11:57:26.349 JupyterHub proxy:309] Removing user bob from proxy (/user/bob/)  

@viniciusdc viniciusdc requested a review from costrouc March 3, 2022 15:00
@viniciusdc
Copy link
Contributor Author

I will soon add docs to this

@@ -30,7 +41,16 @@ resource "helm_release" "jupyterhub" {
shared-pvc = var.shared-pvc
conda-store-pvc = var.conda-store-pvc
conda-store-mount = var.conda-store-mount
extra-mounts = var.extra-mounts
extra-mounts = merge(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is concat no? Merged will combine the dicts https://www.terraform.io/language/functions/merge.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

Copy link
Contributor Author

@viniciusdc viniciusdc Mar 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as var.extra-mounts is already a dict of the form

var.extra-mounts = {
   "/dask-etc" = {}
}

and extra-mounts expect a dict for jsonencode, so we will be merging those dicts indeed, but only the new key idle-culler-... will be included, the values are maintained the same. Wich results in

extra-mounts = {
   "/dask-etc" = {},
   "/jupyter" = {}
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[bug] Idle culler doesn't appear to be working properly
3 participants