Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add persistent volume configuration to jupyter #78

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

wabscale
Copy link

I was having issues with the jupyter where any notebooks saved were lost any time the pod kicked over. I've added some very simple configuration to add a persistent volume claim to the jupyter instance (which is disabled by default).

jupyter:
  persistent:
    enable: false
    storageClass: "default"
    size: 1Gi
    path: /home/jovyan/persistent

I've made it so that you can specify where the volume gets mounted, the size, and the storage class. When the storage class name is "default", it will use whatever the default storage class is.

initContainers:
- name: {{ .Release.Name }}-jupyter-volume-permissions
image: busybox:1.31.1
command: ["sh", "-c", "chmod -Rv 777 /data/ && chown 0:100 -R /data/"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting permissions to 777 is generally a bad idea. You may find 755 is more appropriate.

However could you provide a little explaination why this is necessary at all?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The storage class that I am using is longhorn. The volumes they give you are owned by root by default. That init container sets the permissions so that the jovyan user can read and write to the PV.

I'm not aware of any storage classes that let you explicitly set the default ownership. I will add an option so its easy to disable this init container if desired.

dask/values.yaml Outdated
enable: false
storageClass: "default"
size: 1Gi
path: /home/jovyan/persistent
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend making this just /home/jovyan

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to explicitly not overwrite the examples directory with the pv. I see your point though, it would be much clearer to the user if the entire home directory is whats saved.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The examples are created at runtime. So this should've be a problem.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I make the mount path /home/jovyan the examples do not show up.

Checking the Dockerfile from the dask-docker repo shows they are added at build time.

https://github.com/dask/dask-docker/blob/master/notebook/Dockerfile#L47

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh interesting. I must be thinking of the Pangeo helm chart.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering how you want to handle this. I can submit a PR to the docker-helm repo to copy the examples to another directory (say /usr/share/doc/dask-examples). That way they would be independent of the PV, and always available.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @jacobtomlinson how would you wan tot handle this?

Copy link
Member

@jacobtomlinson jacobtomlinson Sep 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can submit a PR to the docker-helm repo to copy the examples to another directory (say /usr/share/doc/dask-examples).

The trouble with this is discoverability. Really they need to show up in the sidebar.

Perhaps adding them to that location and then adding a line to the prepare.sh script to create a symbolic link would work?

@jacobtomlinson
Copy link
Member

This looks like a good change. Just a couple of comments.

@daddydrac
Copy link

I have a way to do this on AWS s3

@jacobtomlinson
Copy link
Member

That's great @joehoeller! Would you mind sharing your method? Perhaps we could add it to the docs.

@daddydrac
Copy link

daddydrac commented Dec 2, 2020 via email

@jacobtomlinson
Copy link
Member

Yeah I spotted the SO question. I've answered it.

Are you able to share things publicly here? It would be great for the community.

@daddydrac
Copy link

daddydrac commented Dec 2, 2020

Persistent Volumes for Jupyter using RAPIDS and Dask with Amazon s3

The instructions for Dask are not obvious, however it does say to install on both worker and jupyter server nodes:
IMG-0991

Inside values.yaml place this in worker and jupyter service:

    env:
      - name: EXTRA_PIP_PACKAGES
        value: "hybridcontents s3fs s3contents --use-feature=2020-resolver --upgrade"

Then, inside the path /dask/templates you will see a file called dask-jupyter-config.yaml, replace that code with everything that is in this configMap gist here: https://gist.github.com/joehoeller/063d055cdaf7d92455ee6d695ead8d0a

You will see these vars:

  1. "access_key_id": AWS Access Key
  2. "secret_access_key": AWS Secret Key
  3. "bucket": The bucket name you created in AWS console or cli
  4. "region_name": The region you are making the request from
  5. "endpoint_url": This can vary from: https://s3.us-west-2.amazonaws.com, https://s3.us-gov-east-1.amazonaws.com, and for FIPS enabled security (will only accept connection if Jupyter is on https) https://s3-fips.us-gov-east-1.amazonaws.com.

@daddydrac
Copy link

@jacobtomlinson did this work for you? I noticed my version I forked internally has a slightly diff file structure.

@daddydrac
Copy link

@jacobtomlinson can i close this?

Base automatically changed from master to main February 11, 2021 01:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
chart/dask Related to the dask chart
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants