Skip to content

Latest commit

 

History

History
30 lines (24 loc) · 1.66 KB

File metadata and controls

30 lines (24 loc) · 1.66 KB

PySpark Notebook with DeltaLake for production

This repo tries to replicate databricks runtime, plus feature-rich jupyter/docker-stacks.

Base image: rapidsai/rapidsai:22.02-cuda11.5-runtime-ubuntu20.04-py3.8

Additional packages:

Planning:

  • All additional packages from jupyter/r-notebook.
  • All additional packages that are on top of Databricks runtime dependencies tree (10.3 ML GPU runtime)
  • xgboost and Spark distribution of xgboost (Waiting for this PR)
  • hyperopt

Starting Docker

Generate environment variables

Check .env.template for environment variables template, or modify and copy these lines

echo "JUPYTER_PATH=<path-to-notebook-directory>" > .env
echo "NB_UID=`id -u`" >> .env
echo "NB_GID=`id -g`" >> .env

Get path-to-notebook-directory using pwd in the notebook directory

Docker Compose

docker-compose up -d