This is a minimal cookiecutter template to create a project structure for a data science project that can serve for both data exploration, model development and be containerized as a deployable stand-alone application with Docker.
The below command creates a project using this cookiecutter template on your development machine. Visit the Cookiecuter website for more details.
pipx run cookiecutter gh:srinathh/minimal_data_science_cookiecutter
You may want to use Conda, venv or Pyenv etc to create isolated environment for your project
Change to the project directory & run the below to install the packages and the executable in the project for local development. See the README.md inside the project for more information. You may note that the Dockerfile does the same to build an application image for deployment.
pip install -e .
If you'll be using Jupyter notebooks & chose the appropriate option to add development dependencies during creation, use the following:
pip install -e ".[dev]"
Create an empty git repository in your platform of choice and push the contents to ensure you can version control.
git init
git add .
git commit -m "first commit"
git branch -M main
git remote add origin <remote path>
git push -u origin main
To install development dependencies like Jupyter,
The Python ecosystem standardized on pyproject.toml
as a build-system independent
format for packaging specification with PEP 517
and PEP 518.
While legacy packages still use a mish-mash of setup.py
or requirements.txt
to
specify dependencies, the modern Pythonic approach is to directly encode both
dependencies and all other app package related information such-as entry-points,
preferred build system, python version dependency etc directly in pyproject.toml
For more information on how to customize pyproject.toml
, see the
official tutorial
It adds the following standard data science packages as project
dependencies in pyproject.toml
pandas
numpy
,scikit-learn
,matplotlib
,python-dotenv
,{% endif %}
The official Python docker image comes in two flavors, the regular and slim. The slim versions are much lighter but may be missing some debian packages. Always test before deploying.
├── .gitignore <- to indicate which files to ignore by git
├── LICENSE <- Open-source license if one is chosen
├── README.md <- The top-level README for developers using this project.
├── data <- Folder for data. Any file inside this is ignored by Git
├── notebooks <- Jupyter notebooks. Suggested naming convention is a number (for ordering),
│ the creator's initials, and a short `-` delimited description, e.g.
│ `1.0-jqp-initial-data-exploration`.
├── Dockerfile <- Dockerfile to build the package with app as a Docker Image
├── pyproject.toml <- Project configuration file with package metadata and dependency management
└── src <- Source code for use in this project.
│
├── app
│ ├── __init__.py
│ └── app.py <- contains the entry point for the sample Hello World application
│
├── {{ cookiecutter.project_slug }} <- the package for this project
│ └── __init__.py
│
└── utils
├── __init__.py
├── sample.py <- A sample utility function example
├── singleton.py <- A utility implementation of Singleton pattern for Python
└── test_utils.py <- An example of how to write tests