Skip to content

geoaigroup/ml_repository_structure

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Project Repository Structure 📁

This document outlines the standardized structure for all project repositories to ensure consistency and ease of collaboration.

Repository Structure

project-root/
├── media/                # Static assets (non-Python)
│   ├── image1.png        # Documentation/images
│   └── image2.jpg        # Screenshots/visuals
│
├── data/                  # Datasets (non-Python)
│   ├── raw/              # Original, immutable data
│   ├── processed/        # Cleaned and transformed data
│   └── external/         # Third-party data
│
├── notebooks/            # Jupyter notebooks (non-Python)
│
├── src/                  # Source code (Python package)
│   ├── __init__.py       
│   ├── data/             # Data processing
│   ├── models/           # Model code
│   ├── evaluation/       # Evaluation logic
│   ├── visualization/    # Visualization tools
│   ├── utils/            # Utilities
│   ├── processing/       # processing (pre-post)
│   ├── train.py          # Main training script
│   └── eval.py           # Main evaluation script
│
├── experiments/           # Experiment results (non-Python)
│   ├── experiment_1/
│   │   ├── logs/
│   │   └── checkpoints/
│   └── experiment_2/
│       ├── logs/
│       └── checkpoints//
│
├── scripts/              # Helper scripts (.sh/.bash)
├── tests/                # Test cases
├── config/               # Configuration files
├── requirements.txt      # Python dependencies
├── Dockerfile            # Environment setup
├── .gitignore            # Git exclusion rules
└── README.md             # Project documentation

Key Files & Folders 🔑

Required = Must exist from Day 1 (can be empty initially)

Core Structure

Folder/File Status Contents Description
media/ Recommended Documentation images (*.png, *.jpg)
data/ Required Datasets storage:
notebooks/ Required Jupyter notebooks for exploration & demos
src/ Required Main Python package with modules:
experiments/ Required Training runs (logs/checkpoints)
config/ Required Configuration files (*.yaml, *.json)

Essential Files

File Command/Usage
requirements.txt pip install -r requirements.txt
Dockerfile docker build -t project-name .
README.md Project documentation hub

Supplementary Folders

File Command/Usage
scripts/ Shell scripts for automation
tests/ Unit and integration tests (test_*.py)

Python Module Requirements 🐍

All subfolders under src/ must be proper Python modules:

src/
├── __init__.py          # Required for root package
└── data/
    ├── __init__.py      # Required for module
    └── loader.py       # Import via: from src.data import loader

Repository Rules 🚥

Folder Naming

  • ❌ Never rename core folders (e.g., src/data/src/datasets/)
  • ✅ Keep original names

Module Management

  • ➕ Add new modules as needed (e.g., src/inference/)
  • 🗑️ Delete unused modules (e.g., empty src/visualization/)

Code Location

  • 🐍 Python code only in src/ and tests/ (with __init__.py)
  • 📁 Non-Python files in data/, notebooks/, experiments/

Real-World Scenarios 🧪

✅ Allowed:

  • Creating src/inference/ for prediction logic
  • Deleting empty src/visualization/ folder

❌ Not Allowed:

  • Renaming src/models/src/networks/
  • Adding Python files to data/ without __init__.py

Important Notes 📝

  1. Must-Have Folders: Create these even if empty:
    data/, notebooks/, src/, experiments/
    
  2. Other folders (scripts/, tests/) can be added later
  3. Maintain identical structure across all repositories
  4. Update .gitignore to exclude large data files/credentials

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published