🤖 Mini GPT-2 Implementation

A beginner-friendly implementation of GPT-2 for learning purposes.

🎯 What Are We Building?

A small but powerful language model that can:

✍️ Continue writing stories you start
🎵 Generate creative text
📝 Help with writing tasks
🤔 Respond to prompts in a coherent way

🎓 Learning Journey

1️⃣ The Building Blocks

# Each word becomes numbers the AI can understand
"Hello world!" → [3748, 995, 0]

Tokenizer: Turns text into numbers (and back!)
Attention: Helps AI understand which words are related
Neural Network: The AI's "brain" for processing information

2️⃣ How It Works

Think of it like this:

📚 AI reads lots of text (like learning from books)
🧩 Breaks text into small pieces (tokens)
🔍 Learns patterns (like how words go together)
✍️ Uses patterns to write new text

🚀 Quick Start

# Clone the repository
git clone https://github.com/AIMLRLab/gpt2.git
cd gpt2

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install requirements
pip install -r requirements.txt

# Train the model
python train.py

# Chat with your model
python chat.py

📊 Model Architecture

Our GPT-2 implementation features:

Vocabulary: 50,257 tokens (standard GPT-2 vocabulary)
Embedding Size: 768 (determines model's capacity)
Attention Heads: 4 (for parallel processing)
Layers: 8 transformer blocks
Context: 512 tokens (text window size)
Parameters: ~124M

🎓 Training Details

Default hyperparameters:

Batch Size: 16 sequences per batch
Learning Rate: 0.001 (AdamW optimizer)
Epochs: 5 passes through data
Validation Split: 90/10 train/val
Checkpointing: Saves best model

💬 Chat Interface

Features:

Temperature control (0.9) for balanced output
Token filtering for better quality
Special token handling
Graceful interruption
Error recovery

Example prompts:

"Tell me a story about..."
"What are your thoughts on..."
"Write a poem about..."

📝 Files Overview

train.py: Training loop with educational logging and visualization
gpt2.py: GPT-2 model architecture with detailed mathematical implementations
chat.py: Interactive chat interface with token-by-token generation display
requirements.txt: Core dependencies (PyTorch, tiktoken, etc.)
data.txt: Training data file (not included)
MATH.md: Mathematical foundations and detailed explanations

🔍 Logging

The training process logs:

Training/validation loss per epoch
Model architecture details
Batch statistics and sample inputs
Hyperparameter configurations
Checkpoint information
Token-level generation details (in chat mode)

Logs are saved in:

Training logs: logs/training_YYYYMMDD_HHMMSS.log
Chat logs: logs/chat_YYYYMMDD_HHMMSS.log

Key logging features:

Detailed progress tracking
Educational explanations
Debug mode for attention patterns
Token probability visualization
Error handling and recovery

🎯 Usage

After training, chat with your model:

python chat.py

Example prompts:

"Once upon a time"
"The future of AI is"
"The meaning of life is"

📊 Model Details

Architecture (The AI's Brain)

Input → Embeddings → Transformer Blocks → Output
      ↑          ↑                    ↑
Words to    Position     12 layers of smart
numbers     info         pattern recognition

Training Settings

Batch Size: 16 (processes 16 text chunks at once)
Context Length: 512 (can "remember" 512 tokens)
Learning Rate: 0.001 (how fast it learns)
Model Size: 124M parameters (like 124M knobs to tune)

What These Numbers Mean

Batch Size: Like solving 16 math problems at once
Context Length: How many words it reads at once (like your short-term memory)
Learning Rate: How big steps it takes when learning (too big = stumble, too small = slow)
Model Size: How many "brain cells" it has (more = smarter but slower)

🎮 Playground

Try these prompts:

"The future of AI is"
"Once upon a time"
"The secret to happiness"

📚 Learning Resources

Want to learn more? Check out:

⚠️ Limitations

Remember:

Needs lots of training data
Can make mistakes
Learns patterns but doesn't truly "understand"
Works best with topics it's trained on

🤝 Contributing

Join our learning journey!

Fork the repository
Create your feature branch (git checkout -b feature/AmazingFeature)
Add your improvements
Commit your changes (git commit -m 'Add some AmazingFeature')
Push to the branch (git push origin feature/AmazingFeature)
Open a Pull Request

📄 License

MIT License - See LICENSE file

🙋‍♂️ Support

Open an issue for bugs
Start a discussion for questions
PRs welcome!

🙋‍♂️ Questions?

Open an issue
Start a discussion
Join our Discord community

Remember: The goal is learning - don't be afraid to experiment and make mistakes!

⚠️ Known Issues

Training might seem to hang after "Training Progress" - this is normal, it's generating visualizations
First generation might be slow as the model loads
GPU recommended for faster training

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🤖 Mini GPT-2 Implementation

🎯 What Are We Building?

🎓 Learning Journey

1️⃣ The Building Blocks

2️⃣ How It Works

🚀 Quick Start

📊 Model Architecture

🎓 Training Details

💬 Chat Interface

📝 Files Overview

🔍 Logging

🎯 Usage

📊 Model Details

Architecture (The AI's Brain)

Training Settings

What These Numbers Mean

🎮 Playground

📚 Learning Resources

⚠️ Limitations

🤝 Contributing

📄 License

🙋‍♂️ Support

🙋‍♂️ Questions?

⚠️ Known Issues

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.gitignore		.gitignore
MATH.md		MATH.md
README.md		README.md
chat.py		chat.py
data.txt		data.txt
gpt2.py		gpt2.py
requirements.txt		requirements.txt
train.py		train.py

AIMLRLab/gpt2

Folders and files

Latest commit

History

Repository files navigation

🤖 Mini GPT-2 Implementation

🎯 What Are We Building?

🎓 Learning Journey

1️⃣ The Building Blocks

2️⃣ How It Works

🚀 Quick Start

📊 Model Architecture

🎓 Training Details

💬 Chat Interface

📝 Files Overview

🔍 Logging

🎯 Usage

📊 Model Details

Architecture (The AI's Brain)

Training Settings

What These Numbers Mean

🎮 Playground

📚 Learning Resources

⚠️ Limitations

🤝 Contributing

📄 License

🙋‍♂️ Support

🙋‍♂️ Questions?

⚠️ Known Issues

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages