Skip to content

AIMLRLab/gpt2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🤖 Mini GPT-2 Implementation

A beginner-friendly implementation of GPT-2 for learning purposes.

🎯 What Are We Building?

A small but powerful language model that can:

  • ✍️ Continue writing stories you start
  • 🎵 Generate creative text
  • 📝 Help with writing tasks
  • 🤔 Respond to prompts in a coherent way

🎓 Learning Journey

1️⃣ The Building Blocks

# Each word becomes numbers the AI can understand
"Hello world!" → [3748, 995, 0]
  • Tokenizer: Turns text into numbers (and back!)
  • Attention: Helps AI understand which words are related
  • Neural Network: The AI's "brain" for processing information

2️⃣ How It Works

Think of it like this:

  1. 📚 AI reads lots of text (like learning from books)
  2. 🧩 Breaks text into small pieces (tokens)
  3. 🔍 Learns patterns (like how words go together)
  4. ✍️ Uses patterns to write new text

🚀 Quick Start

# Clone the repository
git clone https://github.com/AIMLRLab/gpt2.git
cd gpt2

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install requirements
pip install -r requirements.txt

# Train the model
python train.py

# Chat with your model
python chat.py

📊 Model Architecture

Our GPT-2 implementation features:

  • Vocabulary: 50,257 tokens (standard GPT-2 vocabulary)
  • Embedding Size: 768 (determines model's capacity)
  • Attention Heads: 4 (for parallel processing)
  • Layers: 8 transformer blocks
  • Context: 512 tokens (text window size)
  • Parameters: ~124M

🎓 Training Details

Default hyperparameters:

  • Batch Size: 16 sequences per batch
  • Learning Rate: 0.001 (AdamW optimizer)
  • Epochs: 5 passes through data
  • Validation Split: 90/10 train/val
  • Checkpointing: Saves best model

💬 Chat Interface

Features:

  • Temperature control (0.9) for balanced output
  • Token filtering for better quality
  • Special token handling
  • Graceful interruption
  • Error recovery

Example prompts:

"Tell me a story about..."
"What are your thoughts on..."
"Write a poem about..."

📝 Files Overview

  • train.py: Training loop with educational logging and visualization
  • gpt2.py: GPT-2 model architecture with detailed mathematical implementations
  • chat.py: Interactive chat interface with token-by-token generation display
  • requirements.txt: Core dependencies (PyTorch, tiktoken, etc.)
  • data.txt: Training data file (not included)
  • MATH.md: Mathematical foundations and detailed explanations

🔍 Logging

The training process logs:

  • Training/validation loss per epoch
  • Model architecture details
  • Batch statistics and sample inputs
  • Hyperparameter configurations
  • Checkpoint information
  • Token-level generation details (in chat mode)

Logs are saved in:

  • Training logs: logs/training_YYYYMMDD_HHMMSS.log
  • Chat logs: logs/chat_YYYYMMDD_HHMMSS.log

Key logging features:

  • Detailed progress tracking
  • Educational explanations
  • Debug mode for attention patterns
  • Token probability visualization
  • Error handling and recovery

🎯 Usage

After training, chat with your model:

python chat.py

Example prompts:

  • "Once upon a time"
  • "The future of AI is"
  • "The meaning of life is"

📊 Model Details

Architecture (The AI's Brain)

Input → Embeddings → Transformer Blocks → Output
      ↑          ↑                    ↑
Words to    Position     12 layers of smart
numbers     info         pattern recognition

Training Settings

  • Batch Size: 16 (processes 16 text chunks at once)
  • Context Length: 512 (can "remember" 512 tokens)
  • Learning Rate: 0.001 (how fast it learns)
  • Model Size: 124M parameters (like 124M knobs to tune)

What These Numbers Mean

  • Batch Size: Like solving 16 math problems at once
  • Context Length: How many words it reads at once (like your short-term memory)
  • Learning Rate: How big steps it takes when learning (too big = stumble, too small = slow)
  • Model Size: How many "brain cells" it has (more = smarter but slower)

🎮 Playground

Try these prompts:

  1. "The future of AI is"
  2. "Once upon a time"
  3. "The secret to happiness"

📚 Learning Resources

Want to learn more? Check out:

⚠️ Limitations

Remember:

  • Needs lots of training data
  • Can make mistakes
  • Learns patterns but doesn't truly "understand"
  • Works best with topics it's trained on

🤝 Contributing

Join our learning journey!

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/AmazingFeature)
  3. Add your improvements
  4. Commit your changes (git commit -m 'Add some AmazingFeature')
  5. Push to the branch (git push origin feature/AmazingFeature)
  6. Open a Pull Request

📄 License

MIT License - See LICENSE file

🙋‍♂️ Support

  • Open an issue for bugs
  • Start a discussion for questions
  • PRs welcome!

🙋‍♂️ Questions?

Remember: The goal is learning - don't be afraid to experiment and make mistakes!

⚠️ Known Issues

  1. Training might seem to hang after "Training Progress" - this is normal, it's generating visualizations
  2. First generation might be slow as the model loads
  3. GPU recommended for faster training

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages