A beginner-friendly implementation of GPT-2 for learning purposes.
A small but powerful language model that can:
- ✍️ Continue writing stories you start
- 🎵 Generate creative text
- 📝 Help with writing tasks
- 🤔 Respond to prompts in a coherent way
# Each word becomes numbers the AI can understand
"Hello world!" → [3748, 995, 0]
- Tokenizer: Turns text into numbers (and back!)
- Attention: Helps AI understand which words are related
- Neural Network: The AI's "brain" for processing information
Think of it like this:
- 📚 AI reads lots of text (like learning from books)
- 🧩 Breaks text into small pieces (tokens)
- 🔍 Learns patterns (like how words go together)
- ✍️ Uses patterns to write new text
# Clone the repository
git clone https://github.com/AIMLRLab/gpt2.git
cd gpt2
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install requirements
pip install -r requirements.txt
# Train the model
python train.py
# Chat with your model
python chat.py
Our GPT-2 implementation features:
- Vocabulary: 50,257 tokens (standard GPT-2 vocabulary)
- Embedding Size: 768 (determines model's capacity)
- Attention Heads: 4 (for parallel processing)
- Layers: 8 transformer blocks
- Context: 512 tokens (text window size)
- Parameters: ~124M
Default hyperparameters:
- Batch Size: 16 sequences per batch
- Learning Rate: 0.001 (AdamW optimizer)
- Epochs: 5 passes through data
- Validation Split: 90/10 train/val
- Checkpointing: Saves best model
Features:
- Temperature control (0.9) for balanced output
- Token filtering for better quality
- Special token handling
- Graceful interruption
- Error recovery
Example prompts:
"Tell me a story about..."
"What are your thoughts on..."
"Write a poem about..."
train.py
: Training loop with educational logging and visualizationgpt2.py
: GPT-2 model architecture with detailed mathematical implementationschat.py
: Interactive chat interface with token-by-token generation displayrequirements.txt
: Core dependencies (PyTorch, tiktoken, etc.)data.txt
: Training data file (not included)MATH.md
: Mathematical foundations and detailed explanations
The training process logs:
- Training/validation loss per epoch
- Model architecture details
- Batch statistics and sample inputs
- Hyperparameter configurations
- Checkpoint information
- Token-level generation details (in chat mode)
Logs are saved in:
- Training logs:
logs/training_YYYYMMDD_HHMMSS.log
- Chat logs:
logs/chat_YYYYMMDD_HHMMSS.log
Key logging features:
- Detailed progress tracking
- Educational explanations
- Debug mode for attention patterns
- Token probability visualization
- Error handling and recovery
After training, chat with your model:
python chat.py
Example prompts:
- "Once upon a time"
- "The future of AI is"
- "The meaning of life is"
Input → Embeddings → Transformer Blocks → Output
↑ ↑ ↑
Words to Position 12 layers of smart
numbers info pattern recognition
- Batch Size: 16 (processes 16 text chunks at once)
- Context Length: 512 (can "remember" 512 tokens)
- Learning Rate: 0.001 (how fast it learns)
- Model Size: 124M parameters (like 124M knobs to tune)
- Batch Size: Like solving 16 math problems at once
- Context Length: How many words it reads at once (like your short-term memory)
- Learning Rate: How big steps it takes when learning (too big = stumble, too small = slow)
- Model Size: How many "brain cells" it has (more = smarter but slower)
Try these prompts:
- "The future of AI is"
- "Once upon a time"
- "The secret to happiness"
Want to learn more? Check out:
Remember:
- Needs lots of training data
- Can make mistakes
- Learns patterns but doesn't truly "understand"
- Works best with topics it's trained on
Join our learning journey!
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature
) - Add your improvements
- Commit your changes (
git commit -m 'Add some AmazingFeature'
) - Push to the branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
MIT License - See LICENSE file
- Open an issue for bugs
- Start a discussion for questions
- PRs welcome!
- Open an issue
- Start a discussion
- Join our Discord community
Remember: The goal is learning - don't be afraid to experiment and make mistakes!
- Training might seem to hang after "Training Progress" - this is normal, it's generating visualizations
- First generation might be slow as the model loads
- GPU recommended for faster training