GPT-2 Implementation in C

A simplified yet functional implementation of the GPT-2 language model written entirely in C.

This project demonstrates the core concepts of transformer-based language models with a focus on educational clarity and minimal dependencies.

Features

Pure C Implementation: No external dependencies beyond standard C libraries
Complete Transformer Architecture: Includes multi-head attention, feed-forward networks, and layer normalization
Training Capabilities: Basic training loop with cross-entropy loss and gradient updates
Text Generation: Interactive text completion with temperature-controlled sampling
Memory Management: Proper allocation and deallocation of matrix structures
Educational Focus: Well-commented code with debug output for learning purposes

Architecture Overview

The implementation includes:

Token and Positional Embeddings: Convert input tokens to dense vector representations
Multi-Head Self-Attention: Core attention mechanism for modeling token relationships
Feed-Forward Networks: Position-wise fully connected layers with GELU activation
Layer Normalization: Stabilizes training and improves convergence
Language Modeling Head: Projects hidden states to vocabulary logits

Model Configuration

Default hyperparameters (defined in gpt2.h):

Vocabulary Size: 50,257 (standard GPT-2 vocabulary)
Embedding Dimension: 768
Number of Layers: 12
Number of Attention Heads: 12
Feed-Forward Dimension: 3072
Maximum Sequence Length: 1024
Learning Rate: 0.0001

File Structure

├── gpt2.h          # Header file with structure definitions and constants
├── gpt2.c          # Core implementation (matrix operations, model architecture)
├── main.c          # Training loop and text generation interface
├── train.txt       # Training data file (user-provided)
└── README.md       # This file

Prerequisites

C compiler (GCC, Clang, or MSVC)
Standard C libraries (stdio.h, stdlib.h, string.h, math.h, time.h)

Compilation

gcc -o gpt2 main.c gpt2.c -lm

Or using Makefile:

CC=gcc
CFLAGS=-Wall -O2
LIBS=-lm

gpt2: main.c gpt2.c gpt2.h
	$(CC) $(CFLAGS) -o gpt2 main.c gpt2.c $(LIBS)

clean:
	rm -f gpt2

Usage

1. Prepare Training Data

Create a train.txt (or simply use this one) file with your training text. Each line will be treated as a separate training example:

Hello world, this is a sample sentence.
Machine learning is fascinating.
Transformers have revolutionized NLP.

2. Run Training and Generation

./gpt2

The program will:

Load training data from train.txt
Initialize the GPT-2 model with random weights
Train for a specified number of epochs
Generate text completions for test prompts
Enter interactive mode for custom text generation

3. Execution

Enter prompt: Altherya

(…………………)

[forward_pass] ÔûÂ exiting
[generate_text] Forward pass completed
[generate_text] Sampled token: 32 (' ')
[generate_text] Generation complete, final seq_len=104
[generate_text] Final text: 'Altherya nel                                                                                                    '
Generated: "Altherya nel                                                                                                    "

Clearly it still have many issues, but this is for exercise purpose only.

Code Structure

Limitations and Simplifications

This implementation makes several simplifications for educational clarity:

Character-level Tokenization: Uses simple character mapping instead of BPE
Simplified Attention: Multi-head attention is partially implemented
Basic Optimizer: Uses simple gradient updates instead of Adam
Limited Gradient Computation: Backward pass is simplified
No Regularization: Missing dropout and weight decay
Fixed Hyperparameters: No dynamic learning rate scheduling

Educational Value

This implementation is designed to help understand:

Transformer Architecture: How attention, feed-forward layers, and normalization work together
Matrix Operations: Low-level implementation of neural network computations
Memory Management: Proper handling of dynamic memory in C
Training Loop: How language models learn from sequential data
Text Generation: Autoregressive sampling strategies

Performance Considerations

Low Memory Usage
Training Speed: Significantly slower than optimized implementations
Numerical Stability: Uses basic floating-point arithmetic
Scalability: Not optimized for large datasets or models

Extensions and Improvements

Potential enhancements:

Better Tokenization: Implement BPE or WordPiece tokenization
Optimized Attention: Full multi-head attention with proper reshaping
Advanced Optimizers: Adam, AdamW, or other modern optimizers
Regularization: Dropout, layer dropout, attention dropout
Mixed Precision: Half-precision training for memory efficiency
Parallel Processing: Multi-threading or GPU acceleration
Model Checkpointing: Save/load trained models
Evaluation Metrics: Perplexity calculation and validation

Learning Resources

To better understand the concepts implemented here:

Attention Is All You Need - Original Transformer paper
Language Models are Unsupervised Multitask Learners - GPT-2 paper
The Illustrated Transformer - Visual explanation
GPT-2 Architecture - GPT-2 specific details

Contributing

This is an educational implementation. Contributions that improve clarity, add comments, or fix bugs are welcome. Please maintain the focus on readability and educational value.

Disclaimer

This is a simplified implementation for educational purposes and should not be used for production applications. The model architecture and training procedures are significantly simplified compared to the original GPT-2.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

GPT-2 Implementation in C

Features

Architecture Overview

Model Configuration

File Structure

Prerequisites

Compilation

Usage

1. Prepare Training Data

2. Run Training and Generation

3. Execution

Code Structure

Limitations and Simplifications

Educational Value

Performance Considerations

Extensions and Improvements

Learning Resources

Contributing

Disclaimer

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md
gpt2.c		gpt2.c
gpt2.h		gpt2.h
main.c		main.c
train.txt		train.txt

Alessio2405/gpt-2-in-c

Folders and files

Latest commit

History

Repository files navigation

GPT-2 Implementation in C

Features

Architecture Overview

Model Configuration

File Structure

Prerequisites

Compilation

Usage

1. Prepare Training Data

2. Run Training and Generation

3. Execution

Code Structure

Limitations and Simplifications

Educational Value

Performance Considerations

Extensions and Improvements

Learning Resources

Contributing

Disclaimer

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages