A minimal implementation of a decoder-only Transformer (GPT-like) language model trained on a small custom corpus. Built for learning purposes using PyTorch.
Perfect for beginners who want to understand how GPT models work under the hood.
- From Scratch Implementation: No high-level libraries like Hugging Face
- Multi-Head Self-Attention with causal masking
- Pre-Norm Transformer Blocks
- Positional Embeddings
- Train/Validation Split with proper evaluation
- Top-k Sampling + Temperature for better text generation
- Gradient Clipping and Dropout for stability
- Colab Compatible (single notebook version available)
- Model saving and easy inference