TRANSFORMER MODEL - GPT 2¶

Ongoing project

Timeline: 6th April, 2025 - PRESENT (Weekends only)

Introduction¶

An end-to-end PyTorch implementation of a GPT-2 style language model, based on OpenAI's 124M Paramater model released by OpenAI and Andrej Karpathy’s NanoGPT. The project walks through all major components of the GPT-2 architecture from first principles, including tokenization, positional embeddings, multi-head self-attention, feedforward layers, and transformer blocks. Key ML concepts such as autoregressive language modeling, causal masking, residual connections, and layer normalization are implemented and explained in detail. The final model is trained on real-world text data to generate coherent natural language.