Building everything from Scratch Language Models Attention Mechanisms BatchNorm & LayerNorm Simple Tokenizer Activation Functions Positional Encodings GPT2 GPT2 Pretraining Machine Learning Linear Regression Logistic Regression K Nearest Neighbours (KNN) K Means Multi Layered Perceptron Regression Losses Classification Losses More to come soon!