1 weeks ago
Alignment & RLHF
Part 13 of the AI/LLM mastery series — how a model learns not just to answer, but to answer the way humans prefer. RLHF explained:…
AI/LLM Security → Training & Alignment
Part 13 of the AI/LLM mastery series — how a model learns not just to answer, but to answer the way humans prefer. RLHF explained:…
Part 12 of the AI/LLM mastery series — turning a knowledgeable base model into a helpful assistant by changing its behaviour, not …
Part 11 of the AI/LLM mastery series — the unglamorous machinery that decides model quality. How a raw web scrape (mostly junk) be…
Part 10 of the AI/LLM mastery series — the maths of "bigger, more data, more compute". The three levers (parameters, tok…
Part 9 of the AI/LLM mastery series — how a randomly-initialised GPT becomes one that knows the world. Pretraining: the trillions …