The Transformer Principles Series is a three-volume graduate-level treatise that builds a complete mathematical and engineering understanding of modern AI systems, from the foundational attention mechanism to large language models and multimodal architectures.
Volume II - Training, Aligning, and Deploying Large Language Models addresses the full lifecycle of LLMs: web-scale data curation and deduplication, scaling laws and compute-optimal training, the GPT and BERT family of architectures, mixed-precision distributed training, supervised instruction tuning, reinforcement learning from human feedback (RLHF), direct preference optimization (DPO), safety and alignment, evaluation methodology, in-context learning, agentic reasoning, and production deployment.