LLM101n 是由 Andrej Karpathy 开发的一套教程,旨在从零开始构建一个名为“Storyteller”的大型语言模型 (LLM)。这个项目的独特之处在于它不仅能让模型创作和精炼小故事,还能让用户与 AI 合作,共同享受创作过程和体验 AI 讲述故事的魅力。与现有的 AI 模型不同,Storyteller 专注于增强互动性和创造性。
https://github.com/karpathy/LLM101n
本课程尚未发布,鼠鼠会第一时间更新。
Syllabus
- Chapter 01 Bigram Language Model (language modeling)
- Chapter 02 Micrograd (machine learning, backpropagation)
- Chapter 03 N-gram model (multi-layer perceptron, matmul, gelu)
- Chapter 04 Attention (attention, softmax, positional encoder)
- Chapter 05 Transformer (transformer, residual, layernorm, GPT-2)
- Chapter 06 Tokenization (minBPE, byte pair encoding)
- Chapter 07 Optimization (initialization, optimization, AdamW)
- Chapter 08 Need for Speed I: Device (device, CPU, GPU, ...)
- Chapter 09 Need for Speed II: Precision (mixed precision training, fp16, bf16, fp8, ...)
- Chapter 10 Need for Speed III: Distributed (distributed optimization, DDP, ZeRO)
- Chapter 11 Datasets (datasets, data loading, synthetic data generation)
- Chapter 12 Inference I: kv-cache (kv-cache)
- Chapter 13 Inference II: Quantization (quantization)
- Chapter 14 Finetuning I: SFT (supervised finetuning SFT, PEFT, LoRA, chat)
- Chapter 15 Finetuning II: RL (reinforcement learning, RLHF, PPO, DPO)
- Chapter 16 Deployment (API, web app)
- Chapter 17 Multimodal (VQVAE, diffusion transformer)
Syllabus
- Chapter 01 Bigram 语言模型(语言建模)
- Chapter 02 Micrograd(机器学习,反向传播)
- Chapter 03 N-gram模型(多层感知器,matmul,gelu)
- Chapter 04 注意力机制(注意力机制,softmax,位置编码器)
- Chapter 05 Transformer(Transformer,残差,LayerNorm,GPT-2)
- Chapter 06 词嵌入模型(minBPE,字节对编码)
- Chapter 07 优化(初始化,优化,AdamW)
- Chapter 08 风驰电掣1: 设备(设备,CPU,GPU)
- Chapter 09 风驰电掣2: 精度(mixed precision training,fp16,bf16,fp8,...)
- Chapter 10 风驰电掣3: 分布式(分布式优化,DDP,ZeRO)
- Chapter 11 数据集(数据集,数据加载,合成数据生成)
- Chapter 12 推理1: kv-cache(kv-cache)
- Chapter 13 推理2: 量化(量化)
- Chapter 14 微调1: SFT(监督微调SFT,PEFT,LoRA,chat)
- Chapter 15 微调2: RL(强化学习RL,RLHF,PPO,DPO)
- Chapter 16 部署(API,Web 应用)
- Chapter 17 多模态(VQVAE,diffusion transformer)