Sveriges mest populära poddar

Large Language Model (LLM) Talk

DeepSeek-V2

11 min • 10 februari 2025

DeepSeek-V2 is a Mixture-of-Experts (MoE) language model that balances strong performance with economical training and efficient inference. It uses a total of 236B parameters, with 21B activated for each token, and supports a context length of 128K tokens. Key architectural innovations includeMulti-Head Latent Attention (MLA), which compresses the KV cache for faster inference, andDeepSeekMoE, which enables economical training through sparse computation. Compared to DeepSeek 67B, DeepSeek-V2 saves 42.5% of training costs, reduces the KV cache by 93.3%, and boosts maximum generation throughput by 5.76 times. It is pre-trained on 8.1T tokens of high-quality data and further aligned through Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL).

Kategorier
Förekommer på
00:00 -00:00