Sveriges mest populära poddar

Large Language Model (LLM) Talk

DeepSeek v3

16 min • 20 januari 2025

DeepSeek-V3 is a large Mixture-of-Experts (MoE) language model, trained ~10x less cost, with 671 billion total parameters, of which 37 billion are activated for each token. It uses Multi-head Latent Attention (MLA) and DeepSeekMoE architectures. A key feature of DeepSeek-V3 is its auxiliary-loss-free load balancing strategy and multi-token prediction training objective. The model was pre-trained on 14.8 trillion tokens and underwent supervised fine-tuning and reinforcement learning. It has demonstrated strong performance on various benchmarks, achieving results comparable to leading closed-source models while maintaining economical training costs.

Kategorier
Förekommer på
00:00 -00:00