Start / Large Language Model (LLM) Talk / Kimi k1 5

Kimi k1.5

22 min • 23 januari 2025

Kimi k1.5 is a multimodal LLM trained with reinforcement learning (RL). Key aspects include: long context scaling to 128k, improving performance with increased context length; improved policy optimization using a variant of online mirror descent; and a simplistic framework that enables planning and reflection without complex methods. It uses a reference policy in its off-policy RL approach, and long2short methods such as model merging and DPO to transfer knowledge from long-CoT to short-CoT models, achieving state-of-the-art reasoning performance. The model is jointly trained on text and vision data.

Kategorier

Poddar Teknologi

Förekommer på

Teknik

00:00 -00:00