Start / Large Language Model (LLM) Talk / Qwen 1

Qwen-1

15 min • 1 februari 2025

Qwen-1, also known as QWEN, is a series of large language models that includes base pretrained models, chat models, and specialized models for coding and math. These models are trained on a massive dataset of 3 trillion tokens using byte pair encoding for tokenization, and they feature a modified Transformer architecture with untied embeddings and rotary positional embeddings. The chat models (QWEN-CHAT) are aligned to human preferences using Supervised Finetuning (SFT) and Reinforcement Learning from Human Feedback (RLHF). QWEN models have strong performance, outperforming many open-source models, but they generally lag behind models like GPT-4.

Kategorier

Poddar Teknologi

Förekommer på

Teknik

00:00 -00:00