Start / Large Language Model (LLM) Talk / Gpt 2

GPT-2

20 min • 14 januari 2025

GPT-2 language model is a large, transformer-based model using a decoder-only architecture. It predicts the next word in a sequence, much like an advanced keyboard app. GPT-2 is auto-regressive, adding each predicted token to the input for the next step. It uses masked self-attention, focusing on previous tokens, unlike BERT's self-attention. Input tokens are processed through multiple decoder blocks, each having self-attention and neural network layers. The self-attention mechanism uses query, key, and value vectors for context. GPT-2 has applications in machine translation, summarization, and music generation.

Kategorier

Poddar Teknologi

Förekommer på

Teknik

00:00 -00:00