LLaMA-2 is a collection of large language models (LLMs), with pretrained and fine-tuned versions ranging from 7 billion to 70 billion parameters. The fine-tuned models, called Llama 2-Chat, are designed for dialogue and outperform open-source models on various benchmarks. The models were trained on 2 trillion tokens of publicly available data, and were optimized for both helpfulness and safety using techniques such as supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF). Llama 2 also includes a novel technique, Ghost Attention (GAtt), to maintain dialogue flow.