MiniMax-01 is a series of large language and vision-language models that use lightning attention and a mixture of experts (MoE) to achieve long context processing. The models, MiniMax-Text-01 and MiniMax-VL-01, match the performance of top-tier models, like GPT-4o and Claude-3.5-Sonnet, while offering 20-32 times longer context windows, reaching up to 4 million tokens during inference. The models use a hybrid architecture, with linear and softmax attention mechanisms, and are trained on large datasets of text, code, and image-caption pairs. They also use a multi-stage training process with supervised fine-tuning and reinforcement learning to optimize their capabilities in long-context and real-world scenarios.