"Inside the labs, we have these capable models and they're not that far ahead from what the public has access to for free." Boom! That’s a direct quote from Mira Murati, CTO of OpenAI, and it sets the stage for this episode of The Prompt. Jim Carter covers the release of GPT-4o-mini, a smaller, faster, and way cheaper version of the flagship GPT-4o model. This isn't just another upgrade; it's a paradigm shift.
Jim explores how GPT-4o-mini, with its impressive 82% score on the MMLU benchmark and the same 128 thousand token context window, is making waves by being both efficient and accessible. And the kicker? It does all this at a fraction of the cost—15 cents per million input tokens and 60 cents per million output tokens. For anyone running API calls around the clock, these savings are huge.
But it’s not just about the money. Jim argues that this shift towards efficiency over raw power signifies a new phase in AI development. We’re moving from pushing the boundaries of what's possible to making those capabilities practical and accessible. This is echoed in Murati’s admission that the most exciting stuff in the labs isn’t that far ahead of what’s publicly available.
Jim also touches on how GPT-4o-mini holds its own against other models like Claude, Llama, and Gemma. While it’s not blowing them out of the water, it’s competitive, signaling a saturation point in AI development. This new phase is all about optimization, practical applications, and making powerful tech more accessible.