MMLU is not a dedicated translation benchmark, however, it is a good indicator of a model’s ability to understand and generate accurate text in a given language.
The company reportedly used professional human translators to translate the MMLU test set into 13 languages, and evaluate knowledge and problem solving capabilities across languages. The system card states that language understanding is “generally on par” with existing models.
The published results indicate, however, that when compared to OpenAI’s o3-high model, the company’s GPT-5-main model shows marginally weaker performance across all 13 languages.
Results for the GPT-5-thinking model show that performance is on par in Brazilian Portuguese, is slightly worse in Arabic, French, German, Italian, and Spanish, but marginally better in Bengali, Chinese, Hindi, Indonesian, Japanese, Korean, Swahili and Yoruba.
Implications for the Language Industry
While the MMLU metrics do not represent a significant advance in multilingual performance, GPT-5 comes with expanded context windows, and the ability to adjust the reasoning effort, giving users more control over latency in the API.
Michelle Pokras, an OpenAI researcher, explained, “For the first time, releasing a new parameter option for reasoning effort called ‘minimal’. This is so that you can use these reasoning models, but with minimal reasoning so that they can slot into the very fastest and most latency sensitive applications.”
“Now you don’t actually have to choose between a bunch of models, and you can use GPT-5 for all of your use cases and just dial in the reasoning effort,” she concluded.
In addition, OpenAI has focused on improving accuracy in health interactions, and has improved the expressiveness and accuracy of its voice capabilities. Ruochen Wang, Multimodal Researcher at OpenAI said in the release livestream that the company has “been steadily improving voice over the past year to make it more useful for everyone.”
He added: “First, it sounds incredibly natural just like you’re talking to a real person. Second, we’ve added the video so that it sees what you see while chatting with you. Third, you also translate between languages consistently and smoothly across turns.”
OpenAI’s GPT-5 release notes state, however, that “Voice mode is still powered by GPT-4o”.
While Wang did not elaborate further on the ability to translate between languages, the new model’s focus on health use cases could have an impact on healthcare AI interpreting applications.
OpenAI anticipates “early adoption to drive industry leadership on what’s possible with AI powered by GPT‑5, leading to better decision-making, improved collaboration, and faster outcomes on high-stakes work for organizations.”
The company confirmed that GPT-5 is being rolled out to all users on ChatGPT Plus, Pro, Team, and Free plans worldwide across web, mobile, and desktop.