The benchmark, referred to as “machine translation (MT) from one book,” or MTOB, appears to be a benchmark established prior to Gemini 1.5 Pro — albeit relatively recently, in a 2023 paper.
The paper’s conclusions echo claims of human parity that continue to crop up in MT-related headlines from time to time, specifically estimating that “when given a grammar manual for Kalamang […] the model learns to translate English to Kalamang at a similar level to a person learning from the same content.”
Slator Pro Guide: Translation AI
The Slator Pro Guide presents 10 new and impactful ways that LLMs can be used to enhance translation workflows.
“This sounds extremely dubious,” one skeptic retorted on X. “Aren’t there only 200 people who could say whether the translation was any good? Did they weigh in?”
But fans were undeterred. One observer asked whether the LLM might be made available for trial runs in other languages, such as Icelandic. Others called Gemini 1.5 Pro “really impressive” and “mind-blowing, even in a post-GPT4 world.” (The release of Gemini 1.5 Pro comes on the heels of a February 2024 research paper highlighting Gemini as a “valuable tool” for MT.)
Google practically invited such comparisons, specifically stating in its paper that Gemini 1.5 Pro outperformed specialist models, such as OpenAI’s Whisper, at audio comprehension, including tasks with longer-context audio.
The model’s predecessor similarly outperformed Whisper on this task, although the latest experiments also covered Gemini 1.5 Pro’s main claim to fame, the ability to handle long context, defined here as 700,000 words for text and 40-105 minutes of video. The more significant finding for the authors was the fact that Gemini 1.5 Pro’s long context capabilities did not compromise its audio comprehension.