In both cases, it was Gemini Pro that outperformed Whisper. (Gemini comes in three “sizes.” They are, from largest to smallest: Ultra, for highly complex tasks; Pro, for scaling across many tasks; and Nano, for working efficiently on devices.)
Testing on the 21-language CoVoST 2 benchmark for automatic speech translation, Gemini’s BLEU score was 40.1, compared to Whisper v2’s 29.1.
Google researchers evaluated ASR performance based on the 62-language FLEURS benchmark and using word error rate (for which lower scores indicate better performance). Whisper v3’s word error rate was 17.6%, while Gemini’s was 7.6%.
The Best of Bard
In a Google product update, Google VP and General Manager Sissie Hsiao explained that Gemini Pro is now integrated into Bard, Google’s GenAI chatbot.
While Gemini 1.0 was trained to respond to a range of input — including text, images, and audio — Bard with Gemini Pro can currently handle only text-based prompts, “with support for other modalities coming soon.” Confusingly, Bard Gemini Pro’s linguistic capabilities are also limited for the time being, reportedly accessible in English only, albeit in more than 170 countries and territories. Google plans to expand coverage to “more languages and places, like Europe, in the near future.”
At the time of writing, Bard was still able to respond to prompts in multiple languages, including prompts to translate, and in one instance even provided a list of “new words” in a non-English language, along with (mostly correct) transliterated pronunciations.
Bard’s responses regarding its linguistic offerings, however, were inconsistent. Bard was also unable to handle non-English prompts via audio for ASR, but helpfully recommended other online tools that (it said) could.
Fans and Critics
Gemini has already inspired pundits on social media to wax poetic about AI in general and about Google in particular. (Investors also reacted, with Google’s stock price jumping the day of the release.)
“If you were impressed by OpenAI’s ChatGPT, prepare to have your mind blown by Google’s Gemini,” Aaron Francesconi, IRS Director of Data Management Services and Support, wrote on LinkedIn.
Slator Pro Guide: Translation AI
The Slator Pro Guide presents 10 new and impactful ways that LLMs can be used to enhance translation workflows.
Linus Ekenstam praised Google’s strategy in a thread on X with more than 10,000 likes: “Instead of chasing hype, they have been laser-focused on certain things. Maybe much like Amazon, they win not by being the first, but by being the best.”
However, the duck-drawing demo Ekenstam called “jaw-dropping” has been outed by TechCrunch as “faked.” Not wasting any time, MIT Technology Review suggested — as soon as Gemini launched — that it “could signal peak AI hype.”
Wharton professor Ethan Mollick took a similarly measured approach, writing on X, “We really don’t know anything about Gemini Ultra. Does it beat GPT-4 for real? If so, why by such a small amount?”
Mollick went on to wonder aloud whether “the failure to crush GPT-4 shows limits of LLMs approaching.”
It seems, though, that nothing can dampen Bard’s enthusiasm, with the chatbot itself gushing that “initial testing and user feedback suggest that Gemini significantly improves the quality of Bard’s translations. As Gemini continues to evolve, we can expect further improvements in accuracy, fluency, and overall translation quality.”