“The code switching support of SeamlessM4T is pretty cool!” shared a fan with a sense of humor. “It doesn’t do very well with my French or Japanese, but then again neither is very good.”
One Dr. Hubertus Becker questioned the model’s reliability for critical translations, noting, “It’s concerning that an experimental demo can alter the meaning of input words.”
Kalev Leetaru, reporting on SeamlessM4T’s performance in translating Weibo social media posts, cited inconsistent results.
“For some posts it yields translations that compare favorably to both NMT and LLM translations, but with the added cost of having to use language-specific punctuation rules to split into sentences to translate a sentence at a time,” Leetaru explained. “For other posts, it yields subpar translations that can remove or truncate key details, suggesting promise but that it is not quite ready for production use.”
Better than Whisper?
Of course, the more than 60 authors behind the August 22, 2023 paper introducing SeamlessM4T, believe in what they dubbed “the first multilingual system” to translate from and into English for both speech and text.
If the stats behind SeamlessM4T’s training seem somewhat disparate, that might be because the model required training in so many (formerly) separate and siloed tasks. Similarly, the number of languages handled by the model varies by task.
SeamlessM4T can provide automatic speech recognition (ASR) for almost 100 languages; speech-to-text (STT) translation for nearly 100 input and output languages; speech-to-speech translation and text-to-speech translation for nearly 100 input languages and 36 output languages (including English); and traditional “text” translation for close to 100 languages.
Slator Machine Translation Expert-in-the-Loop Report
60-page report on the interaction between human experts and AI in translation production, including AI-enabled workflows, adoption rates, postediting, pricing models.
According to the authors, Meta’s motivation for the new model was to work around the existing separate systems that can complete the above tasks — but generally perform well in only one modality per system.
SeamlessM4T, by contrast, reportedly achieves state-of-the-art results for all these languages while offering “multitask support” in a single model. The paper also asserts that SeamlessM4T outperforms its previous SOTA competitors, namely Whisper and AudioPaLM-2.
Meta has publicly released the contributions to its new model, and encourages researchers and developers to build on this first iteration.