Meta Targets 1,600 Languages in AI Translation

On March 17, 2026, Meta introduced Omnilingual Machine Translation (OMT), a suite of models, datasets, and evaluation tools that extends AI translation support to over 1,600 languages — a significant increase from the roughly 200-language range of its earlier No Language Left Behind (NLLB) models and most existing AI translation systems.

The work points to a broader shift in multilingual AI: from expanding language coverage on paper to improving the ability to generate reliable translations in low-resource languages.

That distinction is central to the paper. Recent large language models (LLMs) have shown strong cross-lingual understanding, meaning they can process many languages even with limited data. However, Meta argues that generating fluent and accurate text in those languages remains a challenge. In practice, a language may be listed as supported while translation quality remains inconsistent or too low for real-world use. Meta describes this as a “generation bottleneck.”

Meta aims to expand the range of languages where translation output is not just possible, but usable — particularly for English into the so-called long-tail languages (i.e., languages with limited digital and parallel data that AI translation systems have historically struggled to support).

This is enabled by a “comprehensive data strategy” that integrates large public multilingual corpora with newly created and mined datasets, synthetic data generation, and two distinct architectural approaches to specializing LLMs for translation:

  • OMT-LLaMA: a decoder-only model built on LLaMA 3 and available in multiple sizes (1B, 3B, and 8B), and
  • OMT-NLLB: an encoder-decoder model built on Meta’s multilingual representation layer, OmniSONAR, and rooted in the company’s earlier NLLB translation approach.

Extending Into the Long Tail — With Limits

Evaluation results show that coverage and quality do not scale evenly. The researchers estimate that Omnilingual MT can “understand sufficiently well” more than 400 languages — meaning it can convey the core meaning of a sentence in most cases — roughly doubling the level achieved by earlier systems, which reached similar performance in about 200 languages.

Beyond this subset, the system extends to a much broader range of languages, showing what Meta describes as “non-trivial” translation performance when translating from around 1,600 languages and into approximately 1,200. This reinforces the central point that understanding languages is easier than generating them.

The improvement is most visible in how far the system can extend into low-resource languages before performance degrades. Meta reports that baseline systems tend to break down after roughly 300-400 languages, while Omnilingual MT maintains meaningful output across about 1,200 languages.

The researchers suggest that Omnilingual MT is “close to solving the ‘understanding’ part of the puzzle in MT” as it doubles the number of reasonably well-understood source languages compared to previous massively multilingual models. At the same time, it “substantially expands the set of languages for which coherent generation is feasible.” However, they caution that “we remain far from completely ‘solving’ machine translation for the long tail of languages.”

A small-scale human evaluation, covering only 57 directions, confirms the “significant progress Omnilingual MT made in challenging language pairs but shows that much more progress is yet to follow.”

Meta also reports efficiency gains from specialization. Its specialized models in the 1B–8B parameter range achieve performance comparable to or exceeding that of a 70B general-purpose LLM on translation tasks, revealing that “specialization, not scale, is perhaps a more reliable path to high-quality multilingual translation.”

It also highlights that translation quality can be further improved for specific languages through targeted techniques such as fine-tuning and retrieval-augmented generation, pointing to potential customization paths beyond out-of-the-box performance.

Building the Evaluation Layer

Alongside the models, the team introduced a set of supporting resources aimed at improving training and evaluation at scale:

  • BOUQuET, a benchmark covering 275 languages (first introduced in early 2025 with 9 pivot languages and since expanded)
  • Met-BOUQuET, a large-scale human evaluation dataset
  • MeDLEy, a training dataset focused on extremely low-resource languages
  • BLASER 3, a reference-free quality estimation model
  • OmniTOX, a multilingual toxicity classifier

These resources address a persistent challenge in multilingual AI translation: the lack of standardized evaluation data across a wide range of languages. 

The researchers noted that “large-scale evaluation remains a major bottleneck for multilingual MT,” adding that “without reliable evaluation for long-tail languages, progress becomes difficult to measure and even harder to compare across systems.” 

By releasing these datasets and tools, Meta aims to improve comparability and measurement in large-scale multilingual translation, while highlighting the need for evaluation frameworks that can scale alongside increasingly broad language coverage.

Toward a Multilingual AI Foundation

While Omnilingual MT is designed for translation, Meta positions it as a potential foundation for broader multilingual AI. The system could be extended to support reasoning, dialogue, and multimodal applications across thousands of languages. Combined with large-scale speech recognition systems, such as Meta’s Omnilingual ASR, the approach could also enable speech-to-text translation at unprecedented scale.

More broadly, the work reflects a shift in how multilingual systems are developed. As Meta puts it, “Omnilingual MT demonstrates that scaling multilingual translation is not simply a matter of increasing the number of supported languages, but of rethinking how MT systems are built, trained, and evaluated.” 

Authors: The Omnilingual MT Team, Belen Alastruey, Niyati Bafna, Andrea Caciolai, Kevin Heffernan, Artyom Kozhevnikov, Christophe Ropers, Eduardo Sánchez, Charles-Eric Saint-James, Ioannis Tsiamas, Chierh Cheng, Joe Chuang, Paul-Ambroise Duquenne, Mark Duppenthaler, Nate Ekberg, Cynthia Gao, Pere Lluís Huguet Cabot, João Maria Janeiro, Jean Maillard, Gabriel Mejia Gonzalez, Holger Schwenk, Edan Toledo, Arina Turkatenko, Albert Ventayol-Boada, Rashel Moritz, Alexandre Mourachko, Surya Parimi, Mary Williamson, Shireen Yates, David Dale, and Marta R. Costa-jussà