Europe’s Large Multilingual Vision-Language Models Hit the Stage

On June 10, 2025, Andre Martins, Head of Research at Language Solutions Integrator Unbabel, announced the release of a preview version of two large multilingual vision-language models (VLMs), EuroVLM-1.7B and EuroVLM-9B, developed in collaboration with Instituto Superior Técnico at the University of Lisbon.

The models support 35 languages, including all 24 official EU languages, along with Arabic, Catalan, Chinese, Galician, Hindi, Japanese, Korean, Norwegian, Russian, Turkish, and Ukrainian.

Built on EuroLLM — a multilingual text-only model — the new VLMs combine strong multilingual language understanding with vision encoding. This enables them to process images alongside text and perform a range of multilingual vision-language tasks, such as:

generating image descriptions in multiple languages,
answering questions about visual content,
following complex instructions that involve both visual analysis and text generation,
translating image captions and descriptions across supported languages,
processing and analyzing documents, charts, and diagrams with multilingual text.

The researchers caution that the models are still under development and should not be deployed in production without appropriate safety measures.

The EuroVLM launch closely follows the public release of a technical report for EuroLLM-9B.

“We’re on a mission to build open, multilingual AI models with ever-expanding capabilities — accessible to everyone,” — Andre Martins, Head of Research, Unbabel

Combining Multilinguality and Multimodality

In a comment shared with Slator, Martins emphasized the strategic importance of combining multilinguality and multimodality. “I believe the future of AI will be both multilingual and multimodal,” he said.

“Relying on text-only models today is like watching black-and-white television in a world that’s rapidly shifting to full color,” he explained. Most current vision-language models, Martins noted, remain heavily English-centric, which can reinforce Anglo-Saxon cultural norms and limit their ability to reflect diverse real-world contexts. “That’s why the intersection of multilinguality and multimodality is so powerful — and why we’re so excited about the new EuroVLM model,” he added.

According to Martins, EuroVLM is a step toward “cultural intelligence at scale,” capable of understanding context-rich visuals and describing them in culturally relevant language.

And That’s Not All

The EuroVLM release comes amid a flurry of updates under the EuroLLM project. A preview version of EuroLLM-22B (base and instruct) was also released this week, trained on 3 trillion tokens and outperforming the earlier 9B model, according to Martins.

Also new is EuroMoE (base and instruct), a lightweight mixture-of-experts (MoE) model — a type of architecture that activates only parts of the model during inference — with only 600 million active parameters. EuroMoE outperforms EuroLLM-1.7B, making it a strong candidate for edge-device deployment.

Martins said the final versions of EuroLLM-22B and EuroMoE are expected in the coming weeks.

MAIN IMAGE - 2025 Language Industry Market Report

Slator 2025 Language Industry Market Report

The 150-page report offers a comprehensive view of the 2025 global market — with market sizing, AI capability breakdowns, buyer insights, use cases, survey data, and projections through 2030.

$970 BUY NOW Included in our Growth, Pro, and
Enterprise plans. Subscribe now!

Next Stop: Speech and Video

Looking ahead, the consortium plans to expand into speech and video, Martins told Slator, aiming at building systems that “can reason across languages and modalities.”

“We’re on a mission to build open, multilingual AI models with ever-expanding capabilities — accessible to everyone,” he said.

As part of this mission, they are partnering with NVIDIA to bring these models into real-world applications. EuroLLM models are also now available as NVIDIA NIMs (Inference Microservices), simplifying integration and deployment in production use cases.

“We believe these models will be instrumental in addressing real-world challenges across a variety of sectors — including localization, healthcare, finance, legal, and public administration,” Martins concluded.

Featured