- generating image descriptions in multiple languages,
- answering questions about visual content,
- following complex instructions that involve both visual analysis and text generation,
- translating image captions and descriptions across supported languages,
- processing and analyzing documents, charts, and diagrams with multilingual text.
The researchers caution that the models are still under development and should not be deployed in production without appropriate safety measures.
The EuroVLM launch closely follows the public release of a technical report for EuroLLM-9B.
“We’re on a mission to build open, multilingual AI models with ever-expanding capabilities — accessible to everyone,” — Andre Martins, Head of Research, Unbabel
Combining Multilinguality and Multimodality
In a comment shared with Slator, Martins emphasized the strategic importance of combining multilinguality and multimodality. “I believe the future of AI will be both multilingual and multimodal,” he said.
“Relying on text-only models today is like watching black-and-white television in a world that’s rapidly shifting to full color,” he explained. Most current vision-language models, Martins noted, remain heavily English-centric, which can reinforce Anglo-Saxon cultural norms and limit their ability to reflect diverse real-world contexts. “That’s why the intersection of multilinguality and multimodality is so powerful — and why we’re so excited about the new EuroVLM model,” he added.
According to Martins, EuroVLM is a step toward “cultural intelligence at scale,” capable of understanding context-rich visuals and describing them in culturally relevant language.
And That’s Not All
The EuroVLM release comes amid a flurry of updates under the EuroLLM project. A preview version of EuroLLM-22B (base and instruct) was also released this week, trained on 3 trillion tokens and outperforming the earlier 9B model, according to Martins.
Also new is EuroMoE (base and instruct), a lightweight mixture-of-experts (MoE) model — a type of architecture that activates only parts of the model during inference — with only 600 million active parameters. EuroMoE outperforms EuroLLM-1.7B, making it a strong candidate for edge-device deployment.
Martins said the final versions of EuroLLM-22B and EuroMoE are expected in the coming weeks.
Slator 2025 Language Industry Market Report
The 150-page report offers a comprehensive view of the 2025 global market — with market sizing, AI capability breakdowns, buyer insights, use cases, survey data, and projections through 2030.
Next Stop: Speech and Video
Looking ahead, the consortium plans to expand into speech and video, Martins told Slator, aiming at building systems that “can reason across languages and modalities.”
“We’re on a mission to build open, multilingual AI models with ever-expanding capabilities — accessible to everyone,” he said.
As part of this mission, they are partnering with NVIDIA to bring these models into real-world applications. EuroLLM models are also now available as NVIDIA NIMs (Inference Microservices), simplifying integration and deployment in production use cases.
“We believe these models will be instrumental in addressing real-world challenges across a variety of sectors — including localization, healthcare, finance, legal, and public administration,” Martins concluded.