Many real-world translation tasks go beyond simple language conversion. They often require following specific guidelines and rules — such as maintaining consistent terminology, adapting date and currency formats, or converting units of measurement to align with local standards. As a result, translation models must not only produce accurate translations but also reliably follow instructions — something that demands both translation and general capabilities.
Speaking to Slator, Ricardo Rei, formerly Senior Research Scientist at Unbabel and now Head of Research at Sword Health, said that TOWER+ is the first family of models that try to go beyond AI translation into AI localization.
“What I mean by this is that the model is trained to not only translate a text from one language to another but also to follow instructions while doing so — at the document or paragraph level,” he explained. To achieve this, they had to “make sure that the model was a strong ‘generalist’ while improving the ‘base’ models capabilities on translation and multilinguality.”
According to the researchers, this is “the first systematic study on balancing translation quality and general-purpose capabilities in open weight LLMs.”
New Benchmark
To evaluate the model’s ability to follow instructions during translation, the researchers introduced IF-MT, a new benchmark that tests both AI translation quality and instruction-following.
TOWER+ achieved top scores on both the translation and instruction-following dimensions, outperforming earlier TOWER models and all open-weight competitors.
The 72B variant performs competitively with GPT-4o and Claude on instruction-following and general capabilities, while maintaining top-tier AI translation performance. The 9B version outperforms much larger open models like Qwen 2.5 72B and LLaMA 3.3 70B. Even the smallest model, TOWER+ 2B, matches or surpasses the general capabilities of models over 30 times its size.
“Our findings highlight that it is possible to rival frontier models in general capabilities, while optimizing for specific business domains, such as translation and localization,” the researchers said.
Rei noted that this opens the door to more complex workflows. “This allows a much more flexible use for the translation industry that is more and more tackling more challenging tasks such as transcreation, content adaptation, etc.”
The researchers described the TOWER+ training pipeline as a “blueprint for adapting LLMs to domain- or task-specific business use cases while preserving general capabilities.”. They also called for more research into handling multilingual and instruction-heavy use cases, especially in low-resource language settings.
All TOWER+ models are publicly available on Hugging Face.
Authors: Ricardo Rei, Nuno M. Guerreiro, José Pombal, João Alves, Pedro Teixeirinha, Amin Farajian, and André F. T. Martins