He explained that while single-agent workflows rely on a single AI agent to complete tasks independently — for example, by calling a translation system API or prompting a large language model (LLM) for translation — multi-agent workflows bring together several specialized agents — each with a distinct role — that collaborate toward a shared goal.
Although single-agent systems work well for simpler translation tasks, Briva-Iglesias argues that multi-agent systems are particularly promising for complex scenarios that require high accuracy, domain-specific knowledge, and contextual awareness.
He highlights four core strengths of AI agents for AI translation:
- Autonomy — AI agents operate independently once given clear instructions.
- Tool use — they can integrate glossaries, translation memories, or retrieval-augmented generation for better accuracy.
- Memory — through feedback loops, AI agents refine output over time.
- Customizable workflows — flexible architectures allow adaptation to domain-specific needs, scalability, and rigorous quality control.
Briva-Iglesias also describes five distinct multi-agent workflow architectures.
Greater Domain Adaptation and Content Preservation
To test the potential of multi-agent workflows in AI translation, he conducted a pilot study in legal translation using a multi-agent system of four specialized AI agents: one for initial translation, one for adequacy review, one for fluency review, and one for final editing.
“This structure simulates real-world translation processes in legal settings, where consistency, terminology accuracy, and compliance are paramount,” he explained.
He compared six setups: four multi-agent configurations (using either large DeepSeek R1 or smaller GPT-4o-mini models, with variations in temperature settings to balance creativity and precision) and two industry-standard AI translation systems, DeepL and Google Translate.
He found that multi-agent setups with larger models clearly outperformed setups with smaller models, and the best-performing configurations — combining creative generation (high-temperature) with precise review (low-temperature) — outperformed DeepL and Google Translate in human evaluation, despite having no access to external resources like translation memories or legal termbases.
Slator 2025 Legal Services and Language AI Report
The 90-page report analyzes language services, AI, and technology demand from legal services organizations in the public and private sector.
“The inclusion of memory, RAG, domain- and client-specific databases, and more granular agent role customization could further improve performance,” Briva-Iglesias noted, suggesting that future studies explore how such additional tools and refined role assignments impact translation quality.
The multi-agent system consistently delivered more contextually and terminologically correct output, properly formatting currency figures, and maintaining conceptual consistency.
“This suggests that integrating multiple specialized agents into MT workflows may allow for greater domain adaptation and content preservation, particularly in high-stakes fields such as legal and medical translation,” he said.
The paper, he notes, sets the stage for further research into multi-agent applications in AI translation, from identifying the optimal multi-agent configurations to integrating these setups into professional translation workflows, and balancing cost and performance.
A public demo of the multi-agent system — which supports different language combinations, models, and files — is available here for further analysis and replication.