Despite these advancements, MT researchers still struggle to find tools that can fully interpret and evaluate the performance of MT models at a granular level and are also user-friendly. In an October 20, 2024 paper, researchers from the University of California, Santa Barbara, and Carnegie Mellon University underscored the “necessity for an integrated solution that combines comprehensive model evaluation with user-friendly interfaces and advanced analytical capabilities.”
Translation Canvas
To address this need, they developed Translation Canvas, an evaluation toolkit focused on explainability, accessibility, and flexibility.
Slator Translation as a Feature (TaaF) Report
The Slator Translation as a Feature (TaaF) Report is a vital and concise guide on how AI translation is becoming an integral feature in enterprise technology.
Translation Canvas offers an intuitive interface and supports fine-grained evaluations, pinpointing specific error spans and providing natural language explanations. The toolkit currently incorporates three evaluation metrics — BLEU, COMET, and InstructScore — giving researchers multiple perspectives on model performance.
The Translation Canvas dashboard provides a comprehensive view of MT model performance, detailing the distribution of errors and enabling comparative analysis between MT systems. This helps researchers quickly identify areas where a model underperforms relative to others. Additionally, the tool includes a robust search function that enables researchers to filter results by error type, severity, or content, making targeted analysis more efficient.
The researchers noted that Translation Canvas is designed specifically for the translation research community, “where understanding the nuances of model errors and performance is vital for further improvements.”
While previous tools, like Ghent University’s MATEO project, offered a web-based platform for diverse metrics, Translation Canvas builds on this by integrating natural language error explanations, powered by InstructScore, and advanced instance-level analysis.
Useful, Enjoyable, and Easy To Use
To assess the effectiveness of Translation Canvas, the researchers conducted a user evaluation study with participants experienced in MT and existing MT evaluation metrics.
Users rated it high for both enjoyability and usability, particularly appreciating the highlight of error types and the quick analysis process. The graph presentations and error sorting saved significant time in fine-grained analysis, and the support for multi-system analysis was highlighted as a key usability feature.
“Our evaluation shows that users find the system to be useful, enjoyable and as easy to use as command-line evaluation tools,” the researchers said.
2024 Slator Pro Guide: Translation AI
The 2024 Slator Pro Guide presents 20 new and impactful ways that LLMs can be used to enhance translation workflows.
Even those new to MT evaluation could quickly get started — first-time users reported taking only ten minutes to begin working with a custom dataset. This ease of use reflects the system’s effective balance between functionality and user-friendliness, meeting the need for tools that support both rapid onboarding and sophisticated analysis.
The researchers acknowledge that human evaluation remains essential for capturing the subtleties of translation quality, and they plan to further improve Translation Canvas based on user feedback. With user permission, they will collect feedback on source texts, references, model outputs, and rankings to continuously refine the tool. Users can revoke permission at any time, ensuring control over their data and feedback.
Authors: Chinmay Dandekar, Wenda Xu, Xi Xu, Siqi Ouyang, and Lei Li