The working group itself will not test or benchmark QE systems directly. Instead, it aims to develop evaluation guidance and testing “recipes” that enterprises, Language Solutions Integrators (LSIs), and researchers can apply when assessing QE systems on their own data and workflows.
The effort aims to create a structured evaluation approach similar in spirit to the Multidimensional Quality Metrics (MQM) framework, the widely used framework for assessing translation quality.
The initiative also aims to provide greater clarity for decision-makers and help localization professionals in conversations with internal stakeholders about QE adoption and evaluation.
The working group operates under the auspices of AMTA and is explicitly user-led. Membership is limited to QE users and researchers, and employees of companies that sell QE systems are excluded in order to maintain neutrality.
The group has been meeting since December 2025 and currently includes around 20 members from translation buyers, LSIs, and research organizations, according to working group coordinator Evelyn Yang Garland.
“All members are QE users or researchers and have pledged to be neutral and fair to all QE systems,” Garland told Slator. “QE vendors are not eligible for membership, but we plan to release our work products publicly later this year and welcome everyone’s feedback at that time,” she added.
Garland also said that the working group would welcome additional participants in the coming months, particularly translation buyers and LSIs interested in applying the framework to their own data and real-world projects.
“QE vendors are not eligible for membership, but we plan to release our work products publicly later this year and welcome everyone’s feedback at that time.” — Evelyn Yang Garland
The initiative runs from December 2025 through June 2026, with the group planning to present its results at the AMTA Conference in Québec City from August 31 to September 2, 2026. The outputs — including methodology recommendations and testing guidance — will be owned by AMTA and released under an open Creative Commons Attribution (CC BY 4.0) license, allowing others to reuse and build upon the work as long as proper attribution is given.
“Assessing the quality of a translation, whether produced manually or automatically, has always been complex, challenging, and prone to subjectivity,” said Jay Marciano, President of AMTA, in a statement shared with Slator.
“With the advent of automatic quality assessment technologies, we […] now need unbiased methods for evaluating the success of those tools. And that is the aim of this working group,” he added.
The launch comes at a time when evaluation methods themselves are evolving, with growing research into LLM- and agent-based approaches to quality assessment.
“Assessing the quality of a translation, whether produced manually or automatically, has always been complex, challenging, and prone to subjectivity.” — Jay Marciano, President, AMTA
As automated evaluation becomes more sophisticated and QE systems become more embedded in enterprise AI translation workflows, the absence of standardized evaluation methods for these systems has become more visible.
The AMTA initiative could help bring greater transparency and comparability to a growing segment of the language AI stack.