Furthermore, the availability of “several existing reliable scoring and evaluation methods”, including Metric X, BLEURT, and COMET, which can serve as reward models, makes it possible to objectively evaluate the effectiveness of ReST, enhancing the credibility of the research.
To ensure the versatility of their approach, the researchers tested ReST on diverse benchmark datasets — encompassing IWSLT 2014, WMT 2020, and an internal Web Domain dataset — and across different language pairs. “We selected a different language pair for each dataset to test the generality of the results,” they said.
In addition to automated metrics, the researchers conducted human evaluations to ensure that ReST aligns with human preferences. These evaluations involved human raters who assessed translations on a scale from 0 to 6, adding a qualitative dimension to the assessment.
ReST Improves Translation Quality
The results demonstrated ReST’s ability to significantly improve translation quality, as indicated by both automated metrics and human evaluation on MT benchmarks.
According to the researchers, what sets ReST apart is its efficiency. It outperforms online reinforcement learning methods in terms of sample and compute efficiency because it generates training data offline, allowing for data reuse.
As highlighted by techno-optimist and AI accelerationist Far El in a tweet, ReST represents “1 more step towards fully autonomous machines and the beginning of the end of manual finetuning”.
Beyond MT, ReST exhibits promising potential in various generative learning settings, including summarization, turn-based dialogue, and generative audio and video models, as emphasized by the authors.
This adaptability positions ReST as a versatile methodology for advancing reinforcement learning from human feedback (RLHF) across a broad spectrum of language-related tasks, they concluded.
Authors: Caglar Gulcehre, Tom Le Paine, Srivatsan Srinivasan, Ksenia Konyushkova, Lotte Weerts, Abhishek Sharma, Aditya Siddhant, Alex Ahern, Miaosen Wang, Chenjie Gu, Wolfgang Macherey, Arnaud Doucet, Orhan Firat, Nando de Freitas