Applying Word-Level Dictionary Data Directly
The methodology they used, dubbed “Dictionary-based Prompting for Machine Translation,” or DIPMT, goes a step further than previous research into low resource and domain transfer because there is no need for model training. Instead, DIPMT uses prompting-based translation, for which word-level dictionary data is input directly into the prompt.
The researchers based their experiments on two large language models, OPT for English and BLOOM for the multilingual set. For the out-of-domain evaluation, they used data from Aharoni and Goldberg (medical, law, IT, and Koran). They also removed from the training set sentences longer than 250 tokens and sentence pairs with a source/target length ratio of more than 1.5.
The experimental data set included (a) the source sentence (along with the translation instruction and the target language); (b) the dictionary-based word-level translations; and (c) the translation to the target language, which the model is expected to generate.
For the baseline, researchers used a prompt format without dictionary-based word-level translations. The baseline had two parts: (a) the source sentence (along with the translation instruction and the target language); and (b) the translation into the target language.
Further Improvement with Domain-Specific Dictionaries
The researchers experimented with translation to and from English using multiple languages and the above-mentioned language models, as well as out-of-domain translation. Out-of-domain data is known to impact translation quality precisely because it would not typically be included in model pre-training.
Domain-specific bilingual dictionaries are not as easy to find as general dictionaries, if they even exist. Likewise, not all source word types have a dictionary equivalent. To account for these factors, the researchers employed parallel data available for each domain and created domain-specific dictionaries.
Slator Machine Translation Expert-in-the-Loop Report
60-page report on the interaction between human experts and AI in translation production, including AI-enabled workflows, adoption rates, postediting, pricing models.
The results of the experiments showed that the methodology based on dictionary inputs, adding possible word-level equivalents via prompting, outperformed the baseline by an average of 9:4 BLEU points.
The DIPMT approach appears to be promising as a way to improve MT quality, especially in domain-specific content (compared to other methodologies, such as domain-specific data augmentation through back translation).
The next step would be to include human experts in the evaluation phase. This would be a “data full circle” of sorts since the dictionaries and the parallel data used for prompting were all produced by humans.