More specifically, the investigation analyzed three sub-tasks specific to the clinical domain: the models’ accuracy within a specific clinical case by translating 202 Covid-19 reports, their accuracy in using specific clinical terminology, and their accuracy in capturing ontological concepts.
The investigation analyzed each sub-task against automated quality estimation scores, which were then verified with a separate human evaluation step. This human evaluation was carried out by five experts with backgrounds in translation and biomedical research.
Their conclusions underlined that bigger is not better. In fact, the researchers were able to increase the output quality “much higher” once the data had been cleaned for fine-tuning.
Logrus Global’s CEO, Serge Gladkoff, told Slator that “this is a very important conclusion. […] In reality, a large language model gives you worse quality, probably because LLM is trained on much bigger corpus, with conflicting domains, and fine-tuning is weaker […] than on smaller models.”
In their research, Gladkoff also concluded that “in neural technology, data is king. The data is more important than the model.”
Insight into Low-Resource Languages
As part of the experiments, the researchers also set out to understand the impact of transfer learning to low-resource languages across models.
While one of the study’s massive-sized multilingual pre-trained language models, “NLLB-200”, contained data from 200+ language pairs, the other Meta AI model, “WMT21fb”, only had 14 language pairs, of which English<>Spanish was not included.
The researchers considered that English<>Spanish was, therefore, a low-resource language pair for the WMT21fb model, and analyzed the potential for transfer-learning from the Spanish dataset in the NLLB-200 model to the WMT21fb model, specifically in the clinical domain.
“In reality, a large language model gives you worse quality, probably because LLM is trained on much bigger corpus, with conflicting domains, and fine-tuning is weaker […] than on smaller models.” — Serge Gladkoff, CEO, Logrus Global
The study showed that transfer-learning is possible, demonstrating that the WMT21fb model was able to obtain clean segments in a new, low-resource language and that it was capable of producing a “good enough” engine in this language.
Logrus Global’s Gladkoff told Slator that this was “a striking and unexpected side effect” of the study, concluding that these large models “demonstrated amazing ability to learn the language which they were not trained upon.”
The study is the latest piece of research carried out by the authors, who recently analyzed the impact on quality estimation when fine-tuning large language models with post-editing data.