Clinical Translations Better with Smaller Language Models, Research Finds

Researchers at Logrus Global, Ocean Translations, and the University of Manchester have revealed that fine-tuning small-sized language models in the clinical domain produces significantly better translations when compared to extra-large language models.

In a paper published on December 12, 2023, on pre-print service Arxiv, the group stated that this conclusion is the first of its kind in the clinical field.

The researchers set out to study the performance of machine translation quality for clinical texts between English and Spanish across three models: one small multilingual pre-trained language model, fine-tuned with clinical data, and two massive-sized multilingual pre-trained language models, both by Meta AI.

More specifically, the investigation analyzed three sub-tasks specific to the clinical domain: the models’ accuracy within a specific clinical case by translating 202 Covid-19 reports, their accuracy in using specific clinical terminology, and their accuracy in capturing ontological concepts.

The investigation analyzed each sub-task against automated quality estimation scores, which were then verified with a separate human evaluation step. This human evaluation was carried out by five experts with backgrounds in translation and biomedical research.

Their conclusions underlined that bigger is not better. In fact, the researchers were able to increase the output quality “much higher” once the data had been cleaned for fine-tuning.

Logrus Global’s CEO, Serge Gladkoff, told Slator that “this is a very important conclusion. […] In reality, a large language model gives you worse quality, probably because LLM is trained on much bigger corpus, with conflicting domains, and fine-tuning is weaker […] than on smaller models.”

In their research, Gladkoff also concluded that “in neural technology, data is king. The data is more important than the model.”

Insight into Low-Resource Languages

As part of the experiments, the researchers also set out to understand the impact of transfer learning to low-resource languages across models.

While one of the study’s massive-sized multilingual pre-trained language models, “NLLB-200”, contained data from 200+ language pairs, the other Meta AI model, “WMT21fb”, only had 14 language pairs, of which English<>Spanish was not included.

The researchers considered that English<>Spanish was, therefore, a low-resource language pair for the WMT21fb model, and analyzed the potential for transfer-learning from the Spanish dataset in the NLLB-200 model to the WMT21fb model, specifically in the clinical domain.

“In reality, a large language model gives you worse quality, probably because LLM is trained on much bigger corpus, with conflicting domains, and fine-tuning is weaker […] than on smaller models.” — Serge Gladkoff, CEO, Logrus Global

The study showed that transfer-learning is possible, demonstrating that the WMT21fb model was able to obtain clean segments in a new, low-resource language and that it was capable of producing a “good enough” engine in this language.

Logrus Global’s Gladkoff told Slator that this was “a striking and unexpected side effect” of the study, concluding that these large models “demonstrated amazing ability to learn the language which they were not trained upon.”

The study is the latest piece of research carried out by the authors, who recently analyzed the impact on quality estimation when fine-tuning large language models with post-editing data.

Featured