Moreover, they underscored that current MT research with LLMs primarily focuses on pre-training for zero-shot MT or fine-tuning to enhance zero-shot capabilities. While some works have explored pre-training or fine-tuning encoder-decoder MT models for adaptive MT, there is a need for research specifically on “fine-tuning available open-source models to enhance their in-context learning ability for real-time adaptive MT,” they said.
These models can be fine-tuned to perform better at in-context learning scenarios, where specific prompt templates include in-domain sentences, phrases, or terminology. “This direction can improve both translation quality and efficiency, especially as fewer examples might be required for in-context learning,” they highlighted.
For the fine-tuning process, the authors used 10,000 segments with zero-shot and 10,000 segments with one-shot translation prompts, with zero-shot prompts representing regular translation without any context, and one-shot prompts introducing fuzzy matches for improved adherence to domain terminology and style. They focused on the Spanish-to-English language pair, and for the evaluation they used BLEU, chrF++, TER, and COMET metrics.
Quality Gains and Efficient Self-Hosting
The experiments for the English-to-Spanish medical domain demonstrated that, with the relatively small dataset of 20,000 segments, fine-tuning significantly enhanced Mistral’s in-context learning ability, especially for real-time adaptive MT.
In comparison with GPT-3.5-turbo and NLLB 3.3B, the fine-tuned Mistral 7B outperformed GPT-3.5-turbo in zero-shot translation while achieving comparable one-shot translation quality. Additionally, the zero-shot translation of the fine-tuned Mistral matched NLLB 3.3B’s performance, with its one-shot translation quality surpassing that of NLLB 3.3B.
“These findings emphasize the significance of fine-tuning efficient LLMs like Mistral 7B,” the authors said. They also highlighted that a fine-tuned small “standalone” LLM can be more efficient than using two models — conventional MT and LLM — at translation time.
Furthermore, fine-tuning open-source LLMs offers the benefit of “efficient self-hosting”, allowing individuals to deploy their LLMs for privacy while achieving quality gains comparable to commercial models.
Finally, the authors expressed the intention to experiment with other domains and language pairs, including low-resource languages, and with other multilingual LLMs.
Note: The code used for these experiments is publicly available on GitHub.