e-Commerce Peculiarities
According to the researchers, conventional MT methods may create issues in e-commerce content translation like low accuracy and keyword omission and duplication. By contrast, they argue, their “general-to-specific” (G2ST) approach for model training obtains better results.
The G2ST methodology uses two-phase fine-tuning and contrastive enhancement steps to enhance results (contrastive enhancement is a method in which different candidate translations are compared and the model learns to choose the better translation). It works by “transferring” a general MT model to an e-commerce-specific MT model.
Slator Pro Guide: Translation AI
The Slator Pro Guide presents 10 new and impactful ways that LLMs can be used to enhance translation workflows.
Pro Guide: Translation AI
To improve the models’ translation performance, the researchers first collected domain-related resources. These included aligned Chinese-English terms and a parallel corpus annotated specifically for e-commerce.
The first preparatory task consisted of expanding the model vocabulary size, particularly domain-related word pairs, explained the researchers. Chinese-English-aligned term pairs were sourced from Alibaba.com and ChatGPT 2 and were used for the first model fine-tuning task.
The resulting parallel corpus was in turn annotated for the second fine-tuning phase. The next step, contrastive enhancement, allowed the researchers to improve the lexical representation capability of the model.
For their experiments, the Alibaba researchers used their newly curated Chinese-English corpora on SOTA NMT and LLM models, including LLaMA, Qwen, GPT-3.5, and GPT-4.
Results showed that with the G2ST methodology, LLaMA2 outperformed LLaMA. The researchers used the SacreBLEU, Rouge-1, Rouge-2, and Rouge-L metrics for their tests, and found the performance of other models to be comparable. However, the Qwen-14B model rendered the best translation scores.
The researchers intend to further test their methodology on multilingual machine translation.