eBay’s New In-House Large Language Model for E-commerce Can Also Translate

They explained that using foundation models that can be accessed and tuned for specific use cases, such as the LLaMA-2 models, “poses a risk in terms of licensing, data safety and future proofing among other things.” Additionally, they noted that “these models are very generic and mostly trained on English-centric data.”

They developed LLMs entirely in-house from scratch using a vast dataset of 3 trillion tokens, comprising both general and e-commerce-specific texts in multiple languages. They used the ParaCrawl corpus along with smaller in-house corpus from the e-commerce domain. This approach ensures their robustness in handling diverse languages and domain-specific tasks.

Additionally, eBay developed their own tokenizer and model vocabulary, customized towards e-commerce. “ This gives us several advantages, namely (i) full control over the vocabulary including special tokens (ii) better support for multilinguality (iii) better adaptation to e-commerce specific use-cases,” they said.

Slator 2024 Language Industry Market Report — Language AI Edition

The 140-page flagship report features in-depth market analysis, language AI opportunities, survey results, and much more.

$970 BUY NOW Included in our Growth, Pro, and
Enterprise plans. Subscribe now!

Eliminating Dependencies

According to the authors, their models perform on par with, or better than, the popular LLaMA-2 models, particularly excelling in non-English machine translation, as well as natural language understanding (NLU) tasks and e-commerce-specific applications.

The authors explained that this performance boost is attributed to the inclusion of significant amounts of non-English and e-commerce-specific data during pretraining, which enhances the models’ understanding and performance on tasks in languages other than English. Moreover, the customized vocabulary for e-commerce tasks resulted in a significant speed-up in text generation, outperforming LLaMA-2 by up to 34%.

The authors expect these models “to be used as a foundation for fine-tuning and instruction-tuning, eliminating dependencies to external models.”

Future efforts will focus on enhancing the data pipeline, incorporating more eBay-specific data, training larger models, and exploring the Mixture-of-Experts architecture for improved efficiency.

Authors: Christian Herold, Michael Kozielski, Leonid Ekimov, Pavel Petrushkov, Pierre-Yves Vandenbussche, and Shahram Khadivi

Featured