These entities often present significant challenges, even for human translators, due to cultural nuances, domain-specific references, or their rarity in training data. “The named entities may be entities that are rare, ambiguous, or unknown to the machine translation system,” the researchers explained.
Named entities play a crucial role in real-world MT applications, from translating official documents to localizing media content. Errors in translating these entities can lead to miscommunication, cultural misunderstandings, or even reputational damage. This task aims to close this gap, enabling MT systems to handle such entities more effectively across diverse domains and use cases.
“We believe that the ability to accurately translate named entities is crucial for machine translation systems to be effective in real-world scenarios,” they highlighted.
Participation
The Sapienza NLP group and Apple are encouraging participants to propose novel approaches to address these challenges. Submissions for the EA-MT task are open until January 31, 2025, at 23:59 UTC-12.
The EA-MT task is part of SemEval-2025, a long-standing series of international NLP workshops dedicated to advancing semantic analysis through high-quality annotated datasets and groundbreaking research.
Participants will also have the opportunity to submit a paper and showcase their work at the SemEval workshop, taking place from July 31 to August 1, 2025, during the 63rd Annual Meeting of the Association for Computational Linguistics in Vienna, Austria.
The task offers multiple avenues for participation:
- Fine-tuning Pre-Trained Models: Leverage popular MT models like MarianMT, M2M-100, or T5, or fine-tune LLMs such as Llama-3 or Qwen-2.
- Developing New MT Systems: Build systems incorporating named entity recognition (NER), entity linking (EL), or data augmentation techniques.
- Using External Systems: Enhance models with APIs or commercial LLMs, such as GPT-4 or Gemini.
2024 Slator Pro Guide: Translation AI
The 2024 Slator Pro Guide presents 20 new and impactful ways that LLMs can be used to enhance translation workflows.
Beyond the Best Model
The shared task emphasizes learning and collaboration over competition. Participants are encouraged to explore innovative techniques, even if they don’t result in the “best” model.
For example, a fast, resource-efficient model that performs slightly below the best system might be more practical for real-world scenarios. Similarly, some techniques may be more effective for certain types of named entities or language pairs.
Researchers are also urged to share negative results — approaches that didn’t work — to help the community avoid similar pitfalls.
The EA-MT task will focus on translating from English into 11 target languages: Arabic, Chinese, French, German, Italian, Japanese, Korean, Spanish, Thai and Turkish. Future editions may expand this list to include additional language pairs.