Another significant obstacle is the scarcity of diverse training data containing accurate gender forms.
To address these challenges, the researchers proposed a two-fold solution: leveraging large language models (LLMs) to rectify gender-specific translations in training data and fine-tuning ST models with this corrected dataset.
2024 Slator Pro Guide: Translation AI
The 2024 Slator Pro Guide presents 20 new and impactful ways that LLMs can be used to enhance translation workflows.
Gender-Debiased Translations
Specifically, they used GPT-4 to generate both masculine and feminine versions for the speaker for a subset of the training data (2 million utterances). This allowed them to create “gender-debiased training targets,” ensuring the outputs aligned with the speaker’s identity. With this enhanced dataset, they fine-tuned the ST models to accurately infer gender directly from audio inputs.
To provide users with greater flexibility, the researchers also introduced a “three-mode” training:
- Masculine Mode — produces translations exclusively in the masculine form.
- Feminine Mode — produces translations exclusively in the feminine form.
- Auto Mode — automatically infers and applies gender-specific translations based on audio cues.
“Our work proposes to adapt the ST model architecture that can generate accurate speaker gender forms from audio inputs in an ‘Auto’ mode or allow the user to choose the desired speaker gender form in a ‘Masculine’ or ‘Feminine’ mode, respecting the diversity of speakers,” they explained.
The researchers tested their method on English-to-Spanish and English-to-Italian translation tasks, achieving over 90% accuracy in gender-specific translations. This represents a significant improvement compared to existing systems like Meta’s SeamlessM4T and Nvidia’s Canary, they noted.
Looking ahead, the researchers plan to expand their work to include non-binary and transgender speakers. “Future work may involve exploring bias of other types in large-scale ST systems, reducing bias for non-binary speakers, and better fine-tuning approaches,” they concluded.
Authors: Shubham Bansal, Vikas Joshi, Harveen Chadha, Rupeshkumar Mehta, and Jinyu Li