No Free Lunch
In a bid to ask clients and partners to help accelerate progress, Microsoft released a large, 2GB Speech Language Translation Corpus so users of the Microsoft Translator Speech API have a baseline “to evaluate end-to-end conversational speech translation quality.”
Applications for the API range from making large repositories of audio files searchable by transcribing them into text, real-time subtitling and machine translating those subtitles, or one-to-one, in-person or remote live translation (full-circle back to the universal translator).
Users of the technology include Lionbridge (automatic subtitling), telecom provider Tele2 (live translation of phone conversations), and ProDeaf (multilingual support of speech-to-sign scenarios). Microsoft wants the corpus to become the “gold standard…for speech language translation.”
Microsoft does not provide the corpus all for the sake of the greater good, of course. Using the Speech Translation API to transcribe and translate 1,000 hours of audio per month costs USD 7,000 per month; and 10,000 hours leaves you with a USD 35,000 bill.
Habeas Corpus
The corpus was created from actual conversations over Skype to “capture the typical side-effects of Skype’s transport layer.” It contains around 3,000 end-to-end speech translation sets for English, and 2,100 for French and German.
Each set consists of an audio file, a verbatim transcription, a cleaned-up transcription, and a translation based on the cleaned-up transcription. The average length of the audio sequence is 4.7 seconds in English, 5.4 seconds in French, and 6.7 seconds in German.
The nature of the content is conversational (e.g., “And I mean on WeChat you always have updates of new emoticons that you can download”). The audio was transcribed and translated by human linguists. Microsoft recorded 100 speakers for each language with 50-plus pairings.
To simulate the eventual use case (i.e., two people speaking over Skype in two different languages), Microsoft asked bilingual participants to hold a 30-minute conversation, where one spoke either German or French with the other responding in English.
In its blog post, Microsoft said it plans to release an updated version of Skype Translator in 2017 and expand language coverage.