Within less than a year, Baidu has taken the idea from research paper to API, releasing a speech-to-speech “machine simultaneous interpretation service” on August 16, 2019. However, STACL is not the major breakthrough Baidu’s researchers claimed it to be. STACL is based on what its researchers call a wait-k model, where users can specify how many words STACL should wait for before beginning translation.
Based on this partial input, STACL attempts to predict the words a speaker may utter. The longer the wait time, the more accurate STACL’s output — which all takes away from its claim of being simultaneous.
On the Baidu product page, it is not immediately clear if the product, which is available for Chinese and English, is indeed based on STACL, though some of the terminology is similar (“low latency” and “predictive modeling based on semantic units information transfer and speaker synchronization” as translated by Google Translate). There has also been no further news on STACL, so there is no telling if the model has been developed further.
In other Baidu language tech news, the Internet giant released an NMT model, ERNIE 2.0, based on what it calls “continual pre-training.” In a paper published on arXiv.org on July 29, 2019, Baidu claims that ERNIE outperforms Google’s BERT and XLNet on 16 tasks, including English tasks on GLUE benchmarks and several common tasks in Chinese.
Slator 2020 Language Industry Market Report
55 pages. Total market size, biz dev and sales insights, TMS & MT review, buyer segment analysis, M&A, Covid impact & outlook.
Of course, Baidu is not alone among big tech in overselling its translation technology capabilities. For example, in a July 2019 demo, Microsoft hyped its mixed reality, AI, and translation technologies, creating the appearance that they are now able to have a hologram of a speaker give a Japanese presentation based on English input without a human in the loop.
At the time, we contacted Microsoft to ask if the raw machine translation output of Azure (Microsoft) Translate was edited by a human before being used in the demo. Microsoft said it does not have any comment at this time.