Academia continues to ramp up its research into neural machine translation (NMT). Five months into the year, the number of papers published in the open-access science archive, arXiv.org, nearly equals the research output for the entire year 2016. The spike confirms a trend Slator reported in late 2016, when we pointed out how NMT steamrolls SMT.
As of May 7, 2017, the Cornell University-run arXiv.org had a total of 137 papers in its repository, which had NMT either in their titles or abstracts. From only seven documents published in 2014, output went up to 11 in 2015. But the breakthrough year was 2016, with research output hitting 67 contributions.
NMT, or an approach to machine translation based on neural networks, is seen as the next evolution after phrase-based statistical machine translation (SMT) and the previous rules-based approach.
Partner spotlight
How teams localize with AI.
Browse a full day of sessions built to drive results this quarter.
While many studies and comparative evaluations have pointed to NMT’s advantages in achieving more fluent translations, the technology is still in its nascent stage and interesting developments in the research space continue to unfold.
Most Prolific
At press time, NMT papers submitted in 2017 were authored by 173 researchers from across the world, majority of them (63 researchers) being affiliated with universities and research institutes in the US.
The most prolific contributor is Kyunghyun Cho, Assistant Professor at the Department of Computer Science, Courant Institute of Mathematical Sciences Center for Data Science, New York University. Cho logged 14 citations last year.
Aside from Cho, 62 other researchers with interest in NMT have published their work on arXiv under the auspices of eight American universities: UC Berkeley, Carnegie Mellon, NYU, MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, Stanford, Georgia Institute of Technology Atlanta, Johns Hopkins University, and Harvard.
Sixty-one researchers from Europe have also substantially contributed to the collection, with authors from the UK (18), Germany (11), Ireland (13), and the Netherlands (7) submitting the most papers.
There were also 58 NMT academic papers from Asia, authored by researchers mostly from China, Hong Kong and Taiwan (31), Japan (22), South Korea (3), and Singapore (2).
Tech Firms in the Mix
Research teams from US tech giants such as Facebook Research, Google Brain, IBM Watson, NVIDIA (on whose GPU chips NMT runs), and translation technology pioneer SYSTRAN have also been increasingly contributing their research to arXiv.
A paper from a team of researchers from Google Brain, for example, offers insights on building and extending NMT architectures and includes an open-source NMT framework to experiment with results.
Researchers from Harvard and SYSTRAN introduced an open-source NMT toolkit — OpenMT — which provides a library for training and deploying neural machine translation models. They said the toolkit will be further developed “to maintain strong MT results at the research frontier” and provide a stable framework for production use.
The company’s own researchers have collaborated with other scientists from the University of Science and Technology of China, Sun Yat-sen University (Taiwan), Guangdong Key Laboratory of Information Security Technology, Tsinghua University, UESTC, and Johns Hopkins University.
Surge Will Last
As early as February 2016, an informal survey conducted by Cho indicated that the NMT research boom would have legs.
In a blog post dated February 13, 2016, Cho said he conducted the informal (he admits highly biased) poll mainly to determine researchers’ opinion about contributing to arXiv. Rather than being a peer-reviewed journal or online platform, arXiv is an automated online distribution system for research papers (e-prints).
“In total, 203 people participated, and they were either machine learning or natural language processing researchers. Among them, 64.5% said their major area of research is machine learning, and the rest natural language processing,” Cho wrote.
It is a big number of scholars and scientists who could feed the NMT research funnel for years — whether, as Cho calls it, they choose “to arXiv” or “not to arXiv” their works right away.
Featured
Partner spotlight
Boost Language Access
Improve health outcomes and ensure compliance for individuals with LEP