“Intelligence analysts study activity in countries all around the world and must read copious documents in many foreign languages. As it is now, analysts must wade through documents manually or use a computer system unable to translate uncommonly spoken languages into English. And current software systems don’t provide good translations of low-resource languages.”
According to the same post, SCRIPTS will transcribe text documents as well as speech from media like videos and news broadcasts in low-resource languages like Hausa and Uyghur. Data analysts will then be able to query the system and it will find and translate relevant material and provide English summaries regarding the information they contain.
“Current software systems don’t provide good translations of low-resource languages”
The solution will combine elements of machine translation (MT) and related natural language processing (NLP) and information retrieval technologies such as text-to-speech (TTS). It is expected to be able to translate 750 million words per day.
The US faces ever increasing volumes of multilingual information in its foreign operations. Just last month, the US Department of Defense awarded Virginia-based Multilingual Solutions a USD 39m contract to help with translation work.
Quite a Team
Research team head McKeown is an authority in the field of NLP and is no stranger to US government-funded projects for organizations like IARPA and the Defense Advanced Research Projects Agency (DARPA).
McKeown assembled a team of researchers in relevant fields such as machine translation and as text-to-speech across Columbia University, Cambridge University, the University of Maryland, Edinburgh University, and Yale.
Many members of the research team, like McKeown, have worked on similar government-funded projects before, such as DARPA’s LORELEI (Low Resource Languages for Emergent Incidents), a USD 26m MT project for lesser known languages.
Additionally, Columbia University recently sought out an Associate Research Scientist to specifically work on SCRIPTS. The job posting is still up as of this writing, though the earliest proposed start date is November 6, 2017.