The CSAIL research team was led by MIT Professor Regina Barzilay, who has spent a couple of decades on language-related research, among other data science-driven topics. The team’s ultimate goal is for the system to be able to decipher lost languages that have eluded linguists for decades, using just a few thousand words,” wrote Adam Conner-Simons, CSAIL Communications Manager, on the lab’s blog.
The study was, in part, supported by US spy agency IARPA. Over the years, IARPA has invested in finding low-resource language models that can be queried in English by holding conferences, offering grants, and running contests.
Ultimate Low-Resource Challenge for Humans and Machines
The new CSAIL study builds on a 2019 paper, where the authors, including Barzilay, propose a new approach to the automatic decipherment of lost languages. “Decipherment is an ultimate low-resource challenge for both humans and machines. The lack of parallel data and scarce quantities of ancient text complicate the adoption of neural methods that dominate modern machine translation,” the researchers wrote.
Slator 2021 Data-for-AI Market Report
44-pages on how LSPs enter and scale in AI Data-as-a-service. Market overview, AI use cases, platforms, case studies, sales insights.
However, while in the 2019 paper the languages used were known to be related to early forms of Hebrew and Greek, in the new CSAIL study — which evaluated the model on Gothic, Ugaritic, and Iberian — the relationship between languages is inferred by algorithm and thus, applicable to more undeciphered scripts than prior work.
Using the algorithm, the team was able to confirm recent scholarship that suggests Iberian is not actually related to Basque as previously believed.
According to Conner-Simons, “The team hopes to expand their work beyond the act of connecting texts to related words in a known language — an approach referred to as ‘cognate-based decipherment.’ This paradigm assumes that such a known language exists, but the example of Iberian shows that this is not always the case. The team’s new approach would involve identifying semantic meaning of the words, even if they don’t know how to read them.”
Image: Phaistos Disk, an undecoded clay disk from the Minoan palace of Phaistos, Crete dating to the Minoan Bronze Age; exhibited in the Heraklion Archaeological Museum, Crete, Greece.