Technology Brings New Tools to Revitalizing Endangered Languages

Heritage as Data

University of Arizona scholars are collaborating with Indigenous communities to fortify endangered languages. Combining data science, machine learning and other technologies, the work offers a model for safeguarding cultural heritage and creates tools for self-directed language work in communities around the world.

One project supports revitalizing the ancestral language of the Schitsu’umsh, also known as the Coeur d’Alene tribe. The last elder who grew up speaking it as their first language died in 2018.

Linguists in the College of Social and Behavioral Sciences, working with the university’s American Indian Language Development Institute, have created a web application that is being trained on Coeur d’Alene words as a path to mastering the language more broadly.

With machine learning, the application will eventually be able to independently build its command of the language by scanning texts to learn more advanced grammar and syntax. Those scanned materials also become part of a master archive, opening doors to a trove of potential research that relies on databases of digitized text.

A separate project in partnership with the Tohono O’odham Community College in Sells, Arizona, uses similar technologies to develop natural language voice recognition for the O’odham language. One goal of the work is to develop algorithms so fully trained that they’ll be able to provide automated transcriptions of tribal audio archives, such as decades of recorded community meetings and events.


Data Connects Us Magazine