Microsoft Makes In-Kind Offer to Boost European Language AI Data

Building on its existing “European Digital Commitments,” communicated in April 2025, Microsoft is launching two key initiatives: supporting the development of more multilingual large language models (LLMs) through innovation centers and new data programs, and expanding its Culture AI program to digitally preserve Europe‘s rich heritage.

The language data program, described in a July 20, 2025 blog article, aims primarily to expand the supply of multilingual data for AI model training. To accomplish this, Microsoft plans on having dedicated staff at its Strasbourg, France, innovation centers: the Microsoft Open Innovation Center (MOIC) and the AI for Good Lab.

Leveraging Microsoft Azure’s capabilities and the company’s technical expertise, the initiative involves strategic partnerships across Europe, including data collaboration that makes multilingual datasets — such as GitHub text and various voice datasets — more accessible and transparent.

As part of this effort, Microsoft is also partnering with Hugging Face to host and distribute the resulting data. Additionally, it is funding work with Common Crawl, an open repository of web data, to enable native speakers to annotate and seed European language data into their publicly available datasets with authentic linguistic nuances.

Grants for Data and Academic Partnerships

Both MOIC and the AI for Good Lab are “also issuing a call for proposals to help expand the supply of digital content for 10 European languages,” focusing on languages with low online representation, such as Estonian, Alsatian, Slovak, Greek, and Maltese. 

Chosen proposals will receive grants for Azure credits worth up to USD 1m and essential engineering and technical support. Applications will open on September 1, 2025, on the AI for Good Lab website. Both MOIC and the lab will prioritize projects that aim to make data available for languages currently underrepresented online.

Microsoft also plans on continuing to back initiatives like those at the Barcelona Supercomputing Center, the Basque Center for Language Technology, and the University of Santiago de Compostela, related to AI models trained in Spanish, Catalan, Basque, and Galician.

Additionally, the company is forging new partnerships with the University of Strasbourg and IE University School of Science & Technology in Spain, providing Azure grants for joint research that focuses on low-resource languages.

While the announcement does not expressly mention a direct benefit to Microsoft in terms of new intellectual property or data from this initiative, in the European Digital Commitments announcement from April, the company did state they would “start with an expansion of our cloud and AI infrastructure in Europe,” signaling an alignment with business goals.