Cohere Labs Launches Vision-Language Dataset for African Languages

On December 16, 2025, Cohere Labs announced the release of AfriAya, a new vision-language dataset aimed at improving how AI models understand African languages and cultural contexts.

The dataset was developed through a collaborative effort of researchers and institutions, with Cohere Labs playing a supportive role by facilitating connections and providing an environment for collaboration.

AfriAya combines image-text pairs grounded in African environments, objects, and everyday scenarios and is intended to support use cases such as image captioning, visual question answering, and multimodal assistants tailored to African users.

At launch, AfriAya covers 13 African languages. Cohere said the project is moving toward version 2, which aims to expand to 25 languages and support fine-tuning of Aya Vision for African-specific use cases.

The dataset builds on Cohere’s broader Aya multilingual initiative and, according to Cohere, emerged from collaboration within the Cohere Labs open science community led by Ugandan engineers Kato Steven Mubiru and Bronson Bakunga.

From ‘Surface Translation’ to ‘Visual Sovereignty’

Speaking to Slator, Co-Lead of the project and CEO of Crane AI Labs Mubiru explained that “AfriAya was born out of a challenge common to the industry: how to build high-fidelity data when traditional crowdsourcing hits a bottleneck.”

According to Mubiru, AfriAya represents a shift from “surface translation” to “visual sovereignty,” addressing how most vision-language models still view Africa “through a Western lens” and often misidentifying local foods, clothing, and everyday environments. 

He said AfriAya provides the “foundational infrastructure” needed to fix that and enable more culturally grounded multimodal AI. 

To scale quality, the team combined large language models for initial data verification with native-speaker review for culturally sensitive corrections — an approach Mubiru said is essential for low-resource languages.

Cohere positioned its support for AfriAya as part of its open science approach, making the dataset available as a community resource for research and development. “While AfriAya is still early in its journey, its story already reveals what’s possible when open science, local expertise, and global collaboration meet,” Cohere said.

Cohere’s work with AfriAya also aligns with wider efforts addressing AI’s global language gap. Projects like the African Next Voices are gathering extensive linguistic corpora to improve coverage for African languages in speech recognition and translation systems.