african-voice-lab

non-profit
Activity Feed

AI & ML interests

None defined yet.

Recent Activity

Organization Card

Voices For All β€” Open Voice AI for Low-Resource Languages

Mission: Every language deserves a voice. We build open, state-of-the-art speech AI for the world's 7,000+ languages β€” starting with those left behind by Big Tech.


🌍 Why This Matters

Over 90% of the world's languages have zero access to modern text-to-speech (TTS), speech recognition (ASR), or voice assistants. The biggest AI labs focus on English, Mandarin, Spanish, and a handful of other high-resource languages. Meanwhile:

  • Amharic (~60M speakers) has only a handful of open TTS models
  • Wolof (~12M speakers) has virtually no open TTS with voice cloning
  • Hausa (~90M speakers) has critically underserved speech tools
  • Swahili (~200M speakers) has basic TTS but no open voice cloning
  • Somali, Zulu, Igbo, Yoruba β€” all left behind

This is not a technical limitation. It is a funding and prioritization gap. We exist to close it.


πŸ—οΈ Our Approach

Pillar How We Do It
Open Weights Every model released under Apache 2.0 or MIT
Open Data Curated, documented training datasets published on Hugging Face
Open Benchmarks Language-specific evaluation protocols (MOS, CMOS, WER, MCD)
Community-Driven Native speakers validate quality; local researchers lead expansion
Efficient Architecture We fine-tune lightweight 600M-param diffusion models, making inference affordable on CPU/GPU

🎯 Proof of Concept: Amharic OmniVoice

Our first model is a fully functional, high-quality Amharic TTS + Voice Cloning system:

  • Model: african-low-resource/omnivoice-amharic
  • Architecture: Non-autoregressive discrete diffusion (OmniVoice, 612M params)
  • Training Data: 81K samples, 331 hours across 4 curated Amharic datasets
  • Capabilities: Text-to-speech, zero-shot voice cloning, multi-speaker synthesis
  • Best Loss: 3.9518 (state-of-the-art for this architecture on Amharic)
  • License: Apache 2.0
  • Demo: Live Gradio Space

Why This Proves We Can Scale

  • Data pipeline is reusable: Tokenization, cleaning, and training scripts work for any language with ~100h of audio
  • Model is lightweight: 612M params runs on a free T4 Colab β€” no API keys, no cloud bills
  • Voice cloning works: A speaker can clone their voice with 10 seconds of audio
  • Community validated: Native Amharic speakers confirm natural prosody

πŸ—ΊοΈ Roadmap: 2026–2028

Phase Languages Deliverables Timeline
Phase 1 (Now) Amharic βœ…, Wolof TTS + Voice Cloning, live demo, benchmark Q2–Q3 2026
Phase 2 Hausa, Swahili ASR + TTS pipeline, mobile app prototype Q4 2026 – Q1 2027
Phase 3 Somali, Zulu, Igbo, Yoruba Full speech stack (ASR, TTS, voice search) 2027
Phase 4 20+ African + Asian LRLs Self-service fine-tuning toolkit for any community 2028

🀝 Partners & Funders

We are actively seeking partnerships with:

  • Grant bodies: Mozilla Common Voice, Lacuna Fund, Gates Foundation, IDRC
  • Research networks: Masakhane, AI4D Africa, GalsenAI, Google Research Africa
  • Local institutions: Ethiopian AI institutes, East African universities
  • Industry: Telecoms, banking, agriculture tech building voice interfaces

πŸ“Š Impact Metrics We Track

Metric Target
Languages with open TTS 1 β†’ 20+ by 2028
Hours of open training data released 331 β†’ 5,000+
Community contributors 5 β†’ 200+
Voice-enabled apps built on our models 0 β†’ 50+
Cost to deploy TTS for a new language $10K+ β†’ <$1K

πŸ“¬ Contact


"Technology should not be a privilege of the languages with the most data. It should be a right for every community that speaks."

datasets 0

None public yet