🎙️ Uzbek Voice AI

Real-time Uzbek-language voice assistant for farmers — TTS research, model fine-tuning & a streaming voice agent.

Text-to-Speech VITS / MMS Speech Recognition LLM dialog FastAPI · WebSocket PyTorch Kaggle GPU Low-resource NLP

I built a live, streaming voice assistant that lets Uzbek-speaking farmers ask agriculture questions and get spoken answers — then ran a full research track to find (or build) the best Uzbek text-to-speech voice, since Uzbek is a low-resource language with very few options.

5
TTS models benchmarked
60 h
studio corpus mined
7,080
fine-tune steps trained
<2 s
target voice latency

🔊 Listen — Uzbek TTS demos

Same sentences synthesized by the best off-the-shelf model vs. my fine-tuned voice. All audio is generated from text — no human recording.

Agronomist greeting

«Ассалому алайкум, соғ-саломатмисиз? Мен агрономман, қандайсиз?»

Best quality · MMS
My fine-tune · FeruzaSpeech

Wheat-rust disease

«Буғдой майдонида қўнғир занг касаллиги кўринди.»

MMS Fine-tune

Dosage with numbers

«Фунгицидни йигирма беш фоиз концентрацияда, бир гектарга икки литр солинг.»

MMS Fine-tune

🧭 The approach

📊 Models evaluated

ModelApproachResult
Meta MMS (Uzbek, Cyrillic)Off-the-shelf VITS✅ Best quality
MMS + FeruzaSpeechMy VITS fine-tune◑ Adapted voice, slightly rougher
Chatterbox v3LoRA fine-tune✗ Poor (no Uzbek base)
Community Latin VITSOff-the-shelf✗ Garbled
Yandex SpeechKit «nigora»Commercial API✅ Production choice

🛠️ Tech stack

Python · PyTorch · HuggingFace Transformers · VITS / Meta MMS · Chatterbox · FastAPI + WebSockets · gRPC (Yandex SpeechKit STT/TTS v3) · Silero/energy VAD · Kaggle GPU automation (Kaggle API) · soundfile / datasets.

💡 Key takeaways

⚠️ Research/educational project. Open Uzbek datasets & models used here are non-commercial (CC-BY-NC / academic) licensed; the audio demos are for evaluation only.