Skip to main content


Exciting news on open-source neural voices!
Our first experiment is complete with fantastic results! Check out the audio sample attached to this post.
For this month, @pneumasolutions provided GPU resources for training. I really appreciate their contribution.
This is just the beginning. To keep training going, I'm still accepting donations. Any amount helps.
I'm happy to receive your donations via PayPal:
paypal.me/geotts
Please mention mush42/tts in the notes.
#SpeechSynthesis #AI #ML

Matt Campbell reshared this.

in reply to Musharraf :verified:

QWow that sounds way better. I hope to see this working with NVDA in the future.
in reply to Musharraf :verified:

does this use piper, or something else? it sounds a bit flatter than other piper voices
in reply to the esoteric programmer

@esoteric_programmer
It uses OptiSpeech, developed by me based on recent advances in neural TTS technology.
Piper is based on Vits, which dates back to 2021.
github.com/mush42/optispeech/
in reply to Musharraf :verified:

is that capable of working in low resource environments? would the resulting thing be able to generate samples at a speed good enough for most screenreader use? or, is this not ment for that use case?
in reply to the esoteric programmer

@esoteric_programmer
The model is designed from the ground up to be used with a screen reader running on the CPU.
It takes a lot of experimentation to strike the right balance between model efficiency and output quality. But I'm getting there!
in reply to Musharraf :verified:

wow it sounds so much better than vits! It sounds like it's not even based on hifigan stuff
in reply to the pup of space

@spacepup
It is not based on Vits.
While the underlying repo supports multiple model architectures, this particular run is based on ConvNeXt-TTS architecture:
ieeexplore.ieee.org/document/1…
in reply to Musharraf :verified:

may i have a link to your optispeech hfc female checkpoint? I'd like to test this, i am really intreagued
in reply to the pup of space

@spacepup
Glad to hear this!
For now, I'm not making checkpoints available due to them being unstable.
While initial results are promising, I discovered an issue with FFT parameters. Fixed it, and currently waiting for the server to become available to fine-tune with corrected data.
Upon getting consistent output quality, I'll publish an online demo, and make the pretrained checkpoints available via HuggingFace.