Exciting news on open-source neural voices!
Our first experiment is complete with fantastic results! Check out the audio sample attached to this post.
For this month, @pneumasolutions provided GPU resources for training. I really appreciate their contribution.
This is just the beginning. To keep training going, I'm still accepting donations. Any amount helps.
I'm happy to receive your donations via PayPal:
paypal.me/geotts
Please mention mush42/tts in the notes.
#SpeechSynthesis #AI #ML

Matt Campbell reshared this.

in reply to the esoteric programmer

@esoteric_programmer
It uses OptiSpeech, developed by me based on recent advances in neural TTS technology.
Piper is based on Vits, which dates back to 2021.
github.com/mush42/optispeech/
in reply to spacedoggy

@spacepup
It is not based on Vits.
While the underlying repo supports multiple model architectures, this particular run is based on ConvNeXt-TTS architecture:
ieeexplore.ieee.org/document/1…
in reply to spacedoggy

@spacepup
Glad to hear this!
For now, I'm not making checkpoints available due to them being unstable.
While initial results are promising, I discovered an issue with FFT parameters. Fixed it, and currently waiting for the server to become available to fine-tune with corrected data.
Upon getting consistent output quality, I'll publish an online demo, and make the pretrained checkpoints available via HuggingFace.