Exciting news on open-source neural voices!
Our first experiment is complete with fantastic results! Check out the audio sample attached to this post.
For this month, @pneumasolutions provided GPU resources for training. I really appreciate their contribution.
This is just the beginning. To keep training going, I'm still accepting donations. Any amount helps.
I'm happy to receive your donations via PayPal:
paypal.me/geotts
Please mention mush42/tts in the notes.
#SpeechSynthesis #AI #ML
Our first experiment is complete with fantastic results! Check out the audio sample attached to this post.
For this month, @pneumasolutions provided GPU resources for training. I really appreciate their contribution.
This is just the beginning. To keep training going, I'm still accepting donations. Any amount helps.
I'm happy to receive your donations via PayPal:
paypal.me/geotts
Please mention mush42/tts in the notes.
#SpeechSynthesis #AI #ML
Zaplaťte uživateli Beka Gozalishvili pomocí služby PayPal.Me
Přejděte na adresu paypal.me/geotts a zadejte částku. Protože jde o PayPal, je to jednoduché a bezpečné. Nemáte účet PayPal? Nevadí.PayPal.Me
Jason J.G. White likes this.
Matt Campbell reshared this.
Winnie_l'Oursin
in reply to Musharraf • • •patricus
in reply to Musharraf • • •Lino Morales
in reply to Musharraf • • •the esoteric programmer
in reply to Musharraf • • •Musharraf
in reply to the esoteric programmer • • •It uses OptiSpeech, developed by me based on recent advances in neural TTS technology.
Piper is based on Vits, which dates back to 2021.
github.com/mush42/optispeech/
GitHub - mush42/optispeech: A lightweight end-to-end text-to-speech model
GitHubthe esoteric programmer
in reply to Musharraf • • •Musharraf
in reply to the esoteric programmer • • •The model is designed from the ground up to be used with a screen reader running on the CPU.
It takes a lot of experimentation to strike the right balance between model efficiency and output quality. But I'm getting there!
spacedoggy
in reply to Musharraf • • •Musharraf
in reply to spacedoggy • • •It is not based on Vits.
While the underlying repo supports multiple model architectures, this particular run is based on ConvNeXt-TTS architecture:
ieeexplore.ieee.org/document/1…
Convnext-TTS And Convnext-VC: Convnext-Based Fast End-To-End Sequence-To-Sequence Text-To-Speech And Voice Conversion
ieeexplore.ieee.orgspacedoggy
in reply to Musharraf • • •Musharraf
in reply to spacedoggy • • •Glad to hear this!
For now, I'm not making checkpoints available due to them being unstable.
While initial results are promising, I discovered an issue with FFT parameters. Fixed it, and currently waiting for the server to become available to fine-tune with corrected data.
Upon getting consistent output quality, I'll publish an online demo, and make the pretrained checkpoints available via HuggingFace.