Hey Mastodon!
#helpwanted
I've been quietly working on a fast and lightweight neural Text-To-Speech (TTS) model for NVDA/SAPI.
The next step is training the model, and that requires some serious GPU power. Unfortunately, those resources are a bit out of my reach right now.
This is where I could really use your help, if you're interested!
Hubert Figuière reshared this.
Musharraf
in reply to Musharraf • • •- Help with training costs: I've been fortunate to receive a grant from Google's TRC program, but there are some additional expenses. Any contribution would be incredibly helpful.
- Donating spare GPU power or Colab credit: Even a little bit would be a huge boost!
Musharraf
in reply to Musharraf • • •All of my work on neural TTS is completely free and open-source.
github.com/mush42/sonata-nvda
github.com/mush42/sonata
Together, we can make high quality TTS technology a reality for more people.
GitHub - mush42/sonata-nvda: This add-on implements a speech synthesizer driver for NVDA using neural TTS models. It supports Piper
GitHubMusharraf
in reply to Musharraf • • •This means it is efficient and has low latency.
It is open-source:
github.com/mush42/optispeech
GitHub - mush42/optispeech: A lightweight end-to-end text-to-speech model
GitHubPratik Patel
in reply to Musharraf • • •Nick Giannak III
in reply to Musharraf • • •Musharraf
in reply to Nick Giannak III • • •This is more efficient and lightweight. Compared to Piper, this model is more responsive and requires less system resources.
Also this is a modern TTS implementation, I referred to papers published in 2023-2024.
Nick Giannak III
in reply to Musharraf • • •Musharraf
in reply to Nick Giannak III • • •Currently I'm exclusively working on the model architecture, which will resolve some of the issues.
I'll train on freely available, high quality datasets, but creating a new dataset from scratch is beyond my current resources.
I'll leave this for later, and I can help anyone who wants to take up this task.
Nick Giannak III
in reply to Musharraf • • •Musharraf
in reply to Nick Giannak III • • •A good dataset will definitely help a lot. Not only me, but any future developer who will work in this field.
Also, a high quality dataset can establish a bridge between our community and academia, where major TTS breakthroughs happen. We give you our dataset to evaluate your models, and you allow us to use your great TTS model architectures.
Peter Vágner
in reply to Musharraf • •Musharraf
in reply to Peter Vágner • • •In machine learning, A dataset is usually split into two splits.
The 'train' split is the larger, and is used as input for model training.
The 'val' split is relatively small, and it is used for evaluating how the model performs during training.
Musharraf
in reply to Peter Vágner • • •Given you have a list of wav files and the corresponding transcription, first you need to decide on the size of each split.
Depending on the size of your dataset, you can split it 95%-5% or 99%-1% for training and val respectively.
Musharraf
in reply to Peter Vágner • • •I can help you with preparing your dataset.
Please DM me with any questions you have.
Tomecki
in reply to Musharraf • • •Musharraf
in reply to Tomecki • • •Simply put:
Sonata is an inference engine that can theoretically drive any TTS model.
OptiSpeech is an actual model that generates speech, and the quality of the output depends on it.
Musharraf
in reply to Musharraf • • •Bill Dengler
in reply to Musharraf • • •Musharraf
in reply to Bill Dengler • • •Will DM you soon.
Luis Carlos
in reply to Musharraf • • •Musharraf
in reply to Luis Carlos • • •Yep. It is intended to be integrated into Sonata, which already runs on Windows/Android/iOS.
Luis Carlos
in reply to Musharraf • • •Musharraf
in reply to Luis Carlos • • •@luiscarlosgonzalez @NVAccess
I' afraid I cannot. The whole point of this and Sonata is to create high-quality, but very efficient and lightweight neural TTS. Coqui-TTS models are neither efficient nor lightweight.
Luis Carlos
in reply to Musharraf • • •Musharraf
in reply to Luis Carlos • • •Tortoise TTS is too heavy for a high-end server, let alone a standard computer or a mobile device.
This system is designed specifically for running on a standard CPU.
Scott
in reply to Musharraf • • •Musharraf
in reply to Scott • • •Sensitive content
Approx $50 for a one month of Google Colab subscription.
Scott
in reply to Musharraf • • •Sensitive content
Roberto Perez
in reply to Musharraf • • •Sensitive content
Musharraf
in reply to Roberto Perez • • •Thanks for your contribution. Really appreciate it!
For this month, Pneuma Solutions provided GPU resources for training. But I still accept donations to ensure we have enough resources for future training needs.
I'm happy to receive your donations via my colleague's paypal address:
paypal.me/geotts
or
Email:
info@geotts.ge
Please mention mush42/tts in the transaction note.
Zaplaťte uživateli Beka Gozalishvili pomocí služby PayPal.Me
PayPal.MeBen Blatchford
in reply to Musharraf • • •