Musharraf

7 months ago

Musharraf
7 months ago

Hey Mastodon!
#helpwanted

I've been quietly working on a fast and lightweight neural Text-To-Speech (TTS) model for NVDA/SAPI.

The next step is training the model, and that requires some serious GPU power. Unfortunately, those resources are a bit out of my reach right now.

This is where I could really use your help, if you're interested!

#helpwanted

Hubert Figuière reshared this.

in reply to Musharraf

Musharraf

in reply to Musharraf 7 months ago

This is where I could really use your help, if you're interested!
- Help with training costs: I've been fortunate to receive a grant from Google's TRC program, but there are some additional expenses. Any contribution would be incredibly helpful.
- Donating spare GPU power or Colab credit: Even a little bit would be a huge boost!

in reply to Musharraf

Musharraf

in reply to Musharraf 7 months ago

All of my work on neural TTS is completely free and open-source.
github.com/mush42/sonata-nvda
github.com/mush42/sonata

Together, we can make high quality TTS technology a reality for more people.

GitHub - mush42/sonata-nvda: This add-on implements a speech synthesizer driver for NVDA using neural TTS models. It supports Piper

This add-on implements a speech synthesizer driver for NVDA using neural TTS models. It supports Piper - mush42/sonata-nvda

^GitHub

in reply to Musharraf

Musharraf

in reply to Musharraf 7 months ago

The model is targeted for on-device text-to-speech.
This means it is efficient and has low latency.
It is open-source:
github.com/mush42/optispeech

GitHub - mush42/optispeech: A lightweight end-to-end text-to-speech model

A lightweight end-to-end text-to-speech model. Contribute to mush42/optispeech development by creating an account on GitHub.

^GitHub

in reply to Musharraf

Pratik Patel

in reply to Musharraf 7 months ago

What's the best way to donate funds?

in reply to Musharraf

Nick Giannak III

in reply to Musharraf 7 months ago

What makes this different from Piper?

in reply to Nick Giannak III

Musharraf

in reply to Nick Giannak III 7 months ago

@nick
This is more efficient and lightweight. Compared to Piper, this model is more responsive and requires less system resources.
Also this is a modern TTS implementation, I referred to papers published in 2023-2024.

@Nick Giannak III

in reply to Musharraf

Nick Giannak III

in reply to Musharraf 7 months ago

GOtcha. I might throw you a few dollars to maek it go, especially if you can come up with training data that won't have the pronunciation problems that existed with PIper.

in reply to Nick Giannak III

Musharraf

in reply to Nick Giannak III 7 months ago

@nick
Currently I'm exclusively working on the model architecture, which will resolve some of the issues.
I'll train on freely available, high quality datasets, but creating a new dataset from scratch is beyond my current resources.
I'll leave this for later, and I can help anyone who wants to take up this task.

@Nick Giannak III

in reply to Musharraf

Nick Giannak III

in reply to Musharraf 7 months ago

Hey, the devil is in the details. If we need a new dataset, then we'll se how we can go about funding it.

in reply to Nick Giannak III

Musharraf

in reply to Nick Giannak III 7 months ago

@nick
A good dataset will definitely help a lot. Not only me, but any future developer who will work in this field.
Also, a high quality dataset can establish a bridge between our community and academia, where major TTS breakthroughs happen. We give you our dataset to evaluate your models, and you allow us to use your great TTS model architectures.

@Nick Giannak III

in reply to Musharraf

Peter Vágner

in reply to Musharraf 7 months ago

@Musharraf :verified: Please, when preparing the dataset what's the difference between train and val? If I have single speaker recording what do I put in those folders?

@Musharraf

in reply to Peter Vágner

Musharraf

in reply to Peter Vágner 7 months ago

@pvagner
In machine learning, A dataset is usually split into two splits.
The 'train' split is the larger, and is used as input for model training.
The 'val' split is relatively small, and it is used for evaluating how the model performs during training.

@Peter Vágner

in reply to Peter Vágner

Musharraf

in reply to Peter Vágner 7 months ago

@pvagner
Given you have a list of wav files and the corresponding transcription, first you need to decide on the size of each split.
Depending on the size of your dataset, you can split it 95%-5% or 99%-1% for training and val respectively.

@Peter Vágner

in reply to Peter Vágner

Musharraf

in reply to Peter Vágner 7 months ago

@pvagner
I can help you with preparing your dataset.
Please DM me with any questions you have.

@Peter Vágner

in reply to Musharraf

Tomecki

in reply to Musharraf 6 months ago

Is any differences between Sonata and Optispeech?

in reply to Tomecki

Musharraf

in reply to Tomecki 6 months ago

@tomecki
Simply put:
Sonata is an inference engine that can theoretically drive any TTS model.
OptiSpeech is an actual model that generates speech, and the quality of the output depends on it.

@Tomecki

in reply to Musharraf

Musharraf

in reply to Musharraf 7 months ago

@NVAccess

@NV Access

in reply to Musharraf

Bill Dengler

in reply to Musharraf 7 months ago

@NVAccess How can I donate GPU power? I have a modern NVidia GPU.

@NV Access

in reply to Bill Dengler

Musharraf

in reply to Bill Dengler 7 months ago

@bill @NVAccess
Will DM you soon.

@NV Access @Bill Dengler

in reply to Musharraf

Luis Carlos

in reply to Musharraf 7 months ago

@NVAccess Shall be this system available in Sonata?

@NV Access

in reply to Luis Carlos

Musharraf

in reply to Luis Carlos 7 months ago

@luiscarlosgonzalez @NVAccess
Yep. It is intended to be integrated into Sonata, which already runs on Windows/Android/iOS.

@NV Access @Luis Carlos

in reply to Musharraf

Luis Carlos

in reply to Musharraf 7 months ago

@NVAccess Could you add support for Coqui TTS voices?

@NV Access

in reply to Luis Carlos

Musharraf

in reply to Luis Carlos 7 months ago

@luiscarlosgonzalez @NVAccess

I' afraid I cannot. The whole point of this and Sonata is to create high-quality, but very efficient and lightweight neural TTS. Coqui-TTS models are neither efficient nor lightweight.

@NV Access @Luis Carlos

in reply to Musharraf

Luis Carlos

in reply to Musharraf 6 months ago

@NVAccess Why not Tortoise TTS? Oh, I believe it's also unreilable like the previous one

@NV Access

in reply to Luis Carlos

Musharraf

in reply to Luis Carlos 6 months ago

@luiscarlosgonzalez
Tortoise TTS is too heavy for a high-end server, let alone a standard computer or a mobile device.
This system is designed specifically for running on a standard CPU.

@Luis Carlos

in reply to Musharraf

Scott

in reply to Musharraf 7 months ago

Hey, I'd like to contribute some financial support. Can you ballpark how much you'd need to spend to train a voice?

in reply to Scott

Musharraf

in reply to Scott 7 months ago

Money

Sensitive content

@Scott

in reply to Musharraf

Scott

in reply to Musharraf 7 months ago

Money

Sensitive content

in reply to Musharraf

Roberto Perez

in reply to Musharraf 6 months ago

Money

Sensitive content

@Scott

in reply to Roberto Perez

Musharraf

in reply to Roberto Perez 6 months ago

@rperez030
Thanks for your contribution. Really appreciate it!
For this month, Pneuma Solutions provided GPU resources for training. But I still accept donations to ensure we have enough resources for future training needs.
I'm happy to receive your donations via my colleague's paypal address:
paypal.me/geotts
or
Email:
info@geotts.ge
Please mention mush42/tts in the transaction note.

Zaplaťte uživateli Beka Gozalishvili pomocí služby PayPal.Me

Přejděte na adresu paypal.me/geotts a zadejte částku. Protože jde o PayPal, je to jednoduché a bezpečné. Nemáte účet PayPal? Nevadí.

^PayPal.Me

@Roberto Perez

in reply to Musharraf

Ben Blatchford

in reply to Musharraf 7 months ago

@alexhall I have a GPU that I would love to rent out for this project. How would I get started?

@Alex Hall

⇧

Musharraf 7 months ago • •

Musharraf
7 months ago