Samuel Proulx

Samuel Proulx

2 weeks ago (Received 1 week ago) • •

Samuel Proulx
2 weeks ago (Received 1 week ago) • •

So this looks like a high quality, fast, natural, and open source TTS system in Python. A key candidate for an #NVDA#addon. Unfortunately, I find #nvdasr addon development super confusing. Is there a good template to start from or something? github.com/thewh1teagle/kokoro-onnx

GitHub - thewh1teagle/kokoro-onnx: TTS with kokoro and onnx runtime

TTS with kokoro and onnx runtime. Contribute to thewh1teagle/kokoro-onnx development by creating an account on GitHub.

^GitHub

Peter Vágner likes this.

reshared this

in reply to Samuel Proulx

Samuel Proulx

in reply to Samuel Proulx • 2 weeks ago (Received 1 week ago) • •

Here's a much longer example of the quality of speech Kokoro TTS generates. I really do think it might be a decent #NVDA addon. The weird pauses are because I'm just giving it a big long string, rather than chunking it like I should. It generates this in real time on CPU, and faster on GPU. The code to generate it is as follows:
import soundfile as sf
from kokoro_onnx import Kokoro
from onnxruntime import InferenceSession

session = InferenceSession("kokoro-v0_19.onnx", providers=["ROCMExecutionProvider", "CPUExecutionProvider"])
kokoro = Kokoro.from_session(session, "voices.json")
samples, sample_rate = kokoro.create(
"He wasn't sleeping very well, and he knew the people around him noticed, but he didn't know what to do about it. He had quietly gone to Madame Pomfrey, who had regretfully told him that Dreamless Sleep was highly addicting and that while she could give him the occasional dose, it would have to be spread out enough to prevent it from becoming addicting – meaning he could only take it one night out of every two weeks or so. It was one night more of productive sleep than he'd be getting otherwise, so he still did it, but it didn't help the larger issue. He wasn't under the effects of any nightmare-inducing Curses, potions, or other magical ailments, so there was nothing for Madame Pomfrey to do. The nightmares were coming from his own mind, and she was not a Mind-Healer. She'd offered to try and connect Harry with one, but when Harry discovered that it involved having someone else quite literally entering his mind with magic and helping him sort out things like trauma he couldn't. If Harry couldn't even tell Hermione the extent of what he'd suffered at the Dursley's, he wasn't about to let a stranger into his mind to see it. Let alone the 'adventures' of his Hogwarts years. So the nightmares persisted, and with the poor quality of sleep serving as the first domino, everything else slowly began to fall. His grades weren't slipping yet, but he was struggling with the study schedule Hermione had set out for them and doing his homework took more effort, more energy that he didn't have.", voice="af_sarah", speed=1.0, lang="en-us"
)
sf.write("audio.wav", samples, sample_rate)
print("Created audio.wav")

#nvda

in reply to Samuel Proulx

Serena 🏳️‍🌈

in reply to Samuel Proulx • 1 week ago • •

@FreakyFwoof Yeah, that sounds amazing. I would love to read stuff with that synthesiser.

@Andre Louis

Samuel Proulx likes this.

in reply to Serena 🏳️‍🌈

Andre Louis

in reply to Serena 🏳️‍🌈 • 1 week ago • •

Now to gently gently request @Tamasg to make it a reality... Haha

@Tamas G

Samuel Proulx likes this.

in reply to Andre Louis

Tamas G

in reply to Andre Louis • 1 week ago • •

ha. I know very little about how we could get it compiled right in the add-on. (I know there was a discussion of this earlier so if that build process for onnxruntime into the add-on succeeded, would love some basic copy then.) For anyone wanting to try, I think looking at something like the Brailab driver (which is super minimal and in the end all you're really going to use are the getters and setters for the synth driver, the way you do speech is obviously not at all like Brailab), and then crafting in to open the stream might work. But between the latest family emergency, work at Spotify with the new year / new projects, I'm afraid I'll be swamped for awhile to give it that truly comparitive look. I'd also love to see a test run at how quick it can synthesize speech on slower CPUs especially when that speech is interrupted mid-utterance - how does it handle stopping a stream and loading a new one, is there lots of latancy? A simple py test that just throws lots of speech chunks like that, stops, starts, would give us an idea maybe to then know if it's worth turning into a driver just yet.

This entry was edited (1 week ago)

Samuel Proulx likes this.

in reply to Tamas G

Andre Louis

in reply to Tamas G • 1 week ago • •

Sorry to hear about family emergencies, never nice to deal with. I hope things can be sorted out for the better.

Re slow CPU though, that's where I come in. I am right now even, using an Intel Core I5-3570K from 2012. It runs every synth very well, apart from Piper which it struggles with due to the neural aspect of it. If my machine can run... Whatever you guys end up coming up with (hopefully) then anything else should be a breeze.

Samuel Proulx likes this.

in reply to Andre Louis

Winter blue tardis🇧🇬🇭🇺

in reply to Andre Louis • 1 week ago • •

I have an even slower one. Yay for countries in the middle of... Well somewhere, and computers from 2009 haha if something can even run on that, I'd be surprised. How's that for a slow processor? It's pretty ancient. The synth sounds nice, yeah, don't like how it reads hashtag, but I guess that's me. There's also something about question marks it clearly missed, but I think it needs to be fed a bigger chunk of text to see if it'll sound better. Otherwise, for the quality, Bleh, either my ears, or something, do not consider it a great quality in the sound terms, but for a TTS, I guess it's good. says the person who daily drives a TTS that came out in 2001. LOL.

Samuel Proulx likes this.

in reply to Winter blue tardis🇧🇬🇭🇺

Andre Louis

in reply to Winter blue tardis🇧🇬🇭🇺 • 1 week ago • •

What's your CPU spec then, and your daily synth of choice?

Samuel Proulx likes this.

in reply to Andre Louis

Winter blue tardis🇧🇬🇭🇺

in reply to Andre Louis • 1 week ago • •

A synth that does English people no good. Haha. And I have a dell from 2009, it has still a 32Bit windows 10 version, so it tells you something. :D

Samuel Proulx likes this.

in reply to Winter blue tardis🇧🇬🇭🇺

Winter blue tardis🇧🇬🇭🇺

in reply to Winter blue tardis🇧🇬🇭🇺 • 1 week ago • •

I also cannot tell you the full specs. Computer not here, sadly. It has a removable battery though, that gave up a long time ago, then I fell down some stairs while carrying set computer, and the pixels in the screen went poof, and no screen.

in reply to Samuel Proulx

Peter Vágner

in reply to Samuel Proulx • 1 week ago •

@Samuel Proulx I am wondering how it compares to #optispeech developed by @Musharraf
Or which one is more likelly to get more support and be preffered.
github.com/mush42/optispeech

#optispeech @Musharraf @Samuel Proulx

in reply to Peter Vágner

Samuel Proulx

in reply to Peter Vágner • 1 week ago • •

@pvagner@mush42 I'm not sure. I do kind of worry about a tts developed by and for blind people and if it can be kept up to date and maintained.

@Peter Vágner @Musharraf

in reply to Samuel Proulx

Peter Vágner

in reply to Samuel Proulx • 1 week ago •

@Samuel Proulx I understand @Musharraf has made verry significant progress for example as compared to piper TTS. To me it looks it's much lighter for both training and using trained model even enhancing audio quality and elligibility in the process. This is just my guess but with such an achievement it's fine not to limit it to blind audience exclusivelly. This is how I am seeing #optispeech. However I haven't played with kokoro TTS thus I have asked how much do you like it for example while comparing to something else, perhaps piper TTS if you do know that one.

#optispeech @Musharraf @Samuel Proulx

Samuel Proulx likes this.

in reply to Peter Vágner

Samuel Proulx

in reply to Peter Vágner • 1 week ago • •

@pvagner@mush42 I like kokoro much better than piper. It sounds more natural with fewer artifacts.

@Peter Vágner @Musharraf

⇧