in reply to the esoteric programmer

@esoteric_programmer
It converts the following text:
Every time I see someone light up, um, because of something I’ve made, it’s like, wow, a little piece of my inner child gets healed, you know? And, um, when...snip

To the attached speech.

Tamas G reshared this.

in reply to Musharraf

so, this does text completion and then generates speech using something like tts? is that correct so far? or do you attach audio of something, the model transcribes it and gets its meaning in whatever way that's considered meaning anyway, then concatenates your prompt text to that? that could create so, so many deepfakes, it's not even funny, if what I'm imagining is actually what's happening
in reply to Musharraf

@mush42 @esoteric_programmer While the model is really really good, I find it has problems when trying to convert text that's more than a few lines. It will start splicing parts of the audio prompt into the result or just go kinda insane. I've only tried the HF space so far, but want to try runningn on my Mac tomorrow to see if I can get better results. If I could get this to read articles to me that would be great.