victor tsaran

10 months ago • •

victor tsaran
10 months ago • •

Introducing Voicebox: The first generative AI model for speech to generalize across tasks with state-of-the-art performance

https://ai.facebook.com/blog/voicebox-generative-ai-model-speech/?utm_source=tldrai

Introducing Voicebox: The first generative AI model for speech to generalize across tasks with state-of-the-art performance

Voicebox is a state-of-the-art speech generative model based on a new method proposed by Meta AI called Flow Matching.

^{ai.facebook.com}

in reply to victor tsaran

Drew Mochak

in reply to victor tsaran • 10 months ago • •

I'm not super impressed with the audio quality, but the noise filter thing was neat.

in reply to Drew Mochak

victor tsaran

in reply to Drew Mochak • 10 months ago • •

@objectinspace Similarly, not impressed with voices. I guess the idea behind their generation is what impresses... But then, it's just a demo!!!

@Drew Mochak

in reply to victor tsaran

Drew Mochak

in reply to victor tsaran • 10 months ago • •

There was a paragraph (don't have time now to dig it up) where it was talking about how it beats the current state of the art TTS's by a certain amount, and then named them? But I have never heard of them so I had no frame of reference.

in reply to victor tsaran

Timothy Wynn

in reply to victor tsaran • 10 months ago • •

What's the difference between this project and, say, Piper in terms of performance? #CC @ZBennoui
https://github.com/rhasspy/piper

GitHub - rhasspy/piper: A fast, local neural text to speech system

A fast, local neural text to speech system. Contribute to rhasspy/piper development by creating an account on GitHub.

^GitHub

#cc @Zachary Bennoui

in reply to Timothy Wynn

victor tsaran

in reply to Timothy Wynn • 10 months ago • •

@twynn @ZBennoui I really don't know...

@Timothy Wynn @Zachary Bennoui

⇧

victor tsaran 10 months ago • •

victor tsaran
10 months ago • •