VibeVoice by Microsoft: a TTS designed for generating expressive, long-form, multi-speaker conversational audio up to 90 minutes #TTS #LLMmicrosoft.github.io/VibeVoice/
@mush42 I don’t think they were aiming for quality. Their goal seems to be conversational TTS in a long-form, podcast-style format on NoteBookLM. There’s CSM from Sesame, but it doesn’t handle long-form well.
Musharraf
in reply to Chi Kim • • •Chi Kim
in reply to Musharraf • • •