OK, I am a lot more impressed with Supertonic than I thought. It has its own text processor that does some uncommon context text-based things like phone numbers, technical units, and financial stuff, that other commercial AI TTS engines like Eleven Labs, Gemini, Open AI, and Microsoft Azure don't do. And plus, unlike Sonata/Piper, it's not using eSpeak as a base.
miki
in reply to Alex Krier • • •Are you sure this is a text processor and not an end-to-end neural network?
If you train a neural net on human speech with plenty of examples of such patterns, it'll pick them up automatically, no special preprocessing necessary.
Alex Krier
in reply to miki • • •miki
in reply to Alex Krier • • •Peter Vágner
in reply to miki • •miki
in reply to Peter Vágner • • •Peter Vágner likes this.