in reply to Jakob Rosin

@cachondo @kevinrj @FreakyFwoof Oh really, that’s interesting. Just a bit about how this is working under the hood for people that might be curious. I believe it’s using a language model to synthesize the speech rather than the older neural approaches used by personal voice up until recently. This is the same technology 11 labs, Google, Microsoft, and Amazon are using. That’s why it sounds so good and you don’t need to provide as much training data. To my knowledge, this is the first time an LM based TTS system has been deployed for screen reader use, and I’m wondering if later on in the beta cycle they will add the newer Siri voices that are based on the same technology. It sounds really good, and you didn’t mention this during the demonstration, but the voice breathes as well at punctuation marks.