Now I have a lot more hope and enthusiasm for the Piper voices, so this weekend I will re-train Mac_alex, Keynote, and Votrax using a bit more data, + finetune from the HFC_male variant which will minimize the cutting off of words. These will be released as new RT voices and you will be able to install either type of voice to compare differences amongst them.
This entry was edited (1 year ago)

David Goldfield reshared this.

in reply to Peter Vágner

I'm thinking it has more to do with the quality of tuned data and which model you initially finetune from. HFC_male for example appears to have less of them than finetuning from Kusal in my experience, so my guess is it's something related to the dataset that way. (but yes, sadly, even Mac Alex voice has a slight amount of this on some but not all words - it's improved from prior models by some but not fully solved.)
This entry was edited (1 year ago)
in reply to Peter Vágner

@pvagner ooh yeah. I wonder how the quality of other language fine-tuned models are. That could make a big difference too - I know there are Hungarian ones too, so I may try to train a Brailab one just for fun one of these days to test that out. Sourcing good quality data for other languages is so much harder, wish it wasn't. So you may need to try to train from scratch with 2-3 hours of data to see if it can do better.
in reply to Peter Vágner

@pvagner @mush42 I do think it was an issue before streaming came around, but more so with some voices over others. Like Ryan Medium always felt like a super lower quality model to me over something like HFC_male which is very good at not cutting words off in the original model. I'm not sue if a pre-emptive opening of the audio stream is required, this would slow down processing but maybe we can get more consistent wording as balannce.