Skip to main content


Hello Fediverse,

We are looking for Text-To-Speak (TTS) expertise to help or advise us on improving the default voice of the Linux desktop. :linux: 📣

Please reach out or boost :boost_love:

Thanks!

#Linux #tts #accessibility #a11y #GNOME #KDE #FreeSoftware #freedesktop #ml

This entry was edited (4 months ago)

reshared this

in reply to Sonny

I'd at least use RHVoice everywhere I could, and then likely went the Piper TTS route (no idea whether it already has a Speech Dispatcher module, though).
Unknown parent

Lukáš Tyrychtr
Definitely keep it as default, but offer alternatives, and document they exist.
in reply to Lukáš Tyrychtr

@tyrylu @fireborn yeah the more feedback I get the more I being to wonder if what we need isn't an easy way to discover and install speech synthesizers.

I would still like to have a better default though.

Unknown parent

Sonny
@fireborn do you have a link I can follow up on?
Unknown parent

Lukáš Tyrychtr
If it's just pronunciation of some characters, a better speech dispatcher dictionary would likely go a long way, but no idea who should/would create it, and, of course, it would have countless locale specific variants.
in reply to Sonny

I'm developing Sonata, which is a frontend for multiple neural TTS models. Currently it supports more than 30 languages.
Sonata provides TTS through C-library, command line app, GRPC server, and Python bindings.
I'm optimizing Sonata for use in low-resource, high responsiveness scenarios, such as screen reader usage.
An Android app that uses Sonata is currently being developed and will be released soon.
I'm very interested to know what I can offer.
Repo: https://github.com/mush42/sonata

Peter Vágner reshared this.

in reply to Musharraf :verified:

@Musharraf :verified: If I am training a model for piper will I be able to use the same trained onnx model with sonata or do I have to train again? The fact the android app is on the horizon motivates me further I must say.
in reply to Peter Vágner

@pvagner
Yes. Existing onnx models work fine.
You can also export the existing checkpoints using a different script for streaming speech in realtime.
in reply to Peter Vágner

@pvagner
PR pending:
https://github.com/rhasspy/piper/pull/255
in reply to Sonny

No docs yet, but since we support Piper voices, you can listen to voice samples from Piper's official demo page:
https://rhasspy.github.io/piper-samples/
in reply to Sonny

I'm working on a D-Bus based spec and client library that would supersede the current platform APIs. Need to blog/publicize/socialize, but would love to talk about it at some point.

https://eeejay.github.io/libspiel/

Peter Vágner reshared this.

in reply to Eitan

The big idea is that each speech provider can be contained in a flatpak. That allows supporting extensibility in immutable distros, varying licenses and commercial speech engines, and not relying on a fragile set of scripts and dependencies like we do today.
in reply to Eitan

In addition the API would be in parity to other platforms and allow things that speech dispatcher currently does not like pausing (🤯), speech progress events, and concurrent synthesis (ie. one program can't monopolize a global speech queue).
This entry was edited (4 months ago)
in reply to Sonny

I wouldn't call myself an "expert", but I am blind and use TTS and computers a lot and could provide feedback.
in reply to Lukáš Tyrychtr

Thanks for the feedback. We will look into making discovery/install/update/comparison of synthesizers more accessible.

Can you help me understand why you think espeak should remain as default?

From my side, I would like to encourage as much as possible developers to test their GUI with the screen reader.
I believe the default espeak voice is off-putting.

in reply to Sonny

Up until last year, I was working on a TTS engine for a major cloud provider and am currently focusing on efficient AI systems (currently mainly memory-efficient NN training, but during my broad research I also explored efficient inference on edge devices). I'm also user of the Gnome desktop and although not VI, I actually would love an easy-to-use TTS with a natural voice integrated into the desktop. Maybe I could help?
in reply to Sonny

I am going to work on integration of Piper and/or Mimic3 into Linux distros, Termux and Android system. If you believe you'll benefit from this, please respond or email tts@autkin.net as I am seeking sponsorship.
in reply to Sonny

I'm curious to see what comes out of this!

I don't have any idea how to improve the core code, but I have played with eSpeak & rendered HTML/CSS to SSML.

I notice that there's a sharp distinction between the voices which sound natural vs the ones which give me more knobs to take advantage of the medium. I'd like to see that remedied!

in reply to Sonny

Hi there. I would strongly advise against going with neural text-to-speech engines for the default voice of Linux as they are in most cases majorly unresponsive. At the very least offer eSpeak the current option as an easily switchable alternative.
in reply to Sonny

here is my research on existing projects: https://pad.nixnet.services/s/0qeHhUC1c (a bit out of date)

i think espeak-ng is outdated technology. the quality is not acceptable

the best fully open source tts i know is Coqui TTS, but the company is shutting down. maybe you could still contract the people who also worked at mozilla before on the same project

https://github.com/coqui-ai/TTS
info@coqui.ai

in reply to Sonny

We can contract BTW :)

Some examples of what we're interested in

• The state of speech synthesis on the Linux desktop and the various solutions

• What would it take to improve espeak voice

• How well machine learning solutions could work locally, specially in relationship to battery life and older hardware

This entry was edited (4 months ago)
in reply to Sonny

Nice.
For STT, so the opposite, I found #vosk to be OK.
#Vosk
This entry was edited (2 months ago)