Sonny

1 year ago

Sonny
1 year ago

Hello Fediverse,

We are looking for Text-To-Speak (TTS) expertise to help or advise us on improving the default voice of the Linux desktop. 📣

Please reach out or boost

Thanks!

#Linux #tts #accessibility #a11y #GNOME #KDE #FreeSoftware #freedesktop #ml

#FreeSoftware #a11y #Accessibility #linux #gnome #KDE #ML #freedesktop #tts

This entry was edited (1 year ago)

reshared this

in reply to Sonny

Lukáš Tyrychtr

in reply to Sonny 1 year ago

I'd at least use RHVoice everywhere I could, and then likely went the Piper TTS route (no idea whether it already has a Speech Dispatcher module, though).

Peter Vágner likes this.

Unknown parent

Lukáš Tyrychtr

Unknown parent 1 year ago

Definitely keep it as default, but offer alternatives, and document they exist.

in reply to Lukáš Tyrychtr

Sonny

in reply to Lukáš Tyrychtr 1 year ago

@tyrylu @fireborn yeah the more feedback I get the more I being to wonder if what we need isn't an easy way to discover and install speech synthesizers.

I would still like to have a better default though.

@aaron @Lukáš Tyrychtr

Unknown parent

Sonny

Unknown parent 1 year ago

@fireborn do you have a link I can follow up on?

@aaron

Unknown parent

If it's just pronunciation of some characters, a better speech dispatcher dictionary would likely go a long way, but no idea who should/would create it, and, of course, it would have countless locale specific variants.

in reply to Sonny

Musharraf

in reply to Sonny 1 year ago

I'm developing Sonata, which is a frontend for multiple neural TTS models. Currently it supports more than 30 languages.
Sonata provides TTS through C-library, command line app, GRPC server, and Python bindings.
I'm optimizing Sonata for use in low-resource, high responsiveness scenarios, such as screen reader usage.
An Android app that uses Sonata is currently being developed and will be released soon.
I'm very interested to know what I can offer.
Repo: github.com/mush42/sonata

GitHub - mush42/sonata: A cross-platform engine for neural TTS models.

A cross-platform engine for neural TTS models. Contribute to mush42/sonata development by creating an account on GitHub.

^GitHub

Peter Vágner likes this.

Peter Vágner reshared this.

in reply to Musharraf

Peter Vágner

in reply to Musharraf 1 year ago

@Musharraf :verified: If I am training a model for piper will I be able to use the same trained onnx model with sonata or do I have to train again? The fact the android app is on the horizon motivates me further I must say.

@Musharraf

in reply to Peter Vágner

Musharraf

in reply to Peter Vágner 1 year ago

@pvagner
Yes. Existing onnx models work fine.
You can also export the existing checkpoints using a different script for streaming speech in realtime.

@Peter Vágner

Peter Vágner likes this.

in reply to Musharraf

Peter Vágner

in reply to Musharraf 1 year ago

@Musharraf :verified: Is this export script part of piper or part of sonata? I don't seem to be able to find it.

@Musharraf

in reply to Peter Vágner

Musharraf

in reply to Peter Vágner 1 year ago

@pvagner
PR pending:
github.com/rhasspy/piper/pull/…

ONNX streaming support by mush42 · Pull Request #255 · rhasspy/piper

Link to issue number: Issue #25 Summary of the issue: Piper uses sentence-level streaming. For short sentences, the latency of Piper output is relatively low due to the good RTF. But for longer sen...

^GitHub

@Peter Vágner

in reply to Sonny

Musharraf

in reply to Sonny 1 year ago

No docs yet, but since we support Piper voices, you can listen to voice samples from Piper's official demo page:
rhasspy.github.io/piper-sample…

Piper Voice Samples

^{rhasspy.github.io}

in reply to Sonny

Eitan

in reply to Sonny 1 year ago

I'm working on a D-Bus based spec and client library that would supersede the current platform APIs. Need to blog/publicize/socialize, but would love to talk about it at some point.

eeejay.github.io/libspiel/

Spiel-0.1

Reference for Spiel-0.1

^{eeejay.github.io}

Peter Vágner likes this.

Peter Vágner reshared this.

in reply to Eitan

Eitan

in reply to Eitan 1 year ago

The big idea is that each speech provider can be contained in a flatpak. That allows supporting extensibility in immutable distros, varying licenses and commercial speech engines, and not relying on a fragile set of scripts and dependencies like we do today.

in reply to Eitan

Eitan

in reply to Eitan 1 year ago

In addition the API would be in parity to other platforms and allow things that speech dispatcher currently does not like pausing (🤯), speech progress events, and concurrent synthesis (ie. one program can't monopolize a global speech queue).

This entry was edited (1 year ago)

in reply to Sonny

Patrick W

in reply to Sonny 1 year ago

I wouldn't call myself an "expert", but I am blind and use TTS and computers a lot and could provide feedback.

in reply to Patrick W

Sonny

in reply to Patrick W 1 year ago

thanks! Anything that you'd like to mention that isn't covered by tink.uk/notes-on-synthetic-spe… ?

Notes on synthetic speech - Tink - Léonie Watson

^{Tink - Léonie Watson - On technology, food & life in the digital age}

This entry was edited (1 year ago)

in reply to Lukáš Tyrychtr

Sonny

in reply to Lukáš Tyrychtr 1 year ago

Thanks for the feedback. We will look into making discovery/install/update/comparison of synthesizers more accessible.

Can you help me understand why you think espeak should remain as default?

From my side, I would like to encourage as much as possible developers to test their GUI with the screen reader.
I believe the default espeak voice is off-putting.

in reply to Sonny

Stephan

in reply to Sonny 1 year ago

Up until last year, I was working on a TTS engine for a major cloud provider and am currently focusing on efficient AI systems (currently mainly memory-efficient NN training, but during my broad research I also explored efficient inference on edge devices). I'm also user of the Gnome desktop and although not VI, I actually would love an easy-to-use TTS with a natural voice integrated into the desktop. Maybe I could help?

in reply to Sonny

Andriy Utkin

in reply to Sonny 1 year ago

I am going to work on integration of Piper and/or Mimic3 into Linux distros, Termux and Android system. If you believe you'll benefit from this, please respond or email tts@autkin.net as I am seeking sponsorship.

Peter Vágner likes this.

in reply to Sonny

alcinnz

in reply to Sonny 1 year ago

I'm curious to see what comes out of this!

I don't have any idea how to improve the core code, but I have played with eSpeak & rendered HTML/CSS to SSML.

I notice that there's a sharp distinction between the voices which sound natural vs the ones which give me more knobs to take advantage of the medium. I'd like to see that remedied!

in reply to Sonny

davidak

in reply to Sonny 1 year ago

here is my research on existing projects: pad.nixnet.services/s/0qeHhUC1… (a bit out of date)

i think espeak-ng is outdated technology. the quality is not acceptable

the best fully open source tts i know is Coqui TTS, but the company is shutting down. maybe you could still contract the people who also worked at mozilla before on the same project

github.com/coqui-ai/TTS
info@coqui.ai