Skip to main content

Search

Items tagged with: SpeechToText


Speech to Text on Linux?!?!

I just found "Speech Note" when searching the Software app.
Installed it via flatpak.
flathub.org/apps/net.mkiol.Spe…

Used the "English (Vosk Large)" and "English (Vosk Small)" language model with very decent results. There are loads of models to choose from.
All processed locally. No network needed!
This is great!

#accessibility #SpeechToText #Linux #debian #flatpak #flathub #SpeechNote


I'm sure everyone who wants to know about this already does but, just in case anyone has, particularly if #blind or #DeafBlind, been looking for a local method of converting speech to text ... Whisper is an ML model from OpenAI which allows doing that. It can be used accessibly with all screen readers on Windows. Obviously, this is great for those of us with impaired hearing, it is certainly far more accurate than any of the speech to text programs I've seen, needs no training, and can handle background noise quite well. The audio duration limits are set by your hard drive space and the amount of time you're willing to put into transcription, I've transcribed several hours of audio without difficulty, it just takes time. It's available on Windows using github.com/Softcatala/whisper-… which just seems to need python. A GPU makes it faster, but it's usable on an I5 CPU. The model is also available online at freesubtitles.ai though that requires payment or waiting for long periods to transcribe limited amounts of audio. Thanks to @Bryn@mindly.social for the pointer at whisper-ctranslate2. #whisper #SpeechToText


Mindblowing 🤯

#Whisper is an #openSource #speechRecognition model written in #Python by #OpenAI. I’ve just seen it in action. Extract an #mp3 from a video, run it through Whisper, and it turns every spoken word into text. It even does a very decent job in #Danish. Perfect for subtitling #TV and #video. I am very impressed.

github.com/openai/whisper

#ai #language #transcription #speechToText


Google publishes the source code for their TalkBack screen reader. GrapheneOS maintains a fork of it and includes it in GrapheneOS with the help of a blind GrapheneOS user who works on their own more elaborate fork. Eventually, we'd like to include more or all of their changes.

TalkBack depends on a text-to-speech (TTS) implementation installed/configured/activated. It needs to have Direct Boot support to function before the first unlock of a profile. Google's TTS implementation supports this and can be used on GrapheneOS, but it's not open source.

We requested Direct Boot support from both prominent open source implementations:

RHVoice: github.com/RHVoice/RHVoice/iss…
eSpeak NG: github.com/espeak-ng/espeak-ng…

eSpeak NG recently added it but it's not yet included in a stable release and their licensing (GPLv3) is too restrictive for us.

RHVoice itself has acceptable licensing for inclusion in GrapheneOS (LGPL v2.1), but has dependencies with restrictive licensing. Both these software projects also have non-free licensing issues for the voices. Neither provides close to a working out-of-the-box experience either.

Google's Speech Services app providing text-to-speech and speech-to-text works perfectly. Their proprietary accessibility services app with extended TalkBack and other services also works fine. However, many of our users don't want to use them and we need something we can bundle.

There aren't currently any usable open source speech-to-text apps. There are experimental open source speech-to-text implementations but they lack Android integration.

We also really need to make a brand new setup wizard with both accessibility and enterprise deployment support.

GrapheneOS still has too little funding and too few developers to take on these projects. These would be standalone projects able to be developed largely independently. There are similar standalone projects which we need to have developed in order to replace some existing apps.

AOSP provides a set of barebones sample apps with outdated user interfaces / features. These are intended to be replaced by OEMs, but we lack the resources of a typical OEM. We replaced AOSP Camera with our own app, but we still need to do the same with Gallery and other apps.

Google has started the process of updating the open source TalkBack, which only happens rarely. We've identified a major issue: a major component has no source code published.

github.com/google/talkback/pul…

Google has been very hostile towards feedback / contributions for TalkBack...

This is one example of something seemingly on the right track significantly regressing. Another example is the takeover of the Seedvault project initially developed for GrapheneOS. It has deviated substantially from the original plans and lacks usability, robustness and security.

In the case of Seedvault, GrapheneOS designed the concept for it and one of our community members created it. It was taken over by a group highly hostile towards us and run into the ground. It doesn't have the intended design/features and lacks usability, security and robustness.

All of these are important standalone app projects for making GrapheneOS highly usable and accessible. What we need is not being developed by others and therefore we need to the resources including funding and developers to make our own implementations meeting our requirements.

#grapheneos #privacy #security #android #mobile #accessibility #texttospeech #speechtotext #talkback #blind #backup