Recently I have been playing with various GUI's for the Whisper transcription software. Buzz has definitely won the showdown. Almost completely keyboard accessible, give or take the toolbar which needs exploring through object navigation of NVDA or an equivallent in your screen reader of choice; handles the downloading of models, FFMPEG conversion and everything that otherwise would have required operation in the command line, works with Whisper.CPP as far as I can tell and can be localized to other languages.
Now I can finally listen to podcasts in all the languages I can't speak. I love it when technology enhances my access to knowledge and helps me do my work even better for those who benefit from it.
github.com/chidiwilliams/buzz
#Accessibility #Audio #Languages #OpenSource
Now I can finally listen to podcasts in all the languages I can't speak. I love it when technology enhances my access to knowledge and helps me do my work even better for those who benefit from it.
github.com/chidiwilliams/buzz
#Accessibility #Audio #Languages #OpenSource
GitHub - chidiwilliams/buzz: Buzz transcribes and translates audio offline on your personal computer. Powered by OpenAI's Whisper.
Buzz transcribes and translates audio offline on your personal computer. Powered by OpenAI's Whisper. - GitHub - chidiwilliams/buzz: Buzz transcribes and translates audio offline on your person...GitHub
reshared this
Jens Bertrams
in reply to Paweł Masarczyk • • •Steffen
in reply to Jens Bertrams • • •Steffen
in reply to Jens Bertrams • • •Paweł Masarczyk
in reply to Steffen • • •Jens Bertrams
in reply to Paweł Masarczyk • • •Jens Bertrams
in reply to Paweł Masarczyk • • •Paweł Masarczyk
in reply to Jens Bertrams • • •@Radiojens @radiorobbe I use the regular Whisper, I think it's the Whisper.CPP implementation, actually, with the large model. Here are the steps:
1. I import the file using ctrl+o
2. I setup the options for the transcription job as I like them: the mechanism is Whisper, the model is large, the language is set to automatic detection, all the rest left at defaults;
3. I click Run and wait. I will eventually be moved to the table where the progresss on the task is reported.
4. I wait for it to finish i.e. to say "Completed" in the second column.
5. I navigate to the toolbar. I use the laptop layout of NVDA so I'll try to explain it using that keymap:
A. I call the navigator focus to my system focus by pressing NVDA+Backspace;
B. I navigate out of the table object - NVDA+Shift+Up arrow;
C. I navigate then two objects to the left - NVDA+shift+left arrow twice, so that I find the toolbar;
D. I expand that object with NVDA+shift+down;
E. I navigate to the right using NVDA+Shift+right arrow until I find the "Open Transcript" button;
F. I call the focus to my navigator object - that+'s NVDA+Shift+M
G. I activate the button by pressing NVDA+Enter;
6. A new window opens where the text of the transcript is presented in this inaccessible edit field that you can't handle with a keyboard. The "Export" button is found by pressing Tab. You can pick the format you need from the context menu that pops up and save it anywhere you choose.
I hope this helped. If not, and you find it a good idea, we could try to communicate somewhere else and coordinate a remote session so that I could try and see what the problem might be on your end.
Jens Bertrams
in reply to Paweł Masarczyk • • •Jens Bertrams
in reply to Paweł Masarczyk • • •Paweł Masarczyk
in reply to Jens Bertrams • • •