Peter Vágner reshared this.

Interesting, when a modal dialog (nested within a `<main>` element or some sectioning content) opens in Chrome it seems to disregard its context, so elements like `<header>` and `<footer>` will have `banner` and `contentinfo` roles as if they were scoped to `<body>`. When opened non-modally, they have `sectionheader` and `sectionfooter` (new ARIA roles + mapping) which means that it is computing the roles based on the dialog’s context.

Safari and Firefox keep the context is both scenarios (i.e. modal and non-modal). I have no idea what is correct. Need to read more.

Anyway, here’s a playground for the bug: knowler.dev/demos/MR4JmQW

#HTML #ARIA #accessibility

Peter Vágner reshared this.

in reply to Nathan Knowler

Interesting indeed. To distill that further:
data:text/html,<body><main role="none"><header>hi
So it seems Chrome is using the accessibility tree for the context restriction here, not the DOM tree. That means role="none" and any other way of removing content from the accessibility tree, including modal dialogs, change the context. In contrast, Firefox uses the DOM tree in this case.
Normally, the accessibility tree is what you want to use for context because of generics, role="none", etc. However, in this particular case, it seems more appropriate to do this based on the DOM. Hmm.
Peter Vágner reshared this.

typst, an open-source document typesetting system that seems to be growing in popularity, now generates accessible, tagged PDFs by default! typst.app/blog/2025/typst-0.14…

reshared this

Peter Vágner reshared this.

Good news! It's getting harder to keep track of new #chatmail relays :)

Recently several public relays were added to the chatmail.at/relays list, in Warsaw, Helsinki, Romania and Barcelona.🧡

If you have #deltachat installed on mobile, you can go to any chatmail relay website in the list and click on a link there to create a chat profile.

It's wonderful to hear about collectives setting up chat infrastructure like the recent xat.fedi.cat from @fedicat and @eXOfasia

Peter Vágner reshared this.

Peter Vágner reshared this.

I've been using IndentNav for a while to write Python. Recently I installed BrowserNav and now get way more positional info about HTML elements. It took some getting used to, but now the beeps and tones help me get an idea of the physical layout of a site or electron interface.
It's similar to IndentNav, but it has more rules and works with web browsers instead of being focused on code in a text editor. Positioning is one of the pieces of information that I forgot how much I miss from my sighted days. It's especially helpful with API reference docs that rely on positional encoding. Since there's more info in the browser, I only have beeps instead of precise indentation levels to get a general idea of the structure
#blind #nvda #nvaccess #browsernav #indentnav #accessibility #code

reshared this

Peter Vágner reshared this.

I love Joplin so much. I set up the selfhosted Joplin Notes at some point, and I have never, not even one time, had to touch it. It updates itself, it doesn't crash, it doesn't take up enormous amounts of system resources for no reason, and all the apps just work with it on my Windows, Mac, and IOS computers. I've even got other users on it, and it works just fine. And the apps are also good. Offline? No problem! You can still look at the last version of your stuff. Need to publish a quick, well-formatted single page to the web? Done with a snap!

Peter Vágner reshared this.

in reply to Nick Giannak III

@nick@quinn You could also run cosmos-cloud.io. It'll run and update your dockers for you, configure your reverse proxies, manage your SSL certs, etc. But the nice thing is it uses the standard methods to do all of those things, so unlike other server management GUIs, you can do stuff via the command line or via cosmos-cloud and it doesn't matter.
in reply to Matt Campbell

@matt@nick@quinn to be fair, it's not something you should do in an enterprise environment. In general, updates for things that aren't your hobby need to be deployed to staging, tested, and only then pushed to production. Watchtower just does an in-place update of the containers. But that's fine, and probably even better, for hobby projects.
Peter Vágner reshared this.

Just heard about Bitrig, an iOS app that lets you "vibecode" apps in Swift UI from your phone. It was created by some X Swift UI devs who used to work at Apple, and although I haven't yet tested this out I have no reason to think it wouldn't be accessible. This could actually be a really fun way to learn, as you can not only see the final result, but also view and edit the code. You can even share your new app with friends and deploy to TestFlight. Link to the main site: bitrig.app

Peter Vágner reshared this.

Peter Vágner reshared this.

For a few years I've been aware of this website that purports to be able to unlock shopping cart wheels using the speaker on your phone, but i finally had an excuse to try and and i remembered in the moment.

A woman was outside the grocery store struggling to move her shopping car that was stuck because two of the wheels were locked

I remembered the website! So I put my phone near the wheels and played the sound. The wheels unlocked like magic. She was very happy. So cool.

begaydocrime.com/

reshared this

Peter Vágner reshared this.

Apparently, Zoom P4 Next has been released. It's so so cool to see accessibility being mentioned in the video as one of the features advertised. youtube.com/watch?v=Id73VO07C0…
#accessibility #recording #podcasting

reshared this

Peter Vágner reshared this.

I have a rather peculiar #Android problem.
- I use personalDNSfilter (zenz-solutions.de/personaldnsf…) to block ads system-wide. It's basically like running a local pi-hole using a local VPN.
- I would also like to use Orbot (#Tor) and run some apps (specifically Nextcloud) that don't natively support proxying through Orbot's VPN.

The problem is, Android won't let me run two VPNs at the same time. And blocking ads without a VPN would require rooting my phone, which I don't want to do. However:
- personalDNSfilter can expose the DNS server on port 5300 without using the VPN (which is useless in itself).
- Orbot can expose its HTTP and SOCKS proxy without using the VPN (which is also useless in itself).

Is there some way to setup a custom VPN that would combine these two things, i.e., let me route some apps through Orbot's proxy and use the local DNS server (provided by personalDNSfilter) at port 5300? I was looking at OpenVPN for Android (github.com/schwabe/ics-openvpn), but I'm honestly really confused. Help please? 😅 Boosts appreciated.

Peter Vágner reshared this.

in reply to Jiří Eischmann

Hello @Razemix, with #AdGuardHome you can use it inside and outside you LAN.
Here, I set it as DHCP server and it acts as DNS resolver for all the endpoints on my local network. But I also declare each device with an unique identifier and set private DNS on all of them. Profiles for iOS devices can be generated from the #AGH dashboard.
My AGH is serving DoT, DoH and DoQ protocols. This way, strangers cannot use my resolver to poison it.
VPN connection is not required is this setup to use your AGH outside your local network.
You need a domain name, a free certificate (Let’s Encrypt), open two ports (443 & 853 on UDP & TCP) in your router and firewall, write a tiny script to update your DNS record if your WAN IP address is dynamic.
Network ports are: #DoT (853/TCP), #DoQ (853/UDP), #DoH HTTP/2 (443/TCP), DoH HTTP/3 (443/UDP).
All you devices and family ones can use your personal secure DNS.
You can also completely replace standard DNS client on all your computers with #dnsproxy software developed by AGH team. All your devices will use secure DNS.

@sesivany

Peter Vágner reshared this.

Well, thanks to github.com/Memotech-Bill/MakeD… - now I have a DAISY book of 380 tracks on my Brailliant, and it plays music! Ahaha what a concept Humanware, I'm playing my own Mp3s freely on your device. Big middle finger to you. For a release, grab github.com/Memotech-Bill/MakeD… - this uses mostly Linux conventions. I built a Windows version, that uses FFMPEG and not Id3tag to grab file durations, and make a DAISY 2.02 book.
Oh, you want it? I do link back to this original source. I don't just write code willy-nilly without crediting back a source. So here you go, my fork.
eurpod.com/makedaisy_ffmpeg.zi…
- just point it at a directory and it'll do all the magic for you, throw it onto a Chameleon, Humanware E-reader, Brailliant Bi 20X or 40X, whatever, even any old DAISY player might play this if you can open ncc.html.
This entry was edited (3 months ago)

reshared this

in reply to x0

@x0 OOh vbr I haven't tried yet, might experiment on that a little bit, most stuff I have here just re-encodes into CBR rates. Too bad we can't poke inside the firmware to see what MP3 engine they use. Would be a shame if they now locked it down to just 64-bit mono, but gosh darn it, they're never taking the freedom away from me to play any file I want on there now that audio's been let out of the bag :D
@x0
Peter Vágner reshared this.

Saw on my timeline today that the Windows 11 25H2 installer removed the ability to launch Narrator from install media. Can anyone please confirm? That'd be an interesting thing for me to learn for sure right now, seeing as I've been trying to bring it up in KVM, failing, and debugging it as if Linux audio is the problem. To be fair, it often is, but maybe not this time.

Peter Vágner reshared this.

Peter Vágner reshared this.

We are proud to announce the go-live of our new website, izzyondroid.org/ 🥳

Although still modest, this site lets newcomers jump straight into #IzzyOnDroid and explains what makes our repository worth using. We also hope the new domain will increase visibility and reduce confusion.

We’d love to hear your thoughts here on the Fediverse. Encounter an issue? Please report it on our Codeberg issue tracker at codeberg.org/IzzyOnDroid/izzyo…. And, YES! — our website is #OpenSource 🤗

(1/3)

reshared this

Peter Vágner reshared this.

Today's AWS debacle is the perfect example of the reason why in the last few years I started to be less enthusiastic about Signal, and more oriented to federated or even P2P solutions like XMPP and Jami. I wrote about it already:

gagliardoni.net/#im_battle_202…

Signal was down for few hours today, after an outage that affected AWS:

mastodon.world/@Mer__edith/115…

Let's ignore for a second the blind reliance on AWS or any other cloud provider. In a decentralized system, this would not have happened, or at least it would have not impacted so many users.

Yes, I am a cryptographer myself, I know that Signal's encryption is the best. But encryption is not everything. Availability issues, geopolitical troubles, risk of enshittification, limitations on users' freedom to use and control the software lead to a lack of trust, even in a supersecure solution. And I say that with honest admiration for the folks at Signal, who are doing a great job.

May they prove me wrong over and over again.

#signal #im #aws #amazon #privacy #security #digitalsovereignty #selfhosting #fediverse #federation #p2p #enshittification #xmpp #jami #politics #opensource #freesoftware #libre


PSA: we're aware that Signal is down for some people. This appears to be related to a major AWS outage. Stand by.

reshared this

Peter Vágner reshared this.

Recent development of VScan: The visual perception layer for the blind is much more privacy friendly, universal and finally in app stores!

Sensitive content

reshared this

Peter Vágner reshared this.

Boy, this brings back memories! | JAWS for DOS 2.3 Basic Training Tapes : Henter-Joyce, Inc : Free Download, Borrow, and Streaming : Internet Archive archive.org/details/jfd-2.3-ba…

reshared this

Peter Vágner reshared this.

I'm quite proud of the fact that I usually find out about downtime of #bigtech services from the news.

For years, almost all of my personal online tools have been self-hosted or run in data centres of small regional providers. They aren't immune to downtime, but when they're down, it isn't half the internet.

Monocultures may seem efficient, but they aren't resilient.

#AWS #AWSOutage #Amazon

Peter Vágner reshared this.

Peter Vágner reshared this.

Just a quick note for the archives and for those of us who have hearing issues and need easy youtube transcripts with #a11y to #screenreaders. I've generally used downsub.com to produce text, but it's becoming less usable, often saying "waiting" for a minute or two before giving an error with no specifics as to what the problem is. I wanted something else and found something similar, if not slightly better, at youtube-transcript.io/ It allows 25 transcripts a month free, though I'm not sure how it tracks usage since I didn't have to sign in. Anyhow, it's quite simple, paste in your link, hit enter, and it comes back with a transcript in English for English videos. It appears to use the captions YT itself generates, since I compared the results with what I got from downsub and saw no difference. You can read the transcript on the page or copy it to the clipboard with a button, again on the page. It has 30 second timing markers by default when it's on the page, but doesn't when you copy it, at least by default. I'm still after an easy local transcript grabber, but this is a step forward when downsub doesn't work for me and I thought people might like to have it. #blind #braille

Peter Vágner reshared this.

in reply to Cleverson

@clv0 Sorry, just to be clear, downsub is still working here as well, much of the time. I'd say it works a good half the time, if not a bit more. Sometimes it'll say "waiting" and in a few seconds, come up with the transcript. Sometimes it will come up with it immediately. Quite often, though, it'll just say waiting for a while then say "error". It's still usable, though, quite often, just not nearly always.
Peter Vágner reshared this.

Hey! Did you know that you can now find all the official @xmpp news on Movim 😮!

mov.im/community/news.xmpp.org…

You can subscribe to their News feed using any XMPP account and by joining from any #Movim instance (a lot of them are listed on join.movim.eu/) ✨

Movim is building a complete social platform on top of XMPP. Any account that joins the platform can already create its own blog 😊

Movim Communities allows you to create spaces to publish with others around the topic you like 😸! This is what the XSF did with their own #XMPP News Community, hosted on their own server and accessible from anyone on the network.

100% standard and fully federated ❤️

This entry was edited (3 months ago)

reshared this

Peter Vágner reshared this.

hi. so descam, the thing I made to describe your camera via Rust, well someone said why didn't you make it a website? And it made a lot of sense. So thanks to a bit of tinkering and a lot of vibing we have descam.oriolgomez.com the OpenAI API key is never sent anywhere, it uses your local storage to store it. I might add support for local models if there is request, so. you can run your phone images to your local ollama or whatever.

reshared this

Peter Vágner reshared this.

PipeWire 1.6 Promises Bluetooth Audio Streaming for Hearing Aid Support lxer.com/module/newswire/ext_l…

reshared this

Peter Vágner reshared this.

Máte doma v šuplíku telefon, který už nepotřebujete? Doneste ho na #OpenAlt stánek mobilního linuxu a pomůžete jeho vývoji!
Pomůžou nám následující, třeba i poškozené (kromě zákl. desky), modely:

Pixel 3 (i XL), Pixel 3a (i XL), Pixel 4a, Pixel 6,
OnePlus 6, 6T,
Sony Xperia XZ2, XZ3,
Xiaomi Mi Mix 2S, 3, Pocophone F1, Mi 8 (i Pro, Explorer),
LG G7 ThinQ, V35 ThinQ,
Fairphone (jakýkoliv),
SHIFT (jakýkoliv),
Chromebooky (různé)
Motorola Moto (různé)
Samsung Galaxy (různé)
Xiaomi (různé)

reshared this

Peter Vágner reshared this.

When not using Google Play services (e.g. #GrapheneOS, #LineageOS users), #Signal can be a real battery drain. @mollyim with @unifiedpush on the other hand is extremely battery efficient.

Here's how to set this up, using #Nextcloud as the UnifiedPush provider: kroon.email/site/en/posts/moll…

reshared this

Peter Vágner reshared this.

Hacker gets annoyed at Amazon’s Kindle apps, reverse-engineers the Kindle web reader’s protocol (which basically sends each page as a set of glyphs in a deliberately broken variant of SVG). Such obscurity, much security.

blog.pixelmelt.dev/kindle-web-…

reshared this

Peter Vágner reshared this.

I need to reinstall Windows on my Surface tablet, and I wanted to see if I could use AI to get into the boot menu and change the boot order to try USB first. I have a Chat GPT subscription, so I used voice mode with the video camera on, gave it some information about what I was doing, and let it be my eyes for a while.
I was delighted when the assistant actually knew how to get into the boot menu. It told me I could navigate the menu with my volume keys and select an option with the power key, much like navigating an Android phone’s recovery menu. I had to remind it I was totally blind a couple of times, but eventually, it helped me select USB as the first boot device and choose the “Exit and Restart” option. This was quite a few steps, and I was genuinely impressed at how easily I was able to fix this problem without interacting with a real person.
I eagerly waited for the Windows installer to come up. It never did.
So I used OCR on the screen, and discovered that not a single thing chat GPT claimed had happened … had actually happened.
It didn’t hallucinate just one thing. It hallucinated an entire multi-step interaction with the firmware of my tablet. It basically experienced a break from reality for about two minutes and started describing what it thought should happen, with no regard for what was actually happening.
Last week, the same app helped me learn the control panel of my heated mattress pad. It does work sometimes. But today, it led me on the wildest goose chase I’ve ever been on. I was actually trying to boot from the SD card, and as it turns out, that’s not even an option in the boot menu. But I made the mistake of teling Chat GPT exactly what I was trying to do, so it had all the material it needed to hallucinate a complex interaction convincingly.
Never let yourself forget the all important “A” in “AI”. That intelligence is not artificial as in “synthetic”, it’s artificial as in “pretend”. No LLM has the slightest idea of what it’s doing or saying. The companies that create these models have the all-important task of trying to make their intelligence more convincing than every other company.
That means they work most of the time, but the rest of the time, they will confidently lie. And that lie might be a missing digit, or it might be a whole entire interaction with a device.
I called Aira and got it sorted. I actually had to touch the arrows on the touchscreen to rearrange the boot order. Yes, I had a keyboard connected. No, there is no documented way to rearrange boot devices on a surface using the keyboard. Yes, everythyng about this is moronic. But, it’s done, and once I get Windows on my USB device, I’m pretty sure it will boot because an actual human told me so.
I can’t even begin to enumerate the possible clusterfucks that could arise from AI weaving such complex webs of lies. Do not use this shit for anything mission-critical. Ever. Even if it told the truth the last 99 times. Eventually, it will lie. When it does, you’ll have no idea.
This entry was edited (3 months ago)

reshared this

in reply to Simon Jaeger

Ahaha, totally reminds me of me trying to get secure boot disabled with GPT - it confidently kept saying I turned it off, that I'm in the advanced tab and just press arrow down a few times to find it. Of course, I was in the boot tab, and I later resorted to asking it more limited questions because the moment I would give away what I wanted to do, it would get way worse. No asking it, "Look for somethhing with the label secure boot and tell me how many arrow downs it is." Had a similar experience trying to get it to read the MAC address off a router, but at least there I could use something more local like Seeing AI and get an answer that I knew was as accurate as the ability I have to keep a good angle for it to read it. I think that's the other problem with GPT or other AI: It doesn't inform you like a human (always) that you need to move or tilt your phone. I've had it help me with that a few times, but unless you specifically ask it whether your phone needs adjusting, it would rather make up the thing than tell you.

Zach Bennoui reshared this.

Peter Vágner reshared this.

Tomorrow, I'll have had my Ableton Move for exactly one year, so I finished a silly sequence I started a few weeks ago. Four tracks. No overdubs or weird tricks with the drum rack. Just lots of step entry and automation. I call this one Happy Saw Times, because it started out as this happy F major sawtooth thing.

reshared this

Peter Vágner reshared this.

TIL: There's a W3C candidate recommendation draft for a CSS markup to transfer different properties of text and controls on the web via audio cues and changes to the TTS volume, speech rate, tone, prosody and pronunciation, kind of like the attributed strings in iOS apps and it's called CSS Speech. w3.org/TR/css-speech-1/ #Accessibility #A11y #Blind

reshared this

in reply to Paweł Masarczyk

There are people who seem to feel really strongly about this being a good thing for screen reader users, and I must admit to being bewildered about why. Websites changing aspects of screen reader output may be equitable, if we compare it with the way webpages can alter visual presentation through fonts and other aspects. But to me it feels entirely inappropriate to cross that boundary between the browser as the user agent and accessibility software in order to interfere with very personal settings.

Meanwhile on iOS, the related accessibility attributes are being used to achieve outcomes nobody wants or needs, like spaces between all the digits of a credit card number. @miki @prism

in reply to James Scholes

I can see the point for e.g. text-to-speech APIs built into the browser, maybe even read-aloud features. But the case for screen reader compatibility seems to be built on the foundational assertion that SR output is monotonous and can't be "livened up" by brands.

As assertions go, I think that is both true and exactly how it should be. I don't use a screen reader for entertainment. I can think of few things more obnoxious than a marketing person thinking that my screen reader should "shout this bit."

Many web authors can't even label stuff correctly. Why on earth would we expect them to treat this sort of feature with informed respect? @miki @prism

in reply to Drew Mochak

@prism I think without ARIA or an equivalent (like more things built into the web platform), the web would've continued galloping forward with all the same UI widgets and design patterns but with no way to make them even halfway accessible, and we'd be left even more behind than we are now.

By contrast, I don't think the inability for a website to change the pitch of NVDA is a legitimate blocker to anything worthwhile. @Piciok @miki

in reply to James Scholes

@jscholes I have felt for a while that only having TTS for everything is pretty limitting. So, you know, I use unspoken. Problem solved. I haven't really thought to myself, self, it would be great if the website author could script some nonverbal feedback for me instead of what I am currently hearing, or anything like that. So this may well be a solution in search of a problem.
@Piciok @miki
in reply to Drew Mochak

@prism @jscholes @miki I don't see the point because everyone has different ways they like to hear things. People choose the verbosity and speech options that work for them and to have something override that would be irritating. I also feel that this is part of a larger conversation about the perceived need for sighted people to feel like our experience of the web is vastly different. This is why we have a lot of unnecessary context already and here is another example.
in reply to miki

@miki I think it's a trap to suggest that such problems should currently be solved only through speech properties and auditory cues within individual apps. Expressive semantics on the web have only been explored at a surface level so far, and it's a complete stretch to go from "We don't have the ARIA properties to convey complex information," to "Let's have every application implement its own beeps and boops."

Imagine having to learn the sound scheme for Gmail, then Outlook, then Thunderbird. Then going over to Slack where they also have unread state albeit for chat messages rather than emails, but they use an entirely different approach again.

All the while, braille users are getting nothing, and people who struggle to process sounds alongside speech are becoming more and more frustrated. Even if we assume that this is being worked on in conjunction with improvements to ARIA and the like, how many teams have the bandwidth and willingness to implement more than one affordance?

We've already seen this in practice: ARIA has braille properties, but how many web apps use them? Practically none, because getting speech half right and giving braille users an even more subpar experience is easier. Your own example highlights how few apps currently let you control things like verbosity and ordering of information.

CSS Speech could turn out even worse. A product team might opt to implement it instead of semantics because the two blind people they spoke to said it would work for them, and never mind the other few million for whom it doesn't. They'll be the people complaining that there's no alternative to the accessibility feature a team spent a month on and thought was the bee's knees.

@silverleaf57 @prism @Piciok

in reply to James Scholes

@jscholes @silverleaf57 @prism Efficiency, not equity.

Words are a precious resource, far more precious than even screen real estate. After all, you can only get a fairly limited amount of them through a speaker in a second. We should conserve this resource as much as we can. That means as many other "side channels" as we can get, sounds, pitch changes, audio effects, stereo panning (when available) and much more.

Icon fatigue is real. "me English bad, me no know what delete is to mean" is also real, and icons, pictograms and other kinds of pictures is how you solve that problem in sighted land.

Obviously removing all labels and replacing it with pictograms is a bad idea. Removing all icons and replacing them with text... is how you get glorified DOS UIs with mouse support, and nobody uses these.

in reply to miki

@jscholes @silverleaf57 @prism Everything said above also applies to braille, Braille cells are even more precious than words in a speaker. It's a schame we can abbreviate "main landmark heading level 2" to something more sensible, but we can't abbreviate "unread pinned has attachment overdue" if those labels are not "blessed" by some OS accessibility API.
in reply to James Scholes

@miki Note that I'm specifically responding to your proposed use case here. You want beeps and boops, and I think you should have them. But:

1. I think you should have them in a centralised place that you control, made possible via relevant semantics.

2. I don't think the fact that some people like beeps and boops is a good reason to prioritise incorporating beeps and boops into the web stack in a way that can't be represented via any other modality.

@silverleaf57 @prism @Piciok

This entry was edited (3 months ago)
in reply to James Scholes

@jscholes @silverleaf57 @prism Centralized beeps and boops don't make much sense to me. Each app needs a different set, let's just consider important items on a list. That can mean "overdue", "signature required", "has unresolved complaints", "student not present", "compliance certification not granted" or something entirely different. We can't expect screen readers to have styles for all of these, just as we can't expect browsers to ship icons for all of these.
in reply to miki

@miki Sure. Or it can just mean "important" in a domain-specific way that's shared across apps in that domain. We should be taking advantage of that to make information presentation and processing more streamlined, before inventing an entirely new layer and interaction paradigm that hasn't been user tested and will require text alternatives anyway. @silverleaf57 @prism @Piciok
in reply to James Scholes

@miki As noted, I think people who can process a more efficient stream of information should have it available to them. That could be through a combination of normalised/centralised semantics, support for specialised custom cases, and multi-modal output.

My main concern remains CSS Speech being positioned as the only solution to information processing bottlenecks, which I think is a particularly narrow view and will make things less accessible for many users rather than more.

Good discussion, thanks for chatting through it. @silverleaf57 @prism @Piciok

in reply to James Scholes

@jscholes At the same time, I think the chances that CSSSpeech completely takes over the industry and we all stop doing text role assignments is quite low.
explainxkcd.com/wiki/index.php…

So I am decidedly meh about this. It could help but probably won't.
@miki @silverleaf57 @Piciok

in reply to James Scholes

@jscholes @prism @miki @silverleaf57 I found the concept intriguing and am myself in two minds about it. On one hand, I wouldn't mind having the speech experience augmented by things that aren't words. I could imagine browsing a product's details page and reading upon all of it's features with tiny earcons indicating whether certain feature is supported or not rather than hearing "Yes" and "No" every time. This could even be played at the same time as the readout begins. To be fair, I also don't mind having the pronunciation of tricky words that are important for proper understanding and functioning in a domain, predefined just so I could learn it. Character and number processing might come in handy too - recently there was an issue on the NVDA Github opened against a feature to read combinations of capital letters and digits as separate entities for the benefit of ham radio operators and their call signs. Some kinds of numbers I also find easier to remember when they come digit-by-digit etc. The ability to define the spatial location of voice on the stereo sound spectrum could be useful for presenting those spatial relationships in some advanced web apps (thinking scientific contexts, design, web text and code editors etc.. As you say, however, I wouldn't expect this being widely adopted by web devs who already struggle with the proper use of ARIA. Also the trade-offs could be significant, especially if this becomes the sole way of conveying information. Blind users with a profound hearing impairment who will miss out on crucial information because it was read out too quietly, too fast and with a pitch that takes away some of the frequencies they can't discern any more; neurodivergent people confused by sudden changes and unfamiliar sounds on top of exotic keyboard shortcut choices they already have to remember etc. This could create a situation similar to WCAG SC 1.4.1 where the colour is used as the only way of conveying information.
in reply to Paweł Masarczyk

This already exists though, as a screenreader feature. Kind of. NVDA has an add-on called unspoken that will replace the announcement of roles with various sounds, there's a different one for checked vs. unchecked boxes for instance. JAWS did (does?) something similar with the shareable schemes in the speech and sounds manager. Granted, not a lot of people do this, but the ability is there if people want it. VO, TB and cvox also have earcons--they're not used for this purpose, but they could be. Having this under the user's control rather than the author's control does seem better. It prevents for instance a developer deciding to be super abtrusive with ads. I do see the potential for it to be good, the author would be able to convey more nuanced concepts being the author of the content... it just feels like a thing most people wouldn't use, and most of the people who'd try would end up being obnoxious about it.

@jscholes @miki @silverleaf57

in reply to Drew Mochak

@prism @jscholes @miki @silverleaf57 Yes, this is what I'm thinking too. Also, the addons are great - I experiment with Earcons and Speech Rules which is another addon with tons of customization. Bringing it on as a core feature would signal it as industry standard though and from that it would be possible to explore whether any external API's could augment it in any way.
in reply to James Scholes

@jscholes @prism @miki @silverleaf57 As for this being widely adopted, I expect some CSS properties could be mapped to the aural cues on a browser lever just like some HTML elements carry implicit ARIA properties with them by default. This would have to be carefully considered. Regarding sound cues: this would have to be based on some kind of familiarity principle where the sounds are those most users will already know or they resemble the action they are supposed to represent, think emptying the recycle bin on Windows. I really like the approach of JAWS representing heading levels through piano notes in C major - it sounds logical but on the other hand not everyone is able to recognize musical notes at random. I'm not convinced about the marketing value of this - I mean creating brand voices etc. It sounds fun but no more than that, at least in the screen reader context. I guess inclusion in advertising is another can of worms that might derail the discussion. I'm looking forward to when NVDA finally incorporates some kind of sound scheme system because we will then be able to talk about some kind of standard given that JAWS and to some extent VoiceOver and Talkback make use of that already. I guess then the discussion could evolve around this being complementary to something like aria-roledescription or aria-brailleroledescription, assigning familiar sounds and speech patterns to custom-built controls.
in reply to James Scholes

@jscholes @prism @miki @silverleaf57 I think inviting @tink and @pixelate into the discussion is a great idea as they might have valuable insights on this. On a related note: something that's been running around my head is how many Emojis could be faithfully represented by sounds.
in reply to Paweł Masarczyk

@jscholes @prism @miki @silverleaf57 @tink So, I generally like beeps and boops. All shiny and stuff. But the web is made by sighted people, and they will get things wrong. I'd rather we have our own tools, like NVDA'S earcons addon, and maybe have earcon packs for it to, for example, add aural highlighting for VS Code, or make-gmail-shiny, stuff like that.
Peter Vágner reshared this.

Researchers pointed a satellite dish at the sky for 3 years and monitored what unencrypted data it picked up. The results were shocking: They obtained thousands of T-Mobile users' phone calls and texts, military and law enforcement secrets, much more: 🧵👇wired.com/story/satellites-are…
This entry was edited (3 months ago)

reshared this

Peter Vágner reshared this.

During last 3 months I am using VDO ninja for all my remote interwiev and podcast recordings. here is my article about it from the blind perspective, focused on accessibility and audio.

Have You Ever Wanted to Record an Interview or Podcast Online? You’ve probably faced a few challenges:
How to transmit audio in the highest possible quality?
How to connect in a way that doesn’t burden your guest with installing software?
And how to record everything, ideally into separate tracks?

The solution to these problems is offered by the open-source tool VDO Ninja.

What Is VDO Ninja


It’s an open-source web application that uses WebRTC technology. It allows you to create a P2P connection between participants in an audio or video call and gives you control over various transmission parameters.
You can decide whether the room will include video, what and when will be recorded, and much more.

In terms of accessibility, the interface is fairly easy to get used to — and all parameters can be adjusted directly in the URL address when joining.
All you need is a web browser, either on a computer or smartphone.

Getting Started


The basic principle is similar to using MS Teams, Google Meet, and similar services.
All participants join the same room via a link.
However, VDO Ninja distinguishes between two main types of participants: Guests and the Director.
While the guest has limited control, the director can, for example, change the guest’s input audio device (the change still must be confirmed by the guest).

A Few Words About Browsers


VDO Ninja works in most browsers, but I’ve found Google Chrome to be the most reliable.
Firefox, for some reason, doesn’t display all available audio devices, and when recording multiple tracks, it refuses to download several files simultaneously.

Let’s Record a Podcast


Let’s imagine we’re going to record our podcast, for example, Blindrevue.
We can connect using a link like this:

https://vdo.ninja/?director=Blindrevue&novideo=1&proaudio=1&label=Ondro&autostart=1&videomute=1&showdirector=1&autorecord&sm=0&beep

Looking at the URL more closely, we can see that it contains some useful instructions:
  • director – Defines that we are the director of the room, giving us more control. The value after the equals sign is the room name.
  • novideo – Prevents video from being transmitted from participants. This parameter is optional but useful when recording podcasts to save bandwidth.
  • proaudio – Disables effects like noise reduction, echo cancellation, automatic gain control, compression, etc., and enables stereo transmission.
    Be aware that with this setting, you should use headphones, as echo cancellation is disabled, and otherwise, participants will hear themselves.
  • label=Ondro – Automatically assigns me the nickname “Ondro.”
  • autostart – Starts streaming immediately after joining, skipping the initial setup dialog.
  • videomute – Automatically disables the webcam.
  • showdirector – Displays our own input control panel (useful if we want to record ourselves).
  • autorecord – Automatically starts recording for each participant as they join.
  • sm=0 – Ensures that we automatically hear every new participant without manually unmuting them.
  • beep – Plays a sound and sends system notification when new participants join (requires notification permissions).

For guests, we can send a link like this:

https://vdo.ninja/?room=Blindrevue&novideo=1&proaudio=1&label&autostart=1&videomute=1&webcam

Notice the differences:
  • We replaced director with room. The value must remain the same, otherwise the guest will end up in a different room.
  • We left label empty — this makes VDO Ninja ask the guest for a nickname upon joining.
    Alternatively, you can send personalized links, e.g., label=Peter or label=Marek.
  • The webcam parameter tells VDO Ninja to immediately stream audio from the guest’s microphone; otherwise, they’d need to click “Start streaming” or “Share screen.”


How to Join


Simply open the link in a browser.
In our case, the director automatically streams audio to everyone else.
Participants also join by opening their link in a browser.
If a nickname was predefined, they’ll only be asked for permission to access their microphone and camera.
Otherwise, they’ll also be prompted to enter their name.

Usually, the browser will display a permission warning.
Press F6 to focus on it, then Tab through available options and allow access.

Controls


The page contains several useful buttons:

  • Text chat – Toggles the text chat panel, also allows sending files.
  • Mute speaker output – Mutes local playback (others can still hear you).
  • Mute microphone – Mutes your mic.
  • Mute camera – Turns off your camera (enabled by default in our example).
  • Share screen / Share website – Allows screen or site sharing.
  • Room settings menu (director only) – Shows room configuration options.
  • Settings menu – Lets you configure input/output devices.
  • Stop publishing audio and video (director only) – Stops sending audio/video but still receives others.


Adjusting Input and Output Devices


To change your audio devices:

  1. Activate Settings menu.
  2. Press C to jump to the camera list — skip this for audio-only.
  3. Open Audio sources to pick a microphone.
  4. In Audio output destination, select your playback device. Press test button to test it.
  5. Close settings when done.


Director Options


Each guest appears as a separate landmark on the page.
You can navigate between them quickly (e.g., using D with NVDA).

Useful controls include:

  • Volume slider – Adjusts how loud each participant sounds (locally only).
  • Mute – Silences a guest for everyone.
  • Hangup – Disconnects a participant.
  • Audio settings – Adjusts their audio input/output remotely.


Adjusting Guest Audio


Under Audio settings, you can:

  • Enable/disable filters (noise gate, compressor, auto-gain, etc.).
  • View and change the guest’s input device — if you change it, a Request button appears, prompting the guest to confirm the change.
  • Change the output device, useful for switching between speaker and earpiece on mobile devices.


Recording


Our URL parameters define automatic recording for all participants.
Recordings are saved in your Downloads folder, and progress can be checked with Ctrl+J.

Each participant’s recording is a separate file.
For editing, import them into separate tracks in your DAW and synchronize them manually.
VDO Ninja doesn’t support single-track recording, but you can use Reaper or APP2Clap with a virtual audio device.

To simplify synchronization:

  1. Join as director, but remove autorecord.
  2. Wait for everyone to join and check audio.
  3. When ready, press Alt+D to edit the address bar.
  4. Add &autorecord, reload the page, and confirm rejoining.
  5. Recording now starts simultaneously for everyone.
  6. Verify this in your downloads.


Manual Recording


To start recording manually:

  1. Open Room settings menu.
  2. Go to the Room settings heading.
  3. Click Local record – start all.
  4. Check PCM recording (saves WAV uncompressed).
  5. Check Audio only (records sound without video).
  6. Click Start recording.


Important Recording Notes


  • Always verify that all guest streams are recording.
  • To end recordings safely, click Hangup for each guest or let them leave.
  • You can also toggle recording for each guest under More options → Record.
  • Files are saved as WEBM containers. If your editor doesn’t support it, you can convert them using the official converter.
  • Reaper can open WEBM files but may have editing issues — I prefer importing the OPUS audio file instead.


Recommended Reading


In this article, I’ve covered only a few features and URL parameters.
For more details, check the VDO Ninja Documentation.

Peter Vágner reshared this.

Do you know that you can use Subtitle edit to transcribe audio? It has a relatively accessible guy so you can use Purfwiev's faster whisper xxl, cpp, cpp cublas, const-me. Longer post how to use it follows:

Installing Subtitle Edit


Download the program from the developer’s website. Navigate to the level 2 heading labeled “Files.”
If you want to install Subtitle Edit normally, download the first file, labeled setup.zip.
There is also a portable version available, labeled SE_version_number.zip.

If you decide to use the portable version, extract it and move on to the next section of this article. The installation itself is standard and straightforward.

A Note on Accessibility


NVDA cannot automatically obtain focus in lists.
To find out which item in the list is currently selected, move down with the arrow key to change the item, then press NVDA+TAB to hear which one is focused.

Initial Setup


  • In the menu bar, go to Video and activate Audio to text (Whisper).
  • When using this feature for the first time, the program may ask whether you want to download FFMPEG. This library allows Subtitle Edit to open many audio and video files, so confirm the download by pressing Yes.
  • Subtitle Edit will confirm that FFMPEG has been downloaded and then ask whether you want to download Purfwiev’s Faster Whisper – XXL. This is the interface for the Whisper model that we’ll use for transcription, so again confirm by pressing Yes.
  • The download will take a little while.
  • Once it’s complete, you’ll see the settings window. Press Tab until you reach the Languages and models section. In the list, select the language of your recording.
  • Press Tab to move to the Select model option, and then again to an unlabeled button.
  • After activating it, choose which model you want to use. Several models are available:
    • Small models require less processing power but are less accurate.
    • Large models take longer to transcribe, need more performance and disk space, but are more accurate.
      I recommend choosing Large-V3 at this step.


  • Wait again for the model to finish downloading.


Transcribing Your First Recording


  • Navigate to the Add button and press Space to activate it.
  • A standard file selection dialog will open. Change the file type to Audio files, find your audio file on the disk, and confirm.
  • Activate the Generate button.
  • Now, simply wait. The Subtitle Edit window doesn’t provide much feedback, but you can tell it’s working by the slower performance of your computer—or, if you’re on a laptop, by the increased fan noise.
  • When the transcription is done, Subtitle Edit will display a new window with an OK button.


We Got Subtitles, So One More Step


In the folder containing your original file, you’ll now find a new file with the .srt extension.
This is a subtitle file—it contains both the text and the timing information. Since we usually don’t need timestamps for transcription, we’ll remove them in Subtitle Edit as follows:

  • Press Ctrl+O (or go to File → Open) to bring up the standard open file dialog. Select the .srt file you just got.
  • In the menu bar, open File → Export → Plain text.
  • Choose Merge all lines, and leave Show line numbers and Show timecode unchecked.
  • Press Save as and save the file normally.

If you’re transcribing multiple recordings, it’s a good idea to close the current subtitle file by starting a new project using Ctrl+N or by choosing File → New.

Conclusion


Downloaded models can, of course, be reused, so future transcriptions will go faster.
In this example, I used Purfwiev’s Faster Whisper. If you want to use a different model, you can select it from the model list, and Subtitle Edit will automatically ask whether you’d like to download it.

Peter Vágner reshared this.

Peter Vágner reshared this.

I decided to write a post where I talk about my experiences finding work as a blind person and attempt to give some general advice to blind people who are either looking for work or looking for a position that better aligns with their goals or values. I'm not sure why the strange URL; hopefully it doesn't cause problems. mikegorse.substack.com/p/4834h…

Peter Vágner reshared this.

Peter Vágner reshared this.

On my AMD Ryzen 7 8845HS mini PC, NVDA is a bit sluggish in some cases in Firefox; e.g. cursoring through messages in Gmail folders. For reasons I don't fully understand, setting the processor affinity to a single CPU core and setting the process priority to "above normal" helps significantly, even when the CPU is nearly idle. I don't currently have the time/energy to debug the root cause for this or write a proper add-on, but I wrote an NVDA global plugin to make the change for me automatically when NVDA starts. If it breaks something, you get to keep all the pieces.
```
import ctypes

import globalPluginHandler

class GlobalPlugin(globalPluginHandler.GlobalPlugin):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
p = ctypes.c_void_p(ctypes.windll.kernel32.GetCurrentProcess())
ctypes.windll.kernel32.SetProcessAffinityMask(p, ctypes.c_void_p(1))
ctypes.windll.kernel32.SetPriorityClass(p, ctypes.wintypes.DWORD(0x00008000))
```
#nvdasr

This entry was edited (3 months ago)

reshared this

Peter Vágner reshared this.

Sending and receiving money has just become a whole lot easier!

Instant payments are now available to everyone in the eurozone.

⚡ Instant transfers 24/7, no waiting days for your money
💰 No extra fees, same price as regular payments
🔍 Free payee verification, ensuring IBAN and name match before sending
🛡️ Safer payments with daily checks to help prevent fraud and sanctions risks
🏦 More access, not just for banks but also fintechs and e-money institutions

For faster, safer payments than ever!

reshared this