Skip to main content


I hear that Jaws has image recognition from ChatGPT and Gemini now, and I have some thoughts.
First of all, I'm not a Jaws user, haven't touched it in almost fifteen years. I'm not here to bash Jaws... not this time, anyway.

First of all, it's (theoretically) really great that we now have a relatively lowest common denominator type way to access ML-driven image recognition. Jaws itself is still quite the Barrier to entry for many.
Someone pointed out that there is no need to insert API keys and such. Yes, also great for the masses.
Here's where things get interesting, though.

Someone is paying for that API access, at least in the case of ChatGPT. What are the limits? If they exist, are they reasonable limits? Are all Jaws users sharing the same tokens?

Also, MMM, more yummy metadata for both Google and OpenAI to chew on.
I bet government agencies will have a field day with this. I'm sure they totally want potentially sensitive stuff to find it's way between the gently smiling jaws of these large corporations that only have our best interests at heart. Can you just feel the sarcasm here?

As you might be able to tell, I'm not such a fan of much of what surrounds the AI hype, and this brings up a point that keeps coming up. What are we trading for access? Furthermore, are we willing to sell our souls to read a thermostat?
In some cases, yeah, probably.
It's really hard not to turn this into an "I HATE EVERYTHING" post right now. We're in a place where so much potential exists, and most of it is powered by evil bastards.

In a few years, when most modern PCs have neural processing units (NPUs) built-in, replacing GPUs for ML processing, models with still ethically dubious sources can be easily run on local resources, and the Jaws download is suddenly 16GB, will this conversation be where it is now? Who knows?

in reply to Noof E. Noof

It's not about selling your soul, at least in this particular case. If you don't want, you don't show it images, that's it. But imagine a situation where a blind person lives alone (or a couple where both are blind), and they do need to recognize that darned image. Would you prefer to have a two-clicks way to do that? I'd still prefer to have it, although I agree with many of your points.
in reply to Noof E. Noof

One more point. If Be My AI and FreedomScientific can strike a deal with Open AI re fuss-free ChatGPT access, why can't @NVAccess do it? Maybe it's a matter of time - who knows? But NVAccess should also work towards the realization of that goal in order to keep NVDA more relevant.
#accessibility
in reply to Amir

@amir @NVAccess So I think LLM integration should actually be the last thing on NV access is mind and so many of the programs/website infrastructure is completely inaccessible, but blind people unquestioningly by into all of the AI hype so, despite my wishes anyway, I think unfortunately this will be integrated at some point
in reply to Robert Kingett, blind

@weirdwriter @amir @NVAccess Nah, I'm glad NV Access is handling the core screen reader; let addons do all the fansy stuff, especially now that we have an addon store.
in reply to Devin Prater :blind:

@pixelate @weirdwriter @amir The key thing to look at with any new technology is what real-world benefit there is. In this case, image description could be an option. There are and have been various add-ons for NVDA to provide this functionality. We are definitely watching developments and how emerging technologies might best work for end users.
in reply to Charlotte Joanne

@Lottie @amir Oddly this didn't come through as a reply to my post, but I'm guessing it was? Not so much waiting for the right time - more we aim in general to keep abreast of changes in techn. It depends on your exact goal but the best use case we envisage for Chat GPT or similar in NVDA would be image description. In fact, there is already not one but two add-ons for NVDA using different technologies: "AI Content Describer", which uses GPT4, & "XPoseImage Captioner" which uses Blip.
in reply to NV Access

@NVAccess @amir I know they’re both fantastic! I thought you meant you were waiting to see whether to make this a core feature.
in reply to Charlotte Joanne

@Lottie @amir We probably wouldn't make it a core feature - not at this stage anyway - it's relying on a technology we have no control over (what if they shut down that service tomorrow?) and it would mean directly sending data from users to a third party which would have privacy concerns for a lot of users. We have been through what, half a dozen different "services" for image description in various add-ons - absolutely a really useful feature, but a bit too changeable for core for now
in reply to NV Access

@NVAccess @Lottie But they require subscriptions which might prevent most users from gaining access to them. What JAWS and Be My AI provide is moving the burden of subscriptions away from users so that they can access AI for image descriptions - and follow-up questions - directly. Given the nature of NVAccess and its core philosophy, this rhymes well with what NVDA offers to the masses.