Final update: The developer is now on Mastodon via @andrew_guide.

Update: The developer has removed the ability to download Guide until the security issues mentioned in the linked thread are fixed.

Update: this product contains some code flaws that are concerning from a security perspective, beyond just giving control of your computer to an LLM. You might want to read this thread before installing the product: toot.cafe/@matt/114258349401221651

Update: I've exchanged some long emails with Andrew, the lead developer. He's open to dialogue, and moving the project in the right direction: well-scoped single tasks, more granular controls and permissions, etc. He doesn't strike me as an #AI maximalist can and should do everything all the time kind of guy. He's also investigating deeper screen reader interaction, to let AI just do the things we can't do that it's best at. I stand by my thoughts that the project isn't yet ready for prime time. But as someone else in the thread said, I don't think it should be written off entirely as yet another "AI will save us from inaccessibility" hype train. There is, in fact, something here if it gets polished and scoped a bit more.

Just tried guide for fun. It's supposed to be an app to use #AI to help #blind folks get things done. I asked "Where are the best liver and onions in Ottawa?" It:
1. Decided it needed to search the web.
2. Thought that the "stardew access" icon on my desktop was a kind of web browser, so clicked it.
3. Imagined an "accept cookies" dialogue it needed to accept.
4. Decided that didn't work, so looked for Google Chrome (I don't have chrome installed on that machine)
5. Finally opened edge from the start menu. By the way, it just...left Stardew open and running. Because apparently having Stardew Valley running in the background is a vital part of finding liver and onions in Ottawa.
6. Opened a random extension from my edge toolbar (goodlinks).
7. Clicked the address bar and loaded google.com, instead of just doing the search right from the address bar.
8. Got blocked because it couldn't sign into my Google account, even though it could have also searched from the Google homepage.

To be fair to AI, that was the kind of open-ended task AI is terrible at. If I had asked it to check an inaccessible checkbox, or read a screenshot, or something, I'm sure it would have been fine.

Anyway, I'm still better at using a computer than an AI. So is my 87 year old grandfather, for that matter. www.guideinteraction.com


There's a new product that has been gaining some buzz in the blind community, a Windows app called Guide that uses AI to perform tasks on your computer. It's pitched as a way to get around web accessibility problems in particular. I won't link to the thing itself, because I don't want to give it that validation, but I'll link to a previous discussion thread about it: fed.interfree.ca/notes/a5wf4ys…

I've spent some time taking this app apart. The level of shoddy work here is deeply disgusting. 1/?


This entry was edited (13 hours ago)

Tissman reshared this.

in reply to Robert Kingett

@WeirdWriter Maybe not. But I’m not suffering under the delusion that the Luddite’s and anti AI crowd are my friends anymore than the gross capitalist growth at all costs tech bros are. As blind folks we have to take what we can from both. Because both will screw us if it gets them an inch closer to their goals. So I keep the AI that actually works off the fediverse.
in reply to 🇨🇦Samuel Proulx🇨🇦

I will just leave this here, from a blind developer toot.cafe/@matt/11425836163340…


First, it's an Electron+Python monstrosity. Specifically, the Python backend runs as a web server on the local machine, and the Electron frontend connects to that local web server. Along with the size of Electron itself, the frontend app is about 27 MB, mostly a node_modules tree with no hint of tree-shaking / dead code elimination. The front-end JavaScript code is not minified at all, so once you extract the .asar file, it's easy to look at it. 2/?

in reply to James Scholes

@jscholes@Jage On a more serious note, I think the interface as presented is just way, way too generalist, and its freedom too unrestricted. Things I'd like to see:
1. it gets prompted with the name of the currently focused app, and the mouse is disallowed from leaving that app window.
2. Bringing it up with alt+ctrl+g gives prompts of what tasks it thinks it could perform inside the current app. Instead of the current general "ask me to do anything!"
3. It should have access to the DOM for browsers so it's not taking screenshots and acting only on that all the time if it's being asked to do something on a webpage.
4. It really, really needs training on NVDA focus mode. It says to turn off speech, but that doesn't solve for it trying to type into an edit field when NVDA isn't in focus mode. It does this constantly.

I still don't think it'd be ready for prime time, but it would be closer.

in reply to starkraving666

@starkraving666 It's useful for a lot of things. Narrowly defined tasks like "describe this image" or "What is this control" or "what does this say" especially. But we're trying to create giant mega-models with all the data instead of tiny specialized models with the right data. I don't need the same AI that gives me alt-text for images I can't see to also write me python code and draw me a unicorn and tell me a story. If I want those things (and sometimes I do), they should be different specialized AI's that just do that one thing. It's okay if the Python-writing AI can't read me the menu when I go out to eat.
in reply to aburka 🫣

We can hear exactly what it’s doing with our screen readers. And there is a stop hot key. I’m not concerned about safety. I’m more concerned about privacy, and correctly setting expectations. I work in tech, and I spend at least a couple of hours a day struggling with inaccessible apps and websites, and have to have a co-worker do certain tasks for me. Anything that can make this better is of interest to me. Sure, everyone should build accessible apps. But for the most part, nobody bothers. But I still have to work. However, solutions like this need to be strictly sandboxed to the app I actually need help with, not given access to the entire computer. Not only is this more secure, private, and safer, it also works better. The AI doesn’t need to get distracted by the fact that I have BBalatro installed, or that a windows notification popped up. Just give it the current window.Also, the examples for use need to be more like “Ask me to check a keyboard inaccessible checkbox for you” or “I can help you read an image without alt text”. The current examples are more like “Ask me to book a flight for you”. In its current form, AI can’t do that. The developer has strongly committed to fixing both of these things.
This entry was edited (2 days ago)