There's a new product that has been gaining some buzz in the blind community, a Windows app called Guide that uses AI to perform tasks on your computer. It's pitched as a way to get around web accessibility problems in particular. I won't link to the thing itself, because I don't want to give it that validation, but I'll link to a previous discussion thread about it: fed.interfree.ca/notes/a5wf4ys…

I've spent some time taking this app apart. The level of shoddy work here is deeply disgusting. 1/?


Final update: The developer is now on Mastodon via @andrew_guide.

Update: The developer has removed the ability to download Guide until the security issues mentioned in the linked thread are fixed.

Update: this product contains some code flaws that are concerning from a security perspective, beyond just giving control of your computer to an LLM. You might want to read this thread before installing the product: toot.cafe/@matt/114258349401221651

Update: I've exchanged some long emails with Andrew, the lead developer. He's open to dialogue, and moving the project in the right direction: well-scoped single tasks, more granular controls and permissions, etc. He doesn't strike me as an #AI maximalist can and should do everything all the time kind of guy. He's also investigating deeper screen reader interaction, to let AI just do the things we can't do that it's best at. I stand by my thoughts that the project isn't yet ready for prime time. But as someone else in the thread said, I don't think it should be written off entirely as yet another "AI will save us from inaccessibility" hype train. There is, in fact, something here if it gets polished and scoped a bit more.

Just tried guide for fun. It's supposed to be an app to use #AI to help #blind folks get things done. I asked "Where are the best liver and onions in Ottawa?" It:
1. Decided it needed to search the web.
2. Thought that the "stardew access" icon on my desktop was a kind of web browser, so clicked it.
3. Imagined an "accept cookies" dialogue it needed to accept.
4. Decided that didn't work, so looked for Google Chrome (I don't have chrome installed on that machine)
5. Finally opened edge from the start menu. By the way, it just...left Stardew open and running. Because apparently having Stardew Valley running in the background is a vital part of finding liver and onions in Ottawa.
6. Opened a random extension from my edge toolbar (goodlinks).
7. Clicked the address bar and loaded google.com, instead of just doing the search right from the address bar.
8. Got blocked because it couldn't sign into my Google account, even though it could have also searched from the Google homepage.

To be fair to AI, that was the kind of open-ended task AI is terrible at. If I had asked it to check an inaccessible checkbox, or read a screenshot, or something, I'm sure it would have been fine.

Anyway, I'm still better at using a computer than an AI. So is my 87 year old grandfather, for that matter. www.guideinteraction.com


reshared this

in reply to Matt Campbell

First, it's an Electron+Python monstrosity. Specifically, the Python backend runs as a web server on the local machine, and the Electron frontend connects to that local web server. Along with the size of Electron itself, the frontend app is about 27 MB, mostly a node_modules tree with no hint of tree-shaking / dead code elimination. The front-end JavaScript code is not minified at all, so once you extract the .asar file, it's easy to look at it. 2/?
in reply to Matt Campbell

But now let's talk about the Python backend. The first obvious question, of course, is what AI model it's using, and whether the inference is done locally or remotely. It's using Claude 3.7 Sonnet with its computer use feature. But here's the really crappy part: the connection to Claude, and to other services like Azure Speech and ElevenLabs (yes, both), is happening on the user's machine, using API keys embedded inside the application. 4/?
in reply to Matt Campbell

To spell it out, the problem with directly connecting to third-party services using API keys inside an application running on a user's machine is that you're just begging to have someone steal those keys and run up your bills. Without having your own server in the mix, there's no hope of reining in that usage of third-party services and tying it to some kind of authorization system. They do have an API server (on Azure) for the license/subscription, but as I said, that's easily circumvented. 5/?
in reply to Matt Campbell

The Python backend is packaged using pyinstaller. There's 30 MB (compressed) of Python bytecode in the executable, and then there's also an "_internal" directory with tons of dependencies, adding up to about 200 MB (uncompressed), again with no apparent attempt at eliminating dead code in the package. I readily admit that I'm perhaps overly obsessed with trying to make non-bloated software, but come on. 6/?
in reply to Matt Campbell

It wouldn't be right for me to knock the product for the bloat alone. But taken together with the direct use of third-party services in the app on the user's machine, and the actual functionality problems detailed in the thread I linked to, the whole thing smells of something hastily cobbled together to catch a ride on the AI hype train. If this is the accelerated future of software development that businesses want, then as I said, it's deeply disgusting, and kind of scary. 7/7
in reply to Matt Campbell

Perhaps I need to more explicitly call out what is actually the scariest part here: if you use this product, you're letting an application take control of your computer, using the output of a large language model as input. I know better than to describe an LLM as "just" a next-word predictor, because we've all seen how surprisingly powerful that can be. But still, it's all too common for LLMs to output things that don't make sense, especially when venturing outside their training.

victor tsaran reshared this.

in reply to Alex Hall

@alexhall Right? People are already getting utility from tools that can suggest shell commands, allow the user to verify them, and then run them. This app feels quite lazy in comparison.

How about something which tells me what has keyboard focus when my screen reader doesn't know, but otherwise lets me interact with the application myself? Integration via an NVDA add-on to auto-label controls? Local macros based on initial LLM integration? @matt

in reply to Matt Campbell

And yes, some of my early work as a young programmer could have been skewered like this. Admittedly I'm saying this from the safe distance of 20 years or so, but honestly, it should have been. College certainly didn't properly slaughter my ego as it should have, as @bcantrill discussed in his "Coming of Age" talk (youtube.com/watch?v=VzdVSMRu16…). As long as we don't sink to personal attacks and focus on substantive problems, I think healthy public criticism of publicly released work is OK.
in reply to Matt Campbell

Having exchanged emails with Andrew, GUide’s developer, I really don’t think this is a cash grab. He strikes me as well intentioned. I was unaware of the coding issues, or that it runs an unsecured web server on the users machine. From what you can see, is there anything to stop another rogue app from connecting to the web server and telling Claude to execute stuff on the user’s machine? I was only really evaluating this from a functionality perspective. Though I was somewhat surprised that some of the changes I proposed in my email weren’t things he’d thought about previously. This feels to me like good ideas that didn’t get enough time in the oven, combined with lack of experience and good intentions. The telling thing, to me, will be what happens over the coming weeks. Will this go back in the oven for more cooking? Will we see the changes in safety and security that are needed?
in reply to Matt Campbell

Indeed. Things that are quite important to mention I think: 1. its paid, currently $8 per month, but its going to be more in the future as soon as it gets established (and seriously, the VI community currently takes all that is labeled with AI and aimed towards them) and 2. I don't want to know how much of this was actually developed by AI instead of a real dev thinking about consequences.

Matt Campbell reshared this.

in reply to Matt Campbell

But this is exactly what they want, this kind of program is exactly what the business is and the corporations, no matter how much of a coding monstrosity it is. and this is exactly what so called AI developers output. I would encourage you to read publications like blood in the machine for some actually salient Tech criticism. I hate to say it, but this kind of program, this kind of coding monstrosity, is exactly what the industry wants and it’s exactly what the industry is pushing and has been for years. Absolutely none of it is coded in an elegant way: and this is just yet another example out of a few Dozen I can think of
in reply to Matt Campbell

What do you think of VOCR and TypeAhead? They seem like much better implementations of a similar concept. VOCR uses OCR and other (non-LLM) techniques to make an inaccessible application navigable, using its own commands to move between and activate controls. In my experience its definitely better than screen recognition on iOS. You can use an LLM to ask questions about an image or about the screen, or you can have an LLM divide the screen into areas and describe them separately, but these are just additional features. TypeAhead can use an LLM to move the VoiceOver focus to a specific element, no matter where it is in the application, and it can also directly perform tasks on your computer. However, for multi-step tasks, it always says what it is doing before it does it, and you can cancel it if it starts doing something you don't want. There are privacy concerns, because it uses GPT 3.5 or GPT 4, so I wish it supported ollama or lmstudio, but overall I like it. You can also record your own macros and tell it to execute them. It has a free plan, with GPT 3.5, and a $15 a month plan with GPT 4. Both of these applications only support MacOS currently though.
in reply to Matt Campbell

Hey Matt, developer of Guide here. Thanks for taking the time to dive deep into Guide and share your analysis. You're right to call out the security issues. Right now, my main priority is tightening security, particularly around restricting local server connections, and I've temporarily pulled download links until these fixes are fully implemented. Expect an update in the next few days addressing this specifically.