Guide - AI assistant for people who are blind or low-vision

By Tara, 30 March, 2025

Forum
Windows

Hi all,
I've recently been introduced to Guide, an AI Windows app which takes over your keyboard and mouse and does things for you on your computer.
https://www.guideinteraction.com/
It's $8 a month at the moment with a seven-day free trial, and I'm keeping this around. The below might be a bit long but, here goes. Note that asking it to do something which requires multiple tasks without clear instructions doesn't work well. It goes all around the houses and gets things wrong.
A. I went to the Amazon fiction best sellers page, Then opened the Guide app, and asked it what was number 6 on the bestseller list. After you've asked it your question, go back to the webpage or app you want to work with, and Guide will then take screenshots of your computer and either start scrolling to read stuff or click if necessary. It got the answer right. I then told it I wanted to buy the Kindle version of this book which was the version displayed on the best seller page. So it opened the book page, clicked the 'buy now' button, and I got an email confirmation page from Amazon confirming my purchase. Amazon is accessible, so I don't really need to do this, I was just testing to see if it would get it right. it tells you what it's doing, so if you really don't like the path it's going down, you can hit CTRL + Alt S to stop it, this is a global hotkey. You can then regroup so to speak, and give it further instructions to get it back on track.
B. The ChatGPT Windows app and webpage no longer have an easily accessible way to rename chats. So, I opened the ChatGPT Windows app, opened a chat from my chat history and told Guide something like, 'I want to rename this chat to language practice'. It went around the houses a bit first, it was trying to work out where the 'rename' function would actually be. It decided to click on the chat name in the sidebar, I've now learned that all those links in the chat history are part of the sidebar. And when it hovered over the sidebar with the mouse, the context menu with options like 'delete', 'archive', and 'rename appeared. NVDA started reading this menu out. So then it clicked on the 'rename' button, deleted the name already there, and typed in the new one, and pressed Enter to confirm. And the new name was there. I tried this several times with varying results. Sometimes it worked, sometimes it would leave the old chat name in the 'rename' field and then add the new name onto the end of it. But I've discovered if you give it clear, numbered instructions on how to do a task, it'll complete things much more efficiently. For example:
1. Click on the chat name in the sidebar.
2. Click on the menu with 3 dots.
3. Click on the 'rename' icon.
4. Select all the text in the field with CTRL + A, then delete it.
5. Now type 'language practice' and press Enter to confirm.
If you don't want Guide to do the whole thing, because the only difficult part is actually getting to the menu, you could ask:
1. Click on the chat name in the sidebar.
2. Look for the menu with three dots and hover over it.
3. Wait 2 minutes. Note that Guide automatically takes focus back to the Guide window when it completes a task. But this can mean that the menu or whatever you're working with goes away completely when you go back to the app window. This is what happens with the ChatGPT app. So now I can press Enter on the 'rename' button, then it’ll put me in the edit field, so I can type the new chat name and press Enter to confirm. Asking it to wait 2 minutes gives me time to activate the button and type what I want, but Guide did tell me it couldn’t use the wait function or something, but by the time it worked out it couldn’t use it and was telling me it was looking for another approach, I’d finished.
C. I tried this on the British Airways High Life Cafe website, and in short, it didn't work. I wanted to order food for my flight home, and I asked it to search for a coke. I just opened the homepage, since that site isn't great with a screen reader, and in short, it couldn't find the coke. It went to the 'soft drinks' category, then the 'fizzy drinks' category, great, but it didn't seem to grasp that it either needed to scroll to the next page and the next to find the coke, or it could choose how many results appeared on a page, making the selection process a bit easier. It ended up trying out all sorts of things, until I got totally the wrong things in my basket, so I deleted the things from my basket and started again. The good thing it did do for me was describe the cafe, all the different categories, which I'd never really looked at before, and from that I just selected each category, and found the things I wanted a lot more easily anyway. When I went to pay, I needed to select my flight number, and there's a fiddly dropdown menu, and scrolling through took it ages, too long. There are a couple of hundred options there I'd say. In the end I just did it anyway since it was quicker.
Notes
This is what the instructions say about how it works.
How it works: When asked to complete an action, Guide takes control of your keyboard and mouse. To understand the state of your computer, it takes screenshots of your computer and sends them to the cloud for processing. Your data is not used for any model training purposes.
Hope you enjoy. Edited to add that after writing this, I went and found out how to hover over something with the mouse using NVDA. Trying this worked, it's NVDA key + Shift + M after finding the item you want to hover over. Make sure 'focus moves navigator object' is on by pressing NVDA + 7.

Options

Comments

By Ash Rein on Sunday, March 30, 2025 - 00:24

This is very interesting. Even if it’s not getting it 100% of the time it’s at least giving us a taste of what it would be like in a few years. At some point we might not even need to necessarily use keyboard commands unless we want to. We could just tell the computer what to do and it’ll do it. I’m more excited about the potential than anything else.

By Tara on Sunday, March 30, 2025 - 01:24

Apparently if you put your screen reader in sleep mode it works better. I tried this, but I still like knowing where Guide is on the screen when NVDA reads stuff out. The interesting thing is that if I give it an instruction, and it can't find something, or the results are unexpected, it'll tell me, you asked for X, but I can't find X, I can only see Y and Z. It's not good at starting a task from scratch though. So I can't say to it, I want to buy X or Y Kindle book from Amazon if I don't have Amazon open. Somebody on Mastodon gave it a task from scratch, no apps open apart from Guide, and it just completely flunked it. And of course, because it's processing everything in the cloud, it's a lot slower than someone sighted clicking on stuff with a mouse, or a screen reader user using a computer efficiently. I'm still reeling from the fact it both got me to a menu and I could either control it if I wanted, or it could go into said menu and do exactly what I wanted with my instructions. What I like about it is that it tells me all the steps it's going through, so that when it does get something right, I can then give it the correct steps later as a list of instructions if I want to perform the same task. Edited to add that I've now found out how to hover over something with the mouse using NVDA and it works in this particular instance. But I still like Guide though, because it told me what it was doing, and gave me the idea to investigate how to do something with NVDA. I forgot about this command.

By sebno on Sunday, March 30, 2025 - 21:24

i've tried few unaccessible applications like revo uninstaller and a vst instrument in reaper. with a clear prompt it performs well, but slowly :)
,and combined with golden cursor it could point and save coordinates to click a button or else.
for the vst i've asked to select an instrument in the window and set attack time and reverb lenghth.
it has performed the three steps with comments, pretty fun and a lot of potential.
thanks for the info

By Tara on Sunday, March 30, 2025 - 22:24

Wow, glad you got it to do something that really is inaccessible. I've heard a lot of virtual instruments aren't great with screen readers. The thing it's really slow at is scrolling, especially through drop down boxes and navigating through a lot of content on webpages. It's still early days though. I'll ask on the Slack channels about scrolling.

By Prateek Dujari. on Monday, March 31, 2025 - 20:24

Hi, thx much to Tara for this info. Seems from her very useful testing that Guide is too low in capability at the moment for me to patronize it. I mean, I will right away reject an AI solution like this if I'm required to spell out each step I want it to perform before it can do anything reliability and consistently. However, looking at the trend with AI apps/tools, I expect Guide to only improve and become more intelligent and capable not requiring me to handhold it like it's a child.
You may be aware that ChatGPT pro subscribers already have Open AI's CUA computer using agent AI tool called Operator available, which is well more capable than Guide for this type of task i.e. receive a request from user to find something on the web and accomplish some goal. You do not need to painfully spell out each step as you may need to do with Guide. I'm a ChatGPT plus subscriber and expect Operator to become available to plus subscribers in the near future without any additional cost. this has been case... new features are first introduced to pro subscribers and within few months they appear for plus subscribers.

By Tara on Tuesday, April 1, 2025 - 00:24

Hi,
Yes, maybe the operator feature will come to the plus version. I've got the plus version too. I've actually started saving some Guide prompts, one for solving and typing in the characters for a CAPTCHA, so I can just copy it into the chat when I need it. But yes, it would be nice if it just took the initiative and just did stuff, rather than having to give it detailed instructions. Hopefully in future versions, this won't be the case. It's using Claude 3.7's computer use thing. Shame it's not using the ChatGPT operator. But maybe it's too expensive for the developer to implement.

By Brad on Tuesday, April 1, 2025 - 05:24

They're wanting $8 a month so it's not enough for a chat gpt subscription.

At least I don't think it is, I've not used the survice in a long long time.

By Tara on Tuesday, April 1, 2025 - 05:24

ChatGPT plus costs $20, and the pro version, the version with operator costs $200 at the moment. But according to their blog post about it, they're planning to roll out operator to other teers such as plus and enterprise in the future. I've asked the developer about this, but I'm sure he'll say it's just too expensive to run at the moment and maybe the API for it isn't even available yet.

By Tara on Tuesday, April 1, 2025 - 17:57

Hi,
The download functionality has been taken away so the developer can work on some security fixes. Edited to add he'll be updating people with more info in the next few days.