[New Add-on] Vision Assistant Pro: Your Interactive AI Copilot for NVDA (Powered by Gemini)

By mahmood, 2 December, 2025

Forum

Windows

Hello AppleVis community,

I am Mahmood, and I am excited to share Vision Assistant Pro, a new open-source add-on for NVDA. This tool is designed to bring the intelligence of Google Gemini directly into your screen reader to solve digital challenges that usually require sighted assistance.

It is completely free to use (with your own API key) and focuses on interactivity rather than just static descriptions.

🌟 Key Features:

👁️ Interactive Vision (Object & Full Screen): Unlike standard OCR that just reads text, this feature lets you "see" and "ask."
- Object Vision: Take a snapshot of the specific control (icon, button, image) under your navigator cursor.
- Full Screen Vision: Scan the entire screen layout.
- The Best Part: After the initial description, you can chat with the AI. You are not limited to one description; you can ask follow-up questions like "Is there a save icon?", "Describe the chart in detail," or "What color is the button?"
🧠 Smart Translator (Auto-Swap): Instantly translates selected text. It creates a seamless bilingual experience by automatically detecting languages. If the source matches your target language, it intelligently swaps them.
🎙️ Smart Dictation: A powerful voice typing tool. It doesn't just transcribe; it listens, fixes your grammar, removes stutters (ums and ahs), adds punctuation, and types the polished text directly into your active window.
🔓 CAPTCHA Solver: Struggling with visual codes? Press a shortcut, and the AI will solve the math or read the characters and automatically type the result for you.
📄 Document QA: Have a PDF, TIFF, or text file? You can "chat" with your documents. Ask the AI to summarize them, extract specific data, or explain complex sections.

🛠️ Requirement: You need a free Google Gemini API Key to run this add-on.

📥 Download & Installation: You can download the add-on directly from GitHub:

Download Vision Assistant Pro v2026.07.15 (Direct Link)

Just open the downloaded file and confirm the installation in NVDA.

🔗 Project Source Code: https://github.com/mahmoodhozhabri/VisionAssistantPro

I developed this to help our community become more independent. I would love to hear your feedback and suggestions.

Best regards, Mahmood

Options

Comments

Getting it to work

So I just installed the addon from the addons store, pasted in the API key and nothing seems to happen. When I tried to perform OCR on a PDF NVDA just says not connected.
I tried creating a new project in Google AI studio, same result.
Any idea on how to get it working?

@Nut

not connected? There is no such message in the plugin! Please check again.

@Stefan

Hello. Bulgarian has been added to the new version, 2.6. Enjoy.

Need help getting API key

I feel like I missed the first week of A.I. class and showed up on quiz day. When I go to Google Studio and hit the get API key button, it goes to an empty list of keys for imported projects. When I then hit the create API key, a modal pops up requiring a name and imported project--a combo box that's empty, with the "create" button disabled. Tried in Chrome and Firefox. Is this expected, and I have to create a project, or are browser extensions getting in the way, or what?
Thanks!

translate

It translates one sentence at a time and I have to press the key combination every time, which is not convenient. Make it like the nvda translator to turn on and off, instead of pressing a key combination for each sentence.

Re: Translate

Have to agree; real-trime translation of this quality is just what the doctor ordered. Question is, is it feasible?

@mantanini

Hello. That's not the case! Unless you are translating in the browser! To do this, place the text in the clipboard and press 'y' instead of 't'.

Congrats, good project!

I would recommend not implementing automatic updates or any update system directly in the add-on, as this is done automatically by the NVDA add-on store, and users can enable or disable automatic updates there.

API key missing

Thanks for this most indispensable add-on, i just used the OCR to describe my full screen and got a very nice description, i tried it again a minute later to get a full screen description. it says API key missing. what's up with that? is there a quota limitation of how much you use the add-on or am i missing something. and by the way i am using the very latest version of this add-on

suggestions

Hi Mahmood! I would like to congratulate you on the excellent add-on. Keep up the good work! I have some suggestions that would make things much easier: the possibility of having a screen to place multiple prompts and then being able to switch with a shortcut key. Sometimes, I want to know what's in a photo, then something in a game, and it would be useful to already have these prompts predefined by the user and just keep switching. Another suggestion would be an option so that when it showed the result, it would just speak, without showing a dialogue or window. It's quite useful to know information quickly without having to switch windows. Thank you!

please read

Dear Users,

Please download and install the latest build of the add-on from the project's GitHub page, which is linked in the original post. It appears that a significant portion of users are still running outdated versions.

Due to time constraints, my ability to visit this thread frequently to respond to comments is limited. For any issues or suggestions, please raise them directly in the Issues section of the project's GitHub repository.

Best Regards,

Mahmood

Post updated

Hello everyone. The download link has been updated to version 3.1.0. Enjoy.

Errore 400 in trascrizione audio di .mp3

Salve e complimenti per questo fantastico Add-On. Segnalo che, quando effettuo una trascrizione audio di un file .mp3, viene notificato l'errore 400, con richiesta di controllare la propria chiave API, che ovviamente è corretta. Ho provato a cambiare l'estensione del file audio ad esempio in .wav e la trascrizione funziona, dunque ritengo dipenda dal modo in cui l'estensione .mp3 viene vista dalla chiave Api, spero possiate risolvere, grazie mille.

@Maurizio

Hello, yes, this issue exists and will be fixed soon in version 3.5.0.

Complimenti enormi e richiesta su descrizione video

Intanto grazie per aver risolto brillantemente l'errore 400 sui files .mp3. Ho inoltre trovato davvero meravigliosa e fantastica la funzione per descrivere i video da Url di YouTube ed Instagram. Sarebbe super se si potesse fare anche per X ed altre piattaforme ed anche, se si potesse fare con i video in locale riprodotti sul proprio PC, so che ci potrebbero essere problemi di Privacy, personalmente, se richiesto, sono disposto a concedere tutte le autorizzazioni necessarie, sarebbe davvero meraviglioso. Infiniti e sinceri complimenti e grazie di cuore!!

@Maurizio

Thank you so much for your kind words and feedback, Maurizio! I'm glad to hear the MP3 fix and video descriptions are working well for you.

Regarding your suggestions, I'll be adding support for X (Twitter) very soon. As for local video support, I am currently thinking about it and evaluating the best way to implement it.

Regarding the privacy concerns you mentioned, I want to reassure you that since this addon is open source, everything is transparent and there is no need to worry at all. Your trust and support mean a lot to me—thank you again!

Description Video TikTok

Hello again! I will never stop thanking you for this amazing Addon! Furthermore, I hope he is well, if I am not mistaken his area is Iran and I know what is happening, I sincerely hope that he is as good as possible and that he is safe.
I would kindly ask if it is possible to implement video descriptions also for the TikTok platform, it would be truly incredible, given the efficiency I found for both YouTube, X and Instagram, although on the latter, I found some Download errors, in practice it seemed not to recognize some Urls. But, in any case, it is a wonderful function, can it also be implemented for TikTok? Thank you very much and all the best, with great respect and sincerity!

@Maurizio

Hello, thank you for checking on me. I am well, but internet access has become very difficult, and it's unclear if I will be able to access it in the future. I will try my best. I have never worked with TikTok before. Please send a link to a video. If I can and my internet is stable, I will implement it.

updated

You can get the latest version from the original post.

updated

Hello everyone. The new version of the addon has been released. Support for TikTok videos has also been added!

v4.0.3

Hello everyone.
The new version has been released.
You can use it.

Awesome add-on

This is fantastic! I'd like to donate, but I don't use the platform that is provided. Maybe consider Paypal like NVDA uses? I also appreciate that this is fully accessible to braille users!

First thoughts and questions

I finally got around to trying this add-on, and so far it's excellent! It took me a minute or 2 to figure out the API key thing, but once I got that, I've tried a few modes, and am impressed with the speed and quality of the responses so far.

A few Questions

1. As a low vision user, I noticed that when you bring up the Settings screen for VA Pro, I can navigate through it via NVDA and keyboard, but nothing shows up on the screen. It would be helpful if this window showed up visually, like the rest of the windows do when you use the add-on. Not sure why nothing shows up for the Settings screen.

2. Can there be an option for a dark theme for the windows that appear for low vision users, or those who are light sensitive?

3. When I scan a file or document, does it do the whole thing, like multiple pages, or just the first visual page like when doing standard NVDA + R?

4. When scanning a video link, is there a rough video length we should limit it to? Just trying to figure out how long it would take to process a video. I fed it probably a waaay longer video than I should have, and got a key error. Understandable...

5. Is there a keyboard shortcut to bring up a menu of the add-on's functions directly, in case we forget a specific letter? I know you can do this via the add-on submenu, but maybe a direct menu command to bring us right there?

I think this is all I can think of for now, but I will definitely be playing with this more. I like it though. I'd also love to see a PayPal option for donations, as almost everyone has PayPal.

about donate

Hello everyone, I'm glad the addon has been useful to you. I try to always add more features to the addon. Regarding donations, unfortunately, I am in Iran and do not have access to PayPal. If you would like to support the project and do not have the option to support with cryptocurrency, you can purchase an Apple USA gift card and send it to me via private message. Thank you in advance for your support.

@Scott Davert

hi. I'm glad the addon has been useful to you. I try to always add more features to the addon. Regarding donations, unfortunately, I am in Iran and do not have access to PayPal. If you would like to support the project and do not have the option to support with cryptocurrency, you can purchase an Apple USA gift card and send it to me via private message. Thank you in advance for your support.

@Jesse Anderson

Hello to you, I'm glad the addon has been useful for you.
1. I am blind, but I don't think this section relates to the display, because it uses the default NVDA settings.
2. This also seems to use NVDA settings.
3. The choice is entirely yours. Check it out.
4. I think it's about 10 minutes.
5. You can assign any key to any section in Input Gestures.
Regarding donate, if you would like to support, and if you cannot use crypto, you can purchase an American Apple gift card and send it to me privately on Applevis.

Recognizing input from a scanner.

Hi, wondering if this addon could take input from a scanner OCR it and then drop it in a document or speak it out loud? I'm recommending NVDA to more and more clients and some of them use scanners. Thanks.

update

Hello everyone,

Vision Assistant v4.5 is now officially available! This update brings several significant improvements, including the new Advanced Prompt Manager, full proxy support for all API requests, and an enhanced Document Reader experience.

Download Instructions: Please always use the official download link provided in the original post at the top of this thread to ensure you are getting the latest verified version.

Support the Project: If you find this addon useful and would like to support its continued development, please consider making a donation. Due to international banking restrictions in my region, I am unable to use traditional platforms like PayPal. To address this, I have updated the "Donate" dialog within the addon settings with alternative ways to contribute, including Apple Gift Cards (US region) and Cryptocurrency.

Your support is greatly appreciated and helps keep the project evolving. Thank you all for your feedback and for being part of this journey!

mahmood

I cannot express how much I appreciate this add-on, and yourself for creating it. I have litterally disabled and/or removed at least 5 separate add-ons & applications that had similar functionality, but individually. For example, I had a translation add-on, an OCR add-on, etc.

I installed the 4.5 update from an NVDA add-on notification, and ironically installed it before seeing this post. Anyways keep up the excellent work. This add-on is even outperforming the latest build of Be My Eyes for PC. Which is a shame because BME used to be amazing.

@Brian

Thank you so much for your incredibly kind words! Your feedback truly made my day and is exactly the kind of encouragement that fuels my passion for this project.

It's fantastic to hear that the plugin has been able to consolidate the functionality of multiple separate tools, like translation and OCR, into one seamless experience for you. That was precisely one of the core goals, and knowing it's making such a significant difference in your daily workflow is incredibly rewarding.

And your comparison to Be My Eyes for PC, especially given its previous reputation, is incredibly high praise. This kind of detailed and enthusiastic feedback is invaluable; it not only validates the hard work and countless hours put into development but also inspires me to continue improving and expanding its capabilities.

If you feel inclined to support the ongoing development of this project and my motivation to keep bringing new features and improvements, a donation would be greatly appreciated. Every contribution, no matter how small, helps sustain the effort and passion behind it.

Thank you once again for your wonderful words and for being such an enthusiastic user!

Update Failed

Hello. I just saw there was an update to this add-on this morning, along with an NVDA update. I was able to install the NVDA update, but every time I try updating the Vision Assistant Pro add-on, I get a download failed error. I will keep checking in and retrying the update. Im guessing something simple changed with the NVDA update. Just FYI for now.

re: Update Failed

Hello. When you update an addon through NVDA, there is no difference compared to updating it through the addon itself. The update methods for both NVDA and addons are the same. However, it's strange that you receive an error, whereas you should be getting the message "You have the latest version".

Update Fixed

Thanks for the reply. I did just try the update again, and it went through this time. Thanks again.

update

Hello everyone,
Vision Assistant v4.6 is now officially available!
Download Instructions: Please always use the official download link provided in the original post at the top of this thread to ensure you are getting the latest verified version.
Support the Project: If you find this addon useful and would like to support its continued development, please consider making a donation. Due to international banking restrictions in my region, I am unable to use traditional platforms like PayPal. To address this, I have updated the "Donate" dialog within the addon settings with alternative ways to contribute, including Apple Gift Cards (US region) and Cryptocurrency.
Your support is greatly appreciated and helps keep the project evolving. Thank you all for your feedback and for being part of this journey!

IN-DEPTH DESCRIPTION OF IMAGES INCLUDING TEXT

Hello and again my most sincere, immense congratulations for this Add-On which is becoming more and more indispensable for my daily operations with NVDA. Since for me it is very important to be able to have a complete and detailed description of the images, I would like to suggest the possibility of carrying out this operation: I know that you can use Be my eyes, for example, or cloudvision, which however seems to have had problems, or even Image Describer, but the latter also has problems with the API keys and does not work very well. Thanks to his skill and his undoubted abilities, given the excellent functioning of Vision Assistant Pro, I am sure that he will have no difficulty in implementing this function, so that the images and also the writings contained in them can be described, given that he already does this with the text of the images and the documents, but I would actually like the description of the images themselves to be added, as I described above and it would be wonderful and wonderful if one could then interact with the AI as happens with videos or all the other functions, so as to be able to delve deeper with further questions. For example, I use ImageMagik with the command line to insert my logos in various positions and I have built many scripts for this and, when I have an image, knowing the entire description, I can have the AI tell me in which position I can put my 200x200 png logo, where the space is free in the image in order to make it clearly present, without overlapping it with something or ruining the elements of the image itself. Furthermore, since with its fantastic add-on, you can describe the entire screen, I don't think it will be difficult for you to implement this function. I thank you sincerely and deeply for your wonderful work and I hope you will be able to implement this function, thank you very much again.

Update to version 5.0

Hi everyone,
I’m happy to share that Version 5.0 of Vision Assistant is now live!
The highlight of this major update is the new Multi-Provider architecture, allowing you to switch between Gemini, OpenAI, Mistral, and Groq. Most importantly, it now supports Custom API URLs, giving you full control over your endpoints.
This version was developed under very difficult conditions, mostly locally and offline. Please refer to the main post of this thread for the download link and full details.
On a side note, since I am completely cut off from standard banking like PayPal or Stripe due to international restrictions, I’ve added Apple Gift Cards (US Region) as a donation bridge. For many in this community, this is the most accessible way to show appreciation for the thousands of hours put into this original work. You can send them directly to: [email protected].
Thank you for your continued support!

updated to v5.5.1

Changes for 5.5 (The Automation Update)

AI Operator (Autonomous Control - Shift+A): This is the crown jewel of v5.5. Vision Assistant Pro has graduated from being a passive assistant to becoming your personal AI Operator. It doesn't just describe the screen—it takes command.
- How it works: You can now give verbal instructions to operate your PC. For example, in a completely inaccessible application where your screen reader stays silent, you can press Shift+A and type: "Click on the Settings button" or "Find the search field, type 'Latest News' and press enter." The AI visually identifies the elements, moves the mouse, and executes the task for you.
- Performance Note: This feature is optimized for Gemini 3.0 Flash (Preview), delivering incredibly fast and intelligent responses that can handle even the most complex UI layouts.
- ⚠️ API Usage Warning: Because the AI Operator needs to "see" exactly what's happening to be accurate, it sends a high-resolution screenshot with every step. Please note that frequent use will consume your API quota much faster than standard text-based tasks.
Visual UI Explorer (E): Tired of navigating through "unlabeled buttons"? Press E to activate the UI Explorer. The AI will scan the entire window and generate a list of every clickable element it sees—including icons, graphics, and menus. Simply pick an item from the list, and the AI Operator will click it for you. It’s like having an "accessible layer" on top of any app.
Context-Aware Smart File Action (F): The "F" key has been completely overhauled. It no longer assumes you only want OCR. When you select a single image, it now intelligently asks for your intent: you can choose a Detailed Visual Description to understand the scene or a Structured Text Extraction (OCR) for reading. The menu adapts dynamically based on the file type and your active AI engine.
Core Optimization: We've performed a deep cleanup of the add-on’s internal logic, removing unused legacy functions and redundant code. This results in a leaner, faster, and more reliable experience for all users.

Re: IN-DEPTH DESCRIPTION OF IMAGES INCLUDING TEXT

Hello Maurizio,
Thank you so much for your kind words and for sharing your detailed ideas with us!
I have great news: The exact feature you suggested has just been released in Vision Assistant Pro v5.5 (The Automation Update).
We have completely overhauled the "F" key (Smart File Action). Here is how you can use it for your specific needs:
1.
Activate the Vision Assistant command layer (NVDA + Shift + V).
2.
Press F and select your image file.
3.
The addon will now intelligently ask for your intent. Choose "Detailed Visual Description".
4.
Once the AI provides the initial description, a Chat Dialog will automatically open.
In this dialog, you can interact with the AI exactly as you described. You can ask follow-up questions such as: "I want to insert a 200x200 logo. Based on this image, what are the coordinates of the largest empty space where I can place it without overlapping any important elements?"
Since the AI analyzes the entire image and maintains the conversation history, it can guide you with high precision for your ImageMagick scripts.
Please make sure you have updated to the latest version (v5.5.2) for the best stability. I hope this feature helps you in your daily creative work.
Thank you again for your valuable feedback and suggestions!

Re: updated to v5.5.1

I cannot express how much I enjoy this add-on! I think I have replaced something like 6 add-ons with this one.
Keep up the great work, mahmood. 😄

Excepting input from a scanner or external camera.

HI, question: Is there any way I can connect an external scanner or camera and have vision assistant pro OCR a document? I ask this because as an adaptive tech instructor I work with students and people who are employed and sometimes need to OCR documents from external cameras or scanners. Thanks.

@JohnyTheHess

Hi @JohnyTheHess,

Thank you for this excellent question! This is a very important use case for students and professionals in the accessibility field.

While Vision Assistant Pro does not yet have a dedicated "Scan" button for direct hardware control, you can achieve high-quality results right now using these two methods:

1. **For External/Document Cameras:** You can open the standard **Windows Camera app** (or any camera preview software). Once the document is visible in the live feed, simply press the **'O' key (OCR Full Screen)** in Vision Assistant. The AI will analyze the live screen content and extract the text immediately with high precision.

2. **For Scanners & Files:** If you scan a document and save it as a PDF or Image, simply press the **'F' key (Smart File Action)**. This will open a standard file selection dialog where you can browse to and select the document. The add-on will then perform an advanced AI OCR on the file directly, meaning you don't need to open it in a PDF reader or image viewer first.

As an instructor, your insight is very valuable. Would a simple "Capture from Camera" shortcut be sufficient for your students, or do they typically require advanced scanner controls (like DPI and brightness settings) within the add-on itself?

Thank you for your feedback and support!

Great idea regarding scanners and cameras.

HI Mahmood. Grate ideas that I hadn't even thought of thanks. Yes, just some way to capture and go. Usually, I don't teach all of the other controls unless absolutely necessary and lately it just isn't necessary. So, wow! if that were possible it would certainly be a plus and I appreciate your considering this. Plug in my imaging device, press a key and voila! I have the contents of the document on my screen.
I did not know I could highlight a pdf and just have it recognized? I thought I had to open it. Thanks again for this growing addon which is really putting some amazing functionality into NVDA the screen reader I now recommend first before anything else.

@JohnyTheHess

Hi Johny,

It’s truly rewarding to hear that Vision Assistant Pro has made NVDA your top recommendation! That is the greatest compliment a developer in this field can receive.

Regarding the camera feature: We are definitely on the same page. My goal is to keep it "Point and Shoot"—no complicated DPI or brightness settings. Just a simple shortcut to grab the frame and present the text as quickly as possible.

To clarify the **'F' key (Smart File Action)**: When you press it, the add-on opens a file selection dialog. This allows you to browse and select any image or PDF directly from your computer. The AI then processes it without you ever having to open the file in a separate viewer like Adobe Reader or a photo app. It’s a great way to quickly "peek" into documents.

**Update:** I have just released **Version 5.6**, which introduces a new **"None (Extract Text Layer)"** engine. For searchable PDFs, this engine extracts text locally and almost instantly, which will be a huge time-saver for your students.

Thank you for your continued support and for being such a strong advocate for accessibility!

Best regards,
Mahmood

regarding scanners and cameras.

Thanks for the info on the file dialog. yes, AS far as scanners and cameras you've got it, thanks! One more question, is there a document that describes all the thing this addon can do? I know the help feature is available but there are definitely some things like the automated actions I don't understand. Thanks.

@JohnyTheHess

Hi Johny,

Thank you for the follow-up!

Regarding the documentation, the most detailed guide is built directly into the add-on. You can access it anytime by going to:
**NVDA Menu > Tools > Vision Assistant > Documentation**.

This will open the README file which explains all shortcuts and logic in detail.

To clarify the **"Automated Actions" (AI Operator - Shift+A)**:
Think of it as having a sighted assistant who can move the mouse for you. Instead of just hearing what is on the screen, you can give a direct command. For example:
* "Click on the 'Save' button."
* "Find the search bar and type 'Accessibility'."
* "Navigate through the installation wizard."

The AI analyzes the visual interface of the app, finds the exact coordinates of the element, and performs the click or typing for you. It is a game-changer for apps that have unlabeled buttons or are otherwise completely inaccessible to standard screen reading methods.

I hope this helps! Let me know if you have any other questions as you explore these features.

Best regards,
Mahmood

Documentation.

Yes, it really does! I didn't realize that was available, thanks.

updated to v6.1.0

Hello everyone, the addon has been updated and you can get the latest version from the main post.

🌟 Support the Future of Vision Assistant Pro 🌟
Vision Assistant Pro is more than just an add-on; it is a mission to bridge the gap between AI and true accessibility. Every feature starts as an idea to solve real-world challenges for our community.

The Reality of Development:
Developing and testing a cloud-based AI tool in a region with extreme internet restrictions is a constant battle. To give you some perspective, a stable connection for development and rigorous testing often costs around $5 per gigabyte here.

Furthermore, cloud AI testing is not free. Running comprehensive test cycles for our visual features consumes active API credits—costing upwards of $10 per major test run out of my own pocket, which quickly accumulates over weeks of development.

Why Your Support is Needed:
To date, only three people have donated to this project. While I am deeply grateful for those contributions, more community involvement is essential to sustain the high costs of research, development, and testing infrastructure. Your support provides the resources needed to keep innovating and pushing the boundaries of what's possible.

How You Can Help:
Due to international sanctions, standard methods like PayPal are unavailable to me. I rely on the following alternative methods:

🍎 Apple US Gift Cards:
This is one of the most accessible ways to support. You can purchase them globally with low fees here:
🔗 https://www.mygiftcardsupply.com/shop/itunes-gift-cards/
📩 Please send the gift card code to: [email protected]

💎 Cryptocurrency (TON):
Wallet Address:
UQDoOOOoDYPP8eqWXVsjVyYzulY72JLZK1grPS_O2DbgVNsc

Thank you for being part of this journey and helping us make technology truly accessible for everyone!

Update to version 6.5.1

Hello everyone. The addon has been updated to version 6.5.1; you can download this version from the main post. An important feature added to it is the "live assistant", which allows you to ask the Gemini provider about anything displayed on your computer screen. Additionally, the status key "i" announces the current and remaining pages during OCR. I hope you enjoy this version.

Telegram Channel

Hello everyone. You can join the Vision Assistant Pro Telegram channel to stay quickly updated on the add-on and the features being added to it. Channel link:
https://t.me/visionassistantpro

getting API key

Hi there! I just discovered this addon today. Next qeek, I will be testing a website that could possibly lead to a job. I was hoping to use this addon in case images of text happen to crop up during my testing. However, i can't seem to get an API key. How can you get an API key for this addon? Thanks!