Hello AppleVis community,
I am Mahmood, and I am excited to share Vision Assistant Pro, a new open-source add-on for NVDA. This tool is designed to bring the intelligence of Google Gemini directly into your screen reader to solve digital challenges that usually require sighted assistance.
It is completely free to use (with your own API key) and focuses on interactivity rather than just static descriptions.
🌟 Key Features:
👁️ Interactive Vision (Object & Full Screen): Unlike standard OCR that just reads text, this feature lets you "see" and "ask."
- Object Vision: Take a snapshot of the specific control (icon, button, image) under your navigator cursor.
- Full Screen Vision: Scan the entire screen layout.
- The Best Part: After the initial description, you can chat with the AI. You are not limited to one description; you can ask follow-up questions like "Is there a save icon?", "Describe the chart in detail," or "What color is the button?"
🧠 Smart Translator (Auto-Swap): Instantly translates selected text. It creates a seamless bilingual experience by automatically detecting languages. If the source matches your target language, it intelligently swaps them.
🎙️ Smart Dictation: A powerful voice typing tool. It doesn't just transcribe; it listens, fixes your grammar, removes stutters (ums and ahs), adds punctuation, and types the polished text directly into your active window.
🔓 CAPTCHA Solver: Struggling with visual codes? Press a shortcut, and the AI will solve the math or read the characters and automatically type the result for you.
📄 Document QA: Have a PDF, TIFF, or text file? You can "chat" with your documents. Ask the AI to summarize them, extract specific data, or explain complex sections.
🛠️ Requirement: You need a free Google Gemini API Key to run this add-on.
📥 Download & Installation: You can download the add-on directly from GitHub:
Download Vision Assistant Pro v3.1.0 (Direct Link)
Just open the downloaded file and confirm the installation in NVDA.
🔗 Project Source Code: https://github.com/mahmoodhozhabri/VisionAssistantPro
I developed this to help our community become more independent. I would love to hear your feedback and suggestions.
Best regards, Mahmood
Comments
Getting it to work
So I just installed the addon from the addons store, pasted in the API key and nothing seems to happen. When I tried to perform OCR on a PDF NVDA just says not connected.
I tried creating a new project in Google AI studio, same result.
Any idea on how to get it working?
@Nut
not connected? There is no such message in the plugin! Please check again.
@Stefan
Hello. Bulgarian has been added to the new version, 2.6. Enjoy.
Need help getting API key
I feel like I missed the first week of A.I. class and showed up on quiz day. When I go to Google Studio and hit the get API key button, it goes to an empty list of keys for imported projects. When I then hit the create API key, a modal pops up requiring a name and imported project--a combo box that's empty, with the "create" button disabled. Tried in Chrome and Firefox. Is this expected, and I have to create a project, or are browser extensions getting in the way, or what?
Thanks!
translate
It translates one sentence at a time and I have to press the key combination every time, which is not convenient. Make it like the nvda translator to turn on and off, instead of pressing a key combination for each sentence.
Re: Translate
Have to agree; real-trime translation of this quality is just what the doctor ordered. Question is, is it feasible?
@mantanini
Hello. That's not the case! Unless you are translating in the browser! To do this, place the text in the clipboard and press 'y' instead of 't'.
Congrats, good project!
I would recommend not implementing automatic updates or any update system directly in the add-on, as this is done automatically by the NVDA add-on store, and users can enable or disable automatic updates there.
API key missing
Thanks for this most indispensable add-on, i just used the OCR to describe my full screen and got a very nice description, i tried it again a minute later to get a full screen description. it says API key missing. what's up with that? is there a quota limitation of how much you use the add-on or am i missing something. and by the way i am using the very latest version of this add-on
suggestions
Hi Mahmood! I would like to congratulate you on the excellent add-on. Keep up the good work! I have some suggestions that would make things much easier: the possibility of having a screen to place multiple prompts and then being able to switch with a shortcut key. Sometimes, I want to know what's in a photo, then something in a game, and it would be useful to already have these prompts predefined by the user and just keep switching. Another suggestion would be an option so that when it showed the result, it would just speak, without showing a dialogue or window. It's quite useful to know information quickly without having to switch windows. Thank you!
please read
Dear Users,
Please download and install the latest build of the add-on from the project's GitHub page, which is linked in the original post. It appears that a significant portion of users are still running outdated versions.
Due to time constraints, my ability to visit this thread frequently to respond to comments is limited. For any issues or suggestions, please raise them directly in the Issues section of the project's GitHub repository.
Best Regards,
Mahmood
Post updated
Hello everyone. The download link has been updated to version 3.1.0. Enjoy.