Stop Waiting for AI to Tell You What to See. Start Exploring It Yourself.

By Stephen, 12 November, 2025

Forum

Assistive Technology

I'm about to show you something that breaks every rule about how vision AI is "supposed" to work.
And when I say breaks the rules, I mean completely flips the whole thing upside down.

Here's What's Wrong With Every Vision AI App You've Ever Used
You point your camera.
You wait.
The AI speaks: "It's a living room with a couch and a table."
Cool story. But where's the couch? What color? How close? What's on it? What about that corner over there? That thing on the wall?
Want to know? Point again. Wait again. Ask again.
The AI decides what you need to know. You're stuck listening to whatever it decides to tell you. You don't get to choose. You don't get to dig deeper. You don't get to explore.
You're just a passenger.
So I built something that does the exact opposite.

What If Photos Were Like Video Games Instead of Books?
Forget books. Think video games.
In a game, you don't wait for someone to describe the room. You walk around and look at stuff yourself. You check the corners. You examine objects. You go back to things that interest you. You control what you explore and when.
That's what I built. But for photos. And real-world spaces.
You're not listening to descriptions anymore.
You're exploring them.

Photo Explorer: Touch. Discover. Control.
Here's how it works:
Upload any photo. The AI instantly maps every single object in it.
Now drag your finger across your phone screen.
Wherever you touch? That's what the AI describes. Right there. Instantly.
Let's Get Real:
You upload a photo from your beach vacation.
Touch the top of the screen:
"Bright blue sky with wispy white clouds, crystal clear, no storms visible"
Drag down to the middle:
"Turquoise ocean water with small waves rolling in, foam visible at wave crests, extends to horizon"
Touch the left side:
"Sandy beach, light tan color with visible footprints, a few shells scattered about"
What's that on the right? Touch there:
"Red beach umbrella, slightly tilted, casting dark shadow on sand beneath it"
Wait, what's under the umbrella? Touch that spot:
"Blue and white striped beach chair, appears unoccupied, small cooler beside it"
Go back to those shells - drag your finger back to the beach:
"Sandy beach, light tan color with visible footprints, a few shells scattered..."
See what just happened?
The information didn't vanish. You went back. You explored what YOU wanted. You took your time. You discovered that cooler the AI might never have mentioned on its own.
You're not being told about the photo. You're exploring it.
And here's the kicker: users are spending minutes exploring single photos. Going back to corners. Discovering tiny details. Building complete mental maps.
That's not an accessibility feature. That's an exploration engine.

Live Camera Explorer: Now Touch the Actual World Around You
Okay, that's cool for photos.
But what if you could do that with the real world? Right now? As you're standing there?
Point your camera at any space. The AI analyzes everything in real-time and maps it to your screen.
Drag your finger - the AI tells you what's under your finger:
• Touch left: "Wooden door, 7 feet on your left, slightly open"
• Drag center: "Clear path ahead, hardwood floor, 12 feet visible"
• Touch right: "Bookshelf against wall, 5 feet right, packed with books"
• Bottom of screen: "Coffee table directly ahead, 3 feet, watch your shins"
The world is now touchable.
Real Scenario: Shopping Mall
You're at a busy mall. Noise everywhere. People walking past. You need to find the restroom and you're not sure which direction to go.
Old way? Ask someone, hope they give good directions, try to remember everything they said.
New way?
Point your camera down the hallway. Give it a few seconds.
Now drag your finger around:
• Touch left: "Store entrance on left, 15 feet, bright lights, appears to be clothing store"
• Drag center: "Wide corridor ahead, tiled floor, people walking, 30 feet visible"
• Touch right: "Information kiosk, 10 feet right, tall digital directory screen"
• Drag up: "Restroom sign, 25 feet ahead on right, blue symbol visible"
You just learned the entire hallway layout in 20 seconds.
Need to remember where that restroom was? Just touch that spot again. The map's still there.
Walk forward 20 feet, confused about where to go next? Point again. Get a new map. Drag your finger around.
But Wait - It Gets Wilder
Object Tracking:
Double-tap any object. The AI locks onto it and tracks it for you.
"Tracked: Restroom entrance. 25 feet straight ahead on right side."
Walk forward. The AI updates:
"Tracked restroom now 12 feet ahead on right."
Lost it? Double-tap again:
"Tracked restroom: About 8 steps ahead. Turn right in 4 steps. Group of people between you - stay left to avoid."
Zoom Into Anything:
Tracking that information kiosk? Swipe left.
BOOM. You're now exploring what's ON the kiosk.
• Touch top: "Mall directory map, large touchscreen, showing floor layout"
• Drag center: "Store listings, alphabetical order, bright white text on blue background"
• Touch bottom: "You are here marker, red dot with arrow, pointing to current location level 2 near food court"
Swipe right to zoom back out. You're back to the full hallway view.
Read Any Text
Swipe up - the AI switches to text mode and maps every readable thing.
Now drag your finger:
• Touch here: "Restrooms. Arrow pointing right."
• Drag down: "Food Court level 3. Arrow pointing up."
• Touch lower: "Store hours: Monday to Saturday 10 AM to 9 PM, Sunday 11 AM to 6 PM"
Every sign. Every label. Every directory. Touchable. Explorable.
Scene Summary On Demand
Lost? Overwhelmed? Three-finger tap anywhere.
"Shopping mall corridor. Stores on both sides, restroom 25 feet ahead right, information kiosk 10 feet right, people walking in both directions. 18 objects detected."
Instant orientation. Anytime you need it.
Watch Mode (This One's Wild)
Two-finger double-tap.
The AI switches to Watch Mode and starts narrating live actions in real-time:
"Person approaching from left" "Child running ahead toward fountain" "Security guard walking past on right" "Someone exiting store carrying shopping bags"
It's like having someone describe what's happening around you, continuously, as it happens.

The Fundamental Difference
Every other app: AI decides → Describes → Done → Repeat
This app: You explore → Information stays → Go back anytime → You control everything
It's not an improvement.
It's a completely different paradigm.

You're Not a Listener Anymore. You're an Explorer.
Most apps make you passive.
This app makes you active.
• You decide what to explore
• You decide how long to spend there
• You discover what matters to you
• You can go back and check anything again
The AI isn't deciding what's important. You are.
The information doesn't disappear. It stays there.
You're not being helped. You're exploring.
That's what accessibility should actually mean.

Oh Right, There's More
Because sometimes you just need quick answers:
Voice Control: Just speak - "What am I holding?" "Read this." "What color is this shirt?"
Book Reader: Scan pages, explore line-by-line, premium AI voices, auto-saves your spot
Document Reader: Fill forms, read PDFs, accessible field navigation

Why a Web App? Because Speed Matters.
App stores = submit → wait 2 weeks → maybe approved → users update manually → some stuck on old version for months.
Web app = fix bugs in hours. Ship features instantly. Everyone updated immediately.
Plus it works on literally every smartphone:
• iPhone ✓
• Android ✓
• Samsung ✓
• Google Pixel ✓
• Anything with a browser ✓
Install in 15 seconds:
1. Open browser
2. Visit URL
3. Tap "Add to Home Screen"
4. Done. It's an app now.

The Price (Let's Be Direct)
30-day free trial. Everything unlocked. No credit card.
After that: $9.99 CAD/month
Why? Because the AI costs me money every single time you use it. Plus I'm paying for servers. I'm one person building this.
I priced it to keep it affordable while keeping it running and improving.

Safety Warning (Important)
AI makes mistakes.
This is NOT a replacement for your cane, guide dog, or mobility training.
It's supplementary information. Not primary navigation.
Never make safety decisions based solely on what the AI says.

The Real Point of This Whole Thing
For years, every vision AI app has said:
"We'll tell you what you're looking at."
I'm saying something different:
"Explore what you're looking at yourself."
Not one description - touchable objects you can explore for as long as you want.
Not one explanation - a persistent map you can reference anytime.
Not being told - discovering for yourself.
Information that persists. Exploration you control. Discovery on your terms.

People are spending 10-15 minutes exploring single photos.
Going back to corners. Finding hidden details. Building complete mental pictures.
That's not accessibility.
That's exploration.
That's discovery.
That's control.
And I think that's what we should have been building all along.
You can try out the app here:
http://visionaiassistant.com

Options

Comments

ah OK

Ah OK. so what features can I use after the trial ends? and I wish the app has a lifetime subscription that never exitres. now that would be cool!

@ Stephen

Would be great if you can update the original post with the link. For someone who is newly discovering this. It'd be better to find it in the first go rather than having to look through the comments.

@Gokul

Great minds think alike! I was updating this as you made that comment :).

well, I hope we can share screen and tell us the eloment or the

the subject said it all.
hope it c an have a feature that can share the computer screen then when we press the up and down arrows it will tell us the menu of the game

@ming

Hi Ming,
Thank you for your suggestion! I understand you're looking for a feature that can capture your computer screen and read out game menus when navigating with arrow keys.
This is a challenging request because this app is built as a progressive web app (PWA) that runs in your browser. Web browsers have significant security restrictions around screen capture - they can't directly access another application's screen content or intercept keyboard events from other programs like games. This is by design to protect user privacy and security.
Why we chose a web app approach:
• Universal accessibility: Works on any device (iPhone, Android, Windows, Mac) without separate downloads or app store approvals
• Instant updates: Everyone gets new features immediately without reinstalling
• No storage concerns: Doesn't take up device space like native apps
• Cross-platform: One codebase works everywhere
• Easier to maintain: We can focus on building features rather than managing multiple platform-specific versions
The technical challenges for your use case:
• Web apps are sandboxed and can't access content outside the browser tab for security reasons
• Intercepting arrow key presses from games would require system-level permissions that browsers don't have
• Reading game menus would need the game developer to provide accessibility APIs, which is outside our control
Game accessibility is ultimately dependent on game developers implementing accessible features in their games. Unfortunately, this isn't something a web app can bridge due to browser security limitations.

Watch mode is turning out to be incredible

This near-realtime monitoring of the camera feed is something almost the entirety of the community has been wanting for sometime now and no app/wearable could do reliably so far and we have it near perfect here now. So that's saying something. It's too much to hope for maybe, but still hope that someday this same thing gets done by a smartglass camera...

🧭 AI Navigator: Because sometimes you need to go hands-free

Live Camera Mode is great - drag your finger around, explore what's in front of you, get detailed descriptions. But what if you just want to walk somewhere without touching your phone?
That's the whole point of AI Navigator.
Three-finger swipe down → just start talking.
No dragging. No tapping. Just conversation.
"Take me to the bathroom." Guides you step-by-step with live distance updates
"What's in front of me?" Describes everything ahead without you lifting a finger
"Where's the exit?" Spots it, tells you exactly where and how far
It reads building directory maps automatically, remembers every location, and keeps track of where you've been. Multi-floor buildings? No problem - it'll guide you to the elevator, tell you which floor to go to, and pick up navigation when you arrive.
It's proactive too. Spots spills, moving people, stairs - warns you before you get there.
Different tools for different moments. Sometimes you want to explore with your hands. Sometimes you just want to get somewhere. Now you can do both.
Testing it now. Updates coming soon. 🚀

PS: Goes without saying but to go completely handsfree you would need some sort of mount :).

@Gokul

I'm building those features now so when Meta hands out that nice little API I can work to see how to implement. I'm staying 5 steps ahead with getting the infrastructure in place and we can go from there :).

Can you add italian translation also?

Hi, first of all congratulations for the app, very innovative. I noticed that you added several languages but Italian is missing. Could you add it? Thank you.

screensharing

Screen sharing could be used, like Google AI Studio does it, to let the web app capture screen contents.

Android, lots of unneeded elements on page

When I install the web app, and open the app, the first four items on the page are "region," "list," "region," "region." Maybe those could be trimmed from the start page?

@ Devin Prater

Screen sharing in a browser is not a system-level capture mechanism, and it cannot be used to directly interpret or interact with the contents of other running applications such as games as Ming is requesting.

On desktop systems, a web app yes can request temporary access to capture a specific browser tab, application window, or monitor using the browser’s built-in screen sharing permission prompt. That access is strictly limited to what the user manually selects, only remains active during that session, and provides nothing more than a raw video feed. It however does not expose the actual interface structure, menu hierarchy, UI elements, focus states, or internal game data. In other words, it can show pixels, but it cannot understand what those pixels represent in a reliable, semantic way.

In addition to that, a web app cannot intercept or listen to arrow-key input that is being handled by another application. The browser only receives keyboard input while it is the active, focused window. Once focus shifts to a game, the browser and any web app running inside it immediately lose access to those key events. This makes real-time “menu tracking” based on arrow keys technically impossible in a standard web environment.

On mobile platforms, especially iOS, the restrictions are even more severe. Safari and Progressive Web Apps cannot capture or analyze the screen contents of other apps at all. They also cannot run continuous background capture or monitoring processes. These limitations are enforced at the operating system level to protect user privacy, security, and data isolation between apps.

While some platforms demonstrate controlled screen capture in specific environments, those implementations operate within tightly constrained ecosystems and do not equate to universal, system-wide access for third-party web applications. A typical web app does not gain the ability to observe other programs or interpret their user interfaces simply by using screen sharing.

For a feature that reads game menus dynamically while navigating with arrow keys, the only technically valid approaches are:

• Native system-level accessibility services provided by the operating system
• Direct accessibility APIs implemented by the game developer
• Or a dedicated desktop application with elevated permissions and OS-level hooks

These capabilities are intentionally blocked from browsers and PWAs to prevent surveillance, input capture, and unauthorized data access across applications.

@ Devin Prater

Thanks for letting me know. I don’t have an android so I’m counting on my android friends to yell at me if there is any bugs. I’ll fix that with the next update :).

@ Ambro

I’ll definitely look into this for you 😊.

well, I try it on my phone and I have some suggestion

well,
when I use my finger to explorrer around the screen it will tell me a desk, keyboard screen and so on.
but, I hope that it can describe more when releasing the finger...
and also I try it on my pc and it doesn't do too much.
because my computer doesn't have the camaera.

@ming

Hi Ming,
What feature were you using? Was it live camera explorer or photo explorer?
Did you double tap on an object and then swipe left to zoom in? Were you able to do the tutorial or did it pop up for you at all?

I need more details in order to help you better 😊.
Also, yes it does need a camera to work.

ok...

let me try it again...
maybe I didn't do it quite deeply.

You Broke It! (In the Best Way Possible) 🎉

Hey everyone! 🎉
You're Amazing!
The app is temporarily down, but here's the thing - you've all been using it SO MUCH that we've completely blown through our integration limits! This is honestly the best problem to have. It means you love what we've built! 🙌
What's happening:
Your enthusiasm has pushed us past our usage caps way faster than expected. The sheer volume of activity shows how valuable this tool is to the community.
What's next:
To keep things running smoothly for everyone, we're rolling out subscriptions sooner than planned. This ensures reliable, uninterrupted service and lets us scale properly to handle all your usage.
Thank you for being such incredible early adopters! Your support means everything, and this "problem" is actually proof we're building something people truly need. 💪
Stay tuned - we'll be back up very soon!

We are back up!

All the features should be working again. Also the subscription price will be lowered to $4.99 USD. I will be putting this in shortly.

Subscription plans

Hey Stephen, would you ever consider doing an annual subscription plan? Just thought I would ask. Apologies if this has already been covered.

@Brian

Oh absolutely. Their will be both monthly and annual. I have to just build those products and because we are moving everything to subscription I have to get a legal TOS as well so those are going to be my main focus to implement within the next few days.

Suggestions for the app?

Hi Stephen.

I have two suggestions for the app:
1. In mirror and book reading modes, the possibility to switch so that it automatically informs the user what they need to do. For example, if they need to move their face to the left, or adjust the camera to the right to center the book page.

2. Custom prompts for Live mode:
I'll give some examples of how I would use it, but this can be used for anything:
A screen where we can create prompts, like "Which option is in Focus?"

Another prompt: "What's happening in the movie?"

Then, we would activate the watch mode, another gesture would open a menu where we could choose our prompts, and it would focus only on what we asked for.

Thank you!

@ Diego

Those are some awesome suggestions my friend. I really love the prompt idea especially. I’m gonna look to see how to implement that. Thank you so much again and don’t hesitate to throw your suggestions at me 😊.

Important Update: Subscription Model Now Active

Hello everyone,
I wanted to share an important update about Vision AI Assistant and explain the changes we're implementing today.
Why We're Moving to Subscriptions
When I launched Vision AI Assistant, I never imagined the response would be this incredible. You've been using the app so extensively that we had hit our AI service limits far sooner than anticipated. This is genuinely the best problem to have - it means you're finding real value in what we've built together.
However, the reality is that every AI vision analysis, every voice interaction, and every real-time navigation feature costs money to run. The current usage has made it clear that to keep the service reliable and available for everyone, we need a sustainable business model.
What's Changing
Starting today, Vision AI Assistant requires a $4.99 USD/month subscription to access the full app. This is not a decision I made lightly, but it's necessary to ensure:
• Reliable service for all users without interruptions
• Continued development of new features you've been requesting
• Priority support so I can help you when you need it
• Sustainable infrastructure that can scale with our growing community
Why No Yearly Plans Yet?
I know some of you would prefer an annual subscription option, and I promise it's coming. However, I'm intentionally waiting before offering yearly plans because:
1.
Scalability Testing: I need to see how the infrastructure performs at scale over the next few months. Vision AI is resource-intensive, and I want to make sure I can confidently provide reliable service for an entire year before asking for that commitment.
2.
Service Guarantee: If I take your money for a full year, I want to be absolutely certain I can deliver consistent, high-quality service for those 12 months. Right now, we're still optimizing costs and infrastructure to ensure long-term sustainability.
3.
Fair Pricing: Once I understand the true costs at scale, I'll be able to offer a yearly plan that provides real savings while ensuring the service remains viable.
My Commitment to You
I'm committed to making Vision AI Assistant the best accessibility tool it can be. Your subscription directly funds:
• Server costs and AI processing
• Ongoing feature development
• Quality improvements and bug fixes
• My time to provide support and implement your feedback
Thank you for being part of this journey. Your enthusiasm and feedback have been incredible, and I'm excited to continue building something truly useful for the blind and visually impaired community.
If you have any questions or concerns about this change, please don't hesitate to reach out through the "Contact Developer" feature in the app or give me a call at:
1-866-825-6177
[ext]: 0
It is a toll-free number so you can contact me from anywhere without any charges.

With gratitude,
Stephen.

Ok free again for

All right, so I had to make the app free for now. Having some bit of technical difficulties getting subscriptions implemented. I’m putting this on the back burner for today lol.

Payment error

Hi Stephen. I just signed up but encountered an issue. After entering my details and verifying, I was taken to the subscription screen.
I proceeded and paid the $4.99 fee, and got confirmation of this from my banking app.
However, I then got the following error message:
{"error":"Base44-App-Id header is required, but is was not found on the request"}
When I closed this, I was taken back to the subscription screen. Naturally however I am not willing to pay twice.
Help appreciated.
Dave

Follow up

Just following up on the above, I subsequently discovered that I could select a Back button and get into the app, where I was also told that it is free for now.
The transaction is still though showing as pending with my bank.
I would recommend removing the payment journey until you’re ready, because I originally appeared to be required to complete it.
Dave

@ Dave Nason

Yeah I’m looking at it now. I’m sending it back too you. I got rid of stripe this morning so I don’t know why the app is still holding onto it so I might have to pull it off-line for a moment.

Perfect thanks

Ok great stuff.
While I will be happy enough to pay if the app works well for me, the journey did make me wonder if there should be a free trial once paid subscriptions are implemented. Even just 24 hours to give you a flavour for the app.
Dave

@ Dave Nason

I was actually trying to implement the 24 hour trial, but I think by doing so I might’ve messed up the web hook somehow and it is looking like that is messing with stripe. Needless to say I’m rolling back these subscriptions and just going to keep it free until I can figure out this stripe situation and what’s going on with it.

Tip jar

Ah I see.
In the meantime, conscious of the cost to you of having it free, perhaps you could whip up a voluntary tip jar.
Dave

Dave Nason

Tip jar here:
https://www.paypal.com/ncp/payment/8RUHTTVFBJDCQ

I'm glad you're making it free for now

Hi,

I'm glad you're making it free for now. also, you can scrap the idea about the AI recorder thing. I know they're adding a lot of features would raise up more money in the long run. besides, there are plenty of apps and resources out there that process recordings using AI technology, so yeah.

Podcast

Hi Stephen,

I just caught the podcast you did with Dave and Thomas. Good stuff man!
If you can ever get this integrated into Meta, that would be truly mind blowing. 😎👌🏼

Interesting

This is an interesting idea. It reminds me of an old VoiceOver feature that allowed you to explore photos by touch to feel where objects were located. I used it to check if I was centered in an image, and I was disappointed when Apple removed it.

My main concern is privacy, and I have a few questions if you have a moment:
• Is data being sent straight to the AI provider, or are you logging it as well?
• Which AI provider are you using, and what is their data retention policy?
Regardless, thanks for working on something like this.

same here

Hi,

Awesome job on the podcast interview. good stuff! It would be cool if you would incorporate the glasses with this so that you can use the glasses with the assistant. Sure the Meta AI platform is getting better but sometimes it's a hit and miss. It would be cool though. Also I'm on the app if you would like to follow me my username is jcdjmac. I tryed to follow you but it gave me an error message not sure why.

this sounds absolutely amazing!

this web application sounds absolutely amazing! Can't wait to check it out! I'll be happy to beta test if you need testers. I'm primarily an iOS user, but I also have a pixel and know the basics of TalkBack.

Roaratron

Not sure if you know this already, but it's available now. Link below.
http://visionaiassistant.com/

Not having much luck with app so far

Probably me but so far not having much luck with the app. Think part of the issue is not knowing when I should have Voiceover on or off. Also I go in to the live video mode turn VO off drag my finger round my screen but keep hearing nothing here when there is a room full of objects. Also is there a published list of commands somewhere.

Best regards

Updates

Hey everyone.
I know there’s a lot of comments here so I will try to get to them ASAP but I wanted to jump on here to let everyone know that we have reached our integration quota.
Right now all features will be very limited or completely down until the billing cycle. Yes I had moved it up to $500 a month on my end, but you folks are phenomenal and have been using it so much we have to wait till it resets as my monthly budget for this project is over extended at the $500 a month. Hang tight, enjoy your holidays and will be back online soon.

Gofundme is all set up

We officially have a GoFundMe!
https://www.gofundme.com/f/vision-ai-assistant-empowering-independent-exploration

Surfaces restored

All app services have been restored.

improvements

Hi Stephen and everyone!
I was using the app today and would like to comment on some problems and details I noticed and experienced.
When exploring photos, sometimes tapping the screen is interpreted as a tap and hold, which opens a menu to copy the image, share, among other options.
Additionally, I think the AI's embellishments could be removed. An example was a photo I have of a Christmas tree, and the AI would say something like. Tap to feel the warmth of the candle flame, or the texture of the leaves.
Regarding the book reader:
First, it still speaks everything in English, even when configured for Portuguese.
Second, when it can't detect text, it keeps asking to turn to the next page and capture. I do, but it seems to freeze, you know?
I think an alert system like. Hey, you didn't capture any text! OR a guided system, as I mentioned some time ago, would help.

@Diego

Hi Diego,
Thank you for your detailed feedback! Let me address each of the issues you mentioned.
Tap and Hold Opening Menu in Photo Explorer:
I've been trying to reproduce this issue but haven't been able to trigger it on my end. This sounds like it might be a conflict with VoiceOver or another system accessibility feature intercepting the touch gestures. Please make sure VoiceOver is completely turned off, it can interfere with the app's gesture handling and cause a regular tap to be interpreted as a long press. If VoiceOver is definitely disabled and this is still happening, please let me know so I can investigate the touch event handling.
AI Embellishments in Photo Exploration:
I hear your feedback about the embellishments. I want to explain why they're included: many users have told me they really appreciate the detailed, descriptive language because it helps them not just understand what's in the photo, but actually visualize and connect with it emotionally. When exploring a Christmas tree photo, for example, users have said they want to feel immersed in the scene, to "feel" the warmth and texture through the descriptions, not just get a basic list of objects. It's about creating an experience that's engaging and meaningful, like having a friend describe the photo with enthusiasm rather than reading a catalog.
That said, I understand everyone has different preferences. Some people want concise facts, others want rich descriptions. I'm considering adding a setting that lets you choose between "detailed" and "concise" description modes so you can customize it to your preference.
Book Reader - Portuguese Language and Major Update Coming:
You're absolutely right about the Portuguese issue. The current book reader only works in English. I was working on adding Portuguese support to the existing system however I've scrapped that and I'm developing a completely redesigned book reader that I just finished yesterday and is currently in testing. This should solve all the problems you had mentioned.
This new version is called Interactive Live Text Reader, and it works completely differently. Instead of capture-analyze-display, it works like the Live Camera Explorer mode. The camera stays on continuously, and you drag your finger across the screen to read words in real-time as you touch them. It's like your finger is directly touching the words on the page through the camera. You get instant feedback, no capture-retry loops, and a much more natural reading experience.
The reason I'm mentioning this is because I'm implementing Portuguese language support in this new version using a completely different architectural approach. Instead of trying to retrofit Portuguese into the old capture-based system, I'm building the language support into the new live reader from the ground up. This will make Portuguese work properly across all the real-time OCR processing, text-to-speech output, UI announcements, and gesture instructions.
The new Live Text Reader includes:
• Real-time word reading: Drag your finger across the screen, words are spoken as you touch them
• Double-tap to spell: Double-tap any word to hear it spelled out letter by letter
• Two-finger double-tap: Reads the entire page automatically from top to bottom
• Automatic page change detection: When you turn the page, it detects the new page number and clears the old text automatically
• End-of-page detection: Announces when your finger reaches the bottom of the page
• No capture-retry loops: The camera is always on, so you get immediate feedback on positioning
I'm finishing testing now and expect to release it within the next few days. The Portuguese language integration is being built into this new system, so when it launches, Portuguese users will get the full experience. Proper OCR recognition of Portuguese text with all accent marks, all spoken feedback in Portuguese, and all UI instructions in Portuguese or what ever language you have the app set to.
Book Reader Freezing Issue:
Your description of the freezing issue is exactly what I've heard from other users and is why I'm doing a rebuild. When the OCR can't detect text, it gets stuck asking you to turn the page without actually helping you understand what's wrong. Your suggestion for better alerts is excellent. It should say things like "No text detected, try moving closer" or "Image is too blurry, hold steadier" instead of just looping on "turn to next page" however,
the new Live Text Reader completely eliminates this problem because there's no capture-fail-retry loop. The camera is always on, you're always seeing what it sees, and you get immediate feedback by dragging your finger. If text isn't being detected, you know right away because you're not hearing any words, and you can adjust your camera angle, lighting, or distance in real-time. It's a fundamentally better approach to the problem.
Thank you for taking the time to report these specific issues. This kind of detailed feedback is exactly what I need to keep improving the app!

@Stephen

Hi Stephen!

I use Android here, and TalkBack / Jieshuo are completely disabled when I use photo exploration. I have an S23.

Do you have any news about that custom prompts idea?

Another suggestion:
Have you ever thought about having an audio description feature for content like movies, series, and videos using the camera?

Unfortunate update about vision AI assistant

This is not an easy message to write.

As of today, Vision AI Assistant may be facing closure as early as January 8 , pending the outcome of a critical meeting scheduled for January 7. While no final decision has been made yet, it’s important to be transparent about where things currently stand.

Vision AI Assistant was built with a clear purpose: to give blind and low-vision users more agency, more independence, and a more exploratory way to understand the world around them. From day one, this project was driven by lived experience, accessibility-first design, and a belief that assistive technology should empower curiosity, not limit it.

Unfortunately, the reality behind the scenes has been far more complex.

The ongoing backend and infrastructure costs required to keep Vision AI Assistant running have continued to rise well beyond what was originally projected. Despite careful optimization, personal financial investment, and countless unpaid hours poured into development, testing, and support, the platform has reached a point where it is no longer financially sustainable under its current model.

At this time, I can no longer justify injecting additional personal capital into the project without a clear and viable path forward. This decision is not about lack of belief in the idea, the community, or the technology itself, it is purely about sustainability and long-term feasibility.

The meeting on January 7 will determine whether there is any realistic alternative: restructuring, outside support, or a last-minute solution that could allow Vision AI Assistant to continue in some form. If no such path emerges, the service may officially shut down on January 8.

To everyone who tested the platform, shared feedback, reported bugs, advocated for it, or simply believed in what Vision AI Assistant was trying to do, thank you. This project existed because of a community that cares deeply about accessibility, innovation, and doing things differently.

No matter the outcome, the work done here was not wasted. The ideas, lessons, and conversations sparked by Vision AI Assistant will continue to influence what comes next, even if this chapter comes to a close.

Further updates will be shared as soon as a final decision is made following the January 7 meeting.

Dear Stephen

I believe there was a lot of effort, time and thought put into this, and more importantly, to hit on such an idea, and to follow it through is nothing short of incredible. So, if there's anything that can be done, including paying a reasonably higher subscription, I would be ready to do that, and I'm sure a lot of folks here would agree with me. Waiting for a possitive update.

@gokul

I can’t thank you enough for your kind words. Getting this to where it is has taken more time than I would like to admit. The biggest challenge with the subscription is properly implementing stripe on the back end for subscriptions. I had to pull the subscriptions originally because it was charging people double then not giving them access to the app. Testing that is going to take quite a bit of time and maybe a few months. In order to try to keep the lights on for this project and to keep it funded, I had set up a go fund me. I really wanna keep this project moving forward, but I obviously can’t afford to take hours away from my full-time job unless this app becomes self-sufficient. Your support would be greatly appreciated. The GoFundMe link is below.
https://www.gofundme.com/f/vision-ai-assistant-empowering-independent-exploration

sad news

Hi Stephen. I'm very sad to hear that. When this tool was launched, I saw a lot of potential, and with each improvement, I became more eager to see what would come in the future. Unfortunately, I cannot help due to financial reasons. Subscriptions here in my country, converted to my local currency, would be very expensive, unfortunately, but if I can help in any way, I'll be happy to assist. Thank you in advance for creating something so brilliant and for listening to the community and trying to improve something revolutionary. I hope my words help, in some way.

Exciting news

I want to share an important update about the future of Vision AI Assistant and how we’re working to keep the service running.

Until now, the app has been supported largely through myself and community donations. To create a more stable and transparent path forward, we’re going to try something new. On a limited basis, we’ll begin acknowledging individuals, companies, and organizations that financially sponsor the project by featuring them on the login screen. Placement will only be available to those who have made a direct financial contribution to keeping the service operational.

In some cases, a sponsor may also include a link to their service or website. Every organization will be carefully vetted in advance. This includes accessible employers who actively hire blind and low-vision people, as well as businesses, services, and organizations that genuinely support accessibility, autonomy, and independence. This is not an open advertising marketplace and it is not pay-to-appear without standards. Nothing will be displayed that compromises trust, accessibility, or the overall user experience.

This is a new approach for us, and we’re moving forward thoughtfully. The goal is sustainability without exploitation—finding a way to extend the life of the service without shifting costs onto users or introducing intrusive advertising.

I wanted to be transparent with the community about what we’re trying and why. Vision AI Assistant exists because of this community, and this experiment is about giving the project a realistic chance to continue, improve, and remain available to the people who rely on it.
If you are an organization please give us a call at:
1866-825-6177
To discuss further.
If you are an individual, please ensure to donate through our GoFundMe link at:
https://www.gofundme.com/f/vision-ai-assistant-empowering-independent-exploration
If you prefer to remain anonymous on the app, please reach out and let me know.
Thank you all so much.