Something Different is Coming
A Progressive Web App for Blind and Visually Impaired Users | Works on All Smartphones
I need to tell you about something I built. Not because it's "the best" or "revolutionary" – everyone says that. But because it works in a way that genuinely surprised me when I tested it.
The Problem I Kept Running Into
You know the drill with most vision AI apps:
Point your camera → AI speaks a sentence → that's it.
"It's a living room with a couch and a table."
Cool. But where's the couch exactly? What color? How far? What else is there? Can you tell me about that corner again?
You have to point again. Ask again. Wait again. Listen again.
You're always asking. The AI is always deciding what matters. You never get to just... explore.
What If Photos Worked Like Books?
Stay with me here.
When someone reads you a book, you can say "wait, go back." You can ask them to re-read that paragraph. You can spend five minutes on one page if you want. You control the pace of information.
But photos? Someone gives you one description and that's it. Take it or leave it. They decided what's important. They decided what to mention. They decided when you're done.
We thought: What if photos worked like books?
What if you could explore them at your own pace? Go back to parts that interest you? Discover details the other person missed? Spend as long as you want?
The 6×6 Grid: Your Photo, Your Exploration
Here's what we built:
Upload any photo. Any photo at all.
The AI divides it into 36 zones – a 6×6 grid covering every inch of the image.
Now drag your finger across your phone screen like you're reading a tactile graphic.
What This Actually Feels Like:
You're exploring a photo of your living room:
Start in the top-left corner – drag your finger there:
"Smooth cream-colored wall with matte finish, cool to imagine touching, painted evenly"
Slide your finger right:
"Large window with soft natural light streaming through, sheer white curtains that would feel delicate and silky between your fingers"
Down a bit:
"Polished oak coffee table, glossy surface that would feel smooth and slightly cool, rich honey-brown color"
To the left:
"Plush beige carpet, deep pile that looks like it would feel soft and springy underfoot, slightly worn in the center from foot traffic"
Wait, go back to that window – drag back up:
"Large window with soft natural light streaming through, sheer white curtains..."
You're in control. You decide what to explore. You decide how long to spend. You decide what matters.
Go to the bottom-right corner – what's there?
"Wooden bookshelf against the wall, dark walnut finish with visible grain, would feel smooth with slight ridges"
Move to the zone right above it:
"Books lined up on shelf, various colored spines, some leather-bound that would feel textured and aged"
This Changes Everything
You're not being told about the photo.
You're exploring it.
You can go back to that window five times if you want. You can ignore the couch and focus on the corner. You can trace the room's perimeter. You can jump around randomly.
It's your photo. You explore it your way.
And here's the thing: the information doesn't disappear. It's not one-and-done. It stays there, explorable, for as long as you want.
Now Take That Same Idea and Put It in Physical Space
You walk into a hotel room at midnight. You're exhausted. Strange space. No idea where anything is.
Usually? You either stumble around carefully, or ask someone to walk you through, or just... deal with it till morning.
New option:
Point your camera. Capture one frame. The AI maps it into a 4×4 grid.
Now drag your finger across your screen:
• Top-left: "Window ahead 9 feet with heavy curtains"
• Slide right: "Clear wall space"
• Keep going: "Closet with sliding doors 8 feet on the right"
• Bottom-left: "Clear floor space"
• Center-bottom: "Bed directly ahead 5 feet, queen size"
• Bottom-right: "Nightstand right side 4 feet with lamp and alarm clock"
You just mapped the entire room in 30 seconds. Without taking a step. Without asking someone. Without turning on any lights.
Want to know what's on the left side again? Drag your finger back over there. Want to double-check the right? Drag there.
The information stays right there on your screen. You can reference it. You can re-explore it. You can take your time understanding the space.
The Core Difference
Most apps: Point → Wait → AI decides what to tell you → Move on → Repeat
This app: Explore → Control the pace → Discover what matters to YOU → Information persists → Return anytime
That's not a small difference. That's a fundamentally different interaction model.
You're Not a Passive Receiver
You're an active explorer.
You don't wait for the AI to decide what's important in a photo. You decide which zone to explore.
You don't lose the room layout the moment it's spoken. It stays mapped on your screen.
You don't get one chance to understand. You can explore as long as you want, go back, re-check.
This is what "accessible" should actually mean: Not just access to information, but control over how you receive and interact with it.
I have big plans for this feature to expand it as well.
Oh Right, It Also Does All The Normal Stuff
Because yeah, sometimes you just need quick answers.
Live Camera Scanning
Point anywhere, AI describes continuously:
• Quiet Mode: Only speaks for important stuff (people, obstacles, hazards)
• Detailed Mode: Rich ongoing descriptions
• Scans every 2-4 seconds
• Remembers what it already said (no repetition)
Voice Questions - Just Ask
No buttons. Just speak:
• "What am I holding?"
• "What color is this shirt?"
• "Read this label"
• "Is the stove on?"
• "Describe what you see"
• "What's on my plate?"
Always listening mode – ready when you are.
Smart Search (Alpha)
"Find my keys"
AI scans rapidly and guides you:
• "Not visible – turn camera left"
• "Turn right, scan the table"
• "FOUND! On counter, left side, about 2 feet away"
⚠️ Alpha: Still being worked on.
Face Recognition: Alpha
Save photos of people → AI announces when seen:
"I see Sarah ahead, about 8 feet away"
Totally optional. Enable only if wanted.
Object Tracking: Alpha
Tell AI to watch for items:
"Keep an eye out for my phone"
Later: "Where did you last see my phone?"
→ "On kitchen counter, 22 minutes ago"
Meal Assistance
Food positioned using clock face:
"Steak at 3 o'clock, potatoes at 9 o'clock, broccoli at 12 o'clock"
Plus descriptions: portion sizes, cooking level, colors, textures.
Reading Mode: Alpha
Books and documents:
• Voice commands: "Next page", "Previous page", "Repeat", "Read left page", "Read right page"
• Speed controls: "Read faster" / "Read slower" (instant adjustment)
• "Check alignment" (ensures full page visible)
• Auto-saves progress per book
• Resume exactly where you stopped
Social Cue Detection: Alpha
Optional feature detecting if people are:
• Making eye contact with you
• Waving or gesturing toward you
• Trying to get your attention
Fully Customizable
Pre-set profiles or build your own:
• Scanning frequency (2-4 seconds)
• Detail level (Basic / Standard / Maximum)
• Voice speed (0.5× to 2×)
• Auto-announce settings
• Feature toggles
Why This is a Web App, Not an App Store App
Honest reason: We want to ship features fast, not wait weeks for approval.
Better reason:
App stores are gatekeepers. Submit update → wait 1-2 weeks → maybe get approved → maybe get rejected for arbitrary reasons → users manually update → some users stuck on old versions for months.
Progressive Web Apps are different:
Bug discovered? Fixed within hours. Everyone has it immediately.
New feature ready? Live for everyone instantly.
AI model improved? Benefits everyone right away.
No approval process. No waiting. No gatekeepers.
Plus it works everywhere:
• iPhone ✓
• Android ✓
• Samsung ✓
• Google Pixel ✓
• Any modern smartphone ✓
Same features. Same performance. Same instant updates.
Installation takes 15 seconds:
1. Open browser
2. Visit URL
3. Tap "Add to Home Screen"
4. Appears like regular app
Done.
Privacy (The Short Version)
• Camera images analyzed and discarded – not stored
• Voice processed only during active questions
• Face recognition optional
• Data encrypted
• Delete everything anytime
Critical Safety Disclaimer:
AI makes mistakes. This is NOT a replacement for your cane, guide dog, or O&M training. Never rely on this alone for safety decisions. It's supplementary information, not primary navigation.
When Does This Launch?
Soon.
Final testing in progress.
When we officially release, you will have all features even though some of the app and it's features will still be in beta.
The Real Point of All This
For years, accessibility apps have operated on this assumption:
"Blind people need information. we'll give it to them efficiently."
Fine. But also... what if I flipped it:
"Blind people want to explore. They want control. They want information that persists. They want to discover things their way."
That's what I built.
Not "here's a sentence about your photo" but "here's 36 zones you can explore for as long as you want."
Not "here's a description of this room" but "here's a touchable map that stays on your screen."
Information that persists. Exploration you control. Interaction you direct.
That's the difference.
One Last Thing
The photo grid gives you 36 descriptions per image. Detailed, sensory, rich descriptions.
So when it comes out, watch people explore single photos for 5-10 minutes.
Going back to corners. Discovering details. Building mental images. Creating memories of the image.
That's not just making photos accessible.
That's making photos explorable.
And I think that's better.
Coming Soon
Progressive Web App
Works on All Smartphones
Built for exploration, not just description
What do you think? Which feature interests you most? Questions? Thoughts? Comments below.
Comments
Ask about photos
One feature I would like to see in the future is the ability to ask questions while exploring photos. Even though the detail it gives is very rich, sometimes I find that it doesn’t give me all the specific details I am looking for so it would be nice to have some ability to ask for clarification from the AI about them.
@Zoe Victoria
Hi Zoe,
Yeah, the share sheet thing is a real trade-off, not gonna lie.
Those couple extra taps to save and upload? It's because we're web-based instead of a native app. And that comes with a downside (you found it) and an upside.
The upside: I can ship you improvements basically instantly. When you give me feedback, I can turn it around same-day or next-day. No waiting for app store reviews, no version updates, none of that. You just reload and it's there.
Native apps with share sheets can't do that. When users report issues, they're waiting weeks or even longer. When I improve the AI, you get it immediately. They don't.
So yeah, it's a slightly clunkier upload flow. But it means you're always using the best version of the tool, and I can respond to what you're telling me in real-time.
That's the trade. Not perfect, but I think it's the right one for where we are right now. I do however want to implement that feature in the future.
Thanks so much for that feedback, I'll also look into adding the feature you had requested :).
looks great
looking forward to testing this. Real time, live reading is the part I'm most interested in, but I'm also curious to see how well the image exploration works with album art, on the front of a vinyl record sleeve for example.
@Ashley
Hello Ashley
I'm glad your excited to try it all out!
You can try it out here:
http://visionaiassistant.com
Updates coming
Hello all,
I know their hasn't been an update in the last day. I've been working overtime and I hopefully will be pushing updates out this weekend. It gets crazy at work around this time of year it being the final quarter and all but hopefully I will have time to work on further improving some of the features tomorrow or Sunday.
Cheers!
realtime navigation
@Stephen sorry for not responding earlier; I was caught up in a medical emergency. About the realtime navigation feature, the thing seems to be stuck at my end. It asks for camera access and I grant it, and it's stuck there. I tried turning voiceover off and all but nothing seems to happen. Is it me not doing something perhaps?
@Gokul
Oh noo I hope every one is ok. Don't even worry about real time nav, I'm overhauling the entire thing and putting something much more boss in it's place. I am getting rid of room explore mode and real time navigation and I'm going to use a like live camera feed to make it so that you can room explore, get descriptions, navigate to objects like doors, chairs etc, have contextual awareness to the object your tracking...it is all coming in a verry nice package coming out shortly. I'm just putting the finishing touches on it now then removing room explorer and real time nav and this big beast will take it's place.
@Gokul
Once I'm done playing D&D I'll put the finishing touches on it and drop it for yawl.
Major Update: Live Camera Explorer - Your Feedback Shaped This
Hi everyone,
I hope this message finds you well. I wanted to take a moment to share some exciting updates with you all, and more importantly, to thank you for the incredible feedback you've been giving me. Every suggestion, every bug report, every feature request - I read them all, and they genuinely shape the direction of this app.
The Big News: Introducing Live Camera Explorer
After listening to your feedback over the past week, I realized something important: having Room Explorer Mode, Realtime Navigation, and the requests for live AI narration as separate features was creating unnecessary complexity. You told me the navigation felt disjointed, and I heard you loud and clear.
So I've combined everything into one powerful feature: Live Camera Explorer. This brings together room exploration, live object tracking, real-time narration, text detection, and scene awareness into a single, cohesive experience. Instead of jumping between different modes on the homepage, you now have one unified tool that adapts to what you need in the moment.
To streamline the experience, I've removed the separate Room Explorer and Realtime Navigation buttons from the homepage. I've also added "Contact Developer" directly to the main screen because your feedback matters, and I want to make it as easy as possible for you to reach me.
What Live Camera Explorer Can Do:
• Real-time Object Exploration: Drag your finger across the screen and instantly hear what objects are in view with spatial audio cues and voice feedback
• Object Tracking: Double-tap any object to track it with a continuous audio tone that changes with distance and proximity
• Zoom Into Details: Swipe left while tracking to zoom in and explore what's ON or INSIDE an object (like items on a table or what someone is wearing)
• Text Detection Mode: Swipe up to detect and read all text in view - signs, labels, documents, everything
• Scene Summaries: Three-finger tap for a complete spatial overview of your surroundings
• Watch Mode: Two-finger double-tap for live action narration - the AI continuously describes significant events happening in real-time
How We Made It Fast Enough:
The big challenge was speed. Traditional AI analysis took 3-5 seconds per request, which made real-time exploration impossible. Here's how we solved it:
Instead of analyzing the entire frame from scratch every time, the system now captures lightweight frames and sends them for continuous analysis every second in the background. By optimizing the image resolution (1280x720 instead of full 4K), using JPEG compression at 70% quality, and structuring the AI prompt to focus only on object detection with minimal processing, we cut the round-trip time down to approximately 1 second. The AI now returns simple object labels and boundaries instead of lengthy descriptions, which dramatically reduces processing time.
This means when you're exploring with Live Camera, you're getting near-real-time updates. As you change directions or move the camera, give it about a second to refresh - the objects you hear are based on what was in view just a moment ago. It's not perfect, but it's the fastest I've been able to make it work with current AI technology, and I think you'll find it incredibly useful.
Tutorial Included:
Because Live Camera Explorer has so many gestures and capabilities, I've added a comprehensive tutorial that walks you through everything step-by-step. First-time users will see this automatically, and you can always replay it from Settings > Live Camera Tutorial.
AI Mirror Mode - Check Your Appearance Independently
I've also added AI Mirror Mode so you can check how you look before heading out. It uses your front-facing camera with gesture controls. Swipe up with one finger and the AI analyzes your outfit, hair, facial expression, glasses position (if wearing them), and overall presentation - giving you specific, practical feedback like "your hair is slightly messy on the left side" or "your shirt collar is wrinkled." Swipe down to check your framing quality percentage. Swipe right to toggle between full guidance mode and quiet mode. Three-finger swipe left to exit. That's it - straightforward appearance feedback using simple gestures.
Important Notes & Known Issues:
The Start Button: I'll be honest - the start button for Live Camera can be a bit finicky right now. You may need to double-tap it and hold for just a split second to activate. I'm actively working on making this more reliable, but in the meantime, if it doesn't start the first time, just try the double-tap-and-hold technique.
Live Scene Description (Watch Mode): The continuous live narration feature is still being refined. It works, but I'd love to hear your feedback on how well it's detecting and describing actions in real-time. Please let me know what works and what doesn't - your real-world testing is invaluable.
Loading Indicators: I'm trying to implement audio cues when things are loading (like entering Photo Explorer or Live Camera modes), but it's being a bit temperamental. As a workaround, if you tap the middle of the screen during loading, it should announce "loading" or "in progress." Don't worry though - the system will definitely let you know when it's ready for you to explore.
Language Support Fixed: Several of you reported the app randomly speaking in English even when you had another language set, then reverting back to your chosen language. I believe I've patched this issue. The app should now continuously speak in whatever language you've set in Settings > Language & Region. If you still experience this problem, please reach out so I can investigate further.
PWA Caching Issue Resolved: For those using the Progressive Web App version, you may have noticed it was stubbornly holding onto older versions even after updates were released. Here's what was happening: the browser's service worker was aggressively caching app files for offline performance, but wasn't checking for new versions frequently enough. I've updated the cache invalidation strategy to force-check for updates every time you open the app, and clear old cached versions immediately when new ones are available. The PWA should now push the most recent update without requiring manual cache clearing.
Dark Mode Fixed: Dark mode for low vision users should now work consistently across both the web browser version and PWA. The issue was that dark mode styles were being applied at the component level, but when navigating to certain pages (especially Web Browser mode), the iframe and container elements weren't inheriting the dark mode context properly. I've moved the dark mode implementation up to the root layout level so it applies globally across all views.
Bugs & Feedback:
I'm sure there are still some bugs lurking around - that's just the nature of development, especially when trying to push the boundaries of what's possible with AI and accessibility. If you spot anything that doesn't work right, no matter how small, please reach out through the Contact Developer button on the homepage or message me directly or just yell at me on this thread. Every bug report helps make this better for everyone.
Your Feedback Matters:
I can't stress this enough - your suggestions and experiences directly shape this app. The Live Camera Explorer exists because you told me what you needed. The tutorial exists because you asked for better guidance. The language fixes, the dark mode improvements, the streamlined homepage - all of it came from listening to you.
So please, keep the feedback coming. Tell me what works, what doesn't, what you wish existed, what frustrates you. I read every message, and I'm constantly thinking about how to make this app serve you better.
Thank You:
Finally, I just want to say thank you. Thank you for your patience as I work through these complex features. Thank you for your detailed bug reports. Thank you for your enthusiasm and encouragement. Thank you for trusting me with something as important as your independence and accessibility.
You all are absolutely amazing, and knowing that this app makes even a small difference in your daily lives keeps me motivated to keep improving it. I'm working as hard as I can to make this the best AI vision assistant possible, and your support means the world to me.
Here's to making the world more accessible, one update at a time.
With gratitude, Stephen.
P.S. - As always, I'm here if you need anything. Don't hesitate to reach out.
Great features!! With some issues.
Hello, I really liked the new features implemented, I’m just having a few small issues here. The first one is that in this live camera explorer mode, the system is giving me descriptions in English instead of my language, which is Portuguese, and the second is that when I double-tap with two fingers, it doesn’t give me the real-time environment narration — it only says that this mode has been activated but doesn’t return any description
@ GuilhermE
In regards to your language, please try resetting Portuguese in your settings menu. It seems to be working on this end so you might just have to reset it. As for the two finger double tap, you’re only supposed to get descriptions when something happens in front of you like somebody picking up something or someone opening a door. It’s really only supposed to react during any sort of action sequence however, I’ll give this feature more testing tomorrow and get back to you 😊. Thanks for the feedback.
Support line
I now have a dedicated support line for Vision AI assistant.
If you need help or want to give feedback you can reach me at the toll-free number below:
(866) 825-6177
Cheers! :).
Stephen, Re: Meta Smart Glasses
Hi Stephen,
I asked this a while back, but never heard back. So I thought I would try again. In your pwa, there was an option to sync Meta smart glasses to the AI, but the steps require a head mount. May I ask why?
@ Brian
Oh my I’m so sorry if I missed that! So a couple reasons: web apps can’t access external devices and even if I was able to do a native iOS app, meta has restrictions on what applications are allowed to access the glasses camera. I was looking into it earlier and I knew that that question was gonna come up quite a bit, so that’s why I put all of the information about the Meta glasses in the app. I’m hoping one day we are able to use the camera, I think a lot of it is going to depend on when and if Meta opens it up for third-party access.
Re: Meta access
Fair enough. Hopefully their toolkit they are releasing, or have released, will allow for this.
@ BriaN
Oh you better believe I’m keeping a close eye on that one! I have so many ideas for you guys if and when that happens!
Looking into native iOS app
i’ve gotten a lot of questions about when will this be turned into a fully native iOS app? The answer is I have no idea. I’ve been looking into it and apparently I need a Mac…. i’ve never used a MacBook I’ve always used windows. I was looking at purchasing a Mac and oh my word are they pricey lol. I did find some m1s for around 700 CAD but they only have 8 GB of ram. So I guess I have a few questions for my Mac users. How does the M1 chip hold up? Is it still supported with latest upgrades? Is 8 GB of RAM even going to be enough? How accessible is Xcode with voiceover? I need to know all things MacBook lol. Is Xcode easy to learn? You guys are phenomenal. Thanks so much for the support. It looks like we’re gaining just over 100 new users per week.
Black Friday
BF sales are already happening on Amazon, and you can find some really good deals on ebay as well for Refurbished models, which I would recommend over simply getting a used device. 🙂
Some examples for you...
2022 MacBook Air 13" M2 (8-Core GPU) 16GB RAM 256GB SSD $599
2024 Mac mini Desktop M4 chip 16GB Memory 256GB SSD Silver $479
@ Brian
Thanks so much for those. I’m leaning towards the MacBook, would 8 gigabytes of ram be enough? I’m also a little torn on that one because it’s 160 some odd dollars in shipping to Canada but even still, it is the M2 chip so even still it’s not the worst deal.
Power and Portability
If you are asking about the MacBook Air I listed above, that particular model has 16gb of ram. You really don't want 8gb of ram these days. I mean it is useable, but 16 or above is standard nowadays. On a side note, that mini above would probably be a better workhorse for you, but in the end it all depends on your needs and use case.
HTH.
great work
I've just briefly tested and this is great work. I never knew apps built with platforms like Base44 could be so powerful! My initial feedback is that the social media functions should be stripped out all together, the very last thing the world needs is yet another social platform, and the last thing the blind community needs is a specialised, segregated social platform. I would remove all of those features. Also, having a setting to toggle the voice verbosity would be great. For example something that can only announce results, rather than functions. I'm testing on a Mac, haven't tried on a touch-based device yet but will do so and report back.
New feature
I will be dropping a new feature this weekend. Can anyone guess what it is? Happy Thanksgiving to all of my American friends!
Sorry for the delay
Hey guys sorry for the delay. Been testing this feature and it’s fighting me lol. I will have to delay the feature for now and try to get it out by next Sunday :).
Never mind my earlier comment it’s here!
OK, so I have managed to implement a physical book reader. You should now be able to pick up any print book and read it. Best thing is, I was able to implement the 11 labs API key so you can listen to the print book in your preferred voice! I’ve been reading my dungeon crawler Carl physical book set using this feature. It feels good lol. I will try to speed up the 11 labs response, but it is analyzing a whole page. Once you finish scanning a page and then you double tap with two fingers to read, 11 labs will take 40 to 50 seconds to load. Granted it is not the most optimal, definitely gonna work on speeding that up if I can.
book reader
I mean, I wouldn't mind Eloquence reading it if it shortens the wait time.
@ Devin Prater
The option for system voices can be used as well in the settings of the book reader which should decrease the wait time. I can probably decrease the wait time with the 11 labs API, I’m just gonna have to tinker with it a bit lol.
Wow
So I’m going to remove the 11 labs API and switch to open AIAPI for the book reader. That 11 labs thing is super costly! OpenAI charges $15 where 11 laps would charge $300. So yeah let’s get that out of there lol.
Voices
Both Open AI and gemini have some cool voices. I don't know, if you could have gemini in there, I guess it'd be even more cost-effective.
@Stephen, Re: Open AI
Hi,
There is an add-on for NVDA called AI Content Describer. I mention this because in the settings for this add-on, is a list of LMs that we can use, including something from Open AI called, "Pollinations".
According to the literature it is a free LM to use without requiring an API key.
Just thought you might be able to look into this as a viable option. 😊
https://mcpmarket.com/server/pollinations-2
https://pollinations.ai/
@ Gokul and @brian
Thank you both for your suggestions. I’m looking into them now 😊.