Something Different is Coming
A Progressive Web App for Blind and Visually Impaired Users | Works on All Smartphones
I need to tell you about something I built. Not because it's "the best" or "revolutionary" – everyone says that. But because it works in a way that genuinely surprised me when I tested it.
The Problem I Kept Running Into
You know the drill with most vision AI apps:
Point your camera → AI speaks a sentence → that's it.
"It's a living room with a couch and a table."
Cool. But where's the couch exactly? What color? How far? What else is there? Can you tell me about that corner again?
You have to point again. Ask again. Wait again. Listen again.
You're always asking. The AI is always deciding what matters. You never get to just... explore.
What If Photos Worked Like Books?
Stay with me here.
When someone reads you a book, you can say "wait, go back." You can ask them to re-read that paragraph. You can spend five minutes on one page if you want. You control the pace of information.
But photos? Someone gives you one description and that's it. Take it or leave it. They decided what's important. They decided what to mention. They decided when you're done.
We thought: What if photos worked like books?
What if you could explore them at your own pace? Go back to parts that interest you? Discover details the other person missed? Spend as long as you want?
The 6×6 Grid: Your Photo, Your Exploration
Here's what we built:
Upload any photo. Any photo at all.
The AI divides it into 36 zones – a 6×6 grid covering every inch of the image.
Now drag your finger across your phone screen like you're reading a tactile graphic.
What This Actually Feels Like:
You're exploring a photo of your living room:
Start in the top-left corner – drag your finger there:
"Smooth cream-colored wall with matte finish, cool to imagine touching, painted evenly"
Slide your finger right:
"Large window with soft natural light streaming through, sheer white curtains that would feel delicate and silky between your fingers"
Down a bit:
"Polished oak coffee table, glossy surface that would feel smooth and slightly cool, rich honey-brown color"
To the left:
"Plush beige carpet, deep pile that looks like it would feel soft and springy underfoot, slightly worn in the center from foot traffic"
Wait, go back to that window – drag back up:
"Large window with soft natural light streaming through, sheer white curtains..."
You're in control. You decide what to explore. You decide how long to spend. You decide what matters.
Go to the bottom-right corner – what's there?
"Wooden bookshelf against the wall, dark walnut finish with visible grain, would feel smooth with slight ridges"
Move to the zone right above it:
"Books lined up on shelf, various colored spines, some leather-bound that would feel textured and aged"
This Changes Everything
You're not being told about the photo.
You're exploring it.
You can go back to that window five times if you want. You can ignore the couch and focus on the corner. You can trace the room's perimeter. You can jump around randomly.
It's your photo. You explore it your way.
And here's the thing: the information doesn't disappear. It's not one-and-done. It stays there, explorable, for as long as you want.
Now Take That Same Idea and Put It in Physical Space
You walk into a hotel room at midnight. You're exhausted. Strange space. No idea where anything is.
Usually? You either stumble around carefully, or ask someone to walk you through, or just... deal with it till morning.
New option:
Point your camera. Capture one frame. The AI maps it into a 4×4 grid.
Now drag your finger across your screen:
• Top-left: "Window ahead 9 feet with heavy curtains"
• Slide right: "Clear wall space"
• Keep going: "Closet with sliding doors 8 feet on the right"
• Bottom-left: "Clear floor space"
• Center-bottom: "Bed directly ahead 5 feet, queen size"
• Bottom-right: "Nightstand right side 4 feet with lamp and alarm clock"
You just mapped the entire room in 30 seconds. Without taking a step. Without asking someone. Without turning on any lights.
Want to know what's on the left side again? Drag your finger back over there. Want to double-check the right? Drag there.
The information stays right there on your screen. You can reference it. You can re-explore it. You can take your time understanding the space.
The Core Difference
Most apps: Point → Wait → AI decides what to tell you → Move on → Repeat
This app: Explore → Control the pace → Discover what matters to YOU → Information persists → Return anytime
That's not a small difference. That's a fundamentally different interaction model.
You're Not a Passive Receiver
You're an active explorer.
You don't wait for the AI to decide what's important in a photo. You decide which zone to explore.
You don't lose the room layout the moment it's spoken. It stays mapped on your screen.
You don't get one chance to understand. You can explore as long as you want, go back, re-check.
This is what "accessible" should actually mean: Not just access to information, but control over how you receive and interact with it.
I have big plans for this feature to expand it as well.
Oh Right, It Also Does All The Normal Stuff
Because yeah, sometimes you just need quick answers.
Live Camera Scanning
Point anywhere, AI describes continuously:
• Quiet Mode: Only speaks for important stuff (people, obstacles, hazards)
• Detailed Mode: Rich ongoing descriptions
• Scans every 2-4 seconds
• Remembers what it already said (no repetition)
Voice Questions - Just Ask
No buttons. Just speak:
• "What am I holding?"
• "What color is this shirt?"
• "Read this label"
• "Is the stove on?"
• "Describe what you see"
• "What's on my plate?"
Always listening mode – ready when you are.
Smart Search (Alpha)
"Find my keys"
AI scans rapidly and guides you:
• "Not visible – turn camera left"
• "Turn right, scan the table"
• "FOUND! On counter, left side, about 2 feet away"
⚠️ Alpha: Still being worked on.
Face Recognition: Alpha
Save photos of people → AI announces when seen:
"I see Sarah ahead, about 8 feet away"
Totally optional. Enable only if wanted.
Object Tracking: Alpha
Tell AI to watch for items:
"Keep an eye out for my phone"
Later: "Where did you last see my phone?"
→ "On kitchen counter, 22 minutes ago"
Meal Assistance
Food positioned using clock face:
"Steak at 3 o'clock, potatoes at 9 o'clock, broccoli at 12 o'clock"
Plus descriptions: portion sizes, cooking level, colors, textures.
Reading Mode: Alpha
Books and documents:
• Voice commands: "Next page", "Previous page", "Repeat", "Read left page", "Read right page"
• Speed controls: "Read faster" / "Read slower" (instant adjustment)
• "Check alignment" (ensures full page visible)
• Auto-saves progress per book
• Resume exactly where you stopped
Social Cue Detection: Alpha
Optional feature detecting if people are:
• Making eye contact with you
• Waving or gesturing toward you
• Trying to get your attention
Fully Customizable
Pre-set profiles or build your own:
• Scanning frequency (2-4 seconds)
• Detail level (Basic / Standard / Maximum)
• Voice speed (0.5× to 2×)
• Auto-announce settings
• Feature toggles
Why This is a Web App, Not an App Store App
Honest reason: We want to ship features fast, not wait weeks for approval.
Better reason:
App stores are gatekeepers. Submit update → wait 1-2 weeks → maybe get approved → maybe get rejected for arbitrary reasons → users manually update → some users stuck on old versions for months.
Progressive Web Apps are different:
Bug discovered? Fixed within hours. Everyone has it immediately.
New feature ready? Live for everyone instantly.
AI model improved? Benefits everyone right away.
No approval process. No waiting. No gatekeepers.
Plus it works everywhere:
• iPhone ✓
• Android ✓
• Samsung ✓
• Google Pixel ✓
• Any modern smartphone ✓
Same features. Same performance. Same instant updates.
Installation takes 15 seconds:
1. Open browser
2. Visit URL
3. Tap "Add to Home Screen"
4. Appears like regular app
Done.
Privacy (The Short Version)
• Camera images analyzed and discarded – not stored
• Voice processed only during active questions
• Face recognition optional
• Data encrypted
• Delete everything anytime
Critical Safety Disclaimer:
AI makes mistakes. This is NOT a replacement for your cane, guide dog, or O&M training. Never rely on this alone for safety decisions. It's supplementary information, not primary navigation.
When Does This Launch?
Soon.
Final testing in progress.
When we officially release, you will have all features even though some of the app and it's features will still be in beta.
The Real Point of All This
For years, accessibility apps have operated on this assumption:
"Blind people need information. we'll give it to them efficiently."
Fine. But also... what if I flipped it:
"Blind people want to explore. They want control. They want information that persists. They want to discover things their way."
That's what I built.
Not "here's a sentence about your photo" but "here's 36 zones you can explore for as long as you want."
Not "here's a description of this room" but "here's a touchable map that stays on your screen."
Information that persists. Exploration you control. Interaction you direct.
That's the difference.
One Last Thing
The photo grid gives you 36 descriptions per image. Detailed, sensory, rich descriptions.
So when it comes out, watch people explore single photos for 5-10 minutes.
Going back to corners. Discovering details. Building mental images. Creating memories of the image.
That's not just making photos accessible.
That's making photos explorable.
And I think that's better.
Coming Soon
Progressive Web App
Works on All Smartphones
Built for exploration, not just description
What do you think? Which feature interests you most? Questions? Thoughts? Comments below.
Comments
What Features Interest me?
All of the above!
The fact more apps from an accessibility standpoint (me being an Android user) aren't web apps, your idea is a golden glass of fresh air here!
@ Trenton Matthews
Thank you so much. I did get very little sleep when getting the foundation built lol.
May be looking for beta testers
I may be looking for beta testers in the near future more specifically for the android side. I have an iPhone myself so I can test the iPhone features so let me know if this interests you 😊.
Agreed
Making this a universally accessible application is boss! I for one cannot wait to give it a try. 😎👍
Android beta tester
Good job!
I am ready to beta test it!
@ Brian
Thanks so much. Right now I’m just building the integration for its own self screen reader. That way you can just turn off their screen reader when using the app itself. Using it myself I don’t like that I have my screen reader going plus the screen reader in the app also going. I’m also working on when feeling through your photos, you can actually tap on an item and it’ll expand that item so you can explore that item like a bookshelf for example.
Look forward to it
Stephen, it sounds like a good way to do it. I've been having to ask questions of AI in a grid, or to be exact, several different kinds of grids such as thirds, to understand pictures I take and edit.
An issue I have with another explore by touch AI app is that there is no indication of where the image ends at the top and bottom of the iPhone screen, and as I work with several different aspect frames with lots of blue sky that the AI does not speak, I have to guess a lot. I would think this would not be an issue in your grid system.
I find myself constantly having to ask if the edge of the image cuts part of a bird or other critter off that the AI has said is in the picture, they almost never say this up front. I also have to ask a lot if something is in focus with one of my AI describers.
This reminds me of an app called Image-Explorer
@OldBear, what's this other explore-by-touch AI app you mentioned, if I may ask?
@ Enes Deniz
It should work wherever a touchscreen is implemented. But it would be interesting to see if it actually functions the way it’s supposed to, but so far everything is functioning extremely well. I remember image explorer… It was a pretty weak app. I can also adjust things for you guys on the fly if something seems broken or buggy or you want it to react differently.
Audio representations?
You can't add the option to provide audio feedback as the user moves the finger around the screen, right? You know, the type and timbre, volume and other characteristics may indicate certain visual properties. I also thought of 3-D audio or spoken and perhaps even haptic feedback but those might be more challenging to implement. I acknowledge that audio feedback requires the app to treat the image as a whole as the user should hear continuous beeps or loops or blips or whatever as (s)he moves the finger across the screen and as colors shift and the level of brightness fluctuates etc. so you might need to develop a new underlying approach to redesign the interface for this to work properly.
@ Enes Deniz
I’m working on haptic features for different textures etc etc but that’s gonna be really hard to get going I think it being a web app and all. But the descriptions, for example when I upload a photo of my dog, I can feel where his ears are, his nose is, his eyes, the voice also gives the expression of his eyes and what his nose looks like. I really wanna try to give everyone an actual experience with a photo and not just hearing an AI‘s description of the entire photo. You should be able to explore it… I also am implementing a Zoom feature where if you double tap on a certain portion of let’s say the dogs nose, you can just explore his entire nose. It does work better with bookshelves but right now I only have pictures of my dog lol. I also have a bunch of sunrise and sunset photos and it’s really cool being able to go through and really feel the sky.
@ Enes Deniz
I was talking about the Seeing AI app, in Descriptions>Brows Photos, or something like that. There's an Explore option that gives haptic and audio feedback on some of the larger objects in a photo if the process recognizes them.
Requesting a long shot...
Will this have Braille support, for those persons whom are both deaf and blind?
@Stephen
So is it possible to add the option to zoom in or out and let the user explore by smaller or larger units/distances, even pixels? You know, this will be quite handy if you somehow implement audio cues/beeps. So what I'm talking about is something like a combination of different methods usable simultaneously. Let's say you're exploring a photo featuring a person. You'd get more detailed audio feedback as you slide your finger, but only when your finger moved over a different body part or clothing would you get spoken feedback. So this will require that the app detect individual objects and describe them only while your finger is on them, by taking into account the size and location of each object, rather than dividing each and every image into the same number of zones and treating every image as a grid. One object may span multiple zones on that artificial grid, or it might be so small that it fits within one zone, so that system unfortunately didn't sound so realistic and effective to me. The alternate method I'm proposing is more like that found in Image-Explorer in that respect. The audio cues should also be heard more naturally and continuously, so representing an image as a grid may prevent that. Let me try a different explanation to clarify my point further: Exploring an image represented as a grid sounds like navigating a table with a certain number of columns and rows. So it's more of jumping from one cell to an adjacent one as you slide your finger than exploring the entire image as a whole.
@ Enes Deniz
I am working to see if I can implement your suggestions right now 😊. Standby.
Sure thing.
Well, apparently this is where your app will excel. Whenever we have a suggestion, bug report etc., we just fire away and you take care of everything without ever dealing with app store policies, having to submit your updates and wait for them to be approved.
@ Enes Deniz
Implementation successful. I’m sure it could be better but it’s one heck of a start!
@ Enes Deniz
Imagine a blind person who's
never "seen" their child's face as an example. You can now Feel the shape of their nose, Count their teeth, Explore their smile lines etc etc. it’s pretty tough to feel the exact size of Let’s say an adult on a screen, but I’m hoping the Zoom feature can help with that as well a little. The Zoom feature is playing a little bit hard to get, but I’ll get it.
@ Brian
I’m not against adding braille support at all. The problem with that is going to be what display they’re using and whether or not I can get it to work on displays. I’m not sure how I can implement that effectively where they can entirely feel the braille that’s representing ears, nose, eyes etc etc plus with all the zoom features. I would be curious to know if web apps are just accessible for braille display users anyways?
@Stephen
Now that I've left you to deal with my volley of suggestions, I'm now beginning to think of how many different scenarios in which this app would be highly useful, from exploring the world map to taking or finding a photo of a street to get a better overview for easier navigation or examining a photo taken by a friend and posted on social media in detail.
@ Enes Deniz
lol. I have so many ideas for this app and for you guys.
Ya, that
That's great.
What Enes Deniz describes, and I guess is now implemented, is much like what the Seeing AI Explore feature is, though that is very limited.
I use it for when, for example, I am cropping a picture of a bird with its wings spread to make a photo printout on a specific size paper, and I need to be sure the bird is large enough and in the desired spot without it being cropped by the edges. I locate the bird, say it's in portrate orientation, and run my finger across the screen over and over until I have a good idea of how it is relative to the sides of the picture. Doesn't work with top and bottom as well, but I can at least tell if it is in the top half or bottom half.
Having more specific details included in that would be a game changer.
@ OldBear
You tell me what you need and I’ll do my best to make it happen 😊.
sounds amazing but remember ....
this sounds amazing but remember we need our imaginations to "see" an arm, "see" a plant, "see" what we are doing say, counting teeth, it will require people to have exceptional spacial awareness, imagine a day which will never come in my lifetime where you can "physiclly" somehow? feel a photo. remember we are examining the screen which is fabulous, but it will require people i guess, to imagine they are "in" the photo, does that make sense?
so, if say you have a picture of a dog, in a living-room you'd need I guess to imagine you are in that living-room physically to "feel" how the photo looks. I look forward to trialing this, will it be this year I wonder?
I would have liked to explore my mums house decorations, in particular her Christmas tree it huge apparently lol. it will be great to get a "feel" for my childrens faces as well, yes I can touch them but it will be great to get a feel for it.
I love the idea of the food as well, say if we are in a restaurant if it can say your steak is at 2 o'clock your fries/chips in the UK, chips I mean are say 6 o'clock, for those who value that, it will be fabulous.
@ Karok
I hear ya. With the way I have it set up. You can feel the entire room. I’m also working on those sound cues that you can trace so you can feel how big an object is, you can also zoom in on a specific object and just explore that object. So let’s take your mom‘s Christmas tree. When you take a photo of the room or she sends you a photo of the room, you can move your finger across the screen to find the Christmas tree. You can feel its shape and size as much as possible with sound cues, tap on the tree, then you’ll be able to feel the tree with all of the ornaments on the branches. Then, you can actually tap on each ornament and feel it thru sound and description. Right now I have two levels of Zoom programmed into it. I’m hoping to show it off within the next week or so… There may be a little delays due to me fixing bugs because unlike a lot of companies if I’m going to release something, I wanna make sure it at least functions decently lol.
This sounds really really awesome
I have to say, I find this app that you’re talking about to be quite good. Could this thing help me identify menus on my Casio CTS 1000 V keyboard? I have trouble with the menus because there’s no speech or clicks or beeps or anything. I had to use ally to help me pair the Bluetooth connection so I could use the speakers as a sort of audio speaker to stream stuff from my phone to it. It would also be cool if he could read me what styles are on the display because this has no numbers either, it’s buttons a dial and more buttons. I have an idea as to what some of the buttons do. But the menus and learning what styles are what is a little bit difficult except for the pop styles and the rock styles
@ Exodia
Perhaps. Let me just finish getting the core features out at least and then I could work on lots of other features. The only two main ones I’m having problems with right now is location accuracy, search and well… I guess I haven’t tested the reading yet so I can’t say that’s a problem but everything else seems to be working pretty smoothly. Right now I’m just editing the finishing touches and trying to make that AI stay on point when you’re browsing through your photo.
You can try it out here.
I’ve decided to open this up for a public alpha beta trial. I want to be upfront about something that matters. This project is expensive to build and keep running. It costs me close to three hundred Canadian dollars every month just to maintain everything behind the scenes. I cover it by working full time, which is fine for now, although it limits how fast I can push new features.
I want people to try it without barriers, so the alpha beta will stay public for a little while. It will not stay open forever because the costs add up quickly. I am exploring options for the future, whether that is donations or a small subscription model. I want to find something that works for everyone. If this takes off and the community shows real interest, I would look at reducing my work hours so I can put more time into development.
I appreciate everyone who tests this, gives feedback, or even shows curiosity. This community can be tough to impress and I mean rightfully so, I know I am, which is exactly why I want your honest reactions. And please don’t tell me it doesn’t know your correct location… I know. It’s a thorn in my side lol. Also the search function in the conversation mode doesn’t work quite yet…it’s something I’m working on. TBH, I got a little hyper focused on photo exploration lol. You can find the link below.
http://visionaiassistant.com