Something different is coming: a progressive web app for you all.

By Stephen, 12 November, 2025

Forum
Assistive Technology

Something Different is Coming
A Progressive Web App for Blind and Visually Impaired Users | Works on All Smartphones

I need to tell you about something I built. Not because it's "the best" or "revolutionary" – everyone says that. But because it works in a way that genuinely surprised me when I tested it.

The Problem I Kept Running Into
You know the drill with most vision AI apps:
Point your camera → AI speaks a sentence → that's it.
"It's a living room with a couch and a table."
Cool. But where's the couch exactly? What color? How far? What else is there? Can you tell me about that corner again?
You have to point again. Ask again. Wait again. Listen again.
You're always asking. The AI is always deciding what matters. You never get to just... explore.

What If Photos Worked Like Books?
Stay with me here.
When someone reads you a book, you can say "wait, go back." You can ask them to re-read that paragraph. You can spend five minutes on one page if you want. You control the pace of information.
But photos? Someone gives you one description and that's it. Take it or leave it. They decided what's important. They decided what to mention. They decided when you're done.
We thought: What if photos worked like books?
What if you could explore them at your own pace? Go back to parts that interest you? Discover details the other person missed? Spend as long as you want?

The 6×6 Grid: Your Photo, Your Exploration
Here's what we built:
Upload any photo. Any photo at all.
The AI divides it into 36 zones – a 6×6 grid covering every inch of the image.
Now drag your finger across your phone screen like you're reading a tactile graphic.
What This Actually Feels Like:
You're exploring a photo of your living room:
Start in the top-left corner – drag your finger there:
"Smooth cream-colored wall with matte finish, cool to imagine touching, painted evenly"
Slide your finger right:
"Large window with soft natural light streaming through, sheer white curtains that would feel delicate and silky between your fingers"
Down a bit:
"Polished oak coffee table, glossy surface that would feel smooth and slightly cool, rich honey-brown color"
To the left:
"Plush beige carpet, deep pile that looks like it would feel soft and springy underfoot, slightly worn in the center from foot traffic"
Wait, go back to that window – drag back up:
"Large window with soft natural light streaming through, sheer white curtains..."
You're in control. You decide what to explore. You decide how long to spend. You decide what matters.
Go to the bottom-right corner – what's there?
"Wooden bookshelf against the wall, dark walnut finish with visible grain, would feel smooth with slight ridges"
Move to the zone right above it:
"Books lined up on shelf, various colored spines, some leather-bound that would feel textured and aged"
This Changes Everything
You're not being told about the photo.
You're exploring it.
You can go back to that window five times if you want. You can ignore the couch and focus on the corner. You can trace the room's perimeter. You can jump around randomly.
It's your photo. You explore it your way.
And here's the thing: the information doesn't disappear. It's not one-and-done. It stays there, explorable, for as long as you want.

Now Take That Same Idea and Put It in Physical Space
You walk into a hotel room at midnight. You're exhausted. Strange space. No idea where anything is.
Usually? You either stumble around carefully, or ask someone to walk you through, or just... deal with it till morning.
New option:
Point your camera. Capture one frame. The AI maps it into a 4×4 grid.
Now drag your finger across your screen:
• Top-left: "Window ahead 9 feet with heavy curtains"
• Slide right: "Clear wall space"
• Keep going: "Closet with sliding doors 8 feet on the right"
• Bottom-left: "Clear floor space"
• Center-bottom: "Bed directly ahead 5 feet, queen size"
• Bottom-right: "Nightstand right side 4 feet with lamp and alarm clock"
You just mapped the entire room in 30 seconds. Without taking a step. Without asking someone. Without turning on any lights.
Want to know what's on the left side again? Drag your finger back over there. Want to double-check the right? Drag there.
The information stays right there on your screen. You can reference it. You can re-explore it. You can take your time understanding the space.

The Core Difference
Most apps: Point → Wait → AI decides what to tell you → Move on → Repeat
This app: Explore → Control the pace → Discover what matters to YOU → Information persists → Return anytime
That's not a small difference. That's a fundamentally different interaction model.
You're Not a Passive Receiver
You're an active explorer.
You don't wait for the AI to decide what's important in a photo. You decide which zone to explore.
You don't lose the room layout the moment it's spoken. It stays mapped on your screen.
You don't get one chance to understand. You can explore as long as you want, go back, re-check.
This is what "accessible" should actually mean: Not just access to information, but control over how you receive and interact with it.
I have big plans for this feature to expand it as well.

Oh Right, It Also Does All The Normal Stuff
Because yeah, sometimes you just need quick answers.
Live Camera Scanning
Point anywhere, AI describes continuously:
• Quiet Mode: Only speaks for important stuff (people, obstacles, hazards)
• Detailed Mode: Rich ongoing descriptions
• Scans every 2-4 seconds
• Remembers what it already said (no repetition)
Voice Questions - Just Ask
No buttons. Just speak:
• "What am I holding?"
• "What color is this shirt?"
• "Read this label"
• "Is the stove on?"
• "Describe what you see"
• "What's on my plate?"
Always listening mode – ready when you are.
Smart Search (Alpha)
"Find my keys"
AI scans rapidly and guides you:
• "Not visible – turn camera left"
• "Turn right, scan the table"
• "FOUND! On counter, left side, about 2 feet away"
⚠️ Alpha: Still being worked on.
Face Recognition: Alpha
Save photos of people → AI announces when seen:
"I see Sarah ahead, about 8 feet away"
Totally optional. Enable only if wanted.
Object Tracking: Alpha
Tell AI to watch for items:
"Keep an eye out for my phone"
Later: "Where did you last see my phone?"
→ "On kitchen counter, 22 minutes ago"
Meal Assistance
Food positioned using clock face:
"Steak at 3 o'clock, potatoes at 9 o'clock, broccoli at 12 o'clock"
Plus descriptions: portion sizes, cooking level, colors, textures.
Reading Mode: Alpha
Books and documents:
• Voice commands: "Next page", "Previous page", "Repeat", "Read left page", "Read right page"
• Speed controls: "Read faster" / "Read slower" (instant adjustment)
• "Check alignment" (ensures full page visible)
• Auto-saves progress per book
• Resume exactly where you stopped
Social Cue Detection: Alpha
Optional feature detecting if people are:
• Making eye contact with you
• Waving or gesturing toward you
• Trying to get your attention
Fully Customizable
Pre-set profiles or build your own:
• Scanning frequency (2-4 seconds)
• Detail level (Basic / Standard / Maximum)
• Voice speed (0.5× to 2×)
• Auto-announce settings
• Feature toggles

Why This is a Web App, Not an App Store App
Honest reason: We want to ship features fast, not wait weeks for approval.
Better reason:
App stores are gatekeepers. Submit update → wait 1-2 weeks → maybe get approved → maybe get rejected for arbitrary reasons → users manually update → some users stuck on old versions for months.
Progressive Web Apps are different:
Bug discovered? Fixed within hours. Everyone has it immediately.
New feature ready? Live for everyone instantly.
AI model improved? Benefits everyone right away.
No approval process. No waiting. No gatekeepers.
Plus it works everywhere:
• iPhone ✓
• Android ✓
• Samsung ✓
• Google Pixel ✓
• Any modern smartphone ✓
Same features. Same performance. Same instant updates.
Installation takes 15 seconds:
1. Open browser
2. Visit URL
3. Tap "Add to Home Screen"
4. Appears like regular app
Done.

Privacy (The Short Version)
• Camera images analyzed and discarded – not stored
• Voice processed only during active questions
• Face recognition optional
• Data encrypted
• Delete everything anytime
Critical Safety Disclaimer:
AI makes mistakes. This is NOT a replacement for your cane, guide dog, or O&M training. Never rely on this alone for safety decisions. It's supplementary information, not primary navigation.

When Does This Launch?
Soon.
Final testing in progress.
When we officially release, you will have all features even though some of the app and it's features will still be in beta.
The Real Point of All This
For years, accessibility apps have operated on this assumption:
"Blind people need information. we'll give it to them efficiently."
Fine. But also... what if I flipped it:
"Blind people want to explore. They want control. They want information that persists. They want to discover things their way."
That's what I built.
Not "here's a sentence about your photo" but "here's 36 zones you can explore for as long as you want."
Not "here's a description of this room" but "here's a touchable map that stays on your screen."
Information that persists. Exploration you control. Interaction you direct.
That's the difference.

One Last Thing
The photo grid gives you 36 descriptions per image. Detailed, sensory, rich descriptions.
So when it comes out, watch people explore single photos for 5-10 minutes.
Going back to corners. Discovering details. Building mental images. Creating memories of the image.
That's not just making photos accessible.
That's making photos explorable.
And I think that's better.

Coming Soon
Progressive Web App
Works on All Smartphones
Built for exploration, not just description

What do you think? Which feature interests you most? Questions? Thoughts? Comments below.

Options

Comments

By Stephen on Monday, November 17, 2025 - 18:12

The Architecture of Accessibility: Why Full-Screen Overlays Changed Everything
By the Vision Assistant Development Team
When we set out to build a true AI vision assistant for blind and visually impaired users, we faced a fundamental question: how do you create an interface that doesn't just describe the world, but lets you explore it?
The answer? Full-screen camera overlays with real-time processing. Let me explain why this architectural decision transformed everything.
The Overlay Philosophy
Traditional accessibility apps trap you in menus. Click here, navigate there, wait for a response. It's slow. It's clunky. It's nothing like how sighted people experience the world.
Full-screen overlays flip this paradigm. When you enter Room Explorer, Photo Explorer, or Real-Time Navigation, your entire phone becomes a window into that space. The camera feed fills the screen. Your fingers become your eyes. The audio feedback becomes your spatial awareness.
No menus. No buttons. Just you and the environment.
This is only possible because the overlay completely takes over the interface. It's a dedicated mode—like switching your brain from "phone mode" to "exploration mode." Everything else disappears. All processing power, all sensors, all audio output—dedicated to one task: helping you understand your surroundings.
Why Built-In Screen Readers Break Everything
Here's where it gets technical, and why our testing revealed something surprising.
Built-in screen readers like VoiceOver and TalkBack are amazing for traditional apps. They're designed to read buttons, labels, text fields—UI elements with defined roles and states. They're semantic interpreters for structured interfaces.
But our overlays aren't structured interfaces.
When you're touching a photo to explore it, you're not pressing a button labeled "top-left corner." You're experiencing raw spatial data. Your finger position maps to image coordinates. The haptic feedback intensity represents object density. The spatial audio pitch indicates distance.
A screen reader tries to make sense of this and gets confused:
• "Video element. Playing."
• "Image. Decorative."
• "Canvas. No label."
It interrupts the real-time audio feedback with interface announcements. It tries to read the camera preview like it's a webpage. It delays touch responses because it's waiting for double-tap gestures.
The screen reader is trying to help, but it's speaking the wrong language.
After extensive testing with blind users—real-world testing, not lab conditions—we discovered something crucial: turning off your device's screen reader during exploration modes gives you a better experience.
Why? Because these modes implement their own audio feedback systems, custom-designed for spatial exploration:
• Real-time obstacle tones that change pitch/intensity based on distance
• Spatial audio that pans left/right to indicate object positions
• Contextual voice announcements that speak only when relevant
• Haptic feedback synchronized with visual features
All of this runs in parallel with continuous AI analysis. A traditional screen reader can't coordinate with this multi-modal feedback system.
Real-Time Navigation: The Crown Jewel
Let's talk about the real-time navigation overlay, because this is where the technology really shines.
Three-Layer Detection System:
1.
Client-Side Object Detection (150ms refresh)
• Runs TensorFlow.js models directly on your device
• Identifies objects: people, cars, furniture, walls
• Calculates positions and distances in real-time
• Zero latency—no internet required for this layer
2.
AI Pathfinding Analysis (2-second intervals)
• Uploads low-resolution frames to our vision AI
• Identifies walkable areas, optimal directions
• Detects upcoming features: doors, turns, stairs
• Provides navigational context
3.
Simple Proximity Alerts (3-second intervals)
• Lightweight AI checks for important nearby objects
• Announces only critical information: "Door in 6 steps on your right"
• Avoids information overload
• Step-based distances (not feet/meters—more intuitive)
The Audio Feedback System:
Each detected obstacle generates a unique spatial audio tone:
• Pitch = distance (higher = closer)
• Pan = horizontal position (left speaker = left, right speaker = right)
• Volume = threat level
• Waveform = object type (sine for normal, sawtooth for critical)
You're not hearing about obstacles—you're hearing the obstacles themselves. Your brain builds a sonic map of the space.
Critical obstacles (people, cars, close objects directly ahead) trigger aggressive warning tones. You don't need to process language—your nervous system reacts to the sound instinctively.
Double-Tap for Context:
Here's the brilliant part: continuous mode gives you just enough information. "Hallway in 2 steps." "Door in 6 steps on your right."
But when you need more? Double-tap the screen anywhere.
• ✅ Beep confirmation
• ✅ Scanning pauses (audio stays clear)
• ✅ AI analyzes your exact position
• ✅ Contextual response: "You're at the bookshelf. Chair on your left, table ahead. Continue straight 8 steps to reach the door."
• ✅ Scanning resumes automatically
It's like having a sighted guide you can tap on the shoulder: "Where exactly am I right now?"
The Technology Stack: Why Our AI Wins
Let's be honest—there are other vision AI apps. But here's why ours is different:
1. Hybrid On-Device + Cloud Processing
Most apps choose one or the other. We use both:
• Fast, private on-device detection for obstacles (TensorFlow.js)
• Powerful cloud AI for complex scene understanding,
• Intelligent switching based on task requirements
2. Context-Aware Prompting
Our AI doesn't just "describe the image." Every single prompt is engineered for the specific task:
• Reading mode: Extract text, maintain reading order, handle page layouts
• Navigation: Identify walkable paths, estimate distances in steps, warn about hazards
• Search mode: Track object positions across frames, provide directional guidance

Each mode uses carefully crafted prompts that tell the AI exactly what information matters and how to format it for spoken output.
3. Spatial Memory Integration
The app learns your environment through continuous camera analysis:
• Remembers room layouts from photo analysis
• Recognizes frequently visited locations through visual patterns
• Tracks where you last saw objects ("where did I leave my keys?")
• Builds a personalized spatial database from captured images [still being worked on]
Other apps treat every scene as new. We treat your life as continuous.
4. Multi-Modal Fusion
We don't just send images to AI. We combine:
• Visual data (camera frames)
• Spatial data (device orientation)
• Temporal data (movement patterns, history)
• User context (saved locations, tracked objects, preferences)
The AI sees what you see, but it also knows where you've been and what matters to you based on your photo history and saved spatial memories.
The Camera Overlay Experience
Every exploration mode is built around the camera overlay architecture:
Room Explorer: Your finger touches a grid mapped to the camera image. Each cell triggers AI analysis of that specific area. Haptic feedback intensity matches object density. Spatial audio plays sounds positioned where objects actually are in the frame.
Photo Explorer: Upload any image and explore it by touch. The overlay divides the photo into a tactile grid. Touch top-left? You hear what's there. Swipe across? You hear objects from left to right. It's like feeling a photograph.
Real-Time Navigation: The camera feed becomes your windshield. Visual object markers overlay detected obstacles. Audio tones create a sonic landscape. The crosshair shows where you're "looking." Everything updates 150 milliseconds—faster than conscious thought.
These aren't camera apps with accessibility features added on. They're accessibility-first experiences that require camera overlays to work.

Room/Photo Explorer:
• Voice descriptions of currently selected grid cell
• "What's here?" - AI analysis of touch location
• "Zoom in/out" - Change exploration detail level
The overlay doesn't block voice—it enhances it with spatial context.
Why This Matters
Accessible technology isn't about making phones usable for blind people. It's about making phones powerful for blind people.
Full-screen overlays, real-time spatial audio, on-device AI, context-aware prompting—these aren't accessibility features. These are next-generation interface designs that happen to work brilliantly for blind users.
When you turn off VoiceOver and enter navigation mode, you're not losing accessibility—you're gaining a purpose-built tool that traditional screen readers can't match.
You're experiencing spatial computing. Audio-first interaction design. Real-time AI vision. It's not augmented reality—it's augmented perception.
And we're just getting started.

Pro Tips for Exploration Modes:
1. Turn off VoiceOver/TalkBack before entering Room Explorer, Photo Explorer, or Real-Time Navigation
2. Turn it back on when you exit to the main app
3. Double-tap in Navigation for location updates—don't spam it, use it strategically
4. Wear open ear headphones for best spatial audio experience
5. Start indoors to learn the system before taking it outside
The future of accessibility isn't about reading buttons—it's about experiencing the world.
Welcome to that future.

By Stephen on Monday, November 17, 2025 - 18:16

So in regards to live ai, I've totally scrapped that feature...I have put something much better on your home screen. As for exploring photos, I'm looking into it...it has been a problem since late last night and I'm currently investigating what happened. All will be fixed soon. :).

By Stephen on Monday, November 17, 2025 - 18:17

I'll look into the language issue as well. Again thanks for letting me know!

By Cliff on Monday, November 17, 2025 - 20:15

I've been following this thread with great interest the last days, but haven't had enough spare time to check out the web app in much detail until now. But man, this is some great work and thought put into this project! And the turnaround time for fixing bugs and coming up with new ideas and features is just halarious! 😅 Can't wait to see where this is heading going forwards! Just awesome! And then on to a humble request. I see you've added several languages, but none for any of the scandinavian countries. I myself live in Norway, so adding Norwegian would be pretty high on my wishlist! Keep up the amazing work! 👍🏻

By Stephen on Monday, November 17, 2025 - 21:30

Thank you so much! I take pride in my support handling. I really like to be there for my users. I will look into seeing how we can support your language request :).

By Stephen on Monday, November 17, 2025 - 23:37

fix the bug where you would go into photo Explorer mode and try to move your finger along the screen it wouldn’t detect any objects in the photo you were trying to explore. Did a complete overhaul of the system so it should work decently now. In app screen reader disabled for all users until I can refine it and make it work more sufficiently. Folks were having trouble double tapping on an element. They were trying to select one thing, but the screen reader was selecting another. Please ensure to disable your device screen reader before entering explorer mode. A three finger swipe left gesture Should bring you back to the homepage if you are in explorer mode. Refining real time navigation to make things more optimal and doing the set up for contextual awareness. I’m looking into why users language settings aren’t saving that should be fixed shortly. Live AI has been removed due to the new implementation of the real time navigation feature but the voice conversation with the artificial intelligence where you can ask it what’s in front of you is working fine and it should also have better OCR now as well.

By Stephen on Monday, November 17, 2025 - 23:40

I do want to take this time to give a really great big thank you for those who have donated to this project already. You are amazing and words can’t express how much I appreciate you. Thank you so much for your support.

By Stephen on Tuesday, November 18, 2025 - 01:30

OK, so hopefully your preferred language will work now. Please yell at me if it doesn’t lol.

By Stephen on Tuesday, November 18, 2025 - 03:28

Ok so photo exploration has been updated to be more responsive and yes I had to get a sighted person to make it look more professional. It is no longer clunky and should work efficiently. I’ll be adding more enrichment descriptions for your photos tomorrow. I will also be updating the room explorer feature. Not a big one just a couple tweaks so it feels smoother and looks professional for presentation. If you’re using the real time navigator, and for some reason, you’re finding it’s hallucinating, let me know and I can adjust the image qualities. There is also a new follow mode in the real time navigation features… It’s brand new. Haven’t tested it out thoroughly yet so go to town and feel free to yell at me if it’s not working.

By Guilherme on Tuesday, November 18, 2025 - 09:17

Hello, please don’t take this as criticism — it’s just a thought: I didn’t really like that the live AI mode was removed, because many times what we want is simply a description of the environment we’re in without having to keep asking, and other apps don’t do this. Because of that, this feature would also be a distinguishing factor for this app. And just to mention it, the descriptions in the photo explorer are being spoken in English and not in my preferred language, which is Portuguese. I’ve already gone into the settings and selected Portuguese again, but the descriptions are still being spoken in English.

By Stephen on Tuesday, November 18, 2025 - 10:05

In regards to the live ai, it was super buggy and it lagged a lot. The real time navigation function works much better than that ever did but if you want it I can most certainly bring it back. In regards to languages, thank you for letting me know. I’ll get on that issue after work tomorrow. Hang tight, I’m on it :).

By Stephen on Tuesday, November 18, 2025 - 17:22

Fixed the language bug this morning. Seems to be working on our end so you shouldn't have anymore issues with that. If you do just yell at me about it :).

By Stephen on Tuesday, November 18, 2025 - 17:59

Ok so there is now an AI image generation mode wihtch works the same way as photo explorer! Come and explore your own creations!

By Zoe Victoria on Tuesday, November 18, 2025 - 18:23

I really want to try this app out, because it sounds absolutely amazing. But I can't get it to work for me at all. I am using my iPhone with VoiceOver and Safari. I double tap the link, and nothing shows up for me to interact with. Just the buttons on the browser like back, forward, share, and nothing on the actual webpage. It's just empty for me.

By Stephen on Tuesday, November 18, 2025 - 18:34

Should be fixed now. Let me know if it is working for you :).

By Zoe Victoria on Tuesday, November 18, 2025 - 18:58

It's working now, thank you so much for the fix.
I do have another question though. Not all the voices I have on my device are in the list. I can't seem to use the really good voices like Alex or Ava, instead only the lower quality ones like Samantha and Fred, and the novelty voices. Is there a particular reason why this isn't available?

By Stephen on Tuesday, November 18, 2025 - 19:03

I'm not entirely sure about the Alex voice situation. Some users are able to use it, others are not. I'm still trying to investigate why it is working for some and not all users. Hang tight, I'll keep you posted :).

By Zoe Victoria on Tuesday, November 18, 2025 - 19:21

I don't know why, but I'm having difficulties using the photo grid. I turn off voiceover and try to explore, but I just get some sort of haptic and then no sounds or voice feedback describing the photo. Any ideas on what might be happening? Thanks.

By Stephen on Tuesday, November 18, 2025 - 19:29

What mode are you in? are you using photo explorer mode or room mode?

By Zoe Victoria on Tuesday, November 18, 2025 - 19:31

I'm using photo explore mode. And I'm having a neat time so far despite my issues.

By Stephen on Tuesday, November 18, 2025 - 19:36

haha I'm glad. Thanks for letting me know. I have a massive back end so the more details the better for me haha. It is like a maze...I've really been pushing AI to it's absolute limit lol. I will be pushing an update to photo mode shortly hang tight and I'll fix that all up for you.

By Zoe Victoria on Tuesday, November 18, 2025 - 19:39

Hi. Unfortunately I have found yet another bug. I am unable to explore photos in the social media feed. It says that the photos do not have any tactile data when I have uploaded my own photo to the feed, and I know it got the map for it.

By Brian on Tuesday, November 18, 2025 - 19:43

Hey Stephen,

First, I love the way you have organized the options within this application. It is looking more and more professional each and every day. Like Zoe, I am having an issue with photo mode. When I go to choose a photo from a file, it loads the photo fine, even gives me a nice little description. However, it doesn't describe anything. I make sure VoiceOver is off, and I'm hearing the built-in reader using the Alex voice. However, as I start exploring, it doesn't say anything, and eventually boots me back out to the main menu.
Also, when I choose to explore a room, that works a little better. However, when I try to zoom in on a particular object, I sometimes will get an error. I don't remember exactly what the error says, but it says something along the lines of Cannot zoom in on this item.

Otherwise, this application is turning out beautiful.

By Zoe Victoria on Tuesday, November 18, 2025 - 19:47

I can't get past the room explorer tutorial because the ai says that it can't zoom in and analyze objects in the image.

By Stephen on Tuesday, November 18, 2025 - 19:47

Yeah that isn't fully up and running for that mode yet. I'm just getting the others mastered so then I can basically upload the infrastructure. This app is in alpha and I am building it live with you folks so not every feature is working properly yet. Quick question in photo mode are you just disabling screen reader speech or are you turning it off entirely for photo exploration mode? On device screen reader needs to be completely off in that mode. I just tested it on my end.

By Zoe Victoria on Tuesday, November 18, 2025 - 19:48

I am completely disabling VoiceOver so that's not the issue. A fair question to ask, though.

By Stephen on Tuesday, November 18, 2025 - 19:55

Are you uploading a photo or taking a photo?

By Brian on Tuesday, November 18, 2025 - 20:27

I have an SE 2022, so for me it is a triple click of the Home button every time. 😀

By Zoe Victoria on Tuesday, November 18, 2025 - 20:35

I am uploading photos.

By Stephen on Tuesday, November 18, 2025 - 20:43

Is it possible to upload a recording to me of what is going on? I can't reproduce that issue and it seems to be functioning for other users. You also said your getting haptic feedback on IOS? You shouldn't be able to feel vibrations do to ios restrictions. It looks like only android users can get that. :).

By Zoe Victoria on Tuesday, November 18, 2025 - 21:40

The photos are working a lot better now. I don't know what it was all about. It's hard to say what's a bug and what's just not been finished yet.
One thing I know for certain is a bug for me, is that I'm unable tu use the keyboard to type on IPhone to search in the web browser, or when generating an image to be explored.
One more thing that isn't a bug that can be improved is zooming in. I'm able to zoom in on photos but I was given the impression that I could find elements inside of elements, such as exploring features on a face. At the moment it just shows that single element I zoomed in on with a little more description. I'm assuming because that's not fully made yet? I'm loving what's available so far.

By Stephen on Tuesday, November 18, 2025 - 22:59

Yeah the text edit fields broke when updating the programming language...I'm fixing those along with the zoom features. Life of programming...you fix 1 thing another thing breaks lol. I'm also working on this while also at work so updates are a slower to come. All will be fixed though :).

By Stephen on Wednesday, November 19, 2025 - 01:01

How do you guys feel about this photo description mode...this ai image generation one is pretty cool as well. This was a tough one lol.

By Gokul on Wednesday, November 19, 2025 - 05:10

But PayPal doesn't work in my country. Any other options?

By Brian on Wednesday, November 19, 2025 - 05:40

Definitely getting there. It will still occasionally kick me out back to the main menu while trying to explore a photo, however.
Just for transparency sake, I'm trying to use the same photo I uploaded for my profile. It's a pic of me sitting in front of a small brick wall, wearing khaki shorts and a T-shirt, a ball cap and a pair of sunglasses, and my yellow lap guy dog is laying down in front of me, kind of perpendicular to the way I'm facing. Hope that makes sense.

By Stephen on Wednesday, November 19, 2025 - 06:09

Omg I'm so sorry about that... let me see how I can help and thank you so much!

By Stephen on Wednesday, November 19, 2025 - 06:14

Bigger badder update should be coming. I'm working on this feature tirelessly lol. Sometimes though you may have wait longer than expected as what I've done is I put all images to full screen and gave the AI strict parameters as to object placement. A few more layers of zoom is coming shortly as well. It should be much more detailed.

By Stephen on Wednesday, November 19, 2025 - 06:17

If you use a credit card you should still be able to give thru that link based on my research. I don't think you need a paypal link to give thru that link based off what I'm seeing from other supporters. I'm pretty sure it also has apple pay on that link as well. I just think everything should be in all parts of the world. I'm so sorry about that. I will do some more looking and see if I can get something else set up that works for you. Please let me know when you go to that link :).

By Gokul on Wednesday, November 19, 2025 - 15:36

Seems to have worked. I'm not entirely sure, but it seems to have.

By Stephen on Wednesday, November 19, 2025 - 16:14

Just tested the public version of real time navigator and seems to be working. Just insure to have your phone unmuted for it to work properly :). I might do some tweaks because sometimes it holds on to the previous capture but the audio cues for obsticals and the 1 finger doubletap with screen reader off works. I will probably change how it gives you the information just for better contextual awareness but nothing too major unless you find a problem with it. It does take a moment for the ai to analyze but any faster and we may start getting hallucinations and I'm trying to avoid any of that behavior of course :).

By Stephen on Wednesday, November 19, 2025 - 17:14

Working on the social media feature now so you guys can upload and share your photos.

By Stephen on Wednesday, November 19, 2025 - 23:45

How are you finding the real time navigation feature? Is it working good for you?

By Stephen on Thursday, November 20, 2025 - 01:28

Alright everyone! Together I think we nailed photo exploration mode. I will be copying the explore mode engine over to ai image creation and over to the social media page. Next on my list is to get that documents feature perfected and real time navigation tweaked to function as perfectly as possible. It is because of each and every single one of you that made this possible. If you want to see more features or are having issues I'm always here with open ears. Thanks to everyone of you for supporting this project. There is so much more to come! Lets keep this thread poppin! Love yawl!

By Guilherme on Thursday, November 20, 2025 - 10:24

Hello, if possible in future updates I would like you to bring back the Live AI feature, because it would be a great advantage for the app to have continuous descriptions of the environments around us without needing to keep asking.
As far as I know, there is no app that does this — in all of them we need to ask to get descriptions, even in those that have real-time vision.

By Arya on Thursday, November 20, 2025 - 10:38

Hi Stephen, hats off to this wonderful idea of web app and the idea for exploring the photo.I always felt why we have to wait for the AI's own description of the photo after uploading the photo and we have to ask many questions to get the response we wanted. I always dreamt of an app which gives the user the power to get the info they want.
This app exactly does that .
It allows the user to choose the item to be described and leave the once not interesting to him like the sighted persons browse the headlines and read the content which is of interest to them.I tried using the following modes.
1. Room explorer.
2. Photo explorer.
3. Real time navigation.

Feedback about the various modes are as follows.

1. Room explorer.

After taking the photo, there is a period of inactivity during the processing of the image which confuses the user regarding the status of the application. Kindly introduce processing sounds or message that is processing.
After the results are announced, the AI is giving a most accurate description about the items detected. I am amazed how you achieved this level of accuracy, because all the AI I have tried mostly hallucinated.

2. Photo explorer .

The experience was excellent while exploring a photo.
First time I was able to understand how the photo might look like visually and appealing.

3. Real time Navigation.

The description was fairly accurate in this mode with hallucinations in between.
Some times it says the measurement in feet's and some times it says little far of with out any metrics that makes things difficult to understand.
I got struck in this mode and I was unable to come out of this mode with 2 finger or three finger swipe left.
Am I doing any thing wrong?

Feature request.

1. In the room and photo explorer mode, if the screen reader announces the number of rows and columns or the number of tails available for the user to explore , it would help us to understand how much portion of the screen we have to explore.

In the document reader also please give us the choice to read the content recognized instead of reading all the content from top to bottom, like all the OCR software does.
The sighted people just browse through the document and skip the header and consume only the important content of the document.

I also request to analyze, whether the response time can be shortened in the room and photo explorer mode, because I felt the response time was slow with even 400 MBPS speed network.

excuse me for a long post.

I am actually thrilled with your ideas.
Please keep up the good work.

By Stephen on Thursday, November 20, 2025 - 15:34

Hi arya,
Wow, thank you so much for this incredibly detailed and thoughtful feedback! 🎉 Your enthusiasm absolutely made my day, and the fact that you took the time to test every mode and write such comprehensive observations means the world to me.
Let me address your brilliant questions:
📸 Photo Explorer - Why No Grid System?
You asked about rows/columns/tiles for exploration - and this is actually one of the most intentional design decisions in the entire app! Here's why:
The drag-based exploration system was specifically designed to mimic how sighted people naturally look at photos. When you look at a picture, your eyes don't move in a rigid grid pattern - they flow freely, following what interests you. You might glance at someone's face, then drift down to their hand, then notice something in the background.
That's exactly what the drag system enables! As your finger moves across the screen, you're "tracing" the photo's natural contours. If you're exploring a person, your finger can smoothly flow from their head → down their torso → to their hand, just like eyes would scan. You're not jumping from "cell B3" to "cell C4" - you're experiencing the photo as a continuous space.
Why a grid would actually limit you:
• Arbitrary boundaries: A grid cell might awkwardly split someone's face in half, or group unrelated objects together
• Cognitive overhead: You'd need to remember "I'm in row 3, column 4" instead of just exploring naturally
• Breaks immersion: The magic of Photo Explorer is feeling like you're "touching" the actual scene, not navigating a spreadsheet
Think of it like this: Would you rather explore a sculpture by touching it freely, or by poking it only at pre-marked grid points? 😊
⏱️ Why Photo Processing Takes Time - The AI Magic Behind the Scenes
You mentioned the processing time, and I want to pull back the curtain on what's actually happening:
When you upload a photo, the AI isn't just tagging objects like "dog, person, tree." It's building an entire hierarchical map of the image with unlimited depth:
1. Level 0 (Scene): Identifies every individual object and person as separate entities - if there are 5 dogs, it creates 5 separate dog objects, not a "group of dogs"
2. Level 1 (Components): Breaks each object down into parts (person → head, torso, arms, legs, each dog → head, body, legs, tail)
3. Level 2 (Fine Details): Breaks those parts into sub-features (head → face, ears, hair, neck)
4. Level 3+ (Micro-Details): Goes even deeper (face → eyes, nose, mouth, cheeks, forehead)
Then for EACH of these hundreds of objects, it generates:
• Precise pixel-perfect boundaries so your finger hits exactly the right spot when you drag
• Tactile descriptions that describe texture, shape, and spatial relationships
• Immersive contextual cues for atmospheric details
• Zoom-level specific details so when you double-tap to zoom deeper, you get finer and finer features
This is why it takes 10-30 seconds - the AI is essentially creating a custom "tactile braille map" of your entire photo with 100-500+ touch-sensitive regions arranged in a hierarchical tree. We're working on optimizing this, but there's serious computational work happening to build that seamless exploration experience!
🏠 Room Explorer - The Hybrid Approach
You're absolutely right about the silent processing period - currently there's no automatic feedback, though you can touch the middle of the screen to hear "Analyzing..." We're working on making this automatic and more prominent so you're never left wondering.
Here's why we use the grid for Room Explorer but not Photo Explorer:
When you walk into a room, you need instant spatial orientation - "What's in front of me? What's to my left? What's far away?" The grid gives you that immediate 9-zone map (top-left, top-center, top-right, etc.) before the detailed AI analysis is even complete. It's like a quick mental sketch of the space.
Once you tap a grid cell and zoom in on a specific object, you then get the natural drag exploration for that object's details - but the initial grid helped you find it in the first place.
📄 Document Reader Improvements
You're spot on - we're actively working on selective reading! The next update will let you jump to specific sections (headlines, paragraphs, tables) instead of forcing top-to-bottom reading. Just like sighted people skim for what matters!
🔧 Response Time & Navigation Exit Bug
• Speed: We're optimizing the AI processing pipeline to show progressive results as they come in.
• Navigation mode exit: That two-finger swipe back gesture that's not working? Yeah, that's a nasty little bug I'm actively trying to squash! 🐛 It absolutely should work, and I'm prioritizing this fix. Could you email me which device/browser you're using? That'll help me kill this bug faster! I'm also working on it sometimes Hallucinating in this mode.
🌟 The Most Important Part
Everything you're experiencing is alpha software - which means YOUR feedback is literally shaping the final product. The fact that you're thinking about rows/columns, asking about measurement consistency, and requesting document skimming features? That's gold. Those observations go straight into the development roadmap.
You're not just a tester - you're a co-creator of this tool.
Please keep the feedback coming! Every hallucination you catch, every awkward interaction you notice, every feature you dream of - I want to hear it all.
Thank you for believing in this vision (pun intended 😊) and for helping make it better for the entire blind community.
Keep exploring!
Stephen

By Stephen on Thursday, November 20, 2025 - 19:08

hello Guilherme,
Thanks so much for the feedback and request.
I will put in this feature for you in the next update :).
Cheers!

By Zoe Victoria on Thursday, November 20, 2025 - 21:51

The one and only downside to this being a web app is that there's no way to explore a photo found anywhere there are photos using the share sheet. That's the only disadvantage this has to other image describing tools.