Stop Waiting for AI to Tell You What to See. Start Exploring It Yourself.

By Stephen, 12 November, 2025

Forum

Assistive Technology

I'm about to show you something that breaks every rule about how vision AI is "supposed" to work.
And when I say breaks the rules, I mean completely flips the whole thing upside down.

Here's What's Wrong With Every Vision AI App You've Ever Used
You point your camera.
You wait.
The AI speaks: "It's a living room with a couch and a table."
Cool story. But where's the couch? What color? How close? What's on it? What about that corner over there? That thing on the wall?
Want to know? Point again. Wait again. Ask again.
The AI decides what you need to know. You're stuck listening to whatever it decides to tell you. You don't get to choose. You don't get to dig deeper. You don't get to explore.
You're just a passenger.
So I built something that does the exact opposite.

What If Photos Were Like Video Games Instead of Books?
Forget books. Think video games.
In a game, you don't wait for someone to describe the room. You walk around and look at stuff yourself. You check the corners. You examine objects. You go back to things that interest you. You control what you explore and when.
That's what I built. But for photos. And real-world spaces.
You're not listening to descriptions anymore.
You're exploring them.

Photo Explorer: Touch. Discover. Control.
Here's how it works:
Upload any photo. The AI instantly maps every single object in it.
Now drag your finger across your phone screen.
Wherever you touch? That's what the AI describes. Right there. Instantly.
Let's Get Real:
You upload a photo from your beach vacation.
Touch the top of the screen:
"Bright blue sky with wispy white clouds, crystal clear, no storms visible"
Drag down to the middle:
"Turquoise ocean water with small waves rolling in, foam visible at wave crests, extends to horizon"
Touch the left side:
"Sandy beach, light tan color with visible footprints, a few shells scattered about"
What's that on the right? Touch there:
"Red beach umbrella, slightly tilted, casting dark shadow on sand beneath it"
Wait, what's under the umbrella? Touch that spot:
"Blue and white striped beach chair, appears unoccupied, small cooler beside it"
Go back to those shells - drag your finger back to the beach:
"Sandy beach, light tan color with visible footprints, a few shells scattered..."
See what just happened?
The information didn't vanish. You went back. You explored what YOU wanted. You took your time. You discovered that cooler the AI might never have mentioned on its own.
You're not being told about the photo. You're exploring it.
And here's the kicker: users are spending minutes exploring single photos. Going back to corners. Discovering tiny details. Building complete mental maps.
That's not an accessibility feature. That's an exploration engine.

Live Camera Explorer: Now Touch the Actual World Around You
Okay, that's cool for photos.
But what if you could do that with the real world? Right now? As you're standing there?
Point your camera at any space. The AI analyzes everything in real-time and maps it to your screen.
Drag your finger - the AI tells you what's under your finger:
• Touch left: "Wooden door, 7 feet on your left, slightly open"
• Drag center: "Clear path ahead, hardwood floor, 12 feet visible"
• Touch right: "Bookshelf against wall, 5 feet right, packed with books"
• Bottom of screen: "Coffee table directly ahead, 3 feet, watch your shins"
The world is now touchable.
Real Scenario: Shopping Mall
You're at a busy mall. Noise everywhere. People walking past. You need to find the restroom and you're not sure which direction to go.
Old way? Ask someone, hope they give good directions, try to remember everything they said.
New way?
Point your camera down the hallway. Give it a few seconds.
Now drag your finger around:
• Touch left: "Store entrance on left, 15 feet, bright lights, appears to be clothing store"
• Drag center: "Wide corridor ahead, tiled floor, people walking, 30 feet visible"
• Touch right: "Information kiosk, 10 feet right, tall digital directory screen"
• Drag up: "Restroom sign, 25 feet ahead on right, blue symbol visible"
You just learned the entire hallway layout in 20 seconds.
Need to remember where that restroom was? Just touch that spot again. The map's still there.
Walk forward 20 feet, confused about where to go next? Point again. Get a new map. Drag your finger around.
But Wait - It Gets Wilder
Object Tracking:
Double-tap any object. The AI locks onto it and tracks it for you.
"Tracked: Restroom entrance. 25 feet straight ahead on right side."
Walk forward. The AI updates:
"Tracked restroom now 12 feet ahead on right."
Lost it? Double-tap again:
"Tracked restroom: About 8 steps ahead. Turn right in 4 steps. Group of people between you - stay left to avoid."
Zoom Into Anything:
Tracking that information kiosk? Swipe left.
BOOM. You're now exploring what's ON the kiosk.
• Touch top: "Mall directory map, large touchscreen, showing floor layout"
• Drag center: "Store listings, alphabetical order, bright white text on blue background"
• Touch bottom: "You are here marker, red dot with arrow, pointing to current location level 2 near food court"
Swipe right to zoom back out. You're back to the full hallway view.
Read Any Text
Swipe up - the AI switches to text mode and maps every readable thing.
Now drag your finger:
• Touch here: "Restrooms. Arrow pointing right."
• Drag down: "Food Court level 3. Arrow pointing up."
• Touch lower: "Store hours: Monday to Saturday 10 AM to 9 PM, Sunday 11 AM to 6 PM"
Every sign. Every label. Every directory. Touchable. Explorable.
Scene Summary On Demand
Lost? Overwhelmed? Three-finger tap anywhere.
"Shopping mall corridor. Stores on both sides, restroom 25 feet ahead right, information kiosk 10 feet right, people walking in both directions. 18 objects detected."
Instant orientation. Anytime you need it.
Watch Mode (This One's Wild)
Two-finger double-tap.
The AI switches to Watch Mode and starts narrating live actions in real-time:
"Person approaching from left" "Child running ahead toward fountain" "Security guard walking past on right" "Someone exiting store carrying shopping bags"
It's like having someone describe what's happening around you, continuously, as it happens.

The Fundamental Difference
Every other app: AI decides → Describes → Done → Repeat
This app: You explore → Information stays → Go back anytime → You control everything
It's not an improvement.
It's a completely different paradigm.

You're Not a Listener Anymore. You're an Explorer.
Most apps make you passive.
This app makes you active.
• You decide what to explore
• You decide how long to spend there
• You discover what matters to you
• You can go back and check anything again
The AI isn't deciding what's important. You are.
The information doesn't disappear. It stays there.
You're not being helped. You're exploring.
That's what accessibility should actually mean.

Oh Right, There's More
Because sometimes you just need quick answers:
Voice Control: Just speak - "What am I holding?" "Read this." "What color is this shirt?"
Book Reader: Scan pages, explore line-by-line, premium AI voices, auto-saves your spot
Document Reader: Fill forms, read PDFs, accessible field navigation

Why a Web App? Because Speed Matters.
App stores = submit → wait 2 weeks → maybe approved → users update manually → some stuck on old version for months.
Web app = fix bugs in hours. Ship features instantly. Everyone updated immediately.
Plus it works on literally every smartphone:
• iPhone ✓
• Android ✓
• Samsung ✓
• Google Pixel ✓
• Anything with a browser ✓
Install in 15 seconds:
1. Open browser
2. Visit URL
3. Tap "Add to Home Screen"
4. Done. It's an app now.

The Price (Let's Be Direct)
30-day free trial. Everything unlocked. No credit card.
After that: $9.99 CAD/month
Why? Because the AI costs me money every single time you use it. Plus I'm paying for servers. I'm one person building this.
I priced it to keep it affordable while keeping it running and improving.

Safety Warning (Important)
AI makes mistakes.
This is NOT a replacement for your cane, guide dog, or mobility training.
It's supplementary information. Not primary navigation.
Never make safety decisions based solely on what the AI says.

The Real Point of This Whole Thing
For years, every vision AI app has said:
"We'll tell you what you're looking at."
I'm saying something different:
"Explore what you're looking at yourself."
Not one description - touchable objects you can explore for as long as you want.
Not one explanation - a persistent map you can reference anytime.
Not being told - discovering for yourself.
Information that persists. Exploration you control. Discovery on your terms.

People are spending 10-15 minutes exploring single photos.
Going back to corners. Finding hidden details. Building complete mental pictures.
That's not accessibility.
That's exploration.
That's discovery.
That's control.
And I think that's what we should have been building all along.
You can try out the app here:
http://visionaiassistant.com

Options

Comments

What Features Interest me?

All of the above!

The fact more apps from an accessibility standpoint (me being an Android user) aren't web apps, your idea is a golden glass of fresh air here!

@ Trenton Matthews

Thank you so much. I did get very little sleep when getting the foundation built lol.

May be looking for beta testers

I may be looking for beta testers in the near future more specifically for the android side. I have an iPhone myself so I can test the iPhone features so let me know if this interests you 😊.

Agreed

Making this a universally accessible application is boss! I for one cannot wait to give it a try. 😎👍

Android beta tester

Good job!
I am ready to beta test it!

@ Brian

Thanks so much. Right now I’m just building the integration for its own self screen reader. That way you can just turn off their screen reader when using the app itself. Using it myself I don’t like that I have my screen reader going plus the screen reader in the app also going. I’m also working on when feeling through your photos, you can actually tap on an item and it’ll expand that item so you can explore that item like a bookshelf for example.

Look forward to it

Stephen, it sounds like a good way to do it. I've been having to ask questions of AI in a grid, or to be exact, several different kinds of grids such as thirds, to understand pictures I take and edit.
An issue I have with another explore by touch AI app is that there is no indication of where the image ends at the top and bottom of the iPhone screen, and as I work with several different aspect frames with lots of blue sky that the AI does not speak, I have to guess a lot. I would think this would not be an issue in your grid system.
I find myself constantly having to ask if the edge of the image cuts part of a bird or other critter off that the AI has said is in the picture, they almost never say this up front. I also have to ask a lot if something is in focus with one of my AI describers.

This reminds me of an app called Image-Explorer

This reminds me of an app named Image-Explorer, but it no longer works even though it is still available in the App Store. I will be looking forward to this cross-platform app and it'd be interesting if it also worked on touchscreen laptops.
@OldBear, what's this other explore-by-touch AI app you mentioned, if I may ask?

@ Enes Deniz

It should work wherever a touchscreen is implemented. But it would be interesting to see if it actually functions the way it’s supposed to, but so far everything is functioning extremely well. I remember image explorer… It was a pretty weak app. I can also adjust things for you guys on the fly if something seems broken or buggy or you want it to react differently.

Audio representations?

You can't add the option to provide audio feedback as the user moves the finger around the screen, right? You know, the type and timbre, volume and other characteristics may indicate certain visual properties. I also thought of 3-D audio or spoken and perhaps even haptic feedback but those might be more challenging to implement. I acknowledge that audio feedback requires the app to treat the image as a whole as the user should hear continuous beeps or loops or blips or whatever as (s)he moves the finger across the screen and as colors shift and the level of brightness fluctuates etc. so you might need to develop a new underlying approach to redesign the interface for this to work properly.

@ Enes Deniz

I’m working on haptic features for different textures etc etc but that’s gonna be really hard to get going I think it being a web app and all. But the descriptions, for example when I upload a photo of my dog, I can feel where his ears are, his nose is, his eyes, the voice also gives the expression of his eyes and what his nose looks like. I really wanna try to give everyone an actual experience with a photo and not just hearing an AI‘s description of the entire photo. You should be able to explore it… I also am implementing a Zoom feature where if you double tap on a certain portion of let’s say the dogs nose, you can just explore his entire nose. It does work better with bookshelves but right now I only have pictures of my dog lol. I also have a bunch of sunrise and sunset photos and it’s really cool being able to go through and really feel the sky.

@ Enes Deniz

I was talking about the Seeing AI app, in Descriptions>Brows Photos, or something like that. There's an Explore option that gives haptic and audio feedback on some of the larger objects in a photo if the process recognizes them.

Requesting a long shot...

Will this have Braille support, for those persons whom are both deaf and blind?

@Stephen

So is it possible to add the option to zoom in or out and let the user explore by smaller or larger units/distances, even pixels? You know, this will be quite handy if you somehow implement audio cues/beeps. So what I'm talking about is something like a combination of different methods usable simultaneously. Let's say you're exploring a photo featuring a person. You'd get more detailed audio feedback as you slide your finger, but only when your finger moved over a different body part or clothing would you get spoken feedback. So this will require that the app detect individual objects and describe them only while your finger is on them, by taking into account the size and location of each object, rather than dividing each and every image into the same number of zones and treating every image as a grid. One object may span multiple zones on that artificial grid, or it might be so small that it fits within one zone, so that system unfortunately didn't sound so realistic and effective to me. The alternate method I'm proposing is more like that found in Image-Explorer in that respect. The audio cues should also be heard more naturally and continuously, so representing an image as a grid may prevent that. Let me try a different explanation to clarify my point further: Exploring an image represented as a grid sounds like navigating a table with a certain number of columns and rows. So it's more of jumping from one cell to an adjacent one as you slide your finger than exploring the entire image as a whole.

@ Enes Deniz

I am working to see if I can implement your suggestions right now 😊. Standby.

Sure thing.

Well, apparently this is where your app will excel. Whenever we have a suggestion, bug report etc., we just fire away and you take care of everything without ever dealing with app store policies, having to submit your updates and wait for them to be approved.

@ Enes Deniz

Implementation successful. I’m sure it could be better but it’s one heck of a start!

@ Enes Deniz

Imagine a blind person who's
never "seen" their child's face as an example. You can now Feel the shape of their nose, Count their teeth, Explore their smile lines etc etc. it’s pretty tough to feel the exact size of Let’s say an adult on a screen, but I’m hoping the Zoom feature can help with that as well a little. The Zoom feature is playing a little bit hard to get, but I’ll get it.

@ Brian

I’m not against adding braille support at all. The problem with that is going to be what display they’re using and whether or not I can get it to work on displays. I’m not sure how I can implement that effectively where they can entirely feel the braille that’s representing ears, nose, eyes etc etc plus with all the zoom features. I would be curious to know if web apps are just accessible for braille display users anyways?

@Stephen

Now that I've left you to deal with my volley of suggestions, I'm now beginning to think of how many different scenarios in which this app would be highly useful, from exploring the world map to taking or finding a photo of a street to get a better overview for easier navigation or examining a photo taken by a friend and posted on social media in detail.

@ Enes Deniz

lol. I have so many ideas for this app and for you guys.

Ya, that

That's great.
What Enes Deniz describes, and I guess is now implemented, is much like what the Seeing AI Explore feature is, though that is very limited.
I use it for when, for example, I am cropping a picture of a bird with its wings spread to make a photo printout on a specific size paper, and I need to be sure the bird is large enough and in the desired spot without it being cropped by the edges. I locate the bird, say it's in portrate orientation, and run my finger across the screen over and over until I have a good idea of how it is relative to the sides of the picture. Doesn't work with top and bottom as well, but I can at least tell if it is in the top half or bottom half.
Having more specific details included in that would be a game changer.

@ OldBear

You tell me what you need and I’ll do my best to make it happen 😊.

sounds amazing but remember ....

this sounds amazing but remember we need our imaginations to "see" an arm, "see" a plant, "see" what we are doing say, counting teeth, it will require people to have exceptional spacial awareness, imagine a day which will never come in my lifetime where you can "physiclly" somehow? feel a photo. remember we are examining the screen which is fabulous, but it will require people i guess, to imagine they are "in" the photo, does that make sense?
so, if say you have a picture of a dog, in a living-room you'd need I guess to imagine you are in that living-room physically to "feel" how the photo looks. I look forward to trialing this, will it be this year I wonder?
I would have liked to explore my mums house decorations, in particular her Christmas tree it huge apparently lol. it will be great to get a "feel" for my childrens faces as well, yes I can touch them but it will be great to get a feel for it.

I love the idea of the food as well, say if we are in a restaurant if it can say your steak is at 2 o'clock your fries/chips in the UK, chips I mean are say 6 o'clock, for those who value that, it will be fabulous.

@ Karok

I hear ya. With the way I have it set up. You can feel the entire room. I’m also working on those sound cues that you can trace so you can feel how big an object is, you can also zoom in on a specific object and just explore that object. So let’s take your mom‘s Christmas tree. When you take a photo of the room or she sends you a photo of the room, you can move your finger across the screen to find the Christmas tree. You can feel its shape and size as much as possible with sound cues, tap on the tree, then you’ll be able to feel the tree with all of the ornaments on the branches. Then, you can actually tap on each ornament and feel it thru sound and description. Right now I have two levels of Zoom programmed into it. I’m hoping to show it off within the next week or so… There may be a little delays due to me fixing bugs because unlike a lot of companies if I’m going to release something, I wanna make sure it at least functions decently lol.

This sounds really really awesome

I have to say, I find this app that you’re talking about to be quite good. Could this thing help me identify menus on my Casio CTS 1000 V keyboard? I have trouble with the menus because there’s no speech or clicks or beeps or anything. I had to use ally to help me pair the Bluetooth connection so I could use the speakers as a sort of audio speaker to stream stuff from my phone to it. It would also be cool if he could read me what styles are on the display because this has no numbers either, it’s buttons a dial and more buttons. I have an idea as to what some of the buttons do. But the menus and learning what styles are what is a little bit difficult except for the pop styles and the rock styles

@ Exodia

Perhaps. Let me just finish getting the core features out at least and then I could work on lots of other features. The only two main ones I’m having problems with right now is location accuracy, search and well… I guess I haven’t tested the reading yet so I can’t say that’s a problem but everything else seems to be working pretty smoothly. Right now I’m just editing the finishing touches and trying to make that AI stay on point when you’re browsing through your photo.

You can try it out here.

I’ve decided to open this up for a public alpha beta trial. I want to be upfront about something that matters. This project is expensive to build and keep running. It costs me close to three hundred Canadian dollars every month just to maintain everything behind the scenes. I cover it by working full time, which is fine for now, although it limits how fast I can push new features.

I want people to try it without barriers, so the alpha beta will stay public for a little while. It will not stay open forever because the costs add up quickly. I am exploring options for the future, whether that is donations or a small subscription model. I want to find something that works for everyone. If this takes off and the community shows real interest, I would look at reducing my work hours so I can put more time into development.

I appreciate everyone who tests this, gives feedback, or even shows curiosity. This community can be tough to impress and I mean rightfully so, I know I am, which is exactly why I want your honest reactions. And please don’t tell me it doesn’t know your correct location… I know. It’s a thorn in my side lol. Also the search function in the conversation mode doesn’t work quite yet…it’s something I’m working on. TBH, I got a little hyper focused on photo exploration lol. You can find the link below.
http://visionaiassistant.com

The app is amazing!

I just tried the app, and it works really well especially as a first beta release! It describes the surroundings better than Gemini or ChatGPT. Thanks, Stephen, for your efforts. My observations so far on an iPhone 16 Pro Max:
1. Maybe I'm lost in the Settings window, but it seems that I can't alter the voice. It uses Samantha, and I want to use, say, Alex, Eloquence, or eSpeak-NG. I can't find a setting for that. The only thing I see about the voice is changing the speed.
2. The AI Assistance feature cannot be started unless the Camera mode is activated first. I don't know if this is by design, or a bug. I think the AI assistance should be started independently to avoid confusion.
3. If I enable Continuous scanning from the Settings window, which scans the environment every 4 seconds by default, the app doesn't provide responses in a meaningful or useful way. After enabling that from Settings and enabling the AI assistance feature from the main window which requires enabling the camera mode first, the app keeps beeping continuously. As I ask it something, it tries to provide an answer, but the answer gets cut off immediately. The same happens after the next question. So something weird should be happening there.

Contacting you beyond AppleVis?

Stephen, people on Mastodon who don't use AppleVis want to know if they can contact you via email for feedback and issues. Is such an option available?

The idea is interesting.

I've tried the beta and It gets things write which is nice, but I turned on the live ai mode, the seccond live button, and it mentioned voice commands, I can't seam to finds them? I ask where are the whipes, get nothing back, what text is on the whipes, same thing,, also the room description things my room is empty when it isn't.

Brilliant!

Both the idea and the current implementation in my short initial test. I can totally see the potential and this can go places if put together properly and cleenly. I'd really like to see large-to-small maps being able to be described this way. Also, as you said, this community can be tough to impress so I'd be prepared to face the toughness as an inicial alpha is opened up.

@ Amir Soleimani

Hello Amir Soleimani. So some answers to your questions: no you can not change the voices. That will be something to look into down the road however right now this is the cheapest option. I could probably set up the 11 labs API, but that’s an extra cost on my end so for now it’ll just be your devices default browser speech. As for getting in touch do you prefer discord, Facebook, slack? Let me know and I’ll set up What’s most convenient for you 😊. As to that other pesky bug that seems to keep popping up where it keeps cutting itself off, I’m working on it now 😊.

Re: Stephen

Thanks, Stephen.
1. As for speech, any chance of using other voices already available on the phone? It can be Alex, for instance, for iPhones. I mean phones provide access to a number of built-in voices not just one.
2. As for communication, I'm perfectly fine with AppleVis. However, guess people on Mastodon prefer an email address if doable for you.
3. And thanks for looking into the constant beep/ cutting off issue.

@ Amir Soleimani

In regards to the voice, unfortunately no. While your device may have other voices, it will only let me use the browsers default native voice. I can definitely look into other options in the future but right now for me. It’s the most cost-effective as I’m not paying anything on top of what I’m already paying. I’ll get a contact page set up for everyone today 😊.

Discord

I think a good many blind people use Discord, so a Discord community could be a good discussion group for this.

@ Devin Prater

I just decided to implement a full on direct messaging chat feature in the app lol. I might also implement a community chat so everyone who’s using the app can talk to each other.

@ Gokul

Thanks. Using maps like this is probably my ultimate goal and I’ll get there most likely eventually lol. Right now this is quite literally the foundation but if people can interact with their photos this way, I don’t see why I couldn’t implement a Maps feature this way. Let me work on it… you have ideas in my head spinning now 😊. Another note, I get why the blind community is hard to impress… We keep getting offered things that don’t live up to the hype. That’s why my plan is to be realistic and transparent with the community I’m building at four. No marketing schemes, no video editing to make things seem like they’re working faster than they actually are and everything else not mentioned. Plus I’m easily accessible, and I can push updates fairly immediately. In a moment here I’m going to be pushing an update where you can just direct chat with me through the app.

Update

Just a couple quick updates, live ai SHOULD, in all caps, be giving better descriptions however, it may interrupt itself once or twice per description. I will be actively working on that throughout the day. Also in your settings, if I set it up properly, you should be able to DM me. You should see the contact developer button. It’s basically an instant messaging chat like iMessages or WhatsApp. I just thought it was easier that way seeing as you guys will already be in the app. If you get any error messages, please send your message with a screenshot if you can…yes they do help when trouble shooting. I’ll be monitoring this thread thru out the day incase the chat feature is broken so if you had tried to send a message and I didn’t respond, please let me know here. Thank you all for the great support. There’s so many things I want to do with this :).

First thoughts

First, the web app works pretty well. I tried the room view and a photo. They both worked well, and I was surprised at how having a kind of tactile view of the photo helped me remember the photo even better. Fall leaves at the top left and right, text of the flyer in the middle. Stuff like that. I could imagine using this in video games where there's a game board. Of course, web apps can't really work on top of games but I could take a screen shot. I imagine the photo picker would let me get to all my screen shots.
Now an idea: I didn't know when I'd gotten to the last element of a photo, so maybe a boundary noise? when there are no more elements below where the finger is?

Intriguing Approach All Around

First, I love the detailed description of the tool. It's clear a lot of thought was put into it. I never thought of this approach for describing images, but after the different scenarios you described, it makes sense, and makes me wonder why this hasn't been tried already.

Since this is a web app, I'm also wondering about a future possibility of using this on a desktop or laptop computer, and instead of capturing the camera, it would capture the screen. A user could use the mouse or touch screen in the same way as an IOS screen. This could be very helpful for exploring graphical detail in more detail. It could also be really helpful for exploring inaccessible content, like those stupid anti-virus install wizards that are rarely accessible, when I want to remove them. Once the user has identified a checkbox, button, etc., the user could then tap, double tap, or click/double click to activate an inaccessible control.

I'm looking forward to giving this a try, beta, or at release.

@ Jesse Anderson

You can already try it out. I did post the link above 😊. It was a bunch of messages back though so I’ll post it below this message. I love those ideas btw!! The reason why it’s a Web app is because I can push updates for you guys right away without having to deal with native App Store drama. There’s also a lot of limitations put on native abs. I decided to also do it this way because it’s universally accessible.
http://visionaiassistant.com

@ Devin Prater

Thanks so much for the feedback. Definitely something I’ll look into implementing. Now when you’re in photo explorer mode, if you find objects in that photo, you can actually double tap on them and zoom in and explore that specific object you should be able to zoom up to three times so for example, if you find a table, you can zoom into that table, feel the items on it, then you can tap on that item and explore that item etc etc.

Localization and Apple sign-in support?

Can language support be expanded beyond English? And why is Apple not among the options to sign in quickly? This is honestly surprising and somewhat strange as you're posting about this app on AppleVis, an Apple-focused forum, yet Google, Facebook, Microsoft and e-mail sign-in are all supported, while Apple is not.

An idea that might be interesting.

I don't tink I'd use the web app a lot myself but, I could see this replacing BeMyEye's AI feature on windows.

Imagine you take a picture and if you want to hear the text,, you can, or if you want to explore it, you can with arrows and directional audio, that would be fun.

I'm thinking like reddit posts, so far a lot of the describer Addons for NVDA describe to much in my oppinion, if i go on r/shitamericanssay for example and get a screen shot described; i odn't want to know, it was written on tuesday, and hte post has 100 upvotes, I just want to get to the post.

I'll admit I won't use it much but it'd be nice as an NVDA adon for when i need it.

Cross-Platform Responses

Here's what I got on my Windows computer: "Location unavailable. Make sure Location Services are enabled in iPhone Settings → Privacy → Location Services.". Since this is a web app and this is one of the features you highlight to promote it, in-app messages/responses should not provide specific references to one platform only.
PS: Here's another such message: "Important: Please turn off VoiceOver (iOS) or TalkBack (Android) before using exploration features. The app provides its own voice guidance.".

@ Enes Deniz

We don't support Apple Sign-In yet, but you can easily sign up using your Apple email address (@icloud.com or @me.com) with the 'Email & Password' option! As for your other question, I’m currently working on adding language support…it’s just going to take some time to make sure it works the same as English. The reason it is posted on applevis is because 1, it’s universally accessible, so it works on all devices and this is where everyone finds out about new apps. 2, there is a reason I didn’t post this in the Apple specific forums.

@ Enes Deniz

I built it for mobile but if folks want me to add coding for computers I can most likely do that. When I say universal, I mean, android and iPhone. It may take a bit of an overhaul vut I can probably do it. Let me look into it :). I do agree with you about that first message though. That’s probably lingering around there when I was trying to see if I could do a native iOS app, then do the same thing but make it a native android app. That should be a simple fix.

@ @ Enes Deniz Bookmark

Also that message about voice guidance is accurate. You do need to turn off VoiceOver or talk back to be able to explore your photos.

Possible bug?

So I've noticed a small issue, when I go to explore any of my photos. Once I have loaded up a photo, and disabled VoiceOver on my iPhone, I will use the built-in speech engine to explore the image. When I'm finished, and back out, I go to turn VoiceOver back on, and I get a dialogue asking me to grant camera access. Note that I have already given the camera full access for this particular web application.

Is this a bug with the AI assistant, or an iOS/VoiceOver issue?