In this episode of the AppleVis Extra podcast, hosts Dave Nason and Thomas Domville speak with StephenLovely, the creator of Vision AI Assistant, a rapidly emerging web-based accessibility tool designed primarily for blind and visually impaired users. Stephen explains the motivation behind the project, rooted in his own lived experience as a person who has been blind since birth, and how that perspective shaped every design decision. The discussion covers the app’s core philosophy of giving users control over what visual information they receive, rather than forcing them to listen to long, generic descriptions.
The conversation explores Vision AI Assistant’s major features in depth, including the Photo Explorer, which allows users to explore images by touch and zoom into specific areas for granular detail; Live Camera Mode, which provides near real-time environmental feedback and action detection; object tracking for navigation; sign and text reading via gesture-based interaction; physical book reading with page tracking; and optional voice commands. Stephen explains how the app leverages a progressive web app model to deliver instant updates across platforms, why he chose the Base44 language model, and how careful prompt engineering minimizes hallucinations while allowing medically descriptive output when needed.
The hosts and guest also discuss privacy considerations, data handling, accessibility trade-offs between web and native apps, and the financial realities of running AI-driven services. Stephen outlines future plans, including native app wrappers, potential integration with smart glasses, expanded social media accessibility, and a sustainable subscription model. The episode concludes with reflections on community-driven development, responsiveness, and the broader impact of having accessibility tools led by people with lived experience.
Guest contact information: Website: https://visionaiassistant.com Phone: 1-866-825-6177
Transcript
Disclaimer: This transcript was generated by AI Note Taker – VoicePen, an AI-powered transcription app. It is not edited or formatted, and it may not accurately capture the speakers’ names, voices, or content.
Dave: Hello there and welcome to another episode of the AppleVis Extra podcast. My name is David Mason and I am once again joined by my good friend Thomas Domville, also known as AnonyMouse. Today we're here to talk about an app that has somewhat exploded on the AppleVis website over the past few weeks called Vision AI Assistant. So yeah, thanks for joining me Thomas, it's going to be a really interesting conversation.
Thomas: Right, Dave, and this is going to be kind of an exciting one because this is something a little different than we typically do. Just because it's a web app and not like a native app per se, like we've typically done interviews with other developers, but with this technology, you said it right on. This has exploded. I mean, this has only been out for about a month. And if you haven't been on AppleVis, this is really something that you should take a look at. This is like the next level of things that we can do or AI can do for us visually that others are not doing. So I'm excited to do this interview with Stephen.
Dave: Yeah, absolutely. So rather than us trying to describe the app, let's dive in, chat to Stephen and get the lowdown on all of the great ideas he's had. So yeah, let's dive in. And hello, Stephen. Thanks so much for joining us. How are you today?
Stephen: I'm doing great. Thank you so much for having me.
Dave: So I think what we'll do, we'll come back and we'll speak about your background. We'll speak about the app in more detail and the future, all that kind of stuff. But first off, do you want to give us just a quick overview of what Vision AI Assistant actually is?
Stephen: So Vision AI Assistant is basically what you would really want to imagine from an AI assistant. You can, I just released a while ago, a live camera mode. So essentially what that does is you point your screen around, let's say a hallway or a room, and you can drag your finger along your phone screen. Find where different objects are located in regards to your position. And if you want to track, let's say, a door, you can double tap on that. And then it, hypothetically, you know, once we're fully, fully working, we'll guide you directly to that door. It can also read text. So if you just do a quick flick up, it should read signs if there's a sign on that door. Basically, it gives you information in a way that you would want it versus having to listen to a full description.
Dave: Okay, so it's more of a, yeah, you're exploring photos rather than just getting a description of the whole photo.
Stephen: Yeah, so it's a more, I call it a DIY approach. Right? Because, I mean, one of the big things we have right now, regardless of what AI you're using, whether it's, you know, the meta glasses, which are fantastic, I'm wearing mine right now, whether it's the, you know, the Solos glasses or any sort of AI app, you get one full description. You can't control which information you want to get. You can only do that based off of follow-up questions, right? Whereas this, you can kind of feel around, feel what you're looking for, and that way it also gives you a better sense of your environment too.
Dave: I wonder then is, tell us a bit about you, and then we can kind of come back and talk about how you kind of came about all of this. Like what motivates you? You've obviously... You know, you're interested in coding. How did you get into all this kind of stuff? How did you get into coding and why did you decide to do this?
Stephen: I did it a little bit of a, I kind of went a little bit of a different approach. So, I mean, I took a little bit of coding when I was in high school. My biggest issue with coding was kind of going character for character using text-to-speech line by line. As I'm sure you can imagine, that can be quite a pain. I'm using a platform for this called Base44, and it's more vibe coding. I'm not sure if you've heard of that. Yeah, so vibe coding is basically you have these ideas, right? You can speak in your regular language, and the AI can translate that into code.
Thomas: Really? So this whole thing you piece together is all, you just tell it what you wanted and what you liked about it and things like that, and then it codes it for you, and then you put it together, and then it creates it.
Stephen: Exactly. I basically use the backend as, like, my diary, what I envision, and then it brings it, it helps me bring that to life. So it does the coding.
Dave: And it's not written in, because this is actually another important part of this, is this is a progressive web app as opposed to a native app.
Stephen: Yeah, I was looking into doing a native app. Some of the big problems with native apps I've found, I mean, there's a downside to both, obviously. But a native app, it takes very long to get things approved. You know, it could take up to seven days, according to Apple's website. That being said, that's if you get approved, right? So let's say something was wrong and they denied that update, so then it takes another seven days if you try to correct that specific thing that went wrong. Whereas this, using a progressive web app, I can push out updates immediately. I'm sure you guys have seen within minutes to hours.
Thomas: I mean, that way you can be certain that it works all across platforms. So we're looking here, for example, as iPhone slash Android. So that way the web app will work both ways without you having to submit it to both stores and different languages.
Stephen: Well, exactly. And that's where I was like, well, you know what? This might actually work out the best. I mean, inevitably down the road, I kind of want to get native apps, but that's going to be a bit of a more financial burden than the app already is.
Thomas: Right. Well, I am curious. So are you blind yourself or visually impaired in any way?
Stephen: Completely blind.
Thomas: And how long have you been blind?
Stephen: Since birth.
Thomas: Since birth.
Stephen: Yeah.
Thomas: So what gets you the inspiration to just like out of the blues? Because your day job is nothing related to computer at any means like networking and coding, right?
Stephen: Exactly. So what inspires me, I think the most, I've always been someone who just kind of goes out on a whim and try something out. And if it doesn't succeed, it doesn't succeed. If it does, it does. The one best example I can give you guys about that is back in 2017 when I got Columbo, my guide dog. Like the detective. He acts like it too. But back in 2017, once I got him, a few months after that, I decided to actually give up my permanent address and just backpack around the country for a few years. Just me, him, and the accessible devices that I have. So we lived at a different Airbnb. We stayed at a different hotel. It was kind of living on the road. You just kind of go where you wanted to. Time wasn't really a thing. And we were on the road for probably, I want to say probably four or five years. And when I built this up, I was like, well, what would have made that even better than it already was? What would have been super helpful then? And that's kind of where my brain kind of goes. And I did it very, very impulsively. I just gave my landlord's notice. I was like, yeah, I'm out. Bye. I had a backpack of his stuff. I had a suitcase of my stuff. And we just flew around the country. We took buses around. We did nighttime travel, daytime travel. It was phenomenal. And that's actually how I ended up in a little small town in Sparwood, B.C. We have a population of less than 5,000 people. My spouse and I, we ended up buying her house a couple of years ago out here.
Thomas: What a story. I mean, seriously, that is an amazing story because you don't really hear that a lot. I mean, for anybody to be able to move around a lot like that and just to kind of explore yourself and to explore the country and to try to find where your niche is. And that sounds like the story of you.
Dave: You're living on vibes as well as coding on vibes.
Stephen: I need to change something to do with vibes. But I mean, oh, and that's the thing. And that's essentially what I learned is a lot of us were confined to specific boxes that society says that we're confined to. Right. And the reality is we're not. I like to push boundaries. I like to be in uncomfortable situations because then that makes me more comfortable if that makes any sense. Because then if I'm ever in a situation where I have to do something, I have that skill set to do it, regardless of what situation I'm in.
Thomas: I love it.
Stephen: That's where my impulsiveness comes from.
Thomas: And that's good. You know, that's good. And that's what drives you and this project of yours. And because of your drive into this project, my gosh, ever since you published this as a kind of a side project, I mean, it literally had exploded. I'm sure the reception has been way overwhelming for you because I have never seen so many comments in my life on AppleVis on a single thread like this. This is powerful.
Dave: And certainly, Thomas, certainly on the non-Apple platform part of the website, which is where this is because it's not technically an iPhone app. And still, it's like one of the most commented.
Stephen: The response is overwhelming. It's absolutely overwhelming because it was just a little side project. I was like, let's just have fun, post it, let's see what people think. I mean, I'm actually saying, I know I'm getting direct messages, the post is blowing up, I'm at work, my phone's just constantly going off. I'm like, oh no.
Thomas: This has got to be exciting for you. I mean, it's got to be thrilling. Let's talk about the app itself because there are people probably listening to this podcast that probably have no idea what we're talking about. And so from the basics is that you came onto the site, and you have a site project called Vision AI Systems. How about we dissect it a little bit?
Thomas: So because there's so much packed into this little web app, explain what a web app is for people, and then we'll kind of dive into each of the section of your web apps in more details.
Stephen: Yeah, so essentially what a web app is, you can basically do it with any website. So if you go to visionaiassistant.com and you tap the share button, you can add it to your home screen and it just functions like a regular app. So you don't know the difference. The great thing with that, as I mentioned earlier, you can just push updates right away and folks will receive that within minutes across all platforms, no exclusions, which is really nice.
Dave: I think what's interesting is, though, from my experience using the app, it actually feels relatively native because often with web apps in the past, I've found it's telling me heading levels and it's telling me link, which you don't get in native apps. But this app actually is saying button. You know what I mean? It kind of has something of a native feel. Is that something you've specifically done?
Stephen: Well, being a completely blind person myself, I want to make sure that if it's good for me, then it's good for the blind community because if other completely blind people are using it, at least you have a completely blind person who's testing it and creating it. So my goal is to obviously ensure that the AI has no unlabeled buttons. The thing is, we do have that whole, you know, you have to turn your VoiceOver off in order to use our overlays. But that's where really the magic kind of comes in. I tried implementing a screen reader. Guys, that thing was a pain. I tried three times to build a screen reader and it worked, but just the gestures, it was super sensitive to the gestures. For me, I don't have a problem tapping that button three times just to turn it off to go through the overlay screens, but I like hearing that it feels like a native app because that is the goal, even though it's a web app. If somebody calls the number or they send me a message and they're like, hey, this isn't working, give me a second, I'll put that in place, make sure that that's working. It could be working within five to ten minutes versus having to wait weeks.
Thomas: So essentially, when you are on an iPhone, it's not a native app. It's not from the App Store. This is an actual website that is created in something called Web App to make it feel like an app within your phone. So you can make a shortcut to it, and then it creates a little shortcut right on your home screen. It feels like a native app when you double-tap and open, because it essentially is. So let's talk about some of the features that's in the app. So let's start with the Photo Explorer. I mean, that's where you kind of started in the first place and kind of explained to people what you do exactly in the Photo Explorer and kind of explained the grid style.
Stephen: Oh, gosh. It really evolved in like a span of a month. It was crazy. So originally, it was the grid style, right? You'd have the 6x6 or whatnot. The reason why I wanted to do kind of a photo explorer mode, which basically started it all, so you can go and explore photos. I mean, I had someone test it and she could actually like feel through her family photos. Her mom sent her some pictures of their vacation and she can go exploring, double tap, zoom in and feel kind of what's around that area. Right. So you can actually experience it and not just hear it, if that makes any sense. The biggest problem with the grid system is because when we're standing in a room, let's say we're standing diagonally, the AI kind of leaves things out of place. So I've actually revamped that. I took out the grid system, and now when you put in a photo, the AI can sense where your finger is, and you're dragging your finger across the actual photo. So now it's more accurate when you're on top of a certain object. And getting the AI to separate different objects in the photos was so much fun.
Stephen: That's kind of where it is now, but it's really nice to be able to, you know, even, you know, a grandmother who kind of lost their sight, she can still experience, you know, the photos of her grandkids. Right. Instead of just hearing a description, it's there and gone within a second. This way, you know, she can open that up. She can put in the photo and kind of reminisce. Really kind of feel it.
Stephen: So if you wanted to, for example, feel, you know, let's say a facial expression, right? Because I set it up where the Photo Explorer mode can give you a lot more details. So when you double tap on, let's say, a face, or let's say you double tap on a tree, and then you kind of want to feel around that tree, feel the leaves, you can do that and then zoom in and kind of get a better sense of a leaf. So I've gotten it down to almost micro details, which is phenomenal.
Thomas: So essentially explaining to people is that we have these typical AIs that we have now. I kind of think of it like a mono and stereo kind of a thing because a mono meaning that this is what we've got now. The AI just describes that there's a family of five, mom's in the center, brothers and sisters to the side of that. And we just get one description. So it's a mono kind of described all at once. But you in this photo explorer just took the photo not only to describe it to you, but then you can just put your finger on that section of the photo and you want more details, say, on this person, you can tap on that person and zoom in to get more details of the person without having to do prompting, prompting, prompting, prompting, which is kind of an annoying thing. But you kind of took this to the next level. It's like I want you being in control of what you want to look at, your destiny, what you want out of the photo by just using your finger. And when you zoom in, it describes that scenario or picture or person, right?
Stephen: Exactly. You said to feel it. So is there haptic involved? I'm working on haptic. There's a bit of an issue when it comes to web apps and iOS. And this is where the pitfalls kind of fall in with web apps. So native apps, you can do that. You can have the haptic feedback. But on web apps, iOS won't let you access that. Android will. iOS does not. I want there to be haptic feedback, but until we get a native web app, that's not possible. Our Android users, though, they have the haptic feedback.
Thomas: Hmm. Okay. That's interesting. That's interesting. So regardless of the haptic or not, but just being having control of the photo and how far, how much zoom can we do on that?
Stephen: It depends on your camera. And this is where the tricky part comes into place, right? It's all going to depend on your camera. It's going to depend on the lighting. It's going to depend on the angle. And whether or not the AI can get down to that micro detail that you want with the camera that you're using or that was used for that photo.
Dave: So the zoom, there's no actual limit to the zoom as such. It's that it will go as far as the detail that it's able to see.
Stephen: Exactly. Exactly. Yeah. And through whatever camera took that photo.
Thomas: Makes sense. So the better the camera, the more magnification you have on something that gives you that purity, richness, high quality of the photos, the better. It's going to get better results.
Stephen: Yeah. And now cameras have become important to blind people like myself.
Thomas: Right. I mean, I never really thought about that. It's like, we really didn't really care about the quality. I mean, as long as we got the description you want, but in this case it makes it that much more important. Gosh. So with that, this is, I'll be honest. This is so mind blowing because this is like a whole, it's like you open the door. This is like a whole new scheme that nobody has ever thought of that. This is ingenious idea. I absolutely love this so much. What's driving all this engine? So what AI are you using to use this to describe and whatnot?
Stephen: I'm using the Base44 back engine. So I'm not using OpenAI. I'm not using Google Gemini. I'm actually using the Base44 LLM.
Thomas: So is that pretty limited? Is that like limited?
Stephen: I found it more powerful than, and I mean, one of the comments even on the AppleVis community says it's better than Gemini and OpenAI. And that comment really kind of got me going. I was like, well, because I've been able to pull something off that no AI has been able to do. And that's another feature, but I'm sure we'll get to that.
Dave: Yeah, because I wouldn't have known something like Base44 had its own LLM. I would have assumed they were calling Gemini or OpenAI or Claude or something.
Stephen: You can get the APIs put in and use that, but they do something a little bit different, which is probably why Wix actually purchased them for $80 million.
Thomas: Are there limitations? As in, I know some of, we'll just use ChatGPT, for example, sometimes they're very sensitive about describing people, so they don't like that. Some of them don't like very sensitive things. Is there a limitation like that with this?
Stephen: Not as much as the other ones. There is some limitations, but it can describe things more anatomically, like medical sense, right? Like if somebody sends you, you know, a less than desirable picture, for example, you're like, well, what did they send me? That's the reality. That stuff does happen, right? So it will describe certain things more anatomically and medically, whereas the other ones won't describe it at all.
Thomas: That's very... I love that term, medically. Very well said. I love that. It won't offer you an opinion, but it'll tell you what's there. You know, someone will not tell you if there's a beard on that person. There's some who will not tell you what color eyes you got. Just simple, basic things. Like, I want to know if you got blue eyes. I want to know if you got brown eyes. And sometimes I can't do that. And so I didn't know what the limitation would come to that. I wasn't getting to, like, detail.
Stephen: I think we're going to set back an example to kind of explain the AI's capabilities, right? When I was mentioning the anatomical, medical. Right. It shows you that, yes, you can get some pretty decent descriptions. And I'm not sure if you folks had tested it out a lot yet, but we have little to no hallucinations.
Dave: That's awesome. Do you know why that is? Like, if there is something specific about this model that, you know, rules out?
Stephen: I think it's partially just the model that they developed, and I think it's partially because of my strict prompt engineering for the descriptions. Because if it's not a guarantee, like if it doesn't know when it's guessing, it's not allowed to guess. So this is both vibe coding and prompt engineering. And I have actually taken some prompt engineering courses.
Thomas: So by you telling it, I am blind, I'm visually impaired to be more specific in these areas automatically without, is that how that's been working without us? Like we do with other AIs, we have to be very descriptive about like what I want out of it.
Stephen: The AI that helps me on the back end knows that this app is developed for blind people, so it helps me develop it accordingly, too.
Thomas: Cool. Let's talk about this other feature, this live camera explorer. Now, tell us about this. Now, I know this has gone through some generation, just like the photo explorer has as well.
Stephen: Yeah. So there's the reasons for that. So originally it used to be room explorer. You would walk in, you would take a photo of the room and then you can scroll. But I was like, well, that doesn't make sense because we already have camera mode. All the newer iPhones have the camera button on the side anyways, so they can take their own photos and then upload it. I was like, what if we change it to live camera mode where they just move the phone around and then the screen will adjust. And so that's where it switched from room explorer to live camera mode.
Stephen: We've also with that been able, I say we, it's really me. It just sounds better to say we. But what I've been able to do is something that no company, no glasses, nothing has been able to do. And one of the users on the AppleVis website mentioned it too, is when you double tap with two fingers, you get into that mode where it will automatically describe your surroundings, like actions. Right. So somebody's waving. It will look and it'll say, oh, somebody's waving or the dog has a toy. So it will actually describe actions in real time, which is something the community has been wanting for for a long time.
Dave: Is it sending a photo every half second or every two seconds or something?
Stephen: Every half second.
Dave: Yeah.
Thomas: Oh, wow. Wow. How does the processing that fast? I mean, you got to have a really good internet connection to be able to do every half second.
Stephen: So the reason why it works so fast is because I've told it, especially in this type of scenario, like when you want to quickly get information about your surroundings, the trick is giving the user shorter feedback than longer feedback. So if you can take the photo and be like person waving versus I see someone in a black hoodie waving their hand at you. It takes a lot longer for the language model to get out all of those words versus two to three words. So that's how I was able to get it to come back much quicker.
Thomas: It's more of a balance. It's a compromise of what you can get quicker. You know, this is what's really cool about this. It's because you understand where we are when it comes to so-called live AI. It's not what it seems. It's not what it's made to be because even on the Meta glasses or whatever you want to use, the Gemini Live, is that, yeah, it's live, it's looking, but it's not doing anything until you prompt. And you took this to the next step that we have all been waiting for.
Thomas: It's like, when are you people going to go to the next step where we have seen and heard before in the past where things are happening in actual real time? And so this was, so this is the part which is kind of just wowed me. It's like, finally, finally, somebody understood exactly what we want. And it's so cool that you were able to kind of work that in there.
Thomas: But with that, you took this a step further. Even that, to make this even wower, get the wow effect, is that I think you took out duplications, right? Duplication announcements?
Stephen: Yeah, yeah. Because there's no point. If it tells you the door's on the right, you already know the door's on the right. There's, I mean, a command. You can double tap again if you forgot what it said. But otherwise, there's no point on duplicated messages. As a blind person myself, I find it really annoying when you're door on left, door on left, door on left.
Thomas: You do measurements as well, right?
Stephen: Yes. So the measurements are estimates based off of what it can see from the camera. So the thing that you've got to realize when you're using the camera is when you're further back, the object is going to look smaller in the photo. Right. So the AI, it's not exact how many steps. It's a rough estimate. But as you get closer, the door, for example, will increase in size. So that's how the AI can tell that you're getting closer to the door and it'll give you the approximate how many feet based off of the size of the object.
Dave: Will it use LiDAR on those phones which have it?
Stephen: No. No, there's no LiDAR. It's strictly camera.
Thomas: You know, I don't know, guys. Dave, think about this. Dave, you finally got a live AI that we want. It refreshes every half second. It removes the duplication and now gives you an estimate. Yeah, it's not going to be exact, but, I mean, my God, this is so amazing. This is why I wanted you on here because it's like the door is not just on the left, but the door is about 15 feet from the left.
Thomas: And so that's like another level that you clearly put a lot of emphasis into this. And you being blind just made our world better because you know what's important to us.
Stephen: You know, they have really, really good intentions. I don't like to speak ill of any company. It's always started with good intentions. The problem is paying attention to the community and really knowing what is going to be beneficial.
Stephen: There's so many things visually that that AI is going to miss that might be important to you.
Thomas: So speaking of Meta Ray-Ban glasses and all the ones that are out there now, I could hear, and you probably already heard all this, it's like, oh my God, now put this on the glasses. But there's limitation to that, isn't there?
Stephen: There is, yes. So I'm still waiting. I know the SDK has come out for Meta. However, you have to use a native app. So what I'm working on is kind of a workaround. I'm working on just building a native iOS app wrapper and then once you open the app it just leads you to the website, which may actually work. But that's going to require me to hire somebody to code it.
Thomas: No, that makes sense. And one last thing I want to mention about this live camera mode too is that obviously we need to make this aware to people that this is not to be used as a primary mobility tool. You still have to have a pretty darn good internet connection as you walk around. It's not instantaneous.
Stephen: Yeah. It's as quick as we can possibly get AI to date. But you're right, it's not instantaneous.
Thomas: I kind of saw maybe about a one second delay now.
Stephen: The refresh rate is less now. It was funny because I listened to when you talked about it originally. This was before I did the update. So it's not two seconds anymore. It's down to a second now based on how fast I can get the LLM to respond.
Dave: Plus, of course, you have to account for the fact that it has to have time to actually tell you the details.
Stephen: And that's why the response that you get from the AI in live camera mode is very short, very brief. No more than three words.
Dave: I can imagine Thomas, you walk into the space and you're saying, duck.
Thomas: Left, right. Regardless, this is a vast improvement for what we have nowadays on anything. And that's what's so fascinating.
Thomas: Another feature that you put into this is object tracker. Now, explain to us this object tracker feature.
Stephen: Yeah, so for example, when you're walking through a mall, right, you have your phone in the one hand, you're standing there and you're like, well, what's around me? And you find, let's say a kiosk. You can double tap on the kiosk and the phone will help you navigate to it. So it'll only specifically track that one object. Everything else gets ignored. Once you get to that kiosk, the AI will scan it and then it will actually have a full internal map to the building during your session.
Stephen: And then if you needed to track, let's say the front doors, you can double tap and track that one specific object and ignore everything else. The AI will tell you if there's like a spill on the floor, if you need to go slightly right or slightly left.
Dave: Will it maintain that map over time or is that just during your session?
Stephen: Just during your session. I want to be careful with privacy and data concerns, so it's only during your live session.
Dave: That actually leads to a question about security and privacy.
Stephen: It gets discarded off of the back end as much as I can do. There is a privacy policy. Data does need to be collected for the AI to function, but photos do not stay on the system unless you save them. If you want your data removed, I can delete your account and your data is gone.
Dave: Are photos used for training AI in any way?
Stephen: That's based on the Base44 privacy agreement. Once I delete your data, it's gone from my system.
Thomas: That's nice. And you're not corporate. You're not making money off data collection.
Stephen: No.
Thomas: So what about text-based explorer?
Stephen: Are you talking about the book or the swipe up?
Thomas: Let's talk about both.
Stephen: The book feature is meant for physical books. I bought physical books and wanted to be able to read them. So you can open a book, point the camera, and it will read it. It can also track page numbers to help you know where you are.
Dave: I've definitely seen people asking for that.
Stephen: And the swipe up is more for signs. If you're walking and need to read a sign, store name, hours, washrooms, that's what it's there for.
Thomas: And you also included voice commands.
Stephen: Yes. Some people prefer voice. Some people prefer gestures. This app is meant for everyone. Different strokes for different folks.
Dave: Sometimes you don't want to be talking out loud, so touch is better.
Stephen: Exactly. Everything is under one roof. You don't have to jump between apps.
Thomas: Is everything cloud-based?
Stephen: Yes. It's all cloud-based.
Thomas: So what's your future roadmap?
Stephen: There's more coming. I want this to be the number one app people go to for accessibility needs. Eventually native apps, eventually glasses integration. My number one goal is getting it on glasses.
Dave: Are you looking at other glasses beyond Meta?
Stephen: I've looked at Echo Vision, but hardware is financially straining. For now, the iPhone is incredibly powerful.
Stephen: We actually hit our AI quota recently. I'm paying close to $500 Canadian a month now. But it's worth it for the community.
Dave: You did try a subscription model.
Stephen: There will be a subscription model. Probably $4.99 US a month. That would cover costs.
Thomas: That's incredibly reasonable.
Dave: You could have built this just for yourself, but you shared it.
Stephen: That's because blind products are rarely led by blind people. This was built by the community, for the community.
Thomas: Is there anything else we didn't cover?
Stephen: There's a social media feature. It's basically Instagram for blind people. You can explore photos people post. I want to eventually integrate this directly into Instagram and Facebook via Meta APIs.
Thomas: My brain is about to explode. This is incredible.
Dave: The response from the community has been amazing.
Stephen: I expected drama, but it's been 99.9% positive.
Dave: You set a standard for responsiveness.
Stephen: That's because it's a web app. No approval delays. Just me and my late-night thinking. Most features were built at 3 a.m.
Dave: When do you sleep?
Stephen: Good question.
Thomas: Where can people find this app and contact you?
Stephen: You can go to visionaiassistant.com. You can contact me there. I also have a toll-free number, 1-866-825-6177.
Dave: Great stuff. Thank you so much, Stephen.
Stephen: Thank you very much for having me.
Thomas: You have done a wonderful job. This is tremendous.
Stephen: Thank you so much.
Thomas: Thank you for coming on and we will chat with you again hopefully sometime in the future.
Stephen: Of course. Absolutely. Thank you again.
Thomas: All right.
Dave: Bye. Stephen Lovely from the Vision AI Assistant. So really interesting chat, Thomas.
Thomas: It's just remarkable, isn't it? This was a great interview simply because there was so much to cover. And the story behind it all just makes me feel so good.
Dave: It's a great example of someone building something based on lived experience.
Thomas: With everything going on, all these AIs, this is just a whole new level.
Dave: Yeah, absolutely. Encourage people to go check it out, add it to your home screen, and engage on AppleVis.
Dave: Thanks to you Thomas again for joining me, and thanks everybody for listening.
Comments
Listwning
I always get terrified to hear my own voice, and I talk for a living lol. It was such a pleasure to meet you both. Thanks for having me on ❤️.