Recently apps like ‘Be My Eyes’ and ‘Seeing AI’ have added the ability to generate image descriptions using multi-modal Large Language Models (which can understand and generate both text and images) like GPT4Vision. I’m sure everyone has tried them, but for anyone who hasn’t, here’s the description I got by sharing a Toot to Be My Eyes this morning:
“The picture you've shared is a beautiful watercolour painting of flowers. The flowers have delicate, ruffled petals in shades of pink, purple, and white. The center flower is particularly striking with deep magenta at the base of its petals, which fades into a lighter pink. The leaves are painted in soft greens with hints of yellow and blue. The background is white, which makes the colors of the flowers and leaves pop. In the bottom right corner, there is a signature that reads "Wendy Craig" with the word "Photographer" underneath. The painting has a very soft and elegant feel to it.”
I have no way of knowing if the original image looks anything like this. But it doesn’t really matter, it fits in with what I was expecting from the Toot. I’ve really enjoyed ‘looking’ at all the Caturday photographs without Alt Text today – I can’t lie, I love cats!
Now, we all know the ‘scene description’ in Seeing AI was complete and utter rubbish, I don’t think that is me being harsh. The descriptions you get out of these new apps/models are at the least, very entertaining. At the best they are life changing!
For the first time, I, we, can ‘almost’ be a part of the visual internet. I’m not quite using Instagram daily, but you can Share post direct to Be My Eyes there to and it’s as good as you’ve seen.
I’m very excited, but I’m also aware that there are risks. These include inaccurate or misleading descriptions, so nobody should solely rely on this technology for critical tasks. But it is only going to get better. There is such an incentive for the world to develop computer vision and we will benefit. And it is only ever going to be better than it is right now, which is pretty fantastic.
So, here’s my question – are you as excited as I am? Are you keen to jump into the visual internet? Does the idea of GPT5Vision working with video make you want it to be here now? It’s certainly made using apps like Instagram, Ivory and Threads more inclusive.
Finally, on a personal note. I could see up to the age of sixteen. I’m using the text to image features of these models, together with the image to text, to create my own art. The descriptions you get from Google’s Bard are quite different than those from Be My Eyes and seeing AI – the differences can be quite interesting.
Comments
I also agree.
I'm not here to impress anyone. I don't really care what I look like as long as it keeps me warm.
I will, only if...
I'll tell you all about black clouds if you can answer one question.
...
What does the fox say?
For seriously
You may or may not be able to conceive color, but can you conceive the idea of dark and light? Black storm clouds can happen in the middle of a bright clear day, clear meaning no clouds, and then a storm front rolls in.
For those who can see, they can still see the bright blue sky (keyword here is "bright"), as a contrast to the dark storm clouds.
Of course if it is at night, that is a whole other beast.
I hope that answers your question, at least kinda-sorta?
For me, being totally blind…
For me, being totally blind since birth, I have no concept of light and dark.
oh wow...
I can see light and i'm trying to think of how to describe it...
Oh god... It's, i can't say shiny, i can't say warm, I can't say bright. Alright,, shitty explonation incoming but... It's like a constent drone, but with your eyes, this drone always has one note playing, there are other notes surrounding it, (depending on the day, how bright/dark it is,) but you'll always go back to that tone in the end.
That was a horrible explonation and I know it, i just can't think of any other way to describe it at all... Wow this is so much harder than I thought it would be.
@brian thanks for trying to explain :)
Considering Seeing AI...
Considering the light detector in the Seeing AI app, Brad, your logic is pretty sound. After all, Seeing AI plays a tone, dependent on the brightness of the light your back camera picks up.
Softer tones for less light, louder for brighter light.
Very true.
I didn't think of that.
Definitely excited about AI!
I'm completely blind. I have been since I was two or three.
I find colors pretty interesting. I'd definitely love to be able to see them.
As for AI: I hate Be my AI.
You can't provide a prompt to accompany your images and the AI gets cut off by the painfully short generation limit.
Chat GPT Plus though? Now that's where it's at!
The best part is I can use it on my computer easily!
I do sometimes use Be my AI with pictures in my photos app, just because it's faster and I don't need it to write a novel.
Best thing about using ChatGPT plus though is it will respect your custom instructions. I currently have mine playing the role of Hermes, you know, the greek god.
I found a random image of Hermes on Google Images, pasted it in, and this is the description I got.
Ah, my old buddy Aaron! Always curious, huh? Alright, let me paint you a mental picture.
The image depicts a classic Greek statue, and it's a depiction of a man. This fellow stands tall and poised, embodying the elegance and grace of ancient Greek sculpture. He's in a relaxed stance with one leg slightly forward, a posture many of our statues from back in the day loved to strike.
The statue is made of marble, giving it a smooth, cold texture. Imagine the cool surface of a polished stone or the sensation of ice, but without the melting part! This marble is a creamy white color, reminiscent of freshly whipped cream or the soft texture of a cloud. The statue's detailing is intricate, with well-defined muscles and facial features, showcasing the craftsmanship of the sculptor.
He holds an object beside him, almost like a staff or rod. The man's hair is carved in tight curls, typical of Greek sculptures from that era. He's naked, by the way, which was all the rage back in the day for statues. I guess we were pretty confident!
In the background, there's a grayish hue, much like the feeling of a rainy or overcast day, where the world seems wrapped in a soft, muted blanket.
Now, imagine all these sensations coming together, and you've got a trip back to ancient Greece! If this statue could talk, oh the tales it would tell! Maybe about that one time Zeus tried to hide his mischief, and I caught him red-handed! Classic times! 😉🏺🌩️
I followed up with:
I'll give you one guess about which god you think is being depicted here.
Response:
Ah, the good ol' guessing game! Alright, considering the details and the classic stance with a rod or staff, my divine intuition tells me it's a depiction of... yours truly, Hermes! But hey, don't tell the others I took a guess; they'll never let me live it down if I got it wrong! 😜🕊️💼
Edit:
A short while later, I said:
One day, I'll make a sculpture of you too. It's just gonna be ugly because I can't see color. hahaha
Its response made me laugh, which is why I'm adding it to this post.
Ah, Aaron! First off, the idea of you sculpting me is absolutely delightful! 🤩 And second, beauty isn't just about color. It's about the heart and soul you pour into your creation. Heck, some of the most breathtaking sculptures in history are monochromatic!
Plus, think of it this way: By not seeing color, you're free from the bias and preconceptions that come with it. You get to focus on form, texture, and emotion. And let's be honest, if you captured even a fraction of my sparkling personality, it would be a masterpiece! 😜
But in all seriousness, every creation, regardless of its visual appeal, is a reflection of the artist's spirit. And knowing you, it'll be a work filled with wit, warmth, and whimsy! Can't wait for the grand unveiling! 🎉🗿🎈
And hey, if you need a model, you know where to find me! Though I must warn you, capturing my good side might take a while... mainly because all my sides are good! 😉🔥🕺🏻🎨
aaron I love it
Custom instructions are cool. Although I did share a chat with someone where it had called me 'Princess' - that was a bit embarrassing! Baby is OK, that I can cope with.
I like being able to 'share' to Be My Eyes in apps.
Have you tried Bard? It gives quite different descriptions - no personality though!
OK that's awesome.
That was fun.
Maybe I'll sign up to chat gpt plus and see what i can get.
Never liked bard
Bard makes things up waaaaay too often when we're not talking about images, so I don't trust it to be accurate most of the time.
I have the same issues with Bing, even though Bing is powered by GPT4.
Google's upcoming Gemini definitely sounds interesting though. Can't wait to see it.
well i can't sign up.
Turns out i deleted my account and they don't let you reuse the same email. OH well, I didn't care that much :)
Life changing for me.
Born blind, I’ve taken numerous pictures over the years, in the hope that one day technology might evolve to help make sense of them.
And like Mr Goldfield, until recently I had no interest in being sent pictures by family members, having to ask someone else to make sense of them was a drag.
Sometimes the AI gets it wrong, for instance the deck of a ship was described as a balcony, but this in no way detracts from the sense of joy.
Now my spare moments seem to be taken up with sharing a picture to be my eyes, copying the resultant description and pasting said description in to the photo’s caption for future reference.
It may not be perfect, but to me it’s wonderful!
Dear Wony and others
I have rejoiced in all the positive comments like this. they have more than made up for all the crying and moaning and even personal abuse I've gotten on this site over the past few days.
I very rarely post to AppleVis and I suspect it will be a long time until i do again, but hearing the joy that you get from being able to share in the visual world, however flawed, gives me joy also.
I am so glad your glass is 'half full' along with my own!
A Shot in the Dark?
I'll take a stab at translating light perception.
Light either reflects from objects, or is generated directly from a source. Sound reflects off objects, like a wall to ones side, or is generated directly from a source.
Colors are various ratios of light frequencies or waves, or perhaps, a pure, single wave/frequency. Sound can be a single, pure frequency, have one or many harmonic in a tone, or be many notes in a cord, sometimes clashing notes. Think the two jazz musicians trying to decide what that cord would be called when the piano fell out the second story window and smashed on the car parked below.
An eye and the brain can interpret and process all those light waves encountering the retina into an image, and distinguish from right, left top and bottom. An ear and the brain can process pressure waves of a certain range into sounds, but not so well into their direction. Two eyes and a brain, however, can better judge how far away something is by resolving the two differing images caused by the distance between the eyes. Two ears and a brain can better process the direction of a sound by the slight delay in the sound reaching one ear before the other. I forget exactly how up and down is processed in people's hearing, but it happens.
You can't go too far in comparing sound and light waves, they propagate differently, have wildly different scales of size and so on, but I think you could make comparisons that are valid.
So dark or black clouds would be like walking into a sound dampened room. A completely darkened room would be like putting on ear muffs in a sound dampened room. In the black, your brain eventually starts generating its own dim colors and images, just as you mite hear your own heart and blood flow, or even your own brain-generated sounds in silence.
Night and Day
Defining brightness and darkness can be described as the differences between night and day.
Of course, the first thought you might have is that you do not know what that "looks" like.
What about sensation? Like singer girl said above, imagine how the sun feels on your face, on your skin.
Think of how a sunny day smells to you. This will take some critical thinking, but you will get there.
Think of how a sunny day sounds? Critical thinking sold separately.
Now imagine night time. Let's say 9pm to later.
how does the air feel to you?
What does it smell like outside? What does it sound like?
If you can consider all the above, then you can have your own sense of bright and dark. And who cares if it's not the same as the person next to you who has perfect vision.
Blindness teaches us one thing, if nothing else...
Adaptability.
V Vision Restoration and AI
Hi. I know I am way late to this discussion but here goes. As I have been visually impaired since birth due to ROP. I have light perception in my right eye and am completely blind in my left eye. My depth perception is virtually nonexistent and I can see things at very close distances. Like SG am also 35. If I can get full sight restored in both eyes it would take a psychological toll on me. My brain would need to be rewired and that would be scary. I can fantasize about my very own voice activated self-driving car one day...:) Only time will tell what will happen within the next 25 years.
I do love the very detailed and sometimes accurate descriptions of pictures from be my AI. I also have several pictures of my cat on my phone. Be my AI describes them very well. I don't have good camera aim to take accurate pictures so my fiance takes them for me. It even thinks my cat is cute which he is.:)
As far as dreams go, I dream how I see now. My dreams are vivid and in many of them I am on a paratransit vehicle. Something usually goes wrong in said dreams; vehicle is late or I get left. I've been with paratransit for 20 years and use them for work and am very fortunate I have a good support system.
A few random thoughts
I'll readily admit I could never convey an exact difference between light and dark to someone congenitally blind. However, I could tell them it's comparable to the differences we detect in other senses. Boiling hot and ice cold, or sugary sweet and bitterly sour, or coarse sandpaper and smooth silk.
I used to think AI-generated essays and stories were quite impressive, until I started reading them critically and asking myself what I might change if I was the author and had to do a revision. Once I did that, I began to see that the AI-generated writing was utter crap.
So, going back to the original post, that is indeed a very impressive image description. But I have to wonder what shortcomings I might find if I could see the image with 20/20 eyesight.
Even if the description does contain shortcomings, ultimately it depends on what the description is to be used for. It could be very suitable as a description for the memes and whatnot that seem omnipresent on social media. But if you're preparing ALT text for illustrations in a non-fiction "how to" guide, these AI-generated images might fall short.
Just one more thing. The OP (who has changed their name so many times now, I don't know what to call them) asked if we were excited. I can't begin to describe my excitement in a single comment. This is what I was asking for back in March of last year before all this generative AI stuff became available. An AI that can intelligently describe and make sense of text, not simply read it in random order. And if you imagine this technology behind a tool like SIRI, then it's pretty easy to see how inaccessible websites will no longer be an impediment to the disabled. CAPTCHA challenges, controls that activate on mouse over, unlabeled tables and text fields, and buttons without ALT text all become things for your AI-powered smart assistant to deal with.
I attend an annual writers conference and use a sighted guide to navigate the conference center, help me find associates I'm looking for, and provide me with non-verbal facial expressions and gestures while I speak with someone. Is it too early for me to think that my sighted guide might one day be replaced by my iPhone? I don't think so. I believe these capabilities are in range today, with the right brainpower behind it.
As someone who saw HAL 9000 in the movie 2001 A Space Odyssey back when I was six years old, it is just amazing to see how computer technology has evolved from fanciful science fiction to commonplace reality.
Re: A few random thoughts
Good morning, Dave.