Like buses, they all turn up at once... LIve video rolling out for Chat GPT advance mode over the next week.

By Oliver, 12 December, 2024

Forum

iOS and iPadOS

Chat GPT, hot on the heels of Gemini 2.0, have announced that video and screen sharing are coming to their advanced mode for plus and enterprise users, the roll out coming this week. Europe, unfortunately, will not be included for now.

https://mashable.com/article/openai-brings-video-to-chatgpt-advanced-voice-mode

Video sharing is the magic we've been waiting for, the ability to point our phone at items and have a real time conversation with Chat GPT about what it is seeing in real time. Andy posted an excellent video back in May showing what it was capable of:

https://www.google.com/url?sa=t&source=web&rct=j&opi=89978449&url=https://www.youtube.com/watch%3Fv%3DKwNUJ69RbwY&ved=2ahUKEwjP6N3lgKOKAxVjVkEAHXPnKe8QtwJ6BAgNEAI&usg=AOvVaw2ilEPFvaqP2C6Lu5B0sKqo

This is an example of Be My Eyes integration. Maybe our new benevolent overlords can comment on when this new power will be native to the Be My Eyes app?

All very exciting though. Not got it here in the UK yet, but will be looking over the next few days.

Updated to include external article.

Options

Comments

only for advanced users

I just canceled my GPT membership for advanced mode! Any idea if this'll be rolled out to regular members as well?

only for advanced users

I just canceled my GPT membership for advanced mode! Any idea if this'll be rolled out to regular members as well? Like people who use the free version?

Hoping it's it'll be available in the UK

On open ais help page, it says EU, not Europe , so I'm hoping that we'll see it in the UK in the next week. Here is the exact wording:
“Video, screen share, and image upload capabilities will be available to all Team users and most Plus and Pro users, except for those in the European Union, Switzerland, Iceland, Norway, and Liechtenstein. We expect this rollout to be completed over the next week. Usage of video and screen share capabilities is limited for all eligible plans on a daily basis. Usage of image uploads counts towards your plan't usage limits.”
The ful article can be found here:
https://help.openai.com/en/articles/10271060-12-days-of-openai-release-updates

The specificity of EU and…

The specificity of EU and those other non-EU european countries does mean it will be out in the UK.

Regarding coming to non-advanced users, I assume so but advanced is always going to be the place the new toys come out first. It probably won't be for a good while. Is advance voice mode available on the free version?

And here I go

Obsessively checking the app every five minutes 😂

Not here yet

Saw the live streme; neither the live video, nor santa is available for me yet; no app update either till now. Hopefully it'll get here in a few days time. And yeah, it'd be interesting to know more about any possible BeMyEyes integration.
So, will the third bus be Llama? what do you all think? Meta is usually a little late to the party but... Will we have live video capabilities in a few months time?

I have it!

And we are live!!!!!!!!! Omg and all it took was me upgrading to pro and now ChatGPT can see.

pro subscribers must be getting it first

Ah, so pro subscribers must be first to get it, which I guess makes sense?
I don't have that kind of money though lol.
Hopefully it isn't too long a wait for the rest of us?

I think LLAMA is getting it, but how fast that trickles down to

I think LLAMA is getting it soon, but how long that takes to trickle down the glasses is anyone's guess. Also, we are likely to have the same issue with reading text, identifying faces and people and so on. The exciting part of OPEN AI getting it out is the ability to prompt the AI that I'm blind and want things described another way, and that, unless I misunderstood, it will be coming to BME.

Pro? Isn't that £200 a month? Ouch! They did say over the next week, so next six days now but, let's face it, Open AI aren't the most reliable when it comes to promises and deadlines.

I've got it

Well, I actually have it now!
Just decided to check before going to sleep.
Will do more testing tomorrow.

I also have it

I'm going to test it out soon.

Got it but

I got access, I turned the camera on and everything, but it says it can't access the camera and that it can't see a thing. Do I need to go into the settings and give the app permissions or something?

Realtime monitoring

Got it to work and here's an interesting observation: the native chat gpt app doesn't do realtime monitoring; it's not that it cannot, rather that it won't as it seems it's a limitation placed on it.
So basically, you can't hail a taxy with the help of gpt, unless the OpenAI-BME partnership is coming up with a blind-specific thing without this limitation inside BME.

Yeah, I found the same. I…

Yeah, I found the same. I did a test of asking it to tell me when I dropped a coffee pod on my worktop. it said it saw it but didn't notify me.

Also, it would seem the imaging is quite slow. I heard it was a photograph each second, but not sure if that is the case. It's probably down to server load.

Re, Chat GPT not seeing anything, might be an obvious questions, but are your lights on? It needs light to see. I did the same thing. I'm the weird guy in flat 98 who lives in perpetual darkness, like some vampire.

How do you know if you have it?

I have the subscription, but not the $200 monthly pro. I think some of you have the same subscription as me, at about $20 monthly. My app had an update yesterday, but I don't seem to have this.
I tried gemini and it was pretty cool, but it would be nice if the GPT app that I pay for would do it.

If you're subscribed to plus

You should get it. just click switch to voice mode, and it should ask you something basically amounting to do you want the video thing. Even if it doesn't, just look for a video camera button once the voice mode turns on.

I found going into app…

I found going into app switcher and closing it before starting it again helped. I also got the santa voice... Utterly deranged!

re: Realtime monitoring

Same observations here.
When I ask: "tell me when you see someone in the camera frame." He comfirms to do this. But when I stand in frond of the camera he doesn't notify me.

Hope this will be possible in the future. But for now: this is freaky cool already.
2 years ago I never thought this will be possible!

apple shortcut to start vision mode

Hello,

does someone know if there is a way to make a shortcut to start talking wiht de advanced voice mode whit vision thurned on?

Yea it would be so convenient to use such a shortcut

I hope OpenAI releases such a feature/shortcut in the first place. So it's not a hassle to use the live audio/video stream with ChatGPT.

To be honest

This is just the beginning of what’s going to be possible.

I now have it, and just wow!

This is pretty cool, and I bet will get even better.
Thank goodness I can tell it to stop talking like Santa though. I'm just not that cheerful. LOL

It's not super accurate when just describing a sceen yet. It said that the hallway outside my office was the sky. But I know it's just going to get better and better.

Not calibrated for blindness

the lack of accuracy in seen descriptions etc is mostly because of this lack of calibration; I bet with a bit of specific fine-tuning and training on specific data itself, it'd be super wonderful. In the imminant agentic future, what do you all think of an AI agent for the visually impaired?

It does use memory so might…

It does use memory so might be worth telling it you are blind and then listing off your description requirements.

I suspect it is using an older GPT which is less data heavy because it's taking how ever many pictures per second.

It's the Be My Eyes implementation that will be the next step, I think.

Saying that, the PiccyBot dev is saying they'll (sorry, not sure of gendre), is looking at putting it in a whatsapp chat so you could call it and have it working through your ray-bans, which would be, to quote my inner teenager, sick.

@ Gokul

It is not giving me inaccurate or bad scene descriptions at all… I actually have found however, that some voices actually work better with the vision than others.

this shouldn’t need to be said, but I’m going to say this anyway

Ok guys, if you are having issues, this may be a good place to talk about it without having anything done about it, but the best thing to do is report it through the app. Especially if ChatGPT is not continuously looking out when you tell it to look for something. As they have just released the feature, they’re gonna be paying extra attention to any complaints that come in about it. If only one of us is reporting it, obviously they’re not gonna take it too seriously as it’s only one out of many. But also for the love of God, please be respectful when doing it…

Your point about reporting…

Your point about reporting the issue is a good one. I would question your sign off though, it's verging on patronising. We are individuals with our own voices, so we may report as we wish, but also, this assumes that people will be rude or aggressive, which is problematic. Firstly, those who are rude and aggressive will not listen to your suggestion for civility, and those who will be civil don't need to be told.

I understand your intention, there have been enough bin fires here to show that, when frustrated, there is often an outpouring of anger but, and I think this is what may have made me twitch a little, I, nor anyone else, is a representative of blindness. I speak with my own voice on my own behalf.

sorry to pick up on it so strongly. Obviously there was something under the surface I have been thinking about of late that it raised.

But yes, reporting the trigger issue is a great call. Thank you.

Just to clarify the…

Just to clarify the reporting process, so you don't do what I did and search about in settings for a means of reporting... After you finish your session hit the thumbs down and then you can say what the issue is.

@Stephen

The quality of the descriptions/answers (to be more specific) defers with the voice one is using? That's interesting. In which case, which voice gives better (subjective of course) descriptions?

Send it thru contact form…I got a reply with in hours.

I got an email back from open ai within hours of sending an email thru the settings then help center and filling out the contact form:

Jay
Hello,

Thank you for reaching out to OpenAI Support.

Thank you for sharing your feedback with us! We’re thrilled to hear that you’re enjoying the new video mode for advanced voice—your appreciation truly means a lot. At the same time, we’re sorry to hear about the challenges you’re facing.

Regarding the issue where ChatGPT acknowledges the request but doesn’t notify you when something appears in the frame, we understand how this can challenging, especially when the model doesn’t follow through as expected.

Here are a few factors that could be contributing to the issue and steps you can try:
Model Limitations: While the advanced voice and video capabilities are powerful, they may not always perfectly detect or notify objects in real time. The model might require clearer prompts or frames for improved accuracy.
Prompt Specificity: Try providing more specific instructions or breaking down the task into smaller steps. For example, instead of asking ChatGPT to notify you when it sees your dog, you could first ask it to describe what it sees in the frame, followed by a request to notify you.
Lighting and Clarity: Ensure the video feed is clear and well-lit. Poor lighting, blurry visuals, or fast camera movements may make it difficult for the model to identify objects reliably.
If the problem continues, please provide additional details or examples of the prompts you’re using. This information will help us further investigate and work toward a solution.

We appreciate you patience and understanding in this matter.

Have a nice day!

Best,
Jay
OpenAI Support
We're here to help
Reply directly to this email or through our Messenger
intercom

@ Gokul

I have found Arbor to be the most reliable.

Based on there reply,

It seems to be a live video feed…

that sounded like a very…

that sounded like a very Chat GPT response, which, given the context, makes sense.

Generalized

I do agree with Ollie here. This seems like a very generalized response without, how do I put it, 'much sincerity' to it? I mean, GPT itself says that it is not permitted to live monitoring whichever way I prompt it.
@Stephen did you mention specifically in your message that this is important to you since you're visually impaired?

Not that your feedback won't…

Not that your feedback won't be logged. Most likely they use Chat GPT to collate responses or something similar. It would surprise me though if an AI company wrote out a response long hand.

I don't think this changes anything. We should still put forward such requests.

feedback

I too did send a message about this.
Hopefully this can be improved upon in the future but even now I've found it to be quite useful for identifying and reading things.
I look forward to seeing what happens in the future, but I am surprised we haven't heard so much as a word from be my eyes at this point, since the CEO himself had mentioned he'd been using it for months.
Perhaps they need to wait until video is part of the real time API?

@ Gokul

I mean that isn’t very generalized as they are answering a specific question but we shall see. No I didn’t tell them I’m
Blind and there is a reason for that, I’ve noticed with companies that the blind card always has the opposite effect. Responses like “we are sorry but this isn’t ment to help blind people navigate, etc., etc.” honestly, not telling a company I’m
Blind also got me a job at a company I’ve been working at for 3 years now lol. That’s a fun story. I just don’t see why being blind matters, it’s not doing what I want it to do and that is what matters. I mean it is pretty damn good for what it is but that one fix would make it perfect. Gokul, you may want to try using a different voice as some voices work better with vision than others. Please see my comment above.

If you have access,

Also switch gpts over to o1 pro