New AI app for describing images and video: PiccyBot

Laszlo, thanks for noticing the DeepSeek addition. It's the 7B model that I installed locally on one of my own servers. So not very powerful, as this server is not the best. It is more a proof of concept. One of the good things about it is that I have full control over it. I love open source and DeepSeek was clearly built on top of Meta's Llama, with a lot of smart optimisation steps.
The version I am running for PiccyBot only describes images, for video it will default to Gemini at the moment.

Now the stage is set.. With these kind of open source models available, it shouldn't be too expensive to train a model specifically tailored for blind and low vision use.

Another point is the censorship. At the moment the model will still walk the Chinese government rules and limit output that way. I am sure there will soon be models that will strip these restrictions. The current local model may be less censored as far as sexuality and such, still have to check that.

I have also updated PiccyBot, it should be more stable now, earlier it could get 'stuck' after many requests. It also includes a push notification to tell you when the processing of a video is finished. And you can minimise PiccyBot now while it is processing. It will play the description in the background even when you are continuing with another app.

Another development is the PiccyBot WhatsApp service. Particularly useful for Meta Rayban users who are banned from the 'look and tell' function. Sending a video or image to PiccyBot on WhatsApp will result in an audio description. Bit slow and somewhat clunky but at least it will enable handsfree video descriptions while wearing the glasses.

Good luck with the app guys, let me know how things work for you?

WhatsApp service

This sounds great - is it available now? If so, how do I use it?

Please a Mac Version

Please make it available on the MacOS. We need an app like this.

A model for VI

@martijn Exactly that's what I am excited about! even with deepseek, everything is open source and available out there isn't it? Speaking of which, what about Llama?

re: New update

thanks for the new update! I have tried it and can confirm that the audio will continue to play even when you lock your phone or go to another app. However, if I lock my phone or minimize the app and go to another app while processing, it seems to stop processing because when I come back to the screen, all it shows me is retry and no description was generated.
incidentally, I don't know. Have anyone requested this feature yet, but it would be nice to have a setting where we can set the app to auto retry when description fails or fail to mix audio, etc., waiting for 4:5 minute and then only to come back and have to manually hit retry again and again gets a little tedious. especially now, if the goal is to allow us to have it processing in the background, it makes sense if it would auto retry when fail. Maybe not indefinitely? Maybe auto retry five times or something and then sent a notification that says it has failed five times, please check the video, or something like that?

was there an AI censorship cracked down or something? ROFL

as of yesterday, both llama and Mistral are acting very frigid and refuse to describe or answer anything even remotely sensitive. They act like they're stuck on GPT four. Anyone else experiencing this?

No more gemini experimental?

I have just opened Piccybot, went into settings and discovered that AI model was somehow set to Mistral Pixtral. I have used Gemini Experimental 1206 since it appeared in the list. I browsed through the available models and saw that there was no such model listed any more and probably that's why the selection changed to Mistral Pixtral. Does the disappearance of Gemini experimental seem permanent, because there is no more experimental version of Gemini or for other reasons? Which is the preferred model as of now then thatgenerates the most close descriptions to the superb ones Gemini experimental used to? Google Pro 2.0 maybe?

Google experimental replacement

Yes, I would say that it is 2.0 Pro that is the replacement for the experimental model.

Whatsapp integration and various updates

Hi guys,

There have been some model updates as you noticed, with Google Gemini 2.0 replacing some of the earlier models. The experimental one has been replaced by Gemini 2.0 Pro as already indicated by blindpk.

The main update is the Whatsapp support, specifically handy for users of Meta Glasses, who have been locked out of the 'look and tell' feature unless they are in the US.

I think any method to access other ways to describe images and videos handsfree is useful. I am using Whatsapp to connect PiccyBot to the glasses, and even though this may be a clumsy method, it does work.

Use PiccyBot on your Meta Glasses via Whatsapp
Steps to follow :

1. Register the PiccyBot WhatsApp Service: https://piccybot.com/register
2. After registration, you can use the PiccyBot WhatsApp Service to describe images and videos. Save the contact ‘PiccyBot’ to your Whatsapp contacts.
3. In the Meta View app, go to Settings, then "Communication," and connect via WhatsApp
4. Your device is now ready for hands-free usage
5. Put on your glasses, then use the following prompts to send a message to PiccyBot via WhatsApp
For photos:
* "Hey Meta, take a photo and send it to PiccyBot."
* "Hey Meta, take a photo and WhatsApp it to PiccyBot."
* "Hey Meta, take a photo." After the photo is taken, you’ll hear a click sound. Then say, "Hey Meta, send the latest photo to PiccyBot."
For videos:
* First, say "Hey Meta, take a video," and the glasses will start recording. To stop recording, say, "Hey Meta, stop." After it stops, say, "Hey Meta, send the latest video to PiccyBot."
* Note that your Meta Glasses will capture a video of no more than 15 seconds.
* After sending the media, your glasses will ask, "Send photo/video to PiccyBot?" Respond with a confirming statement, like "Yes." Then it will reply, "Sending photo/video."

The message (image or video) is then sent to the PiccyBot account on WhatsApp.
Once the media is sent, you will receive an audio message saying, "PiccyBot is processing your image/video, please wait."
After processing, an audio description will play. To play the audio hands-free, ensure WhatsApp on your phone isn’t open. Your Meta Glasses will receive a notification about the audio message and say, "Voice message from PiccyBot." To listen, say, "Play the voice message," and the audio will play on your glasses.

For a video description of the process, please check this video by Dave Taylor-Page, with whom I have been working to get this done: https://www.youtube.com/watch?v=2KBH3y64rHk

Good luck with PiccyBot, let me know what you think?

Problem registering with WhatsApp

I read your instructions above and immediately registered for the WhatsApp service as I'm very excited by the prospect of using this on my Meta Ray-bans.

However, I couldn't figure out how to get Piccybot into my WhatsApp contacts. Then I watched the video and it sounds like I should have registered on my phone, not my Mac. I believe at the end of registration it should have given me the option to add the contact.

I tried again on my phone and was told that the number was already taken. Is there another way to add the contact?

this is brilliant!

Going to try this! Even if you have access to meta's look and tell feature, this'll still be worthwhile to have since 1: it can describe videos and 2: the image descriptions will be superior.

Fanominal

Hello I just added the Piccy Bot service to my What's app on my phone. This is great and Piccy Bot is superior; the voice is great on the what's app version too :)

I think Martijn just solved…

I think Martijn just solved Meta ... 🤯

WhatsApp registration problem

I have registered using the link in the instructions above. Got a registration successful message on web, but no contact to add to my WhatsApp.
Tried again and got message that number is already registered.
So how can I make the connection now?
Like many, keen to try this out asap as have really missed look and see on Meta Raybans since vpn option was throttled.

Registration Problem

I'm also experiencing a problem with registering. I registered using the link above. I used my iPhone to register. After hitting submit, I received a registration success message. A PiccyBot contact never opened, so I was unable to save it to my contacts. I tried to reregister, but received a message saying my number was already registered. How can this be fixed?

Mrs B, Earle, Mr Grieves,…

Mrs B, Earle, Mr Grieves, please add the PiccyBot contact separately if you missed it during setup somehow. This is the PiccyBot Whatsapp business account: https://api.whatsapp.com/send/?phone=917736089657

I will be enhancing this further. Translating the messages for non-English users as a first step. Then adding the option to ask follow up questions (like in the regular PiccyBot app).

Let me know how it works for you?

Added The Contact

Thank you. I added the contact. Now I just have to test it.

Maybe

Maybe change the number already registered message to include instructions for if someone has this issue?

Also just in case anyone gets confused when discussing technical matters in future Meta blocked VPN use they didn't throttle it, throttling means limiting the speed of something. Example, a person who runs their internet connection at 100% 24/7 may find their internet connection being throttled.

@Icosa

Yes, a poor choice of words on my part. Meta killed the VPN option.

@Martjin

Thanks for sharing that contact link. It works fine.
This does seem like a useful workaround, although there’s obviously an issue around the speed of response.
Appreciate many use cases and you can’t tailor this to everyone, but I think I would prefer the WhatsApp bot to provide much shorter answers. Perhaps that might also help with response time?

If there were scope for anyone subscribing to the app to have an element of control over the type of responses that come through WhatsApp that might be one solution…

Thank you so much for the work you have done and continue to do in this visual interpretation space. It’s coming on leaps and bounds and just makes me more excited to see what’s around the corner…

Meta Ray-bans not understanding

I added the contact and called it "the pixies" (sorry).

So first time, I asked the glasses to send a photo to the pixies. I think it asked if I wanted it to send it to someone else in my contacts via WhatsApp. I said no. But I think what it did was send it to that person using messages instead. Fortunately it was only a test picture of my dog but I did get a confused reply.

I tried again, saying "send photo to the pixies". It asked me to confirm if I wanted to send it to the pixies, so I said yes. It sent it through messages, but it did reach the right place as I could see it getting rejected.

So I did the same and added "on whatsapp" to the end. It then asked me if I wanted to send it to some random WhatsApp group which had a name that couldn't have sounded less like the pixies if it tried.

So I went into WhatsApp and sent a test message to that contact so it now appears in my Chats list. I then repeated the command to send via WhatsApp but it is absolutely determined to send it to this other random group instead. Fortunately when I say "no" it is not sending it anyway.

I'll keep messing about when I have a little more time - maybe I need to delete it all and start again, or try a more sensible contact name. I thought "PiccyBot" may be tricky for the glasses to understand which is partly why I used the pixies instead. That and because I am an imbecile of course.

How is everyone else finding this? When I get it working I think it is going to be amazing.

Working Fine For Me

Now that I have everything set correctly, it is working fine for me. I named the contact PiccyBot. I have to pronounce it as PeekyBot to get it to work. Now that I've started doing that, it will work every time.

sharing screenshots and pictures?

Hello,
I just wanted to say how much I love this app. I love the current functions and how it constantly expands so I'm excited for the future.
My question is is there a way to share screenshots or pictures from apps like reddit, dystopia and so on? every time I try it just blanks out on me. One of the ways I've found to use AI is to read comics through marvel unlimited. Previously this wasn't doable so I'd love an easier way to do this as currently I'm sending screenshots through be my eyes and there doesn't seem to be a shortcut to do this so it's pretty slow.

Sholdn't we have this in the first post?

@Martijn, would be useful for late-comers if you edit the first post to add this functionality.

Shouldn't we have this in the first post?

@Martijn, would be useful for late-comers if you edit the first post to add this functionality.

Out of curiosity

which service and model does the WhatsApp integration use for description? I asked because I noticed a slight degradation in the quality of the descriptions generated.

I'd love to see this system expand

I've tried this on a few videos from my iPhone, and, despite its personal commentary making everything sound like it's the best thing ever, it worked really well. I used it to describe a cutscene on a videogame, taken with my phone's camera from my monitor. The only problem is I first have to play the video, then have it analyzed. No big deal, I can wait. But what I'd love is a way to do this on my PC directly, either by having it share the screen with me and describe the video from start to end, OR even if I record the video myself first, then load it into the app to have it described later. Using the phone as a middle man so to speak is a bit awkward. I tried sharing the video to dropbox, then trying to share that video with the bot, but unless I take the video myself or specifically have it on my camera roll, it doesn't seem to work that way either. So what I'm saying is a few more sharing options, or having it on a PC would be extremely welcome. Overall though it's a fascinating bit of technology. Haha, except once when I had it read some scrolling paragraphs of text summmarizing part of a story, and it completely made up something completely different, both in its summary, and when I asked it to read out the full text. That was ... weird.

@Remy

If you subscribe to the app, there is an option to turn off the personality of the voice. This makes the descriptions far less opinionated. You may already be doing that, but I thought I would mention it. I found the personality mode amusing for a short space of time, and my wife loved being described as some sort of celebrity, but I find the app is so much better with it turned off.

Service used

Gokul, the PiccyBot Whatsapp service uses GPT4o for images and Gemini 2.0 Flash Lite for videos. These are not the current best models, but they are generally superior to Meta AI. Using the PiccyBot app will give you best results though.

DeepSeek

Hello guys! I have a suggestion, although I don't know if it would be very useful since I haven't tested this llm: Since DeepSeek is open source, would it be possible to download it so that we can process images or videos on the device itself, having more speed and privacy?

Way too big for that I am afraid

The Deepseek variant in question is Janus Pro, as the R1 and earlier families of Deepseek (v2, v3) are all text-only models, i.e. don't understand images. Janus Procontains 7 billion parameters. That means even with parameter size reduction (called quantisation) it won't fit into iPhone memory. By the way Mistral Pixtral and Llama 3 (also in Piccybot subscription version) are open-source models too, but also too big for iPhone memory, as they consist of at least 7 billion parameters or more too.

when the selected model is busy,

does the app automatically use an alternate model regardless of the setting? sometimes when i tried to have a photo described using different models, i got identical descriptions, almost word by word.

if that's the case, i suggest it would be useful the user bing notified which model is actually being used, it is useful for users to evaluate which model is best suited for specific type of images or tasks.

re: when the selected model is busy, Bookmark

I agree, I have seen the same situation where I clearly switched three or four different AI but it's generating the same response. insidentally, all day long today. I am getting this message, "Server is overloaded. Please try again after some time." is it just me or is something broken or is the app that popular that we can't even get through?

LaBoheme, Privateai, can you…

LaBoheme, Privateai, can you tell me which models were playing up for you? PiccyBot falls back on GPT4o when a model is not providing a response. It is all working at the moment for me though. Yesterday may have been rough as there were about 700 new people trying the Whatsapp service. If this continues I will upscale the server and database to cope with it.

so when selected model is busy

it does send the query to alternate models? well, that just answers my question.

what i like to seed is a notification, like a prompt, telling the user which model is actually being used. i find it very helpful for me to know which model is good for certain kind of images or tasks.

re: LaBoheme, Privateai, can you… Bookmark

i mainly switch between amazon pro, mistrel and lama. lately all 3 of them respond as if google gemini's being used lol

Locally hosted models?

@martijn I was thinking would it be possible to host a visual llm like say Deepseek r1 locally for piccibot in order to give more unsensored image descriptions?

to Gokul

Though I am not exactly Martin, but I do know the answer to your proposition, that's why I write it so that you may read it earlier. Gokul, the unfortunate thing is that if a model is trained to do censorship then most of the time it comes from the model itself. That means the answers will be censored no matter how you run the model, locally or in the cloud somewhere. In this case the only thing that can help if someone builds an uncensored version. It happens sometimes, e.g. with earlier versions of text-only Llama models, but it takes considerable amounts of time and computing resources, and I see such uncensored versions less and less often. Unfortunately I see the trend moves towards more and more censored models, and not towards freer ones. That is definitely a pity, but the world seems to be in such state nowadays.
There are some fortunate cases when running locally can indeed reduce or avoid censorship. These are the cases when some textual filters are applied to the questions or to the answers which are not built into the model itself. Sometimes it also happens that another model does the filtering, and not the main one.
Last, but not least as I stated in a quite recent post, DeepSeek R1 is not a visual model: it is text-only. The model that is capable of understanding images in the DeepSeek family is called Janus and that is what Piccybot offers for subscribers recently.

Janus yes

My bad there. What I meant was Janus pro only.

I believe

I believe earlier in the discussion it was stated that picciebot's deepseek option was being run on the developer's own machine rather than the standard cloud version. Modifying the model to be less censored is an entirely different matter as was stated and isn't as simple as changing an option in the settings, most of what we would consider settings options are effectively built into the model instead of being something you can change. It's part of the nature of training, you either train it to be censored or you don't.

Fine-tuning

That's exactly what I was implying; since we do have state-of-the-art open source models available now, shouldn't we think of fine-tuning something for this and this purpose only?

unable to register for whatsapp service

Hi, I am giving my name , e-mail address , selected my country and entered my phone number in the whatsapp registration form of piccy bot and hitting the submit form button.
I am not getting any response from the page about the status of my registration. am I missing out any thing?

Whatsapp and local models

Arya, I will look at the server not accepting entries. It has been busy, I will migrate to a better setup but it will take some time. Try it again after some time please.

The local open source models offer promise but as said, are still too large for most phones. And having them properly uncensored is indeed a matter of someone building on a new dataset. It will happen, but at the moment it would be a risky premise, as it would be expensive and there is a good chance that the overall quality will be so surpassed by newer regular models by the time it is done..

the new Grok AI

Wondering if there's a chance to add Grok to the list. From my interaction with this AI, it is not heavily censered and can output allot of text. It can also describe images, but the X/Twitter platform which is uses for uploading photos does restrict contents when you do it through the APP. It is actually kind of funny that they censor the photos you can upload through them, but the AI itself can generate crazy erotic contents. I uploaded a perfectly boring ordinary photo, and asked the AI to give it a "lewd" description, and it came back with some truly creative filth LOL.

New models

Privateai, correct, I am keeping a close eye on Grok, but so far their API doesn't support multimodal input, disappointingly. As soon as it does, I am hoping this model will result in less censored content.
I did add Claude Sonnet 3.7 to the available models in PiccyBot today. Please try it out and let me know what you think?

great app for video describing

The App is great for having videos described. That is the benefid of this app.

great app for video describing

This App is great for having videos described.

Audio description?

Hello Martijn!
I don't know if it would be possible, but would it be possible to describe what is happening at the exact moment when processing the video, like an audio description? Almost like seeing ai, but without the pause to describe.
Of course, it would have to give less information, but I think it would be amazing if it could do that. Reading the text or listening while the video is playing like this ends up causing synchronization.

New AI app for describing images and video: PiccyBot

Options

Comments